2.Intel 64 instructions demo

1. copy a string

code snippet :

char * _strcpy(char* dst,const char* src)
{
        asm("cld\n\t"
                "1:lodsb \n\t"
                "stosb \n\t"
                "testb %%al,%%al\n\t"
                "jne 1b\n\t"
        :
        :"D"(dst),"S"(src)
        :"memory"
        );
        return dst;
}

lodsX and stosX will load string [byte/word/doubleword/quadword] to [ah/ax/eax/rax] and store in that reverse way,test [and arithmetic operration] only effects EFLAGS register,and then we branch according to Z flag ,note that the label ,we add a suffix 'b'(before) or 'f'(after).

2. read cpu timestamp register

code snippet:

note that RDTSC will read cpu timestamp into EDX:EAX. two ways are all right ,the first one comprise the timestamp quadword ,and store it to memory,while second one loads effective address first ,and store EDX and EAX respectively to memory.

DPDK implementation is simpler ,it place memory store operation into input/output list,let the code show you:

the premise is that you should define a union structure first,nothing special.

3. compare and set implementation

here is my code:

let's take a look at how DPDK library implemented it:

note that DPDK version use "m" modifier while I use register to construct an effective address ,both are OK,and I use setz to test the result ,while DPDK use sete, you will find both are still OK.

4. scan least set bit

gcc builtin macro__builtin_ffsll() can scan the least set bit,if a value is all zero,it represents 0,otherwise the index(starting from 0) plus 1 is returned. now I will implement it on my own:

this time I again forget to use bsfq prefix ,but the result is right.

5. generate cmp mask with MMX instruction

here is the code:

Last updated

Was this helpful?