3.Intel 64 Base Architecture notes
**This article is about Intel 64 Architecture software developer's manual volume 1:Base Architecture
we skip all IA-32 instruction,and Intel 64 is our focus,even though we will emphasize the difference when it's needed to distinguish 32-bit and 64-bit mode.also we only concern Integer operation, floating-point related operation is not supposed to be mentioned.
chapter 3,Basic Execution Environment
chapter 3,Basic Execution Environment
address space
:Intel 64-bit 's address space can range from 0 to 2^64-1 ,and the physical space is at (0,2^46-1). note that since P6 family , IA-32 process can address physical space within(0,2^36-1) through PAE paging.
basic execution register
:the number of general purpose registers is 16 ,they are :RAX,RBX,RCX,RDX,RDI,RSI,RBP,RSP,R8-R15
(remember there are only 8 registers under 32-bit mode) and RIP instruction counter register,and EFLAGS control and status register. REX prefix is used to generate 64-bit operand or access R8-R15 registers. common 64-bit mode prefix includes:REX prefix
, operand-size prefix 66h
and address size prefix 66h
segment register
:are disabled in 64-bit mode.thus creating flat address space .
SIMD register
:8 MM
registers in MMX, 16 XMM
registers introduced in SSE and 16 'YMM' registers introduced in AVX. note that in AVX-512 ,all numbers of registers are doubled. other registers. IO address space is always 32-bit . next how to calculate an address offset ? offset=base+index*scale+displacement
in AT&T assembly syntax ,we could generate an offset by:disp(base, index, scale)
where displacement is a 32-bit(or 16-bit or 8-bit) ,base and index are 64-bit register ,scale factor is in [2,4,8].the exception is that RSP can not be regarded as index register.
chapter 4,Data Types
chapter 4,Data Types
the fundamental data types include byte(8-bit),word(16-bit),doubleword(32-bit),quadword(64-bit),and double quadword(128-bit,which is introduced in SSE extension).as for numeric data type,there are signed and unsigned integer,single-precision (32-bit) and double-precision(64-bit).
packed SIMD data type includes packed byte, packed word , packed double word and packed double quadword.
we skip chapter 5 because it's instruction set summary.and as well as chapter 6 since it's about program runtime footprint .
chapter 7,Programming with General purpose instruction.
chapter 7,Programming with General purpose instruction.
In 64-bit mode,instruction with REX prefix can access 64-bit registers ,note that EFLAGS is still 64-bit then. we will introduce these instructions step by step.
data transfer instructions.
general data movement instruction
mov(movcc)
,there is one limitation ,can not move between two memory operands.exchange instruction
xchg
swap two registers or register and memory location without the help of the third register,this instruction will be automatically prefixed with LOCK_ to make it atomic.cmpxchg
this is most used classic instruction.stack manipulation instruction
push/pop
andpushA/popA
(32-bit only) .
binary arithmetic instruction.
add/sub
and inc/dec
,and cmp
which calculate the difference between integer operands,but does not stor the result into destination operand,then we can do jcc or setcc according flags in EFLAGS.
logical instruction.
and/or/xor/not
all of them can be rex prefixed.
shift and rotate instruction.
sal/sar/shl/shr
perform local or arithmetic shift operation ,note that arithmetic shift keep the most/least significant bit when shifting Left/right.shift will affect CF status bit.
bit and byte instruction
test and set instruction (modified the selected bit)
bt/bts/btr/btc
test a certain bit ,and keep it into CF ,then set/reset/reverse it.bit scan
bsf/bsr
please refer to the demo.byte set
setcc
set a byte according to EFLAGs register._ test instruction
test
is very similar to and ,but only update EFLAGs.
byte string operation
movs/cmps/scas/loads/stos
rely on RDI RSI register, and direction flag DF is the switch that control the growth direction .a common usage is combining string operation with REP prefix which loops and decrements RCX if and only if RCX is not zero.a concrete demo will be represented in article 2.
IO instruction
inb/outb
and its varies,they impose input/output operation on a port,note that even in 64-bit mode,the maximum data width is still double word.
EFLAGs manipulation instruction
set or reset a flag in EFLAGS,for example,cld
will clear the direction flag in EFLAGS,and std
will set this bit.
miscellaneous instruction
lea
will load the effective address of a memory location into a register. cpuid
will load processor features into RAX(RDX). nop
do noting but stall the processor. rdrand
read the 64-bit random number into the dest register.
chapter 9,Programming with Intel MMX technology.
chapter 9,Programming with Intel MMX technology.
MMX, the first generation SIMD instruction ,was introduced into IA-32 in Intel Pentium 2 processor family. There are 8 extra 64-bit registers in MMX extension,they are named MM0 through MM7.MMX instruction set and SSE/SSE2 instruction set still manipulate on these registers. there are three data types introduced in MMX,they are 64-bit packed byte integer/packed word integer/packed doubleword integer
.
MMX data transfer instruction
movd/movq
is OK,but only between memory and registers.
MMX arithmetic instruction
paddb/paddw/paddd/psubb/psubw/psubd
do vector arithmetic operations.additional instructions will perform multiply and divide operation.
MMX comparison instruction
pcmpeqb/pcmpeqw/pcmpeqd
compare the corresponding data element(byte/word/doubleword) and generate the mask, this turns out to be very useful when deciding whether a special element exists.
MMX logical instruction
pand/por/pxor
without operand-size suffix.
MMX shift instruction
psll/psrl(w/d/q)
perform logical shift on the specific data element. arithmetic shift is not my concern.
chapter 10,Programming with Intel SSE technology.
chapter 10,Programming with Intel SSE technology.
SSE was introduced into IA-32 in pentium III processor family,the breakthrough is it brings 128-bit wide data in SIMD framework,There are 16 XMM registers which are named MMX0 through MMX15 in 64-bit mode without changes to MMX technology. note that XMM registers can only be used to perform calculations on data ,they can not be used to address memory ,addressing memory is accomplished by using general purpose register.additional mxcsr
control and status register for floating-point calculation.
There is only one type of data introduced along with SSE,packed single-precision floating-point
data type.still anything about floating-point ,I will skip them all,so next we go to integer instruction first .
SSE/SSE2 have complementary instructions with MMX technology , let's come and see them: pavgb/pavgw
(there is no other sized operand ) calculate respective average values.
pextrw
extracts a word from a MMX register and store it into a general purpose register.this is useful when we individually extract and inspect a word.
pinsrw
inserts a word into MMX register at a given position.
pmaxub/pmaxsb
the max value (signed or unsigned).
pmovmskb
moves the mask of a byte(the most significant bit),and constructs a new byte into a general purpose .this turns out to be of great use when we are deciding something according the final bitmask.
then skips multiply and absolute difference calculation. next is shuffle pshufw
which will shuffle the words in a MMX register according to the mask.
next is cacheability control/prefetch/memory order instruction.
new instructions since SSE offers non-temporal memory load and store which means moving data between memory and registers will not pollute the cache.here in SSE, non-temporal hinted instructions include [novntq
stores quadword from MMX register into memory, movntps
stores single-precision floating-point data from MMX register into memory with non-temporal hint,maskmovq
is another ,non-temporal store will issue stores in write-combining
sematic, if the destination is already in the cache hierarchy, then the cache item will evict,otherwise it groups its stores,and writes into memory at a certain timer later, so with non-temporal ,the order of subsequent writings is weak, the coherence is maintained on our own,we should use sfence
instruction which ensures all the processors have the global visibility of the stored data.
the last instruction set is prefetch,prefetcht0/prefetcht1/prefetcht2/prefetchnta
will do the work with temporal or non-temporal hint or cache hierarchical level.
Last updated
Was this helpful?