1.glibc malloc

why glibc malloc?

The first question is why glibc malloc is gonna be introduced here since the article is mainly about vpp employed technique.

VPP absolutely exploits X86 platform dependent technology to accelerate its own business ,I mean lots of vector instructions are involved and packet frame batch is handled at any time ,but this is not the whole meaning of vector processing ,another important reason why vpp is vectorized is that all the runtime data structure is vectorized which involves frequent memory allocation and deallocation,I suppose vpp memory allocator is much more efficient than glib allocator ,before verifying this ,I need to know how glic memory allocator works ,and comes to glibc malloc.

how could we obtain memory base from system(kernel)?

certainly through system call brk() and mmap().

first we check the initial memory mapping layout of a typical process (make sure the process's main heap is initialized by issuing a malloc() calling):

[root@localhost mywork]# cat /proc/33320/maps 
00400000-00401000 r-xp 00000000 08:03 1682720                            /root/mywork/tmp/delete/a.out
00600000-00601000 r--p 00000000 08:03 1682720                            /root/mywork/tmp/delete/a.out
00601000-00602000 rw-p 00001000 08:03 1682720                            /root/mywork/tmp/delete/a.out
00edd000-00efe000 rw-p 00000000 00:00 0                                  [heap]
7f47764e4000-7f477669a000 r-xp 00000000 08:03 69227064                   /usr/lib64/libc-2.17.so
7f477669a000-7f477689a000 ---p 001b6000 08:03 69227064                   /usr/lib64/libc-2.17.so
7f477689a000-7f477689e000 r--p 001b6000 08:03 69227064                   /usr/lib64/libc-2.17.so
7f477689e000-7f47768a0000 rw-p 001ba000 08:03 69227064                   /usr/lib64/libc-2.17.so
7f47768a0000-7f47768a5000 rw-p 00000000 00:00 0 
7f47768a5000-7f47768c5000 r-xp 00000000 08:03 78179217                   /usr/lib64/ld-2.17.so
7f4776aad000-7f4776ab0000 rw-p 00000000 00:00 0 
7f4776ac1000-7f4776ac4000 rw-p 00000000 00:00 0 
7f4776ac4000-7f4776ac5000 r--p 0001f000 08:03 78179217                   /usr/lib64/ld-2.17.so
7f4776ac5000-7f4776ac6000 rw-p 00020000 08:03 78179217                   /usr/lib64/ld-2.17.so
7f4776ac6000-7f4776ac7000 rw-p 00000000 00:00 0 
7fff42d42000-7fff43145000 rw-p 00000000 00:00 0                          [stack]
7fff431fe000-7fff43200000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

we notice that [heap] mapping is there as we allocate more and more memory by malloc() in main function(note that malloc() not happens in other thread),the heap mapping can extend upward itself,and it will never conflict with other mapping region since it's located in the low area of the whole process address space [, it seems that [stack] mapping can not increase automatically which means I can not allocate too much large temporary stack variable ?]

in Glibc malloc ,[heap] is called main arena,when it needs to extend ,it call brk() to extend the mapping boundary of the process.

another type of dynamic memory is obtained through mmap() with which we use 0 as file descriptor and private mapping flag is set. these memory areas are called non-main arena.

note that VPP clib memory is allocated by direct mapping.

how glibc malloc supports concurrency?

note glibc malloc derives from pthread malloc.

there are more than one arenas available for different threads,exactly there are 8 times of number of cores.

so a thread can choose a arena ,and lock it ,then allocate memory on that arena(,also initialize it if not yet).

how memory elements are organized?

there are three major data structures in glibc malloc,respectively they are malloc_state(also called arena), heap_info(heap struct maintained by non-main arenas ,note that an arena can contain more than one heap(allocated by mmap()))and malloc_chunk(basic memory element,two kinds of chunks,allocated and free).

the following figure show the difference between main arena and thread arena :

main arena contains no heapinfo structure ,and main_arena is present in data segment of libc.so,and its chunk pointer _top point at brk() allocated [heap],meanwhile every time a new area is mapped using mapp(),a heap_info is created,and it will be correlated with a arena(malloc_state). next figure shows how to correlate two heaps :

at the beginning of every newly mapped memory there is a heap_info header whose prev pointer points to previous heap_info ,and arena pointer ar_ptr points to the only arena malloc_state structure.

let's delve into malloc_chunk a little .

here is the definition:

struct malloc_chunk {
  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */
  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;
  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

the prev_size and size are fixed, and other fields vary with what type the chunk is.for detailed information ,please refer to the explanation in malloc.c(https://github.com/lattera/glibc/blob/master/malloc/malloc.c#L1138),originally, when the heap is allocated or the main arena is initialized ,the whole memory can be a huge chunk ,and arena can reference it by top pointer, once a allocation request is serviced, an appropriate chunk may be chosen and splitted into halves,one is returned to user.

after deallocation ,the chunk is returned to glibc ,not to the system,but where to accommodate it ?

note *fd and *bk pointer in malloc_chunk,they could act as next and prev pointer in doubly linked list,when a chunk is deallocated, it may coalesce with adjacent chunk,and insert it in a free list,in glibc malloc ,the free list is called bin.

there are four kinds of bins in glib:fastbin,unsorted bin,small bin and large bin.

when a allocation request is service ,the search order conforms to best-fit(http://www.memorymanagement.org/glossary/b.html#term-best-fit) strategy.and glibc's memory is still a buddy system(http://www.memorymanagement.org/mmref/alloc.html#buddy-system) ,next we will see how these are realized.

introduction of bins

fastbin is an array which holds singly linked free lists whose sizes range from 16-bytes to 80-bytes (adjacent list is 8-bytes apart),when a request ranging from 16 to 80 is serviced ,the corresponding fastbin index is calculated ,if list is not empty ,fetch one ,and return it to user. other small bin services it.

small bin host chunks which are not greater than 512 bytes ,it includes fast-bin chunk ,but it's doubly linked list,and it's still 8-bytes apart,the procedure is similar to fast bin except that when there is no available entry in small bin,unsorted bin will service it.

unsorted bin is a doulby linked list which can host any sized chunks ,when a chunk is freed ,it may coalesce with adjacent free chunk to be a larger free chunk ,and then add it into unsorted bin,when the unsorted bin is searched to match a best-fit chunk ,it may put other chunks into their corresponding bins(this is the only opportunity where chunks are genuinely deallacated)

but why there is unsorted bin(why should we just put a coalesced chunked into bins instead of unsorted bin)? I guess it will accelerate freeing speed and give the recently freed chunk a chance to used very soon later.

large bin host ant chunks which is greater than 512 bytes ,and different large bins can host different sized chunk ,they may not be evenly apart,and chunks in a bin are sorted in decreasing order,when a certain bin is searched and find no appropriate chunk ,next larger bin is search instead ,it's rather time-consuming ,at last ,the top chunk may service it.

for detailed steps ,please refer to :https://sourceware.org/glibc/wiki/MallocInternals

summary

glibc pools small memory objects and use indexing to speed up allocation,in order to decrease fragment, coalescence and unsorted bin are used to make allocated memory confined in a range(do not use new memory area as long as recent freed memory can fulfil the allocation reuest).

Last updated