Golang Map sample code to implement assignment and expansion

  • 2020-10-31 21:47:03
  • OfStack

The golang map operation is the more complex logic in the map implementation. When assigning, in order to reduce the length of the hash conflict chain, map expansion and data migration will be done. The expansion of map and the migration of data will also be a focus.

The data structure

First, we need to relearn the data structure implemented by map:


type hmap struct {
 count   int
 flags   uint8 
 B     uint8
 noverflow uint16
 hash0   uint32
 buckets  unsafe.Pointer
 oldbuckets unsafe.Pointer
 nevacuate uintptr
 extra *mapextra
}

type mapextra struct {
 overflow  *[]*bmap
 oldoverflow *[]*bmap
 nextOverflow *bmap
}

hmap is the structure of the map implementation. Most of the fields were studied in Section 1. The rest are nevacuate and extra.

First, you need to understand the concept of relocation: when the data link in hash is too long, or there are too many empty bucket, the data will be moved to a new bucket, and the number of bucket will be composed to oldbuckets. The relocation of bucket is not completed once, but may be triggered only when the corresponding bucket is visited. (Is this point similar to redis's capacity enlargement? The capacity enlargement is placed on multiple accesses, reducing the delay pressure of a single access)

nevactuate identifies the location of the move (you can also consider the progress of the move). Indicate where bucket has been relocated in oldbuckets (1 array). extra is the structure of 1 map. nextOverflow identifies the empty bucket of the application, which is used for later conflict resolution. overflow and oldoverflow identify the bucket data being used in the linked list that overflows. The difference between old and non-ES43en is that old is the data for relocation.

With a general understanding of the data structure, we are ready to learn the map assignment.

map assignment operation

map assignment operation is written as follows:


data := mapExample["hello"]

golang has been optimized for different types of k. The following are some implementation methods:


func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast32(t *maptype, h *hmap, key uint32) unsafe.Pointer {}
func mapassign_fast32ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {}
func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {}
func mapassign_fast64ptr(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer{}
func mapassign_faststr(t *maptype, h *hmap, s string) unsafe.Pointer {}

Much the same, we will study the implementation of mapassign.

The implementation of mapassign method is to find an empty bucket, assign key to bucket, then return the address of val, and then do memory copy directly through assembly.
Let's look at how to find leisure time step by step.

Before looking for key, it will do an exception detection to check whether map is not initialized, or is writing concurrently, and throw an exception if it exists :(this is why map writes back to panic concurrently)


if h == nil {
 panic(plainError("assignment to entry in nil map"))
}
//  Unexpectedly state inspection   and   Memory scanning 

if h.flags&hashWriting != 0 {
 throw("concurrent map writes")
}

The value of hash corresponding to key needs to be calculated. If buckets is null (map less than 1 fixed length will not initialize data when initialized), another bucket needs to be initialized


alg := t.key.alg
hash := alg.hash(key, uintptr(h.hash0))

//  Why do I need to hash  After setting up flags Because the  alg.hash May be panic
h.flags ^= hashWriting

if h.buckets == nil {
 h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)
}

Obtain the corresponding bucket by hash value. If map is still migrating data, you need to find the corresponding bucket in oldbuckets and move to the new bucket.


//  through hash  To calculate bucket Position deviation of 
bucket := hash & bucketMask(h.B)

//  Here is the relocation logic, we will explain in detail later 
if h.growing() {
 growWork(t, h, bucket)
}

//  Calculate the corresponding bucket  Position, and top hash  value 
b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))
top := tophash(hash)

After you get the bucket, you need to look it up one by one in the linked list to find the corresponding key. It may be the existing key, or it may need to be added.


for {
 for i := uintptr(0); i < bucketCnt; i++ {

  //  if  tophash  It's not the same, so let's take it tophash  Under the 1 a 
  if b.tophash[i] != top {

   //  If it's an empty space, put it kv Got the pointer. 
   if isEmpty(b.tophash[i]) && inserti == nil {
    inserti = &b.tophash[i]
    insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
    val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
   }

   //  If there is no follow-up evidence, then there is no need to find the pit 
   if b.tophash[i] == emptyRest {
    break bucketloop
   }
   continue
  }

  //  if tophash When the match 

  k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
  if t.indirectkey() {
   k = *((*unsafe.Pointer)(k))
  }

  //  To compare k No, we need to keep looking 
  if !alg.equal(key, k) {
   continue
  }

  //  if key  Also equal, indicating that the previous data, direct update k And get v "Will do 
  if t.needkeyupdate() {
   typedmemmove(t.key, k, key)
  }
  val = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.valuesize))
  goto done
 }
 //  Remove the 1 a overflow  (Linked list pointer) 
 ovf := b.overflow(t)
 if ovf == nil {
  break
 }
 b = ovf
}

To summarize the procedure, there are several main parts:

a. map hash mismatch is checked to see if kv is empty. If delete is called and kv is empty, leave the address first. If the corresponding k is not found later (that is, there is no corresponding Key in map before), then simply use the position of empty kv.
b. If map hash matches, you need to determine whether the literal value of key matches. If it doesn't match, you need to look it up. If there is a match, update key directly (because there may be references) and return the address of v.
c. If none of the above, look at bucket

Before inserting data, I will first check that the data is too much and need to expand capacity. If it needs to expand capacity, I will get the new bucket from and look for the corresponding position.


if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
 hashGrow(t, h)
 goto again // Growing the table invalidates everything, so try again
}

If there is no available space, add an bucket to the list to get kv.


if inserti == nil {
 // all current buckets are full, allocate a new one.
 newb := h.newoverflow(t, b)
 inserti = &newb.tophash[0]
 insertk = add(unsafe.Pointer(newb), dataOffset)
 val = add(insertk, bucketCnt*uintptr(t.keysize))
}

7. Update the literals of tophash and key and release the hashWriting restrictions


//  If non-pointer data (that is, directly assigned data), you also need to request memory and copy 
if t.indirectkey() {
 kmem := newobject(t.key)
 *(*unsafe.Pointer)(insertk) = kmem
 insertk = kmem
}
if t.indirectvalue() {
 vmem := newobject(t.elem)
 *(*unsafe.Pointer)(val) = vmem
}
//  update tophash, k
typedmemmove(t.key, insertk, key)
*inserti = top

done:
if h.flags&hashWriting == 0 {
  throw("concurrent map writes")
 }
 h.flags &^= hashWriting
 if t.indirectvalue() {
  val = *((*unsafe.Pointer)(val))
 }
 return val

At this point, the assignment of map is basically done. Next, learn the expansion of map in step.

The expanded Map

There are two cases where expansion is needed. One is that there are too many kv data stored, which has exceeded the current load of map. The other one is overflow. bucket is too much. This threshold is a fixed value, a result of experience, so we won't go into it.

When conditions are met, capacity expansion will begin. If condition 2 is met and the number of buckets after capacity expansion is the same as the original one, it means that there may be too many pits occupied by empty kv, so map capacity expansion is used for memory consolidation. If the map load is too high because of the large AMOUNT of kv, then the amount should be doubled.


data := mapExample["hello"]
0

To summarize the map expansion operation. First get the size of the expansion, then apply for a large array, and then do some initialization, switch the old buckets and overflow.

Migration of map data

After capacity expansion, data migration is needed. The data migration is not completed once, and the migration corresponding to bucket will only be done when it is used. This is a gradual data migration. Now let's learn.

In step of the data assignment, you will see if the bucket you need to operate is in the old buckets and if it is, move on. The following is the specific operation of relocation:


data := mapExample["hello"]
1

nevacuate indicates the current progress, if all the relocation is completed, it should be the same length as 2^B (B is the B in oldbuckets, after all, the new buckets may be 2^(B+1)).

The implementation of the evacuate method is to transfer the corresponding bucket at this location and the data on its conflict chain to the new buckets.

First to determine whether the current bucket has been transferred. (oldbucket identifies the location corresponding to bucket that needs to be moved)


data := mapExample["hello"]
2

The judgment of transfer can be made directly by tophash, the first hash value in tophash (the role of tophash can be referred to in Lesson 3).


func evacuated(b *bmap) bool {
 h := b.tophash[0]
 //  This interval flag  Both have been moved 
 return h > emptyOne && h < minTopHash
}

If not transferred, then the data will be migrated. When the data is migrated, it may be migrated to buckets of the same size or to buckets of twice the size. Here xy is both a marker marking the target migration location: x identifies the migration to the same location and y identifies the migration to twice the size. Let's first take a look at the determination of the target location:


data := mapExample["hello"]
4

After the location of bucket is determined, the migration should be carried out in accordance with one of kv's articles. (The purpose is to clear kv)


data := mapExample["hello"]
5

For data that IS not used indirectly by key (that is, non-pointer data), do memory recovery


data := mapExample["hello"]
6

If the location of the current relocation of bucket and the overall relocation of bucket is the same, we need to update the overall progress marker of nevacuate


data := mapExample["hello"]
7

conclusion

The difficulty of Map assignment lies in data expansion and data relocation operation. The bucket relocation is gradual, with at least one relocation being done for each assignment. Expansion is not 1 will definitely add space, may also be just done memory consolidation. The tophash logo can be used to determine whether it is empty or not, as well as whether to move and the location of the move is X or Y. The key in delete map has the potential to have a lot of empty kv, resulting in a relocation operation. If you can avoid it, avoid it.

Related articles: