UnityDictionary解析

Dictionary解析

说起Dictionary,先要了解Hash算法,可以看看

[TOC]

源码

源码地址:

URL: https://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs

Dictionary泛型实现

头部实现

1
2
3
4
5
6
private struct Entry {
public int hashCode; // Lower 31 bits of hash code, -1 if unused
public int next; // Index of next entry, -1 if last
public TKey key; // Key of entry
public TValue value; // Value of entry
}

Entry结构体

保存了HashCode,下一个index,key,value

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
private int[] buckets;
private Entry[] entries;
private int count;
private int version;
private int freeList;
private int freeCount;
private IEqualityComparer<TKey> comparer;
private KeyCollection keys;
private ValueCollection values;
private Object _syncRoot;

// constants for serialization
private const String VersionName = "Version";
private const String HashSizeName = "HashSize"; // Must save buckets.Length
private const String KeyValuePairsName = "KeyValuePairs";
private const String ComparerName = "Comparer";

Dictionary头部代码

buckets Hash桶

entries 数组,存放元素

count 当前entries的index位置

freeList 被删除Entry在entries中的下标index,这个位置是空闲

freeCount 有多少个被删除的Entry,有多少个空闲的位置

keys 存放Key的集合

values 存放数据的集合

buckets和entries是实现的关键,负责处理的是Hash桶和数据的关系

DIctionary跟List一样,一开始的数组大小是4,往后是翻倍

Insert的实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
        private void Insert(TKey key, TValue value, bool add) {

if( key == null ) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}

if (buckets == null) Initialize(0);
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int targetBucket = hashCode % buckets.Length;

#if FEATURE_RANDOMIZED_STRING_HASHING
int collisionCount = 0;
#endif

for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {
if (add) {
ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate);
}
entries[i].value = value;
version++;
return;
}

#if FEATURE_RANDOMIZED_STRING_HASHING
collisionCount++;
#endif
}
int index;
if (freeCount > 0) {
index = freeList;
freeList = entries[index].next;
freeCount--;
}
else {
if (count == entries.Length)
{
Resize();
targetBucket = hashCode % buckets.Length;
}
index = count;
count++;
}

entries[index].hashCode = hashCode;
entries[index].next = buckets[targetBucket];
entries[index].key = key;
entries[index].value = value;
buckets[targetBucket] = index;
version++;

#if FEATURE_RANDOMIZED_STRING_HASHING

#if FEATURE_CORECLR
// In case we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing
// in this case will be EqualityComparer<string>.Default.
// Note, randomized string hashing is turned on by default on coreclr so EqualityComparer<string>.Default will
// be using randomized string hashing

if (collisionCount > HashHelpers.HashCollisionThreshold && comparer == NonRandomizedStringEqualityComparer.Default)
{
comparer = (IEqualityComparer<TKey>) EqualityComparer<string>.Default;
Resize(entries.Length, true);
}
#else
if(collisionCount > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(comparer))
{
comparer = (IEqualityComparer<TKey>) HashHelpers.GetRandomizedEqualityComparer(comparer);
Resize(entries.Length, true);
}
#endif // FEATURE_CORECLR

#endif

}

代码较长,逻辑也比较多

按照开始插入一个值开始算起

1
2
3
4
5
6
7
8
9
10
11
然后我们假设需要执行一个Add操作,dictionary.Add("a","b"),其中key = "a",value = "b"。

1、根据key的值,计算出它的hashCode。我们假设”a”的hash值为6(GetHashCode("a") = 6)。

2、通过对hashCode取余运算,计算出该hashCode落在哪一个buckets桶中。现在桶的长度(buckets.Length)为4,那么就是6 % 4最后落在index为2的桶中,也就是buckets[2]。

3、避开一种其它情况不谈,接下来它会将hashCode、key、value等信息存入entries[count]中,因为count位置是空闲的;继续count++指向下一个空闲位置。上图中第一个位置,index=0就是空闲的,所以就存放在entries[0]的位置。

4、将Entry的下标entryIndex赋值给buckets中对应下标的bucket。步骤3中是存放在entries[0]的位置,所以buckets[2]=0。

5、最后version++,集合发生了变化,所以版本需要+1。只有增加、替换和删除元素才会更新版本

我们继续执行一个Add操作,dictionary.Add("c","d"),假设GetHashCode(“c”)=6,最后6 % 4 = 2。最后桶的index也是2,按照之前的步骤1~3是没有问题的。

如果继续执行步骤4那么buckets[2] = 1,然后原来的buckets[2]=>entries[0]的关系就会丢失,这是我们不愿意看到的。现在Entry中的next就发挥大作用了。

1
2
3
4
5
6
7
如果对应的buckets[index]有其它元素已经存在,那么会执行以下两条语句,让新的entry.next指向之前的元素,让buckets[index]指向现在的新的元素,就构成了一个单链表。

entries[index].next = buckets[targetBucket];
...
buckets[targetBucket] = index;

实际上步骤4也就是做一个这样的操作,并不会去判断是不是有其它元素,因为buckets中桶初始值就是-1,不会造成问题。

查找操作

代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
private int FindEntry(TKey key) {
if( key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}

if (buckets != null) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
}
}
return -1;
}

// This is a convenience method for the internal callers that were converted from using Hashtable.
// Many were combining key doesn't exist and key exists but null value (for non-value types) checks.
// This allows them to continue getting that behavior with minimal code delta. This is basically
// TryGetValue without the out param
internal TValue GetValueOrDefault(TKey key) {
int i = FindEntry(key);
if (i >= 0) {
return entries[i].value;
}
return default(TValue);
}

假设我们现在执行这样一条语句dictionary.GetValueOrDefault("a"),会执行以下步骤.

  1. 获取key的hashCode,计算出所在的桶位置。我们之前提到,”a”的hashCode=6,所以最后计算出来targetBucket=2
  2. 通过buckets[2]=1找到entries[1],比较key的值是否相等,相等就返回entryIndex,不想等就继续entries[next]查找,直到找到key相等元素或者next == -1的时候。这里我们找到了key == "a"的元素,返回entryIndex=0
  3. 如果entryIndex >= 0那么返回对应的entries[entryIndex]元素,否则返回default(TValue)。这里我们直接返回entries[0].value

删除操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
public bool Remove(TKey key) {
if(key == null) {
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}

if (buckets != null) {
int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
int bucket = hashCode % buckets.Length;
int last = -1;
// 遍历bucket对应的单链表
for (int i = buckets[bucket]; i >= 0; last = i, i = entries[i].next) {
if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {
// 找到元素后,如果last< 0,代表当前是bucket中最后一个元素,那么直接让bucket内下标赋值为 entries[i].next即可
if (last < 0) {
buckets[bucket] = entries[i].next;
}
// last不小于0,代表当前元素处于bucket单链表中间位置,需要将该元素的头结点和尾节点相连起来,防止链表中断
else {
entries[last].next = entries[i].next;
}
// 清空HashCode
entries[i].hashCode = -1;
// 建立freeList单链表
entries[i].next = freeList;
entries[i].key = default(TKey);
entries[i].value = default(TValue);
// 关键的代码,freeList等于当前的entry位置,下一次Add元素会优先Add到该位置
freeList = i;
freeCount++;
version++;
return true;
}
}
}
return false;
}

resize 扩容操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
private void Resize(int newSize, bool forceNewHashCodes) {
Contract.Assert(newSize >= entries.Length);
int[] newBuckets = new int[newSize];
for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;
Entry[] newEntries = new Entry[newSize];
Array.Copy(entries, 0, newEntries, 0, count);
if(forceNewHashCodes) {
for (int i = 0; i < count; i++) {
if(newEntries[i].hashCode != -1) {
newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF);
}
}
}
for (int i = 0; i < count; i++) {
if (newEntries[i].hashCode >= 0) {
int bucket = newEntries[i].hashCode % newSize;
newEntries[i].next = newBuckets[bucket];
newBuckets[bucket] = i;
}
}
buckets = newBuckets;
entries = newEntries;
}

总结出来就是开辟新空间,然后把原来的数据放进去,重新计算HashCode

版本号作用

每次新增,删除修改都会使这个version值增加1

遍历过程不允许Remove(foreach)

C#List解析

List解

C#List内部实现

首先用过的都知道,List支持下标操作,泛型实现

`

1
2
3
4
5
6
7
8
9
10
11
public class List<T> : IList<T>, System.Collections.IList, IReadOnlyList<T>
{
private const int _defaultCapacity = 4;
private T[] _items;
[ContractPublicPropertyName("Count")]
private int _size;
private int _version;
[NonSerialized]
private Object _syncRoot;

static readonly T[] _emptyArray = new T[0];

`

泛型数组

_items实际存放的数据(数组实现)

_size 当前的数组大小

_version 版本号,增删查改后这个值都会增加

`

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public T this[int index] {
get {
// Following trick can reduce the range check by one
if ((uint) index >= (uint)_size) {
ThrowHelper.ThrowArgumentOutOfRangeException();
}
Contract.EndContractBlock();
return _items[index];
}

set {
if ((uint) index >= (uint)_size) {
ThrowHelper.ThrowArgumentOutOfRangeException();
}
Contract.EndContractBlock();
_items[index] = value;
_version++;
}
}

`

下标查找实现

`

1
2
3
4
5
public void Add(T item) {
if (_size == _items.Length) EnsureCapacity(_size + 1);
_items[_size++] = item;
_version++;
}

`

插入的时候先检查是否到达了最大数组长度,是的话扩容

`

1
2
3
4
5
6
7
8
9
10
private void EnsureCapacity(int min) {
if (_items.Length < min) {
int newCapacity = _items.Length == 0? _defaultCapacity : _items.Length * 2;
// Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
// Note that this check works even when _items.Length overflowed thanks to the (uint) cast
if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
if (newCapacity < min) newCapacity = min;
Capacity = newCapacity;
}
}

`

扩容是双倍增加

`

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public bool Remove(T item) {
int index = IndexOf(item);
if (index >= 0) {
RemoveAt(index);
return true;
}

return false;
}

public void RemoveAt(int index) {
if ((uint)index >= (uint)_size) {
ThrowHelper.ThrowArgumentOutOfRangeException();
}
Contract.EndContractBlock();
_size--;
if (index < _size) {
Array.Copy(_items, index + 1, _items, index, _size - index);
}
_items[_size] = default(T);
_version++;
}

`

删除代码,删除是直接删除当前的,然后后面的往前顶替,需要考虑的是下标变换和频繁copy消耗

请我喝杯咖啡吧~

支付宝
微信