手写hashmap实现原理

tobyforever LV 5 SU @ 2026-1-25 9:22:23

这里是为您准备的《Unordered Map 模拟》题解资料。我将按照从暴力模拟到标准实现，再到竞赛最优优化的顺序，给出三个版本的代码。

版本一：暴力扫描法 (Brute Force)

适用场景：验证逻辑正确性、对拍（Validator）、极小数据范围 ( $N \le 1000$ )。思路：使用一个 std::vector 存储所有键值对。每次操作都遍历整个数组。瓶颈：查询和删除都需要遍历，单次操作 $O(N)$ ，总复杂度 $O(N^2)$ 。在 $N=10^5$ 时会严重超时（TLE）。

/*
 * 版本 1: 暴力扫描 (Brute Force)
 * 时间复杂度: O(N^2) - 必定 TLE
 * 空间复杂度: O(N)
 */
#include <iostream>
#include <vector>
#include <string>

using namespace std;

struct Node {
    int key;
    int value;
};

vector<Node> data_store;

// 查找 Key 在 vector 中的索引，找不到返回 -1
int find_index(int key) {
    for (size_t i = 0; i < data_store.size(); ++i) {
        if (data_store[i].key == key) {
            return i;
        }
    }
    return -1;
}

int main() {
    // 基础 IO 优化
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int N;
    if (!(cin >> N)) return 0;

    string op;
    int x, y;

    while (N--) {
        cin >> op;
        if (op == "put") {
            cin >> x >> y;
            int idx = find_index(x);
            if (idx != -1) {
                data_store[idx].value = y; // 存在则更新
            } else {
                data_store.push_back({x, y}); // 不存在则追加
            }
        } else if (op == "get") {
            cin >> x;
            int idx = find_index(x);
            if (idx != -1) {
                cout << data_store[idx].value << "\n";
            } else {
                cout << "-1\n";
            }
        } else if (op == "remove") {
            cin >> x;
            int idx = find_index(x);
            if (idx != -1) {
                // 小技巧：将待删除元素与最后一个元素交换，然后 pop_back
                // 这样避免了 vector 移动元素的 O(N) 开销，虽然查找本身已经是 O(N)
                swap(data_store[idx], data_store.back());
                data_store.pop_back();
            }
        }
    }
    return 0;
}

版本二：标准拉链法哈希表 (Standard Chaining Hash Map)

适用场景：通用的哈希表实现，面试或标准的 CS 课程作业。思路：

动态数组 buckets：存放链表头指针。
链表 Node：解决哈希冲突。
Rehash：当元素数量超过容量的 0.75 时，容量翻倍，所有旧节点重新计算哈希值挂入新表。特点：使用 C++ 指针和 new/delete，逻辑非常清晰，符合面向对象设计。

/*
 * 版本 2: 动态扩容哈希表 (指针版)
 * 时间复杂度: 平均 O(1)，最坏 O(N)
 * 空间复杂度: O(N)
 * 知识点: 链表操作、动态内存管理、Rehash
 */
#include <iostream>
#include <string>
#include <vector>

using namespace std;

// 链表节点结构
struct Node {
    int key;
    int value;
    Node* next;
    
    Node(int k, int v, Node* n) : key(k), value(v), next(n) {}
};

class MyHashMap {
private:
    Node** buckets;     // 指针数组 (Bucket Array)
    int capacity;       // 当前容量 (桶的数量)
    int size;           // 当前元素个数
    const double LOAD_FACTOR = 0.75;

    // 哈希函数
    int hash_func(int key) {
        // 应对负数 key 的情况 (本题虽说是正数，但习惯要好)
        return (key % capacity + capacity) % capacity;
    }

    // 扩容函数 (Rehash)
    void resize() {
        int old_capacity = capacity;
        int new_capacity = capacity * 2;
        Node** old_buckets = buckets;

        // 1. 初始化新桶数组
        buckets = new Node*[new_capacity];
        for (int i = 0; i < new_capacity; ++i) buckets[i] = nullptr;
        capacity = new_capacity; // 更新容量，供 hash_func 使用

        // 2. 搬运节点 (Rehash)
        for (int i = 0; i < old_capacity; ++i) {
            Node* curr = old_buckets[i];
            while (curr != nullptr) {
                Node* next_node = curr->next; // 先保存下一个，因为 curr->next 即将修改

                // 计算在新表中的位置
                int idx = hash_func(curr->key);
                
                // 头插法插入新表
                curr->next = buckets[idx];
                buckets[idx] = curr;

                curr = next_node;
            }
        }

        // 3. 释放旧桶数组 (注意：节点本身没被删除，只是被移动了)
        delete[] old_buckets;
    }

public:
    MyHashMap() {
        capacity = 16; // 初始容量
        size = 0;
        buckets = new Node*[capacity];
        for (int i = 0; i < capacity; ++i) buckets[i] = nullptr;
    }

    ~MyHashMap() {
        // 析构函数：释放所有内存
        for (int i = 0; i < capacity; ++i) {
            Node* curr = buckets[i];
            while (curr) {
                Node* tmp = curr;
                curr = curr->next;
                delete tmp;
            }
        }
        delete[] buckets;
    }

    void put(int key, int value) {
        int idx = hash_func(key);
        Node* curr = buckets[idx];
        
        // 1. 检查是否存在，存在则更新
        while (curr) {
            if (curr->key == key) {
                curr->value = value;
                return;
            }
            curr = curr->next;
        }

        // 2. 不存在，创建新节点 (头插法)
        buckets[idx] = new Node(key, value, buckets[idx]);
        size++;

        // 3. 检查负载因子，是否需要扩容
        if (size >= capacity * LOAD_FACTOR) {
            resize();
        }
    }

    int get(int key) {
        int idx = hash_func(key);
        Node* curr = buckets[idx];
        while (curr) {
            if (curr->key == key) {
                return curr->value;
            }
            curr = curr->next;
        }
        return -1;
    }

    void remove(int key) {
        int idx = hash_func(key);
        Node* curr = buckets[idx];
        Node* prev = nullptr;

        while (curr) {
            if (curr->key == key) {
                // 找到了，准备删除
                if (prev == nullptr) {
                    // 删除的是头节点
                    buckets[idx] = curr->next;
                } else {
                    // 删除的是中间节点
                    prev->next = curr->next;
                }
                delete curr;
                size--;
                return;
            }
            prev = curr;
            curr = curr->next;
        }
    }
};

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int N;
    if (cin >> N) {
        MyHashMap map;
        string op;
        int x, y;
        while (N--) {
            cin >> op;
            if (op == "put") {
                cin >> x >> y;
                map.put(x, y);
            } else if (op == "get") {
                cin >> x;
                cout << map.get(x) << "\n";
            } else if (op == "remove") {
                cin >> x;
                map.remove(x);
            }
        }
    }
    return 0;
}

版本三：竞赛最优版 (Memory Pool Optimization)

适用场景：NOI/ICPC 金牌选手标准。对时间常数要求极高，防止内存碎片化。 优化点：

内存池 (Memory Pool)：不使用 new Node，而是预先开一个大数组 nodes[]。节点的 next 指针改为 int next (数组下标)。
- 优势：避免了 new 的系统调用开销，缓存命中率极高。
位运算取模：如果保证 capacity 永远是 2 的幂次，key % capacity 可以优化为 key & (capacity - 1)。
Fast IO：在 main 中使用。

/*
 * 版本 3: 静态内存池优化版 (OI Style)
 * 时间复杂度: 均摊 O(1)，常数极小
 * 空间复杂度: O(N)
 * 优化: 避免动态 new/delete，使用数组模拟链表
 */
#include <iostream>
#include <vector>
#include <cstring>

using namespace std;

// 预分配的最大节点数 (对应 N 的最大范围)
const int MAX_NODES = 100005;

struct Node {
    int key;
    int val;
    int next; // 存储的是 nodes 数组的下标，0 表示空指针
} nodes[MAX_NODES];

int idx_alloc = 0; // 内存池分配指针

// 简单的内存分配器
int new_node(int k, int v, int nxt) {
    ++idx_alloc;
    nodes[idx_alloc].key = k;
    nodes[idx_alloc].val = v;
    nodes[idx_alloc].next = nxt;
    return idx_alloc;
}

class FastHashMap {
private:
    vector<int> buckets; // 桶数组，存的是 node 的下标
    int capacity;
    int size;
    
    // 负载因子阈值：size * 4 > capacity * 3  <=> size > capacity * 0.75
    // 使用整数乘法避免浮点误差
    bool check_load_factor() {
        return (long long)size * 4 > (long long)capacity * 3;
    }

public:
    FastHashMap() {
        capacity = 16;
        size = 0;
        buckets.assign(capacity, 0); // 0 表示 nullptr
        idx_alloc = 0; // 重置内存池
    }

    void put(int key, int val) {
        // 位运算优化取模 (前提：capacity 是 2 的幂)
        // int idx = key % capacity; 
        int idx = key & (capacity - 1); 
        
        // 遍历链表查找是否存在
        for (int cur = buckets[idx]; cur != 0; cur = nodes[cur].next) {
            if (nodes[cur].key == key) {
                nodes[cur].val = val; // 更新
                return;
            }
        }

        // 不存在，插入新节点 (头插法)
        // new_node 从静态数组分配，速度极快
        buckets[idx] = new_node(key, val, buckets[idx]);
        size++;

        // 检查扩容
        if (check_load_factor()) {
            resize();
        }
    }

    int get(int key) {
        int idx = key & (capacity - 1);
        for (int cur = buckets[idx]; cur != 0; cur = nodes[cur].next) {
            if (nodes[cur].key == key) {
                return nodes[cur].val;
            }
        }
        return -1;
    }

    void remove(int key) {
        int idx = key & (capacity - 1);
        int cur = buckets[idx];
        int prev = 0;

        while (cur != 0) {
            if (nodes[cur].key == key) {
                if (prev == 0) {
                    // 删除头节点
                    buckets[idx] = nodes[cur].next;
                } else {
                    // 删除中间节点
                    nodes[prev].next = nodes[cur].next;
                }
                // 注意：静态内存池通常不回收内存(Lazy Delete)，因为总操作数有限
                // 仅仅断开链接即可。size 减小。
                size--;
                return;
            }
            prev = cur;
            cur = nodes[cur].next;
        }
    }

    void resize() {
        int old_capacity = capacity;
        int new_capacity = capacity << 1; // * 2
        
        // 创建新的桶数组
        vector<int> new_buckets(new_capacity, 0);
        
        // 遍历旧桶
        for (int i = 0; i < old_capacity; ++i) {
            int cur = buckets[i];
            while (cur != 0) {
                int next_node = nodes[cur].next; // 暂存后续
                
                // 重新计算 Hash (Rehash)
                int new_idx = nodes[cur].key & (new_capacity - 1);
                
                // 头插法搬移到新表
                nodes[cur].next = new_buckets[new_idx];
                new_buckets[new_idx] = cur;
                
                cur = next_node;
            }
        }
        
        // 接管新表
        buckets = move(new_buckets);
        capacity = new_capacity;
    }
};

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int N;
    if (cin >> N) {
        FastHashMap map;
        string op;
        int x, y;
        while (N--) {
            cin >> op;
            if (op[0] == 'p') { // check 'p' of "put"
                cin >> x >> y;
                map.put(x, y);
            } else if (op[0] == 'g') { // check 'g' of "get"
                cin >> x;
                cout << map.get(x) << "\n";
            } else { // "remove"
                cin >> x;
                map.remove(x);
            }
        }
    }
    return 0;
}

👨‍🏫 核心知识点与复杂度分析

1. 为什么需要 Rehash (扩容)？

思考：如果不扩容，随着插入元素越来越多，size 远大于 capacity。根据鸽巢原理，每个桶（Bucket）里的链表会变得非常长。
后果：查找时间从 $O(1)$ 退化为 $O(N/M)$ （ $M$ 为桶数），甚至 $O(N)$ 。
策略：通过扩容（通常翻倍），capacity 变大，链表被拆散重新分布，平均长度再次回到常数级别。

2. 时间复杂度分析

普通操作 (put/get):
- 计算 Hash: $O(1)$
- 定位桶: $O(1)$
- 遍历链表: 平均链表长度 = $Load Factor$ (0.75)。这是一个常数。
- 结论: 平均 $O(1)$ 。
扩容操作:
- 需要遍历所有旧节点（ $N$ 个），重新计算 Hash。单次复杂度 $O(N)$ 。
- 均摊分析 (Amortized Analysis):
  - 第 1 次扩容：搬运 16 个。
  - 第 2 次扩容：搬运 32 个。
  - ...
  - 第 k 次扩容：搬运 $N$ 个。
  - 总搬运次数约为 $N + N/2 + N/4 + ... \approx 2N$ 。
  - 将这 $2N$ 的代价分摊到 $N$ 次插入操作中，每次操作只分摊了 $O(2) = O(1)$ 。
- 结论: 均摊 $O(1)$ 。

3. 空间复杂度分析

需要存储 $N$ 个节点 + $Capacity$ 个桶头指针。
因为 $Capacity \approx N$ ，所以总体为 $O(N)$ 。

4. 金牌选手的优化思路 (在版本 3 中体现)

静态数组 vs 动态指针: new 是一种非常慢的操作（涉及堆内存分配算法）。在竞赛中，预先分配一个大数组 Node pool[MAX_N] 并使用整数下标代替指针，可以显著提升速度（Cache Friendly）。
位运算取模: 计算机做除法（%）比位运算（&）慢得多。如果桶大小保证是 $2^k$ ，x % 2^k 等价于 x & (2^k - 1)。
字符串读取优化: 读取操作指令时，判断 op[0] ('p', 'g', 'r') 比比较整个字符串 op == "put" 更快。

View all 4 solutions

ID: 19472
时间: 1000ms
内存: 128MiB
难度: 10
标签: (无)
递交数: 5
已通过: 2
上传者: tobyforever

4 条题解

版本一：暴力扫描法 (Brute Force)

版本二：标准拉链法哈希表 (Standard Chaining Hash Map)

版本三：竞赛最优版 (Memory Pool Optimization)

👨‍🏫 核心知识点与复杂度分析

1. 为什么需要 Rehash (扩容)？

2. 时间复杂度分析

3. 空间复杂度分析

4. 金牌选手的优化思路 (在版本 3 中体现)

信息

状态

开发

支持

关于

4 条题解

版本一：暴力扫描法 (Brute Force)

版本二：标准拉链法哈希表 (Standard Chaining Hash Map)

版本三：竞赛最优版 (Memory Pool Optimization)

👨‍🏫 核心知识点与复杂度分析

1. 为什么需要 Rehash (扩容)？

2. 时间复杂度分析

3. 空间复杂度分析

4. 金牌选手的优化思路 (在版本 3 中体现)

手写hashmap实现原理

信息

状态

开发

支持

关于

还没有账户？

登录