实现前缀树Trie II

0
tobyforever LV 5 SU @ 2026-1-9 18:04:27
之前解法2的代码不完整

好的，作为金牌教练，我非常理解那种“代码不全、心里没底”的感觉。解法 2（哈希表版本） 在 OI 竞赛中通常被称为“空间换时间的暴力逻辑”。虽然它在处理长字符串时不如 Trie 树高效，但在字符串长度较短、操作次数不多时，它是最容易快速实现且不易出错的方案。

以下是 解法 2：双哈希表维护版 的完整代码及深度解析。

解法 2：双哈希表维护版 (C++14 标准)

这种解法的核心思路是：通过牺牲空间来模拟前缀匹配逻辑。我们不再构建树形结构，而是直接把每个单词的所有可能前缀都存进哈希表里。
```
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>

using namespace std;

/**
 * 解法 2 核心思想：
 * 1. 使用 word_count 记录完整单词出现的次数。
 * 2. 使用 prefix_count 记录所有前缀出现的次数。
 * 
 * 易错点：erase 操作时，必须逐个生成前缀并递减计数器。
 */

// 存储完整单词 -> 出现次数
unordered_map<string, int> word_count;
// 存储所有可能的前缀 -> 出现次数
unordered_map<string, int> prefix_count;

void insert(const string& word) {
    // 1. 完整单词计数增加
    word_count[word]++;
    
    // 2. 生成所有前缀并计数增加
    // 例如 word = "abc", 前缀有 "a", "ab", "abc"
    string cur = "";
    for (char c : word) {
        cur += c;
        prefix_count[cur]++;
    }
}

void erase(const string& word) {
    // 题目保证 word 一定存在，所以直接减
    word_count[word]--;
    
    string cur = "";
    for (char c : word) {
        cur += c;
        prefix_count[cur]--;
    }
}

int countWordsEqualTo(const string& word) {
    // 如果找不到，map 会返回 0
    if (word_count.find(word) != word_count.end()) {
        return word_count[word];
    }
    return 0;
}

int countWordsStartingWith(const string& prefix) {
    if (prefix_count.find(prefix) != prefix_count.end()) {
        return prefix_count[prefix];
    }
    return 0;
}

int main() {
    // 优化：在大量字符串 I/O 时，必须关闭同步流，否则会超时
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int Q;
    if (!(cin >> Q)) return 0;

    string op, arg;
    while (Q--) {
        cin >> op;
        if (op == "Trie") {
            cout << "null\n";
        } else {
            cin >> arg;
            if (op == "insert") {
                insert(arg);
                cout << "null\n";
            } else if (op == "erase") {
                erase(arg);
                cout << "null\n";
            } else if (op == "countWordsEqualTo") {
                cout << countWordsEqualTo(arg) << "\n";
            } else if (op == "countWordsStartingWith") {
                cout << countWordsStartingWith(arg) << "\n";
            }
        }
    }
    return 0;
}
```
复杂度分析与思考过程

在草稿纸上，我们可以对比这个解法和 Trie 树的区别：

1. 时间复杂度： $O(Q \times L^2)$
- 插入/删除：对于长度为 $L$ 的字符串，我们需要生成 $L$ 个前缀。每次生成前缀字符串的操作（cur += c 或 substr）本身也是 $O(L)$ 的。因此，单次插入的时间是 $O(1 + 2 + ... + L) = O(L^2)$ 。
- 查询：查询完整单词是 $O(L)$ （哈希计算耗时），查询前缀也是 $O(L)$ 。
- 对比建议：如果题目中字符串平均长度为 10，这个解法非常快。如果长度为 1000，这个解法会超时 (TLE)。
2. 空间复杂度： $O(N \times L^2)$
- 致命弱点：这是该解法最容易报错（MLE）的地方。
- 图形推演：插入一个单词 "apple"：
  
  word_count 存了 "apple" (1个词)
  
  prefix_count 存了 "a", "ap", "app", "appl", "apple" (5个词)
- 如果单词长度很大，哈希表里会充斥着海量的冗余字符串碎片。
启发式教学：为什么 Trie 树（解法 3）是它的进阶版？

请你在草稿纸上画出两个单词 apple 和 apply 的存储方式：
1. 哈希表版（解法 2）：
  
  你存了两个 a, 两个 ap, 两个 app, 两个 appl...
  
  你的内存里有很多重复的字母。
2. Trie 树版（解法 3）：
  
  你只存了一个节点 a，节点 a 往下连 p，p 再连 p...
  
  Trie 树本质上就是哈希表版的“空间合并优化”。
读题关键词与选型建议
- 关键词 1：字符串长度
  
  如果 $L \le 20$ ，放心使用解法 2（哈希表），代码写得快，不容易错。
  
  如果 $L \ge 100$ ，必须使用解法 3（Trie 树）。
- 关键词 2：内存限制
  
  如果题目内存限制是 32MB 或 64MB，解法 2 极大概率会报错（MLE）。
  
  竞赛中标准做法是：看到“前缀”二字，无脑上 Trie 树（解法 3）。
总结

解法 2 是理解解法 3 的阶梯。它通过“把前缀当做独立单词”来解决问题，虽然简单粗暴，但在数据规模较小时非常有效。掌握了这个逻辑，你就能明白为什么 Trie 树要在路径节点上放一个 pass_cnt 计数器——那个计数器其实就是哈希表里对应的 prefix_count！

tobyforever LV 5 SU @ 2026-1-9 17:57:06

非常抱歉！刚才的代码由于生成时的排版错误导致了逻辑交织和乱码。作为教练，我重新为你整理了一份干净、严谨且经过逻辑校验的完整数据生成器。

这份代码采用了 $O(1)$ 复杂度的单词池维护（使用 swap + pop_back），确保在生成 5 万次操作时依然飞快，且完全符合 NOI 竞赛的 1.in/1.out 格式要求。

完整数据生成器 (C++14)

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <random>
#include <algorithm>
#include <cstring>

using namespace std;

/**
 * 竞赛标准 Trie II 实现 (用于同步生成 .out 结果)
 * MAX_NODES 设置为 1,500,000 保证即使在最极端情况下也不会越界
 */
const int MAX_NODES = 1500005;
int trie[MAX_NODES][26];
int pass_cnt[MAX_NODES];
int end_cnt[MAX_NODES];
int tot = 0;

void reset_trie() {
    // 仅重置已经使用过的节点空间，大幅提高多组数据生成效率
    for (int i = 0; i <= tot; ++i) {
        for (int j = 0; j < 26; ++j) trie[i][j] = 0;
        pass_cnt[i] = end_cnt[i] = 0;
    }
    tot = 0;
}

void internal_insert(const string& s) {
    int p = 0;
    for (char c : s) {
        int id = c - 'a';
        if (!trie[p][id]) trie[p][id] = ++tot;
        p = trie[p][id];
        pass_cnt[p]++;
    }
    end_cnt[p]++;
}

void internal_erase(const string& s) {
    int p = 0;
    for (char c : s) {
        int id = c - 'a';
        p = trie[p][id];
        pass_cnt[p]--;
    }
    end_cnt[p]--;
}

int internal_countEqual(const string& s) {
    int p = 0;
    for (char c : s) {
        int id = c - 'a';
        if (!trie[p][id]) return 0;
        p = trie[p][id];
    }
    return end_cnt[p];
}

int internal_countStart(const string& s) {
    int p = 0;
    for (char c : s) {
        int id = c - 'a';
        if (!trie[p][id]) return 0;
        p = trie[p][id];
    }
    return pass_cnt[p];
}

// --- 随机引擎 ---
mt19937 rng(20260110); 

string gen_string(int len, int alpha) {
    string s = "";
    for (int i = 0; i < len; ++i) {
        s += (char)('a' + (rng() % alpha));
    }
    return s;
}

/**
 * t_idx: 测试点编号
 * Q: 总操作数
 * max_l: 单词最大长度
 * alpha: 字符集大小 (26代表a-z, 2代表a-b)
 * erase_bias: 执行删除操作的权重
 */
void make_test_case(int t_idx, int Q, int max_l, int alpha, int erase_bias) {
    string in_name = to_string(t_idx) + ".in";
    string out_name = to_string(t_idx) + ".out";
    ofstream fin(in_name);
    ofstream fout(out_name);

    reset_trie();
    vector<string> pool; // 当前存在于树中的单词池

    fin << Q << "\n" << "Trie\n";
    fout << "null\n";

    for (int i = 1; i < Q; ++i) {
        int type;
        if (pool.empty()) type = 0; // 树空时必须插入
        else {
            int r = rng() % 100;
            if (r < 30) type = 0;               // 30% 插入
            else if (r < 30 + erase_bias) type = 1; // 删除权重
            else if (r < 80) type = 2;          // 查询全匹配
            else type = 3;                      // 查询前缀
        }

        if (type == 0) { // insert
            string s = gen_string(1 + rng() % max_l, alpha);
            internal_insert(s);
            pool.push_back(s);
            fin << "insert " << s << "\n";
            fout << "null\n";
        } 
        else if (type == 1) { // erase
            // 从池子中随机选一个删除
            int idx = rng() % pool.size();
            string s = pool[idx];
            internal_erase(s);
            // O(1) 删除技巧：将待删元素与末尾交换后弹出
            swap(pool[idx], pool.back());
            pool.pop_back();
            fin << "erase " << s << "\n";
            fout << "null\n";
        } 
        else if (type == 2) { // countWordsEqualTo
            string s;
            if (rng() % 10 < 7) s = pool[rng() % pool.size()]; // 70% 概率查存在的
            else s = gen_string(1 + rng() % max_l, alpha);
            fin << "countWordsEqualTo " << s << "\n";
            fout << internal_countEqual(s) << "\n";
        } 
        else { // countWordsStartingWith
            string s;
            if (rng() % 10 < 7) { // 70% 概率查存在的前缀
                string base = pool[rng() % pool.size()];
                s = base.substr(0, 1 + rng() % base.length());
            } else {
                s = gen_string(1 + rng() % max_l, alpha);
            }
            fin << "countWordsStartingWith " << s << "\n";
            fout << internal_countStart(s) << "\n";
        }
    }
    fin.close(); fout.close();
    cout << "Testcase " << t_idx << " [Q=" << Q << "] generated." << endl;
}

int main() {
    // 1-2: 小规模基础逻辑测试
    make_test_case(1, 20, 5, 3, 10);
    make_test_case(2, 200, 10, 26, 20);

    // 3-4: 链状与深树测试 (测试 pass_cnt 深度)
    make_test_case(3, 1000, 100, 1, 10);   // 全 'a' 链
    make_test_case(4, 5000, 50, 3, 15);    // 小字符集深树

    // 5-6: 高频删除测试 (测试计数器能否正确归零)
    make_test_case(5, 10000, 15, 26, 45); 
    make_test_case(6, 20000, 20, 10, 40);

    // 7-8: 混合大规模
    make_test_case(7, 30000, 30, 26, 25);
    make_test_case(8, 40000, 40, 26, 20);

    // 9-10: 极限规模 (Q=50000, 字符集两极分化)
    make_test_case(9, 50000, 20, 26, 25); 
    make_test_case(10, 50000, 10, 2, 30);  // 极高碰撞率

    return 0;
}

修正说明：

修复了乱码：移除了之前由于拼写错误和生成异常导致的混杂逻辑。
$O(1)$ 删除：在维护 pool 时，如果直接使用 pool.erase(pool.begin() + idx)，在 5 万次操作下复杂度会退化到 $O(Q^2)$ 。现在使用了 swap 到末尾再 pop_back 的技巧，确保生成器可以在 1 秒内跑完所有测试点。
命中率优化：查询操作有 70% 的概率会选中一个“已知存在”的单词或其前缀，这样生成的 .out 文件里会有大量的非零整数，能有效检测出选手代码中 pass_cnt 或 end_cnt 的细微逻辑错误。
空间安全：MAX_NODES 设定为 150 万，配合 reset_trie 仅重置已用节点的策略，既保证了不越界，又保证了运行速度。

现在你可以直接编译并运行这段代码，它将稳定生成 1-10 号的所有输入输出文件。

tobyforever LV 5 SU @ 2026-1-9 17:44:11

针对 Implement Trie II（带权/可删前缀树），我们需要在基础 Trie 的基础上增加计数功能。在 NOI 竞赛中，这种“带计数器的字典树”是处理动态字符串集合最通用的工具。

版本 1：暴力存储版 (TLE)

思路：使用 std::vector 存储所有插入的单词。删除时利用 find 找到第一个匹配项并移除。

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>

using namespace std;

vector<string> pool;

void insert(string word) {
    pool.push_back(word);
}

int countWordsEqualTo(string word) {
    int cnt = 0;
    for (const string& s : pool) if (s == word) cnt++;
    return cnt;
}

int countWordsStartingWith(string prefix) {
    int cnt = 0;
    for (const string& s : pool) {
        if (s.size() >= prefix.size() && s.substr(0, prefix.size()) == prefix) cnt++;
    }
    return cnt;
}

void erase(string word) {
    // 题目保证 word 一定存在
    auto it = find(pool.begin(), pool.end(), word);
    if (it != pool.end()) pool.erase(it);
}

int main() {
    int Q; cin >> Q;
    while (Q--) {
        string op, arg; cin >> op;
        if (op == "Trie") cout << "null" << endl;
        else {
            if (op == "insert") { cin >> arg; insert(arg); cout << "null" << endl; }
            else if (op == "erase") { cin >> arg; erase(arg); cout << "null" << endl; }
            else if (op == "countWordsEqualTo") { cin >> arg; cout << countWordsEqualTo(arg) << endl; }
            else if (op == "countWordsStartingWith") { cin >> arg; cout << countWordsStartingWith(arg) << endl; }
        }
    }
    return 0;
}

复杂度分析：
- 时间：insert $O(1)$ ，但 erase/count 均为 $O(N \times L)$ 。总复杂度 $O(Q^2 \times L)$ ，对于 $5 \times 10^4$ 的数据量会严重超时。
- 空间： $O(N \times L)$ 。

版本 2：哈希表维护版 (MLE/TLE Risk)

思路：用两个 map<string, int> 分别记录单词全匹配次数和前缀出现次数。

#include <iostream>
#include <string>
#include <unordered_map>

using namespace std;

unordered_map<string, int> word_cnt;
unordered_map<string, int> prefix_cnt;

void insert(string word) {
    word_cnt[word]++;
    string cur = "";
    for (char c : word) {
        cur += c;
        prefix_cnt[cur]++; // 存储所有可能的前缀
    }
}

void erase(string word) {
    word_cnt[word]--;
    string cur = "";
    for (char c : word) {
        cur += c;
        prefix_cnt[cur]--;
    }
}

// 查询操作直接 O(1) 或 O(L) 从 map 中取值

复杂度思考：
- 时间：插入/删除需要 $O(L^2)$ 产生所有前缀。
- 空间： $O(N \times L^2)$ 。若单词长度为 100，空间消耗会是原单词长度的 100 倍，极易触发 MLE（内存超限）。

版本 3：标准计数 Trie 树 (最优解 - NOI 风格)

思路：在每个节点维护两个计数器。

pass[p]：记录有多少个单词经过了节点 p。
end[p]：记录有多少个单词以节点 p 为结尾。

#include <cstdio>
#include <cstring>
#include <algorithm>

using namespace std;

// 关键改进：扩大数组规模以适应极限测试数据
// 10^6 个节点大约占用 104MB 内存
const int MAXN = 1000005;

int trie[MAXN][26];
int pass_cnt[MAXN]; // 记录经过该节点的单词数
int end_cnt[MAXN];  // 记录以此节点结尾的单词数
int tot = 0;        // 当前节点总数

// 插入操作
void insert(char *s) {
    int p = 0;
    for (int i = 0; s[i]; i++) {
        int id = s[i] - 'a';
        if (!trie[p][id]) {
            if (tot + 1 < MAXN) trie[p][id] = ++tot;
        }
        p = trie[p][id];
        pass_cnt[p]++; // 路径上每个节点的 pass 计数加 1
    }
    end_cnt[p]++; // 单词末尾节点的 end 计数加 1
}

// 删除操作 (题目保证 word 一定存在)
void erase(char *s) {
    int p = 0;
    for (int i = 0; s[i]; i++) {
        int id = s[i] - 'a';
        p = trie[p][id];
        pass_cnt[p]--; // 路径计数逆向减少
    }
    end_cnt[p]--; // 末尾计数减少
}

// 查询相等单词数
int countWordsEqualTo(char *s) {
    int p = 0;
    for (int i = 0; s[i]; i++) {
        int id = s[i] - 'a';
        if (!trie[p][id]) return 0;
        p = trie[p][id];
    }
    return end_cnt[p];
}

// 查询以此为前缀的单词数
int countWordsStartingWith(char *s) {
    int p = 0;
    for (int i = 0; s[i]; i++) {
        int id = s[i] - 'a';
        if (!trie[p][id]) return 0;
        p = trie[p][id];
    }
    return pass_cnt[p];
}

char op[30], str[2005]; 

int main() {
    int Q;
    if (scanf("%d", &Q) != 1) return 0;
    while (Q--) {
        if (scanf("%s", op) != 1) break;
        if (strcmp(op, "Trie") == 0) {
            printf("null\n");
        } else {
            scanf("%s", str);
            if (strcmp(op, "insert") == 0) {
                insert(str);
                printf("null\n");
            } else if (strcmp(op, "erase") == 0) {
                erase(str);
                printf("null\n");
            } else if (strcmp(op, "countWordsEqualTo") == 0) {
                printf("%d\n", countWordsEqualTo(str));
            } else if (strcmp(op, "countWordsStartingWith") == 0) {
                printf("%d\n", countWordsStartingWith(str));
            }
        }
    }
    return 0;
}

1. 时间复杂度分析思考

insert, erase, count：全部为 $O(L)$ 。
每一个字符只被扫描一次。由于采用了静态数组，没有 new/delete 和哈希冲突的开销。
总复杂度： $O(\sum L)$ 。

2. 空间复杂度分析思考

空间： $O(S \times \Sigma)$ ，其中 $S$ 是总字符长度， $26$ 是字母表大小。
关键点：pass_cnt 和 end_cnt 必须和 trie 数组的第一维大小完全一致，确保每一个节点都有对应的计数器。

3. 性能优化建议与易错点

关于 erase 的空间释放：
- 在竞赛中，erase 通常只需减少计数器即可。即便 pass_cnt[p] 降为 0，我们也不必物理删除节点。原因：物理删除节点涉及复杂的指针操作，且数组实现下回收空间非常麻烦。逻辑删除已足够。
pass_cnt 的位置：
- 注意：根节点 p=0 的 pass_cnt 通常不维护（或者定义为所有单词总数）。我们更新的是“当前字符代表的节点”的 pass_cnt。
读题关键词：
- “Count words equal to” $\to$ 找 end_cnt。
- “Count words starting with” $\to$ 找 pass_cnt。
- “Guarantee exists” $\to$ 意味着你不需要写复杂的判断去防范“删掉不存在的单词”导致的 cnt 负数问题。

总结

在处理带权字符串问题时，节点计数器 是最核心的武器。掌握了 pass_cnt 和 end_cnt 的维护，你就能处理 90% 以上的 Trie 变体题。

ID: 19469
时间: 1000ms
内存: 128MiB
难度: 10
标签: (无)
递交数: 5
已通过: 1
上传者: tobyforever

3 条题解

之前解法2的代码不完整

解法 2：双哈希表维护版 (C++14 标准)

复杂度分析与思考过程

1. 时间复杂度： $O(Q \times L^2)$

2. 空间复杂度： $O(N \times L^2)$

启发式教学：为什么 Trie 树（解法 3）是它的进阶版？

读题关键词与选型建议

总结

完整数据生成器 (C++14)

修正说明：

版本 1：暴力存储版 (TLE)

版本 2：哈希表维护版 (MLE/TLE Risk)

版本 3：标准计数 Trie 树 (最优解 - NOI 风格)

1. 时间复杂度分析思考

2. 空间复杂度分析思考

3. 性能优化建议与易错点

总结

信息

状态

开发

支持

关于

3 条题解

之前解法2的代码不完整

解法 2：双哈希表维护版 (C++14 标准)

复杂度分析与思考过程

1. 时间复杂度：O(Q×L2)O(Q \times L^2)O(Q×L2)

2. 空间复杂度：O(N×L2)O(N \times L^2)O(N×L2)

启发式教学：为什么 Trie 树（解法 3）是它的进阶版？

读题关键词与选型建议

总结

完整数据生成器 (C++14)

修正说明：

版本 1：暴力存储版 (TLE)

版本 2：哈希表维护版 (MLE/TLE Risk)

版本 3：标准计数 Trie 树 (最优解 - NOI 风格)

1. 时间复杂度分析思考

2. 空间复杂度分析思考

3. 性能优化建议与易错点

总结

实现前缀树Trie II

信息

状态

开发

支持

关于

还没有账户？

登录

1. 时间复杂度： $O(Q \times L^2)$

2. 空间复杂度： $O(N \times L^2)$