单词替换

tobyforever LV 5 SU @ 2026-1-9 20:52:53

对于“单词替换”这道题，我们从最直观的暴力匹配开始，逐步过渡到 NOI 竞赛中最常用的 Trie 树最优解。

版本 1：暴力匹配版 (Brute Force)

思路：将字典存储在 vector 中。对于句子中的每个单词，遍历整个字典，检查是否有词根是该单词的前缀。如果有多个匹配，记录最短的一个。

#include <iostream>
#include <vector>
#include <string>
#include <sstream>

using namespace std;

// 检查 root 是否是 word 的前缀
bool isPrefix(const string& root, const string& word) {
    if (root.size() > word.size()) return false;
    return word.compare(0, root.size(), root) == 0;
}

int main() {
    int n;
    if (!(cin >> n)) return 0;
    vector<string> dict(n);
    for (int i = 0; i < n; ++i) cin >> dict[i];

    string line;
    getline(cin >> ws, line); // ws 过滤掉之前的换行符
    stringstream ss(line);
    string word, result = "";

    bool first = true;
    while (ss >> word) {
        string shortestRoot = word;
        for (const string& root : dict) {
            if (isPrefix(root, word)) {
                // 如果找到更短的词根，更新
                if (root.size() < shortestRoot.size()) {
                    shortestRoot = root;
                }
            }
        }
        if (!first) result += " ";
        result += shortestRoot;
        first = false;
    }
    cout << result << endl;
    return 0;
}

时间复杂度分析：
- 假设字典有 $N$ 个词，句子有 $M$ 个单词，单词平均长度为 $L$ 。
- 复杂度为 $O(M \times N \times L)$ 。当 $N$ 和 $M$ 都较大时，会严重超时。
空间复杂度： $O(N \times L)$ ，存储字典。

版本 2：哈希表优化版 (Hash Set)

思路：利用 unordered_set 存储词根。对于句子中的每个单词，从长度 1 开始截取前缀，去哈希表中查询。一旦查到，说明这是最短词根。

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <unordered_set>

using namespace std;

int main() {
    ios::sync_with_stdio(false); cin.tie(0); // 竞赛必备加速
    int n;
    cin >> n;
    unordered_set<string> dict;
    for (int i = 0; i < n; ++i) {
        string root; cin >> root;
        dict.insert(root);
    }

    string line;
    getline(cin >> ws, line);
    stringstream ss(line);
    string word;
    bool first = true;

    while (ss >> word) {
        string replace = word;
        // 尝试从长度 1 到 word.length() 截取前缀
        for (int i = 1; i <= word.length(); ++i) {
            string sub = word.substr(0, i);
            if (dict.count(sub)) {
                replace = sub;
                break; // 找到最短的，立即跳出
            }
        }
        if (!first) cout << " ";
        cout << replace;
        first = false;
    }
    cout << endl;
    return 0;
}

复杂度分析：
- 时间： $O(\sum L_{root} + M \times L^2)$ 。substr 和哈希计算都需要 $O(L)$ ，循环 $L$ 次，所以是 $O(L^2)$ 。
- 空间： $O(\sum L_{root})$ 。
优化建议：虽然比版本 1 快，但频繁构造 substr 依然有常数开销。

版本 3：Trie 树最优版 (NOIP 竞赛风格)

核心思想：利用 Trie 树的结构。在搜索单词时，只要碰到第一个 is_end 标记，就说明找到了最短词根。

#include <iostream>
#include <string>
#include <vector>
#include <sstream>

using namespace std;

// 竞赛常数定义：根据规模设定
// 总词根长度约 1000 * 100 = 10^5
const int MAXN = 100005;

int trie[MAXN][26];
bool is_end[MAXN];
int tot = 0;

// 标准插入函数
void insert(const string& s) {
    int p = 0;
    for (char c : s) {
        int id = c - 'a';
        if (!trie[p][id]) trie[p][id] = ++tot;
        p = trie[p][id];
    }
    is_end[p] = true;
}

// 核心逻辑：寻找最短匹配前缀
string findShortestRoot(const string& word) {
    int p = 0;
    string res = "";
    for (char c : word) {
        int id = c - 'a';
        // 如果路径断了，说明没有匹配的词根
        if (!trie[p][id]) return word;
        
        p = trie[p][id];
        res += c;
        // 关键点：一旦遇到标记，说明找到了最短词根
        if (is_end[p]) return res;
    }
    return word;
}

int main() {
    // 1. 加速 I/O
    ios::sync_with_stdio(false);
    cin.tie(0);

    int n;
    if (!(cin >> n)) return 0;
    for (int i = 0; i < n; ++i) {
        string root;
        cin >> root;
        insert(root);
    }

    // 2. 处理带空格的整行句子
    string line;
    getline(cin >> ws, line); 
    
    // 3. 单词拆分处理
    stringstream ss(line);
    string word;
    bool first = true;
    while (ss >> word) {
        if (!first) cout << " ";
        // 4. 调用 Trie 树查询
        cout << findShortestRoot(word);
        first = false;
    }
    cout << endl;

    return 0;
}

复杂度分析思考过程：

时间复杂度分析：
- 构建树： $O(\sum L_{root})$ ，每个字符只访问一次。
- 单词查询：对于句子中的每个字符，在 Trie 树上最多向下走一步。总时间复杂度为 $O(\text{句子总长度})$ 。
- 结论：这是最优复杂度 $O(S)$ ，其中 $S$ 是所有输入字符的总和。
空间复杂度分析：
- $O(\text{节点总数} \times 26)$ 。在本题规模下（ $10^5$ 节点），占用内存约为 $10^5 \times 26 \times 4$ 字节 $\approx 10$ MB，远低于常规竞赛 128MB 的限制。
易错点（教练提示）：
- 换行符污染：在 cin >> n 之后紧跟 getline 会读到一个残留的空行。使用 getline(cin >> ws, line) 可以完美跳过前导空格和换行。
- 最短原则：在 findShortestRoot 中，if (is_end[p]) 必须在循环内优先判断并立即 return。如果走到底才判断，就变成了“最长匹配”。
- 数组范围：MAXN 必须开够，否则会 Segmentation Fault。在 NOI 竞赛中，通常根据“字符串总长度”来估算节点数。

时间复杂度优化建议：

如果句子长度达到 $10^6$ 且对效率要求极高，可以不使用 stringstream 和 string +=，而是直接对 line 字符串进行单指针扫描，遇到空格直接输出并重置 Trie 指针，这样可以实现“零拷贝”处理，速度最快。

View all 2 solutions

ID: 19470
时间: 1000ms
内存: 128MiB
难度: 10
标签: (无)
递交数: 3
已通过: 1
上传者: tobyforever

2 条题解

版本 1：暴力匹配版 (Brute Force)

版本 2：哈希表优化版 (Hash Set)

版本 3：Trie 树最优版 (NOIP 竞赛风格)

复杂度分析思考过程：

时间复杂度优化建议：

信息

状态

开发

支持

关于

2 条题解

版本 1：暴力匹配版 (Brute Force)

版本 2：哈希表优化版 (Hash Set)

版本 3：Trie 树最优版 (NOIP 竞赛风格)

复杂度分析思考过程：

时间复杂度优化建议：

单词替换

信息

状态

开发

支持

关于

还没有账户？

登录