石油分馏模拟系统

tobyforever LV 5 SU @ 2025-12-12 17:21:56

一、思路提示

数据的有序化：
- 题目要求在一个范围内查找。如果数据是乱序的，每次查询都要遍历 $N$ 个元素，总复杂度是 $O(N \times M)$ 。
- 当 $N, M = 10^5$ 时，运算量达到 $10^{10}$ ，这会超时（Time Limit Exceeded）。
- 关键策略：先将所有物质按照沸点进行排序（Sorting）。
查找的加速：
- 排序后，沸点满足 $[L, R]$ 的物质在数组中一定是连续的一段。
- 我们需要找到：
  - 第一个沸点 $\ge L$ 的位置（区间的起点）。
  - 第一个沸点 $> R$ 的位置（区间的终点之后）。
- 在一个有序数组中查找特定值的位置，最快的方法是二分查找（Binary Search）。
标准库的利用：
- C++ 的 <algorithm> 库提供了非常好用的二分查找函数：
  - std::lower_bound(begin, end, val): 查找第一个 $\ge val$ 的元素迭代器。
  - std::upper_bound(begin, end, val): 查找第一个 $> val$ 的元素迭代器。
- 利用这两个函数，可以在 $O(\log N)$ 的时间内处理每一个查询。

二、预备知识点总结

结构体（Struct）：将名称和沸点打包存储。
排序（Sorting）：std::sort 以及自定义比较函数（Comparator）。
二分查找（Binary Search）：lower_bound 和 upper_bound 的理解与使用。
时间复杂度：理解为什么 $O(M \log N)$ 优于 $O(MN)$ 。

三、启发式教学：草稿纸上的推理过程

教练：“想象一下，仓库里乱七八糟地堆着几万瓶化学试剂。现在老板让你把沸点在 100 到 200 度的都挑出来。如果你一瓶瓶看，是不是累死了？”

学生：“是的，太慢了。”

教练：“那第一步我们该做什么？”

学生：“把它们按沸点从低到高排好队放在架子上！”

教练：“非常好。现在架子上的试剂沸点是这样的（画数轴）：” ... -161(M), -89(E), -1(B), 125(O), 174(D) ...

教练：“现在查询区间是 [-100, 0]。我们不需要从头数。我们只要找到两个‘切割点’。”

左刀：我们要找第一个 $\ge -100$ $\geq - 100$ 的位置。
- 看中间：-1。比 -100 大。往左找。
- 看左边：-161。比 -100 小。往右找。
- 找到了 -> -89 (Ethane)。这是起跑线。
右刀：我们要找第一个 $> 0$ $> 0$ 的位置（因为包含 0，所以我们要找大于 0 的才算越界）。
- 找到了 -> 125 (Octane)。这是终点线外面。

教练：“起跑线是下标 1，终点线外是下标 3。那么在这个范围内的数量就是 $3 - 1 = 2$ 个。它们分别是下标 1 和下标 2。沸点最低的就是起跑线那个——下标 1 (Ethane)。”

总结算法：

sort。
lower_bound 找 $L$ 。
upper_bound 找 $R$ 。
算距离，输出。

四、读题关键词总结

“快速计算” / $N, M = 10^5$ $\rightarrow$ 暗示 $O(N \log N)$ 算法，必用二分或高级数据结构。
“按沸点排序” $\rightarrow$ 提示解题的第一步。
“沸点最低的名称” $\rightarrow$ 排序后，区间内的第一个元素就是沸点最低的。

五、标准题解代码 (C++14)

/**
 * 题目：石油分馏模拟系统 (Petroleum Fractionation)
 * 算法：排序 + 二分查找 (Sorting + Binary Search)
 * 难度：GESP 5级 / CSP-J T2
 */

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>

using namespace std;

// 1. 定义结构体存储物质信息
struct Substance {
    string name;
    int bp; // Boiling Point
};

// 2. 自定义排序规则
// 按沸点从小到大排序；若沸点相同，按名称字典序排
bool compareSubstances(const Substance& a, const Substance& b) {
    if (a.bp != b.bp) {
        return a.bp < b.bp;
    }
    return a.name < b.name;
}

// 辅助比较函数，用于 lower_bound 仅比较 int
bool compareByBP(const Substance& a, int val) {
    return a.bp < val;
}
// 辅助比较函数，用于 upper_bound 仅比较 int
bool compareValByBP(int val, const Substance& a) {
    return val < a.bp;
}

int main() {
    // I/O 加速
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int N;
    if (!(cin >> N)) return 0;

    vector<Substance> subs(N);
    for (int i = 0; i < N; ++i) {
        cin >> subs[i].name >> subs[i].bp;
    }

    // 3. 排序：这是二分查找的前提
    sort(subs.begin(), subs.end(), compareSubstances);

    int M;
    cin >> M;
    while (M--) {
        int L, R;
        cin >> L >> R;

        // 4. 二分查找
        // lower_bound: 查找第一个 bp >= L 的位置
        // 注意：我们在结构体数组中查找整数，需要自定义 lambda 或者比较函数
        auto it_low = lower_bound(subs.begin(), subs.end(), L, 
            [](const Substance& s, int val) {
                return s.bp < val;
            });

        // upper_bound: 查找第一个 bp > R 的位置
        auto it_high = upper_bound(subs.begin(), subs.end(), R, 
            [](int val, const Substance& s) {
                return val < s.bp;
            });

        // 计算数量：迭代器相减
        int count = it_high - it_low;

        if (count > 0) {
            // it_low 指向的就是区间内第一个元素（沸点最低）
            cout << count << " " << it_low->name << "\n";
        } else {
            cout << "0 None\n";
        }
    }

    return 0;
}

时间与空间复杂度分析

时间复杂度：
- 排序： $O(N \log N)$ 。
- 查询：每次查询调用两次二分查找，复杂度 $O(\log N)$ 。共 $M$ 次，总计 $O(M \log N)$ 。
- 总时间： $O((N+M) \log N)$ 。完全满足 $10^5$ 数据量的要求。
空间复杂度：
- 存储数组 $O(N)$ 。

六、数据生成器 (Generator.cpp)

/**
 * 题目：石油分馏模拟系统 (Petroleum Fractionation)
 * 修复版数据生成器
 * 
 * 修复内容：
 * - 修正了 Case 3, 5, 10 中查询数量(M)与实际生成的查询数据不匹配的问题。
 * - 之前 M=10 但只 push 了一条数据，导致输入文件数据缺失，
 *   使做题程序在读取后续查询时遇到 EOF 产生错误输出，与标准答案行数不符。
 */

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
#include <random>
#include <ctime>

using namespace std;

mt19937 rng(time(0));

struct Sub {
    string name;
    int bp;
};

// 核心解法 (用于生成 .out)
void solve(int N, vector<Sub> subs, const vector<pair<int,int>>& queries, vector<string>& out) {
    // 排序逻辑
    sort(subs.begin(), subs.end(), [](const Sub& a, const Sub& b){
        if(a.bp != b.bp) return a.bp < b.bp;
        return a.name < b.name;
    });
    
    for(auto q : queries) {
        int L = q.first;
        int R = q.second;
        
        // 二分查找逻辑
        auto it_l = lower_bound(subs.begin(), subs.end(), L, [](const Sub& a, int val){ return a.bp < val; });
        auto it_r = upper_bound(subs.begin(), subs.end(), R, [](int val, const Sub& a){ return val < a.bp; });
        
        int cnt = distance(it_l, it_r);
        if(cnt > 0) out.push_back(to_string(cnt) + " " + it_l->name);
        else out.push_back("0 None");
    }
}

// 随机名称生成
string rand_name() {
    int len = rng() % 5 + 3; // 3-7 chars
    string s = "";
    s += (char)('A' + rng()%26);
    for(int i=1; i<len; ++i) s += (char)('a' + rng()%26);
    return s;
}

void make_case(int id) {
    int N, M;
    vector<Sub> subs;
    vector<pair<int,int>> queries;

    switch(id) {
        case 1: // 样例
            N = 5; M = 3;
            subs = {{"Methane", -161}, {"Butane", -1}, {"Octane", 125}, {"Decane", 174}, {"Ethane", -89}};
            queries = {{-200, -50}, {-100, 0}, {100, 200}};
            break;
            
        case 2: // 极小数据
            N = 10; M = 5;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), (int)(rng()%200)});
            for(int i=0; i<M; ++i) queries.push_back({0, 200});
            break;
            
        case 3: // 【已修复】范围外查询 (空)
            N = 100; M = 10;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), 0}); // 全是0
            // 修复：循环 M 次推入查询，确保输入文件有 M 行查询数据
            for(int k=0; k<M; ++k) queries.push_back({10, 20});
            break;
            
        case 4: // 范围内全包
            N = 100; M = 1;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), (int)(rng()%100)});
            queries.push_back({-100, 200});
            break;
            
        case 5: // 【已修复】沸点重复
            N = 1000; M = 100;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), 50});
            // 修复：循环 M 次推入查询
            for(int k=0; k<M; ++k) queries.push_back({50, 50});
            break;
            
        case 6: // 随机中等
            N = 1000; M = 1000;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), (int)(rng()%1000 - 200)});
            for(int i=0; i<M; ++i) {
                int l = rng()%1000 - 200;
                queries.push_back({l, l + 100});
            }
            break;
            
        case 7: // 负数区间
            N = 1000; M = 100;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), (int)(rng()%100 - 273)});
            for(int i=0; i<M; ++i) queries.push_back({-273, -100});
            break;
            
        case 8: // 大数据
            N = 100000; M = 100000;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), (int)(rng()%2000 - 1000)});
            for(int i=0; i<M; ++i) {
                int l = rng()%2000 - 1000;
                queries.push_back({l, l + rng()%500});
            }
            break;
            
        case 9: // 大数据，区间极小
            N = 100000; M = 100000;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), (int)(rng()%2000 - 1000)});
            for(int i=0; i<M; ++i) {
                int l = rng()%2000 - 1000;
                queries.push_back({l, l}); // 单点查询
            }
            break;
            
        case 10: // 【已修复】满区间
            N = 100000; M = 5;
            for(int i=0; i<N; ++i) subs.push_back({rand_name(), (int)(rng()%2000 - 1000)});
            // 修复：循环 M 次推入查询
            for(int k=0; k<M; ++k) queries.push_back({-5000, 5000});
            break;
    }

    // 写入 .in
    string in_file = to_string(id) + ".in";
    ofstream fout_in(in_file);
    fout_in << N << "\n";
    for(auto s : subs) fout_in << s.name << " " << s.bp << "\n";
    fout_in << M << "\n";
    for(auto q : queries) fout_in << q.first << " " << q.second << "\n";
    fout_in.close();

    // 计算并写入 .out
    vector<string> ans;
    solve(N, subs, queries, ans);
    
    string out_file = to_string(id) + ".out";
    ofstream fout_out(out_file);
    for(auto s : ans) fout_out << s << "\n";
    fout_out.close();

    cout << "Generated Case " << id << " (N=" << N << ", M=" << M << ")" << endl;
}

int main() {
    ios_base::sync_with_stdio(false);
    for(int i=1; i<=10; ++i) make_case(i);
    return 0;
}

这道题目将石油分馏的化学知识与排序、二分查找的算法知识完美融合。它直观地展示了为什么要对数据进行排序（为了分馏），以及如何高效地在有序数据中截取片段（二分），非常适合作为 GESP 5 级的训练题。

ID: 19316
时间: 1000ms
内存: 32MiB
难度: 10
标签: (无)
递交数: 2
已通过: 1
上传者: tobyforever

1 条题解

一、思路提示

二、预备知识点总结

三、启发式教学：草稿纸上的推理过程

四、读题关键词总结

五、标准题解代码 (C++14)

时间与空间复杂度分析

六、数据生成器 (Generator.cpp)

信息

状态

开发

支持

关于

1 条题解

一、 思路提示

二、 预备知识点总结

三、 启发式教学：草稿纸上的推理过程

四、 读题关键词总结

五、 标准题解代码 (C++14)

时间与空间复杂度分析

六、 数据生成器 (Generator.cpp)

石油分馏模拟系统

信息

状态

开发

支持

关于

还没有账户？

登录

一、思路提示

二、预备知识点总结

三、启发式教学：草稿纸上的推理过程

四、读题关键词总结

五、标准题解代码 (C++14)

六、数据生成器 (Generator.cpp)