You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
|
# 改写项目
|
|
|
|
基于unilm模型以及t5的生成式任务,使用keras框架,数据处理脚本在data_do文件夹下
|
|
|
|
训练数据 train_yy.txt
|
|
|
|
|
|
|
|
## 训练
|
|
|
|
训练 t5: python task_seq2seq_t5.py
|
|
|
|
训练 simbert: python simbert_train.py
|
|
|
|
|
|
|
|
## 预测
|
|
|
|
simbert: python predict_sim.py
|
|
|
|
t5: python predict_t5.py
|
|
|
|
|
|
|
|
## API serve
|
|
|
|
请求句子uuid服务启动方式:bash run_app_nohub_t5.sh
|
|
|
|
根据uuid查找改写结果服务启动方式:bash run_app_nohub_search_redis.sh
|
|
|
|
|
|
|
|
## 请求响应示例
|
|
|
|
请求句子uuid: https://console-docs.apipost.cn/preview/e3717e390cbdb50e/f4479038c8015f34
|
|
|
|
请求改写结果: https://console-docs.apipost.cn/preview/6b9de12817e8ef08/b158334d2c9534d2
|
|
|
|
|
|
|
|
## 从yy数据生成训练数据
|
|
|
|
python data_do/处理yy数据原始数据.py
|
|
|
|
python data_do/进一步处理降重数据.py
|
|
|
|
python data_do/yy训练数据处理.py
|
|
|
|
python 筛选训练数据strsim.py
|
|
|
|
python 合并数据.py
|
|
|
|
|
|
|
|
## 测试11篇数据
|
|
|
|
|
|
|
|
## 测试数据是否有bug
|
|
|
|
python 测试10000篇数据.py
|