排版识别标题级别和正文
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

15 lines
426 B

import pandas as pd
import random
data = pd.read_csv("data/train_data_weipu.csv").values.tolist()
random.shuffle(data)
random.shuffle(data)
split = int(len(data)*0.5)
train_1 = data[:split]
train_2 = data[split:]
pd.DataFrame(train_1, columns=["sentence", "label"]).to_csv("data/train_1_data_weipu.csv", index=False)
pd.DataFrame(train_2, columns=["sentence", "label"]).to_csv("data/train_2_data_weipu.csv", index=False)