Commit Graph

10 Commits

Author SHA1 Message Date
LiangSong
32583a41a7 update wudao download and preprocess 2023-05-09 14:47:59 +08:00
LiangSong
758af69c73 update science instruct-tuning datasets 2023-05-05 19:00:37 +08:00
LiangSong
d24b4cce54 update preprocess format 2023-05-05 18:20:59 +08:00
LiangSong
85caa97a6a add xP3 dataset and belle_2M 2023-05-05 17:05:41 +08:00
LiangSong
dba2e2d680 update ShareGPT_90K preprocess 2023-05-04 08:34:38 +08:00
LiangSong
f8f4cde228 using huggingface datasets to accelerate training, using open-llama to pretrain 2023-04-24 19:13:53 +08:00
LiangSong
9f140dc99f update preprocess_instruction, add math/code/multiturn_chat and etc. 2023-04-05 23:51:56 +08:00
LiangSong
a62ac2658f add instruction-tuning 2023-03-30 23:43:12 +08:00
LiangSong
918a8cdc3d reformat code with black 2023-03-27 14:34:59 +08:00
LiangSong
73a81a4205 add high-performance Llama pre-train code 2023-03-26 23:59:53 +08:00