Commit Graph

13 Commits

Author SHA1 Message Date
LiangSong
85caa97a6a add xP3 dataset and belle_2M 2023-05-05 17:05:41 +08:00
LiangSong
f0d41f937b update instruct_config and set all random seed to 42 2023-05-04 08:45:21 +08:00
LiangSong
c2184c6dd1 support multiple epochs 2023-05-03 00:02:01 +08:00
LiangSong
f05e929aad update config 2023-05-02 21:42:55 +08:00
LiangSong
fc21a75d1e add continue training 2023-04-29 20:28:39 +08:00
LiangSong
49118aad42 update header config and add padding to concat_multiple_sequence 2023-04-27 23:42:11 +08:00
LiangSong
db6cdb51d0 unified pre-training and instrcution-tuning both use train_lm and dataset 2023-04-27 19:42:06 +08:00
LiangSong
97aff0e051 use split_dataset_by_node instead accelerate.prepare to accelerate data loading by 50% 2023-04-27 00:04:11 +08:00
LiangSong
0377b43628 update tokenizer to LlamaTokenizer 2023-04-26 18:53:30 +08:00
LiangSong
f41f5558ec update header 2023-04-24 23:19:07 +08:00
LiangSong
f8f4cde228 using huggingface datasets to accelerate training, using open-llama to pretrain 2023-04-24 19:13:53 +08:00
LiangSong
3f62a23ee2 update format 2023-04-12 22:16:15 +08:00
LiangSong
a4aa109dd3 add trainer and utils 2023-04-12 17:59:05 +08:00