update readme add ckpt from hf

This commit is contained in:
LiangSong 2023-04-16 23:50:36 +08:00
parent b21441b14b
commit ad3d943a7d
2 changed files with 15 additions and 2 deletions

View File

@ -2,7 +2,7 @@
* @Author: LiangSong(sl12160010@gmail.com)
* @Date: 2023-03-10 21:18:35
* @LastEditors: LiangSong(sl12160010@gmail.com)
* @LastEditTime: 2023-04-09 22:48:28
* @LastEditTime: 2023-04-16 23:49:06
* @FilePath: /Open-Llama/README.md
* @Description:
*
@ -18,6 +18,12 @@ Open-Llama是一个开源项目提供了一整套用于构建大型语言模
**采用FastChat项目相同方法测评Open-Llama的效果和GPT3.5的效果对比经过测试在中文问题上可以达到GPT3.5 84%的水平具体测试结果和CheckPoint将在近期放出**
经过Instruct-tuning的CheckPoint已开源在[HuggingFace](https://huggingface.co/s-JoL/Open-Llama-V1)。使使用ckpt需要先用下面命令安装最新版本Transformers
``` base
pip install git+https://github.com/s-JoL/transformers.git@dev
```
我们完成了300B token的预训练总共训练80 K stepGlobal Batch Size和Llama中一致为4M。
使用总共7部分数据构成Instruction-tuning数据模型具有一定的编程能力、数学能力和多轮对话能力具体数据见Instruction-Tuning部分。

View File

@ -2,7 +2,7 @@
* @Author: LiangSong(sl12160010@gmail.com)
* @Date: 2023-03-10 21:18:35
* @LastEditors: LiangSong(sl12160010@gmail.com)
* @LastEditTime: 2023-04-08 00:03:57
* @LastEditTime: 2023-04-16 23:49:28
* @FilePath: /Open-Llama/README_en.md
* @Description:
*
@ -15,6 +15,13 @@ Translated by ChatGPT.
Open-Llama is an open source project that provides a complete set of training processes for building large-scale language models, from data preparation to tokenization, pre-training, instruction tuning, and reinforcement learning techniques such as RLHF.
## Progress
The checkpoint after Instruct-tuning has been open-sourced on [HuggingFace](https://huggingface.co/s-JoL/Open-Llama-V1).
To use the checkpoint, you need to first install the latest version of Transformers using the following command.
``` base
pip install git+https://github.com/s-JoL/transformers.git@dev
```
We completed pre-training on 300 billion tokens, with a total of 80,000 steps trained, using a global batch size of 4 million, consistent with Llama. We constructed the instruction-tuning dataset using a total of 7 parts of data, which the model has certain programming ability, mathematical ability, and multi-turn dialogue ability. For specific data, please refer to the instruction-tuning section.
[Demo](http://home.ustc.edu.cn/~sl9292/)