update readme
This commit is contained in:
parent
6988c69884
commit
d269affb42
|
@ -2,7 +2,7 @@
|
||||||
* @Author: LiangSong(sl12160010@gmail.com)
|
* @Author: LiangSong(sl12160010@gmail.com)
|
||||||
* @Date: 2023-03-10 21:18:35
|
* @Date: 2023-03-10 21:18:35
|
||||||
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
||||||
* @LastEditTime: 2023-05-15 23:00:11
|
* @LastEditTime: 2023-05-17 21:16:42
|
||||||
* @FilePath: /Open-Llama/README.md
|
* @FilePath: /Open-Llama/README.md
|
||||||
* @Description:
|
* @Description:
|
||||||
*
|
*
|
||||||
|
@ -52,7 +52,6 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
||||||
|
|
||||||
```
|
```
|
||||||
The CheckPoint after pre-training only is also uploaded to [s-JoL/Open-Llama-V2-pretrain](https://huggingface.co/s-JoL/Open-Llama-V2-pretrain).
|
The CheckPoint after pre-training only is also uploaded to [s-JoL/Open-Llama-V2-pretrain](https://huggingface.co/s-JoL/Open-Llama-V2-pretrain).
|
||||||
The model [PR](https://github.com/huggingface/transformers/pull/22795) has been submitted for merging into the Transformers main branch.
|
|
||||||
|
|
||||||
We have completed 330B token pre-training, training a total of 80 K steps. The Global Batch Size is consistent with Llama at 4M.
|
We have completed 330B token pre-training, training a total of 80 K steps. The Global Batch Size is consistent with Llama at 4M.
|
||||||
Using a total of 7 parts of data to constitute the Instruction-tuning data, the model has certain programming abilities, mathematical abilities, and multi-turn dialogue abilities. Specific data can be found in the Instruction-Tuning section.
|
Using a total of 7 parts of data to constitute the Instruction-tuning data, the model has certain programming abilities, mathematical abilities, and multi-turn dialogue abilities. Specific data can be found in the Instruction-Tuning section.
|
||||||
|
@ -74,7 +73,7 @@ Below is a display of the model's multi-turn dialogue ability regarding code:
|
||||||
|
|
||||||
| | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size |
|
| | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size |
|
||||||
|----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|
|
|----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|
|
||||||
| Open-Llama 7B | 1 | False | False | 173.7B | 13412 | 3587 | 2 |
|
| Open-Llama 7B | 1 | False | False | 173.7B | 13412 | 3620 | 2 |
|
||||||
| Open-Llama 13B | 3 | False | True | - | - | 1856 | 24 |
|
| Open-Llama 13B | 3 | False | True | - | - | 1856 | 24 |
|
||||||
| Open-Llama 33B | 3 | False | True | - | - | 708 | 12 |
|
| Open-Llama 33B | 3 | False | True | - | - | 708 | 12 |
|
||||||
| Open-Llama 65B | 3 | True | True | - | - | 369 | 12 |
|
| Open-Llama 65B | 3 | True | True | - | - | 369 | 12 |
|
||||||
|
@ -85,7 +84,7 @@ Below is a display of the model's multi-turn dialogue ability regarding code:
|
||||||
|
|
||||||
**[2023.4.28] Release v2.0**
|
**[2023.4.28] Release v2.0**
|
||||||
|
|
||||||
This update mainly includes the following aspects, increasing the effective training speed by **50%** compared to the v1 version, reducing padding from **30%** to **5%**, and improving training speed from **3200 tokens/s** to **3587 tokens/s**. 0.95 * 3587 / (0.7 * 3200) = 1.521
|
This update mainly includes the following aspects, increasing the effective training speed by **50%** compared to the v1 version, reducing padding from **30%** to **5%**, and improving training speed from **3200 tokens/s** to **3620 tokens/s**. 0.95 * 3620 / (0.7 * 3200) = 1.521
|
||||||
|
|
||||||
1. Use Hugging Face's datasets library for data reading, with the process as follows:
|
1. Use Hugging Face's datasets library for data reading, with the process as follows:
|
||||||
1. Use the transform function to unify data formats from different datasets to {'text': 'xxx'}
|
1. Use the transform function to unify data formats from different datasets to {'text': 'xxx'}
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
* @Author: LiangSong(sl12160010@gmail.com)
|
* @Author: LiangSong(sl12160010@gmail.com)
|
||||||
* @Date: 2023-03-10 21:18:35
|
* @Date: 2023-03-10 21:18:35
|
||||||
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
||||||
* @LastEditTime: 2023-05-15 22:59:30
|
* @LastEditTime: 2023-05-17 21:17:41
|
||||||
* @FilePath: /Open-Llama/README_zh.md
|
* @FilePath: /Open-Llama/README_zh.md
|
||||||
* @Description:
|
* @Description:
|
||||||
*
|
*
|
||||||
|
@ -53,7 +53,6 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
||||||
|
|
||||||
```
|
```
|
||||||
只经过预训练的CheckPoint也上传至[s-JoL/Open-Llama-V2-pretrain](https://huggingface.co/s-JoL/Open-Llama-V2-pretrain)。
|
只经过预训练的CheckPoint也上传至[s-JoL/Open-Llama-V2-pretrain](https://huggingface.co/s-JoL/Open-Llama-V2-pretrain)。
|
||||||
模型已提交[PR](https://github.com/huggingface/transformers/pull/22795)合并至Transformers main分支。
|
|
||||||
|
|
||||||
我们完成了330B token的预训练,总共训练80 K step,Global Batch Size和Llama中一致为4M。
|
我们完成了330B token的预训练,总共训练80 K step,Global Batch Size和Llama中一致为4M。
|
||||||
使用总共7部分数据构成Instruction-tuning数据,模型具有一定的编程能力、数学能力和多轮对话能力,具体数据见Instruction-Tuning部分。
|
使用总共7部分数据构成Instruction-tuning数据,模型具有一定的编程能力、数学能力和多轮对话能力,具体数据见Instruction-Tuning部分。
|
||||||
|
@ -75,7 +74,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
||||||
|
|
||||||
| | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size |
|
| | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size |
|
||||||
|----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|
|
|----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|
|
||||||
| Open-Llama 7B | 1 | False | False | 173.7B | 13412 | 3587 | 2 |
|
| Open-Llama 7B | 1 | False | False | 173.7B | 13412 | 3620 | 2 |
|
||||||
| Open-Llama 13B | 3 | False | True | - | - | 1856 | 24 |
|
| Open-Llama 13B | 3 | False | True | - | - | 1856 | 24 |
|
||||||
| Open-Llama 33B | 3 | False | True | - | - | 708 | 12 |
|
| Open-Llama 33B | 3 | False | True | - | - | 708 | 12 |
|
||||||
| Open-Llama 65B | 3 | True | True | - | - | 369 | 12 |
|
| Open-Llama 65B | 3 | True | True | - | - | 369 | 12 |
|
||||||
|
@ -86,7 +85,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
||||||
|
|
||||||
**[2023.4.28] Release v2.0**
|
**[2023.4.28] Release v2.0**
|
||||||
|
|
||||||
本次更新主要包含以下几个方面,相对于v1版本提升有效训练速度**50%**,其中pad从**30%**减少至**5%**,训练速度从**3200token/s**提升至**3587token/s**。0.95 * 3600/(0.7 * 3200)=1.521
|
本次更新主要包含以下几个方面,相对于v1版本提升有效训练速度**50%**,其中pad从**30%**减少至**5%**,训练速度从**3200token/s**提升至**3620token/s**。0.95 * 3620/(0.7 * 3200)=1.521
|
||||||
1. 使用Hugging Face的datasets库进行数据读取,具体流程如下
|
1. 使用Hugging Face的datasets库进行数据读取,具体流程如下
|
||||||
1. 使用transform函数将不同数据集的数据统一格式为{'text': 'xxx'}
|
1. 使用transform函数将不同数据集的数据统一格式为{'text': 'xxx'}
|
||||||
2. 使用Tokenizer进行分词
|
2. 使用Tokenizer进行分词
|
||||||
|
|
Loading…
Reference in New Issue
Block a user