update readme

This commit is contained in:
LiangSong 2023-03-27 13:39:01 +08:00
parent ee80f3a5cf
commit 6dd4907629
3 changed files with 15 additions and 1 deletions

View File

@ -30,6 +30,9 @@ Open-Llama是一个开源项目提供了一整套用于构建大型语言模
- **Fused CUDA kernel**使用xformers中提供的 fused CUDA kernel 可以将多个操作融合在一起,减少了 GPU 和 CPU 之间的数据传输,从而提高了训练效率。
- **并行化训练**我们使用Accelerate库支持在多个 GPU 上进行并行化训练,以加快训练速度。
对于7B模型使用Transformers中Pytorch原生版本的Llama模型训练训练速度为1378 token/s/gpu使用本代码库训练速度达到3290 token/s/gpu基本达到Llama原文中的3370 token/s/gpu。
如果使用500B token进行预训练需要训练43000 GPU时。按照Google Cloud上A100-80G Spot的价格计算8卡每小时价格为12.6美元则总价格为67725美元。
当使用未加速版本训练时价格为158744美元。最终降低训练成本9万美元。
### 通用性
在训练语言模型时,我们希望能够构建一个通用的模型,可以适用于不同的语言和不同的领域。为了实现这一点,我们采用了以下策略:
@ -132,6 +135,10 @@ Trainable params: 6,885,879,808
Non-trainable params: 0
Total mult-adds (G): 6.89
```
目前的进展
![](assets/loss.png)
### Instruction-Tuning
### RLHF

View File

@ -26,6 +26,12 @@ Since training large language models is costly, high performance is also crucial
- **Fused CUDA kernel**: Using fused CUDA kernels provided by xformers can fuse multiple operations together, reducing data transfer between GPU and CPU, and improving training efficiency.
- **Parallel training**: We use the Accelerate library to support parallel training on multiple GPUs, accelerating the training process.
For 7B mode, the training speed of the Llama model using the PyTorch native version in the Transformers library is 1378 tokens/s/GPU. With our code, the training speed reaches 3290 tokens/s/GPU, which is close to the reported 3370 tokens/s/GPU in the Llama paper.
If we pretrain with 500 billion tokens, it will take 43,000 GPU hours. Assuming the price of A100-80G Spot on Google Cloud is $12.6 per hour for 8 GPUs, the total cost will be $67,725.
Without acceleration, the cost would be $158,744. Our method reduces the training cost by $90,019 in total.
### Universality
When training language models, we aim to build a universal model that can be used for different languages and fields. To achieve this, we adopt the following strategies:
@ -120,7 +126,8 @@ Trainable params: 6,885,879,808
Non-trainable params: 0
Total mult-adds (G): 6.89
```
Current Progress
![](assets/loss.png)
### Instruction-Tuning
### RLHF

BIN
assets/loss.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB