update readme
This commit is contained in:
parent
0f7751e2ec
commit
889e42fd45
|
@ -33,6 +33,7 @@ Open-Llama是一个开源项目,提供了一整套用于构建大型语言模
|
|||
对于7B模型,使用Transformers中Pytorch原生版本的Llama模型训练训练速度为1378 token/s/gpu,使用本代码库训练速度达到3290 token/s/gpu,基本达到[Llama原文](https://arxiv.org/pdf/2302.13971.pdf)中的3370 token/s/gpu。
|
||||
如果使用500B token进行预训练,需要训练43000 GPU时。按照Google Cloud上A100-80G Spot的价格计算,8卡每小时价格为12.6美元,则总价格为67725美元。
|
||||
当使用未加速版本训练时,价格为158744美元。最终降低训练成本9万美元。
|
||||
更多测试可见[和其他开源模型性能对比](https://github.com/Bayes-Song/Open-Llama#%E5%92%8C%E5%85%B6%E4%BB%96%E5%BC%80%E6%BA%90%E6%A8%A1%E5%9E%8B%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94)
|
||||
### 通用性
|
||||
|
||||
在训练语言模型时,我们希望能够构建一个通用的模型,可以适用于不同的语言和不同的领域。为了实现这一点,我们采用了以下策略:
|
||||
|
|
|
@ -31,7 +31,7 @@ Since training large language models is costly, high performance is also crucial
|
|||
For 7B mode, the training speed of the Llama model using the PyTorch native version in the Transformers library is 1378 tokens/s/GPU. With our code, the training speed reaches 3290 tokens/s/GPU, which is close to the reported 3370 tokens/s/GPU in the [Llama paper](https://arxiv.org/pdf/2302.13971.pdf).
|
||||
If we pretrain with 500 billion tokens, it will take 43,000 GPU hours. Assuming the price of A100-80G Spot on Google Cloud is $12.6 per hour for 8 GPUs, the total cost will be $67,725.
|
||||
Without acceleration, the cost would be $158,744. Our method reduces the training cost by $90,019 in total.
|
||||
|
||||
More comparison can be found in [Comparison of Performance with Other Open-Source Models](https://github.com/Bayes-Song/Open-Llama/blob/main/README_en.md#performance-comparison-with-other-open-source-models).
|
||||
### Universality
|
||||
When training language models, we aim to build a universal model that can be used for different languages and fields. To achieve this, we adopt the following strategies:
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user