diff --git a/README.md b/README.md index 81e922e..2371d06 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,7 @@ Open-Llama是一个开源项目,提供了一整套用于构建大型语言模 对于7B模型,使用Transformers中Pytorch原生版本的Llama模型训练训练速度为1378 token/s/gpu,使用本代码库训练速度达到3290 token/s/gpu,基本达到[Llama原文](https://arxiv.org/pdf/2302.13971.pdf)中的3370 token/s/gpu。 如果使用500B token进行预训练,需要训练43000 GPU时。按照Google Cloud上A100-80G Spot的价格计算,8卡每小时价格为12.6美元,则总价格为67725美元。 当使用未加速版本训练时,价格为158744美元。最终降低训练成本9万美元。 +更多测试可见[和其他开源模型性能对比](https://github.com/Bayes-Song/Open-Llama#%E5%92%8C%E5%85%B6%E4%BB%96%E5%BC%80%E6%BA%90%E6%A8%A1%E5%9E%8B%E6%80%A7%E8%83%BD%E5%AF%B9%E6%AF%94) ### 通用性 在训练语言模型时,我们希望能够构建一个通用的模型,可以适用于不同的语言和不同的领域。为了实现这一点,我们采用了以下策略: diff --git a/README_en.md b/README_en.md index 407d769..e5ebfee 100644 --- a/README_en.md +++ b/README_en.md @@ -31,7 +31,7 @@ Since training large language models is costly, high performance is also crucial For 7B mode, the training speed of the Llama model using the PyTorch native version in the Transformers library is 1378 tokens/s/GPU. With our code, the training speed reaches 3290 tokens/s/GPU, which is close to the reported 3370 tokens/s/GPU in the [Llama paper](https://arxiv.org/pdf/2302.13971.pdf). If we pretrain with 500 billion tokens, it will take 43,000 GPU hours. Assuming the price of A100-80G Spot on Google Cloud is $12.6 per hour for 8 GPUs, the total cost will be $67,725. Without acceleration, the cost would be $158,744. Our method reduces the training cost by $90,019 in total. - +More comparison can be found in [Comparison of Performance with Other Open-Source Models](https://github.com/Bayes-Song/Open-Llama/blob/main/README_en.md#performance-comparison-with-other-open-source-models). ### Universality When training language models, we aim to build a universal model that can be used for different languages and fields. To achieve this, we adopt the following strategies: