From 16811d0efeef750247cd6b0c6657505b4a52fe4e Mon Sep 17 00:00:00 2001 From: LiangSong Date: Mon, 8 May 2023 22:29:24 +0800 Subject: [PATCH] update readme --- README.md | 10 ++++++---- README_zh.md | 9 ++++++--- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 45af213..6b9e6a8 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ * @Author: LiangSong(sl12160010@gmail.com) * @Date: 2023-03-10 21:18:35 * @LastEditors: LiangSong(sl12160010@gmail.com) - * @LastEditTime: 2023-05-08 22:25:57 + * @LastEditTime: 2023-05-08 22:28:51 * @FilePath: /Open-Llama/README.md * @Description: * @@ -61,9 +61,11 @@ Below is a display of the model's multi-turn dialogue ability regarding code: **[2023.5.8] Release v2.1** -This update adds support for larger model training. Using DeepSpeed stage3 + offload + activation checkpoint, you can **train a 65B model on a single machine with 8 A100-80G**. -At the same time, the peft library is introduced to **support training such as lora**. -The following table compares the training speed of Open-Llama and the original Llama, and the performance data of Llama is quoted from the original Llama paper. +- This update adds support for larger model training. Using DeepSpeed stage3 + offload + activation checkpoint, you can **train a 65B model on a single machine with 8 A100-80G**. + +- The peft library is introduced to **support training such as lora**. + +- The following table compares the training speed of Open-Llama and the original Llama, and the performance data of Llama is quoted from the original Llama paper. | | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size | CPU Memory | |----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|------------| | Open-Llama 7B | 1 | False | False | 173.7B | 13412 | 3587 | 2 | 94G | diff --git a/README_zh.md b/README_zh.md index f061ab5..0cd1616 100644 --- a/README_zh.md +++ b/README_zh.md @@ -2,7 +2,7 @@ * @Author: LiangSong(sl12160010@gmail.com) * @Date: 2023-03-10 21:18:35 * @LastEditors: LiangSong(sl12160010@gmail.com) - * @LastEditTime: 2023-05-08 22:25:28 + * @LastEditTime: 2023-05-08 22:28:40 * @FilePath: /Open-Llama/README_zh.md * @Description: * @@ -62,8 +62,11 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)) **[2023.5.8] Release v2.1** -本次更新加入对更大模型训练的支持,使用DeepSpeed stage3 + offload + activation checkpoint可以在**单机8卡A100-80G训练65B模型**。同时引入peft库**支持lora**等训练。 -下表对比了Open-Llama和Llama原文的训练速度,Llama性能数据引自Llama原文。 +- 本次更新加入对更大模型训练的支持,使用DeepSpeed stage3 + offload + activation checkpoint可以在**单机8卡A100-80G训练65B模型**。 + +- 引入peft库**支持lora**等训练。 + +- 下表对比了Open-Llama和Llama原文的训练速度,Llama性能数据引自Llama原文。 | | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size | CPU Memory | |----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|------------| | Open-Llama 7B | 1 | False | False | 173.7B | 13412 | 3587 | 2 | 94G |