fix typo
This commit is contained in:
parent
2df3e622e9
commit
7da40f1c83
|
@ -61,7 +61,7 @@ Below is a display of the model's multi-turn dialogue ability regarding code:
|
|||
|
||||
**[2023.5.8] Release v2.1**
|
||||
|
||||
This update adds support for larger model training. Using DeepSpeed stage3 + offload + activation checkpoint, you can train a 65B model on a **single machine with 8 A100-80G**.
|
||||
This update adds support for larger model training. Using DeepSpeed stage3 + offload + activation checkpoint, you can **train a 65B model on a single machine with 8 A100-80G**.
|
||||
The following table compares the training speed of Open-Llama and the original Llama, and the performance data of Llama is quoted from the original Llama paper.
|
||||
| | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size | CPU Memory |
|
||||
|----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|------------|
|
||||
|
|
Loading…
Reference in New Issue
Block a user