fix typo
This commit is contained in:
		
							parent
							
								
									2df3e622e9
								
							
						
					
					
						commit
						7da40f1c83
					
				|  | @ -61,7 +61,7 @@ Below is a display of the model's multi-turn dialogue ability regarding code: | ||||||
| 
 | 
 | ||||||
| **[2023.5.8] Release v2.1** | **[2023.5.8] Release v2.1** | ||||||
| 
 | 
 | ||||||
| This update adds support for larger model training. Using DeepSpeed stage3 + offload + activation checkpoint, you can train a 65B model on a **single machine with 8 A100-80G**. | This update adds support for larger model training. Using DeepSpeed stage3 + offload + activation checkpoint, you can **train a 65B model on a single machine with 8 A100-80G**. | ||||||
| The following table compares the training speed of Open-Llama and the original Llama, and the performance data of Llama is quoted from the original Llama paper. | The following table compares the training speed of Open-Llama and the original Llama, and the performance data of Llama is quoted from the original Llama paper. | ||||||
| |                | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size | CPU Memory | | |                | DeepSpeed Stage | Offload | Activation Checkpoint | Total Token | GPU hours | Speed token/s/gpu | Batch Size | CPU Memory | | ||||||
| |----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|------------| | |----------------|-----------------|---------|-----------------------|-------------|-----------|-------------------|------------|------------| | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user
	 LiangSong
						LiangSong