Update README.md

HuggingFace -> Hugging Face
This commit is contained in:
Ikko Eltociear Ashimine 2023-05-17 00:11:35 +09:00 committed by GitHub
parent 77b1c552c3
commit 7bacd6cb93
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -29,7 +29,7 @@ Join [discord](https://discord.gg/TrKxrTpnab) to discuss the development of larg
## **Main contents** ## **Main contents**
- **Support Transformers/HuggingFace.** The CheckPoint after Instruct-tuning is open-source on [HuggingFace: s-JoL/Open-Llama-V2](https://huggingface.co/s-JoL/Open-Llama-V2). - **Support Transformers/HuggingFace.** The CheckPoint after Instruct-tuning is open-source on [Hugging Face: s-JoL/Open-Llama-V2](https://huggingface.co/s-JoL/Open-Llama-V2).
- **By adopting the same evaluation method as the FastChat project, Open-Llama's performance is compared to GPT3.5s. After testing, it can reach 89% of GPT3.5's performance on Chinese questions.** - **By adopting the same evaluation method as the FastChat project, Open-Llama's performance is compared to GPT3.5s. After testing, it can reach 89% of GPT3.5's performance on Chinese questions.**
@ -87,7 +87,7 @@ Below is a display of the model's multi-turn dialogue ability regarding code:
This update mainly includes the following aspects, increasing the effective training speed by **50%** compared to the v1 version, reducing padding from **30%** to **5%**, and improving training speed from **3200 tokens/s** to **3587 tokens/s**. 0.95 * 3587 / (0.7 * 3200) = 1.521 This update mainly includes the following aspects, increasing the effective training speed by **50%** compared to the v1 version, reducing padding from **30%** to **5%**, and improving training speed from **3200 tokens/s** to **3587 tokens/s**. 0.95 * 3587 / (0.7 * 3200) = 1.521
1. Use HuggingFace's datasets library for data reading, with the process as follows: 1. Use Hugging Face's datasets library for data reading, with the process as follows:
1. Use the transform function to unify data formats from different datasets to {'text': 'xxx'} 1. Use the transform function to unify data formats from different datasets to {'text': 'xxx'}
2. Tokenize using Tokenizer 2. Tokenize using Tokenizer
3. Sample long sequences; currently, three modes are provided: truncation, sampling (refer to the [Gopher paper](https://arxiv.org/abs/2112.11446)), and splitting 3. Sample long sequences; currently, three modes are provided: truncation, sampling (refer to the [Gopher paper](https://arxiv.org/abs/2112.11446)), and splitting
@ -100,7 +100,7 @@ This update mainly includes the following aspects, increasing the effective trai
[2023.4.16] Release v1.0 [2023.4.16] Release v1.0
Basic pre-training and instruction fine-tuning codes are provided, with a training speed comparable to that of the original Llama. The pre-trained and fine-tuned models are already open-sourced on HuggingFace. Basic pre-training and instruction fine-tuning codes are provided, with a training speed comparable to that of the original Llama. The pre-trained and fine-tuned models are already open-sourced on Hugging Face.
v1 version code can be seen at https://github.com/s-JoL/Open-Llama/tree/v1.0 v1 version code can be seen at https://github.com/s-JoL/Open-Llama/tree/v1.0
@ -336,7 +336,7 @@ Developed based on Gradio.
### Training Framework ### Training Framework
In terms of training frameworks, we tested HuggingFace's open-source Accelerate library, PyTorch Lightning, and HPC-AI's open-source ColossalAI. We found that their performance differences are relatively small when fully utilizing GPUs. Therefore, we chose the relatively simple-to-implement Accelerate library as the training framework. In terms of training frameworks, we tested Hugging Face's open-source Accelerate library, PyTorch Lightning, and HPC-AI's open-source ColossalAI. We found that their performance differences are relatively small when fully utilizing GPUs. Therefore, we chose the relatively simple-to-implement Accelerate library as the training framework.
The test code can be found in utils/speed_test.py. The test code can be found in utils/speed_test.py.
@ -346,7 +346,7 @@ The model structure used during the testing process is:
| GPT2 | 2 | 6 | heads | 4096 | 250100 | 1024 | | GPT2 | 2 | 6 | heads | 4096 | 250100 | 1024 |
The test results are shown below, indicating that when the GPUs are fully utilized, the differences in speed and memory consumption are not significant. The test results are shown below, indicating that when the GPUs are fully utilized, the differences in speed and memory consumption are not significant.
| | HuggingFace | HuggingFace | ColossalAI | ColossalAI | ColossalAI | | | Hugging Face | Hugging Face | ColossalAI | ColossalAI | ColossalAI |
|-----------------|-----------------------------------|------------------------------------|--------------------------------------------------------|--------------------------------------------------------|------------------------------------| |-----------------|-----------------------------------|------------------------------------|--------------------------------------------------------|--------------------------------------------------------|------------------------------------|
| config | without activation ckpt, bs2 | without activation ckpt, max_bs=12 | with activation ckpt, bs2 | without activation ckpt, bs2 | without activation ckpt, max_bs=10 | | config | without activation ckpt, bs2 | without activation ckpt, max_bs=12 | with activation ckpt, bs2 | without activation ckpt, bs2 | without activation ckpt, max_bs=10 |
| second pre step | 0.336, fw=0.033, bw=0.3, opt=5e-6 | 1.25 | 0.347 | 0.308, fw=0.067, bw=0.152, opt=0.088 | 1.055 | | second pre step | 0.336, fw=0.033, bw=0.3, opt=5e-6 | 1.25 | 0.347 | 0.308, fw=0.067, bw=0.152, opt=0.088 | 1.055 |
@ -403,4 +403,4 @@ The following table summarizes the performance of currently available open-sourc
<a href="https://star-history.com/#s-JoL/Open-Llama&Date"> <a href="https://star-history.com/#s-JoL/Open-Llama&Date">
<img src="https://api.star-history.com/svg?repos=s-JoL/Open-Llama&type=Date" alt="Star History Chart"> <img src="https://api.star-history.com/svg?repos=s-JoL/Open-Llama&type=Date" alt="Star History Chart">
</a> </a>
</p> </p>