add relevant links

This commit is contained in:
LiangSong 2023-03-27 02:42:22 +08:00
parent 0c77c87b8d
commit ee80f3a5cf
2 changed files with 11 additions and 11 deletions

View File

@ -2,7 +2,7 @@
* @Author: LiangSong(sl12160010@gmail.com) * @Author: LiangSong(sl12160010@gmail.com)
* @Date: 2023-03-10 21:18:35 * @Date: 2023-03-10 21:18:35
* @LastEditors: LiangSong(sl12160010@gmail.com) * @LastEditors: LiangSong(sl12160010@gmail.com)
* @LastEditTime: 2023-03-27 02:34:07 * @LastEditTime: 2023-03-27 02:40:54
* @FilePath: /Open-Llama/README.md * @FilePath: /Open-Llama/README.md
* @Description: * @Description:
* *
@ -41,8 +41,8 @@ Open-Llama是一个开源项目提供了一整套用于构建大型语言模
- Python 3.7 或更高版本 - Python 3.7 或更高版本
- PyTorch 1.11 或更高版本 - PyTorch 1.11 或更高版本
- Transformers - [Transformers库](https://huggingface.co/docs/transformers/index)
- Accelerate库 - [Accelerate库](https://huggingface.co/docs/accelerate/index)
- CUDA 11.1 或更高版本(用于 GPU 加速基于CUDA11.7进行测试) - CUDA 11.1 或更高版本(用于 GPU 加速基于CUDA11.7进行测试)
## **入门指南** ## **入门指南**
@ -91,8 +91,8 @@ python3 dataset/pretrain_dataset.py
``` ```
### 模型结构 ### 模型结构
我们基于Transformers库中的Llama参考论文原文中的2.4 Efficient implementation一节进行了修改 我们基于Transformers库中的[Llama](https://github.com/facebookresearch/llama)参考论文原文中的2.4 Efficient implementation一节进行了修改
同时还参考了一些其他论文引入了一些优化。具体来说我们引入了由META开源的xformers库中的memory_efficient_attention操作来进行 同时还参考了一些其他论文引入了一些优化。具体来说我们引入了由META开源的[xformers库](https://github.com/facebookresearch/xformers)中的memory_efficient_attention操作来进行
Self Attention的计算这对于性能有明显的提升提升大约30%。 Self Attention的计算这对于性能有明显的提升提升大约30%。
具体可以参见[modeling_llama.py](https://github.com/Bayes-Song/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L240) 具体可以参见[modeling_llama.py](https://github.com/Bayes-Song/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L240)
@ -174,7 +174,7 @@ Total mult-adds (G): 6.89
2. 开源预训练好的多语言Llama 6.9B的checkpoint 2. 开源预训练好的多语言Llama 6.9B的checkpoint
3. 实现Instruction-tuning代码并开源相关checkpoint 3. 实现Instruction-tuning代码并开源相关checkpoint
4. 使用Gradio搭建在线Demo 4. 使用Gradio搭建在线Demo
5. 使用Triton加入更多高性能算子进一步提升性能 5. 使用[Triton](https://github.com/openai/triton)加入更多高性能算子,进一步提升性能
6. 加入根据Common Crawl构建预训练数据集相关代码并开源相关数据集 6. 加入根据Common Crawl构建预训练数据集相关代码并开源相关数据集
7. 加入多模态训练代码 7. 加入多模态训练代码

View File

@ -2,7 +2,7 @@
* @Author: LiangSong(sl12160010@gmail.com) * @Author: LiangSong(sl12160010@gmail.com)
* @Date: 2023-03-10 21:18:35 * @Date: 2023-03-10 21:18:35
* @LastEditors: LiangSong(sl12160010@gmail.com) * @LastEditors: LiangSong(sl12160010@gmail.com)
* @LastEditTime: 2023-03-27 02:35:39 * @LastEditTime: 2023-03-27 02:41:39
* @FilePath: /Open-Llama/README_en.md * @FilePath: /Open-Llama/README_en.md
* @Description: * @Description:
* *
@ -34,8 +34,8 @@ When training language models, we aim to build a universal model that can be use
## **Requirements** ## **Requirements**
- Python 3.7 or higher - Python 3.7 or higher
- PyTorch 1.11 or higher - PyTorch 1.11 or higher
- Transformers library - [Transformers library](https://huggingface.co/docs/transformers/index)
- Accelerate library - [Accelerate library](https://huggingface.co/docs/accelerate/index)
- CUDA 11.1 or higher version (for GPU acceleration, tested based on CUDA 11.7) - CUDA 11.1 or higher version (for GPU acceleration, tested based on CUDA 11.7)
## **Getting Started** ## **Getting Started**
### Installation ### Installation
@ -81,7 +81,7 @@ Check the DataLoader output with the following command:
python3 dataset/pretrain_dataset.py python3 dataset/pretrain_dataset.py
``` ```
### Model Structure ### Model Structure
We modified the Llama model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the xformers library by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details. We modified the [Llama](https://github.com/facebookresearch/llama) model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the [xformers library](https://github.com/facebookresearch/xformers) by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details.
We also referred to Bloom for introducing stable embeddings for better training of token embeddings. We also referred to Bloom for introducing stable embeddings for better training of token embeddings.
@ -163,7 +163,7 @@ The paper mentions that they trained the 6.7B model with 1T tokens, and the GPU
2. Realease the pre-trained checkpoint for the multi-lingual Llama 6.9B model. 2. Realease the pre-trained checkpoint for the multi-lingual Llama 6.9B model.
3. Implement instruction-tuning code and open-source related checkpoints. 3. Implement instruction-tuning code and open-source related checkpoints.
Build an online demo using Gradio. Build an online demo using Gradio.
4. Use Triton to add more high-performance operators and further improve performance. 4. Use [Triton](https://github.com/openai/triton) to add more high-performance operators and further improve performance.
5. Add code for building pre-training datasets based on Common Crawl and open-source related datasets. 5. Add code for building pre-training datasets based on Common Crawl and open-source related datasets.
6. Add code for multi-modal training. 6. Add code for multi-modal training.
## Citation ## Citation