add relevant links
This commit is contained in:
parent
0c77c87b8d
commit
ee80f3a5cf
12
README.md
12
README.md
|
@ -2,7 +2,7 @@
|
||||||
* @Author: LiangSong(sl12160010@gmail.com)
|
* @Author: LiangSong(sl12160010@gmail.com)
|
||||||
* @Date: 2023-03-10 21:18:35
|
* @Date: 2023-03-10 21:18:35
|
||||||
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
||||||
* @LastEditTime: 2023-03-27 02:34:07
|
* @LastEditTime: 2023-03-27 02:40:54
|
||||||
* @FilePath: /Open-Llama/README.md
|
* @FilePath: /Open-Llama/README.md
|
||||||
* @Description:
|
* @Description:
|
||||||
*
|
*
|
||||||
|
@ -41,8 +41,8 @@ Open-Llama是一个开源项目,提供了一整套用于构建大型语言模
|
||||||
|
|
||||||
- Python 3.7 或更高版本
|
- Python 3.7 或更高版本
|
||||||
- PyTorch 1.11 或更高版本
|
- PyTorch 1.11 或更高版本
|
||||||
- Transformers 库
|
- [Transformers库](https://huggingface.co/docs/transformers/index)
|
||||||
- Accelerate库
|
- [Accelerate库](https://huggingface.co/docs/accelerate/index)
|
||||||
- CUDA 11.1 或更高版本(用于 GPU 加速,基于CUDA11.7进行测试)
|
- CUDA 11.1 或更高版本(用于 GPU 加速,基于CUDA11.7进行测试)
|
||||||
|
|
||||||
## **入门指南**
|
## **入门指南**
|
||||||
|
@ -91,8 +91,8 @@ python3 dataset/pretrain_dataset.py
|
||||||
```
|
```
|
||||||
|
|
||||||
### 模型结构
|
### 模型结构
|
||||||
我们基于Transformers库中的Llama参考论文原文中的2.4 Efficient implementation一节进行了修改,
|
我们基于Transformers库中的[Llama](https://github.com/facebookresearch/llama)参考论文原文中的2.4 Efficient implementation一节进行了修改,
|
||||||
同时还参考了一些其他论文引入了一些优化。具体来说,我们引入了由META开源的xformers库中的memory_efficient_attention操作来进行
|
同时还参考了一些其他论文引入了一些优化。具体来说,我们引入了由META开源的[xformers库](https://github.com/facebookresearch/xformers)中的memory_efficient_attention操作来进行
|
||||||
Self Attention的计算,这对于性能有明显的提升,提升大约30%。
|
Self Attention的计算,这对于性能有明显的提升,提升大约30%。
|
||||||
具体可以参见[modeling_llama.py](https://github.com/Bayes-Song/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L240)
|
具体可以参见[modeling_llama.py](https://github.com/Bayes-Song/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L240)
|
||||||
|
|
||||||
|
@ -174,7 +174,7 @@ Total mult-adds (G): 6.89
|
||||||
2. 开源预训练好的多语言Llama 6.9B的checkpoint
|
2. 开源预训练好的多语言Llama 6.9B的checkpoint
|
||||||
3. 实现Instruction-tuning代码,并开源相关checkpoint
|
3. 实现Instruction-tuning代码,并开源相关checkpoint
|
||||||
4. 使用Gradio搭建在线Demo
|
4. 使用Gradio搭建在线Demo
|
||||||
5. 使用Triton加入更多高性能算子,进一步提升性能
|
5. 使用[Triton](https://github.com/openai/triton)加入更多高性能算子,进一步提升性能
|
||||||
6. 加入根据Common Crawl构建预训练数据集相关代码,并开源相关数据集
|
6. 加入根据Common Crawl构建预训练数据集相关代码,并开源相关数据集
|
||||||
7. 加入多模态训练代码
|
7. 加入多模态训练代码
|
||||||
|
|
||||||
|
|
10
README_en.md
10
README_en.md
|
@ -2,7 +2,7 @@
|
||||||
* @Author: LiangSong(sl12160010@gmail.com)
|
* @Author: LiangSong(sl12160010@gmail.com)
|
||||||
* @Date: 2023-03-10 21:18:35
|
* @Date: 2023-03-10 21:18:35
|
||||||
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
||||||
* @LastEditTime: 2023-03-27 02:35:39
|
* @LastEditTime: 2023-03-27 02:41:39
|
||||||
* @FilePath: /Open-Llama/README_en.md
|
* @FilePath: /Open-Llama/README_en.md
|
||||||
* @Description:
|
* @Description:
|
||||||
*
|
*
|
||||||
|
@ -34,8 +34,8 @@ When training language models, we aim to build a universal model that can be use
|
||||||
## **Requirements**
|
## **Requirements**
|
||||||
- Python 3.7 or higher
|
- Python 3.7 or higher
|
||||||
- PyTorch 1.11 or higher
|
- PyTorch 1.11 or higher
|
||||||
- Transformers library
|
- [Transformers library](https://huggingface.co/docs/transformers/index)
|
||||||
- Accelerate library
|
- [Accelerate library](https://huggingface.co/docs/accelerate/index)
|
||||||
- CUDA 11.1 or higher version (for GPU acceleration, tested based on CUDA 11.7)
|
- CUDA 11.1 or higher version (for GPU acceleration, tested based on CUDA 11.7)
|
||||||
## **Getting Started**
|
## **Getting Started**
|
||||||
### Installation
|
### Installation
|
||||||
|
@ -81,7 +81,7 @@ Check the DataLoader output with the following command:
|
||||||
python3 dataset/pretrain_dataset.py
|
python3 dataset/pretrain_dataset.py
|
||||||
```
|
```
|
||||||
### Model Structure
|
### Model Structure
|
||||||
We modified the Llama model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the xformers library by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details.
|
We modified the [Llama](https://github.com/facebookresearch/llama) model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the [xformers library](https://github.com/facebookresearch/xformers) by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details.
|
||||||
|
|
||||||
We also referred to Bloom for introducing stable embeddings for better training of token embeddings.
|
We also referred to Bloom for introducing stable embeddings for better training of token embeddings.
|
||||||
|
|
||||||
|
@ -163,7 +163,7 @@ The paper mentions that they trained the 6.7B model with 1T tokens, and the GPU
|
||||||
2. Realease the pre-trained checkpoint for the multi-lingual Llama 6.9B model.
|
2. Realease the pre-trained checkpoint for the multi-lingual Llama 6.9B model.
|
||||||
3. Implement instruction-tuning code and open-source related checkpoints.
|
3. Implement instruction-tuning code and open-source related checkpoints.
|
||||||
Build an online demo using Gradio.
|
Build an online demo using Gradio.
|
||||||
4. Use Triton to add more high-performance operators and further improve performance.
|
4. Use [Triton](https://github.com/openai/triton) to add more high-performance operators and further improve performance.
|
||||||
5. Add code for building pre-training datasets based on Common Crawl and open-source related datasets.
|
5. Add code for building pre-training datasets based on Common Crawl and open-source related datasets.
|
||||||
6. Add code for multi-modal training.
|
6. Add code for multi-modal training.
|
||||||
## Citation
|
## Citation
|
||||||
|
|
Loading…
Reference in New Issue
Block a user