add relevant links
This commit is contained in:
parent
0c77c87b8d
commit
ee80f3a5cf
12
README.md
12
README.md
|
@ -2,7 +2,7 @@
|
|||
* @Author: LiangSong(sl12160010@gmail.com)
|
||||
* @Date: 2023-03-10 21:18:35
|
||||
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
||||
* @LastEditTime: 2023-03-27 02:34:07
|
||||
* @LastEditTime: 2023-03-27 02:40:54
|
||||
* @FilePath: /Open-Llama/README.md
|
||||
* @Description:
|
||||
*
|
||||
|
@ -41,8 +41,8 @@ Open-Llama是一个开源项目,提供了一整套用于构建大型语言模
|
|||
|
||||
- Python 3.7 或更高版本
|
||||
- PyTorch 1.11 或更高版本
|
||||
- Transformers 库
|
||||
- Accelerate库
|
||||
- [Transformers库](https://huggingface.co/docs/transformers/index)
|
||||
- [Accelerate库](https://huggingface.co/docs/accelerate/index)
|
||||
- CUDA 11.1 或更高版本(用于 GPU 加速,基于CUDA11.7进行测试)
|
||||
|
||||
## **入门指南**
|
||||
|
@ -91,8 +91,8 @@ python3 dataset/pretrain_dataset.py
|
|||
```
|
||||
|
||||
### 模型结构
|
||||
我们基于Transformers库中的Llama参考论文原文中的2.4 Efficient implementation一节进行了修改,
|
||||
同时还参考了一些其他论文引入了一些优化。具体来说,我们引入了由META开源的xformers库中的memory_efficient_attention操作来进行
|
||||
我们基于Transformers库中的[Llama](https://github.com/facebookresearch/llama)参考论文原文中的2.4 Efficient implementation一节进行了修改,
|
||||
同时还参考了一些其他论文引入了一些优化。具体来说,我们引入了由META开源的[xformers库](https://github.com/facebookresearch/xformers)中的memory_efficient_attention操作来进行
|
||||
Self Attention的计算,这对于性能有明显的提升,提升大约30%。
|
||||
具体可以参见[modeling_llama.py](https://github.com/Bayes-Song/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L240)
|
||||
|
||||
|
@ -174,7 +174,7 @@ Total mult-adds (G): 6.89
|
|||
2. 开源预训练好的多语言Llama 6.9B的checkpoint
|
||||
3. 实现Instruction-tuning代码,并开源相关checkpoint
|
||||
4. 使用Gradio搭建在线Demo
|
||||
5. 使用Triton加入更多高性能算子,进一步提升性能
|
||||
5. 使用[Triton](https://github.com/openai/triton)加入更多高性能算子,进一步提升性能
|
||||
6. 加入根据Common Crawl构建预训练数据集相关代码,并开源相关数据集
|
||||
7. 加入多模态训练代码
|
||||
|
||||
|
|
10
README_en.md
10
README_en.md
|
@ -2,7 +2,7 @@
|
|||
* @Author: LiangSong(sl12160010@gmail.com)
|
||||
* @Date: 2023-03-10 21:18:35
|
||||
* @LastEditors: LiangSong(sl12160010@gmail.com)
|
||||
* @LastEditTime: 2023-03-27 02:35:39
|
||||
* @LastEditTime: 2023-03-27 02:41:39
|
||||
* @FilePath: /Open-Llama/README_en.md
|
||||
* @Description:
|
||||
*
|
||||
|
@ -34,8 +34,8 @@ When training language models, we aim to build a universal model that can be use
|
|||
## **Requirements**
|
||||
- Python 3.7 or higher
|
||||
- PyTorch 1.11 or higher
|
||||
- Transformers library
|
||||
- Accelerate library
|
||||
- [Transformers library](https://huggingface.co/docs/transformers/index)
|
||||
- [Accelerate library](https://huggingface.co/docs/accelerate/index)
|
||||
- CUDA 11.1 or higher version (for GPU acceleration, tested based on CUDA 11.7)
|
||||
## **Getting Started**
|
||||
### Installation
|
||||
|
@ -81,7 +81,7 @@ Check the DataLoader output with the following command:
|
|||
python3 dataset/pretrain_dataset.py
|
||||
```
|
||||
### Model Structure
|
||||
We modified the Llama model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the xformers library by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details.
|
||||
We modified the [Llama](https://github.com/facebookresearch/llama) model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the [xformers library](https://github.com/facebookresearch/xformers) by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details.
|
||||
|
||||
We also referred to Bloom for introducing stable embeddings for better training of token embeddings.
|
||||
|
||||
|
@ -163,7 +163,7 @@ The paper mentions that they trained the 6.7B model with 1T tokens, and the GPU
|
|||
2. Realease the pre-trained checkpoint for the multi-lingual Llama 6.9B model.
|
||||
3. Implement instruction-tuning code and open-source related checkpoints.
|
||||
Build an online demo using Gradio.
|
||||
4. Use Triton to add more high-performance operators and further improve performance.
|
||||
4. Use [Triton](https://github.com/openai/triton) to add more high-performance operators and further improve performance.
|
||||
5. Add code for building pre-training datasets based on Common Crawl and open-source related datasets.
|
||||
6. Add code for multi-modal training.
|
||||
## Citation
|
||||
|
|
Loading…
Reference in New Issue
Block a user