add relevant links

2023-03-27 02:42:22 +08:00 · 2023-03-27 02:42:22 +08:00 · ee80f3a5cf
commit ee80f3a5cf
parent 0c77c87b8d
2 changed files with 11 additions and 11 deletions
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@
 * @Author: LiangSong(sl12160010@gmail.com)
 * @Date: 2023-03-10 21:18:35
 * @LastEditors: LiangSong(sl12160010@gmail.com)
- * @LastEditTime: 2023-03-27 02:34:07
+ * @LastEditTime: 2023-03-27 02:40:54
 * @FilePath: /Open-Llama/README.md
 * @Description: 
 * 
@ -41,8 +41,8 @@ Open-Llama是一个开源项目，提供了一整套用于构建大型语言模
 - Python 3.7 或更高版本
 - PyTorch 1.11 或更高版本
- Transformers 库
+- [Transformers库](https://huggingface.co/docs/transformers/index)
- Accelerate库
+- [Accelerate库](https://huggingface.co/docs/accelerate/index)
 - CUDA 11.1 或更高版本（用于 GPU 加速，基于CUDA11.7进行测试）
 ## **入门指南**
@ -91,8 +91,8 @@ python3 dataset/pretrain_dataset.py
 ```
 ### 模型结构
-我们基于Transformers库中的Llama参考论文原文中的2.4 Efficient implementation一节进行了修改，
+我们基于Transformers库中的[Llama](https://github.com/facebookresearch/llama)参考论文原文中的2.4 Efficient implementation一节进行了修改，
-同时还参考了一些其他论文引入了一些优化。具体来说，我们引入了由META开源的xformers库中的memory_efficient_attention操作来进行
+同时还参考了一些其他论文引入了一些优化。具体来说，我们引入了由META开源的[xformers库](https://github.com/facebookresearch/xformers)中的memory_efficient_attention操作来进行
 Self Attention的计算，这对于性能有明显的提升，提升大约30%。
 具体可以参见[modeling_llama.py](https://github.com/Bayes-Song/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L240)
@ -174,7 +174,7 @@ Total mult-adds (G): 6.89
 2. 开源预训练好的多语言Llama 6.9B的checkpoint
 3. 实现Instruction-tuning代码，并开源相关checkpoint
 4. 使用Gradio搭建在线Demo
-5. 使用Triton加入更多高性能算子，进一步提升性能
+5. 使用[Triton](https://github.com/openai/triton)加入更多高性能算子，进一步提升性能
 6. 加入根据Common Crawl构建预训练数据集相关代码，并开源相关数据集
 7. 加入多模态训练代码
--- a/README_en.md
+++ b/README_en.md
@ -2,7 +2,7 @@
 * @Author: LiangSong(sl12160010@gmail.com)
 * @Date: 2023-03-10 21:18:35
 * @LastEditors: LiangSong(sl12160010@gmail.com)
- * @LastEditTime: 2023-03-27 02:35:39
+ * @LastEditTime: 2023-03-27 02:41:39
 * @FilePath: /Open-Llama/README_en.md
 * @Description: 
 * 
@ -34,8 +34,8 @@ When training language models, we aim to build a universal model that can be use
 ## **Requirements**
 - Python 3.7 or higher
 - PyTorch 1.11 or higher
- Transformers library
+- [Transformers library](https://huggingface.co/docs/transformers/index)
- Accelerate library
+- [Accelerate library](https://huggingface.co/docs/accelerate/index)
 - CUDA 11.1 or higher version (for GPU acceleration, tested based on CUDA 11.7)
 ## **Getting Started**
 ### Installation
@ -81,7 +81,7 @@ Check the DataLoader output with the following command:
 python3 dataset/pretrain_dataset.py
 ```
 ### Model Structure
-We modified the Llama model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the xformers library by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details.
+We modified the [Llama](https://github.com/facebookresearch/llama) model in the Transformers library based on section 2.4 "Efficient Implementation" in the original paper and introduced some optimizations from other papers. Specifically, we introduced the memory_efficient_attention operation from the [xformers library](https://github.com/facebookresearch/xformers) by META for computing self-attention, which significantly improves performance by about 30%. Please refer to modeling_llama.py for details.
 We also referred to Bloom for introducing stable embeddings for better training of token embeddings.
@ -163,7 +163,7 @@ The paper mentions that they trained the 6.7B model with 1T tokens, and the GPU
 2. Realease the pre-trained checkpoint for the multi-lingual Llama 6.9B model.
 3. Implement instruction-tuning code and open-source related checkpoints.
 Build an online demo using Gradio.
-4. Use Triton to add more high-performance operators and further improve performance.
+4. Use [Triton](https://github.com/openai/triton) to add more high-performance operators and further improve performance.
 5. Add code for building pre-training datasets based on Common Crawl and open-source related datasets.
 6. Add code for multi-modal training.
 ## Citation