Update Tutorial

2024-06-15 18:41:15 +00:00 · 2024-06-15 18:41:15 +00:00 · 932192bc6d
commit 932192bc6d
parent 4808034128
1 changed files with 9 additions and 3 deletions
--- a/Tutorial.md
+++ b/Tutorial.md
@ -113,8 +113,14 @@

 7. 선행학습(pretrain) 진행

-   컨테이너에 접속한 뒤 다음과 같이 선행학습 명령 수행. 참고: 그래픽 카드 수에 따라 `--num_processes` 값 조정하여야 함.
+    * 컨테이너에 접속한 뒤 다음과 같이 선행학습 명령 수행. 참고: 그래픽 카드 수에 따라 `--num_processes` 값 조정하여야 함.

-   ```
-   $ accelerate launch --num_processes=1 --config_file configs/accelerate_configs/ds_stage1.yaml train_lm.py --train_config configs/pretrain_config.yaml --model_config configs/model_configs/7B.json 
+    ```
+    $ accelerate launch --num_processes=1 --config_file configs/accelerate_configs/ds_stage1.yaml train_lm.py --train_config configs/pretrain_config.yaml --model_config configs/model_configs/7B.json 
+    ```
+
+   * 이슈 사항: GPU 기본 메모리에 비해 주어진 모델이 큼. 메모리와 관련된 옵션(배치사이즈, 단편화 등) 조정하였으나 모델 자체의 크기를 줄이지 않으면 안될 것으로 보임.
+
+    ```
+    torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.11 GiB (GPU 0; 23.64 GiB total capacity; 13.15 GiB already allocated; 9.92 GiB free; 13.15 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
   ```