runtime error

Exit code: 1. Reason: yphen_instruct/f39ac1d28e925b323eae81227eaba4464caced4e/modeling_phi3.py", line 842, in forward attn_outputs, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi_hyphen_3_hyphen_mini_hyphen_4k_hyphen_instruct/f39ac1d28e925b323eae81227eaba4464caced4e/modeling_phi3.py", line 346, in forward attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacity of 22.30 GiB of which 50.69 MiB is free. Process 20845 has 22.25 GiB memory in use. Of the allocated memory 21.64 GiB is allocated by PyTorch, and 316.88 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) 0%| | 0/375 [00:01<?, ?it/s] [W106 16:04:18.798645776 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator()) ============================================================ Iniciando treinamento do modelo EDA ============================================================ ============================================================ Iniciando TensorBoard... Logdir: /app/results TensorBoard estará disponível na interface do HuggingFace Space ============================================================ ✅ TensorBoard iniciado em background ❌ Erro durante o treinamento: Command '['/usr/bin/python', '/app/train.py']' returned non-zero exit status 1.

Container logs:

Fetching error logs...