Training

In the last post, I shared my story of the Kaggle Jigsaw Multilingual Toxic Comment Classification competition. At that time, I only had a 1080Ti with 11G VRAM, and this made it impossible for me to train the SOTA Roberta-XLM large model which requires larger VRAM than what I had. In this post, I want to share some tips about how to reduce the VRAM usage so that you can train larger deep neural networks with your GPU....