Quantcast
Channel: CPU overheats and PC shuts down when swap is full - Unix & Linux Stack Exchange
Viewing all articles
Browse latest Browse all 4

CPU overheats and PC shuts down when swap is full

$
0
0

This doesn't necessarily have to be a Linux problem but I'll ask it here anyway. I'm using a workstation mainly for training deep learning and machine learning models. I run training codes on both CPU and GPU.

CPU: AMD Ryzen 9 5950X 16-Core Processor

GPU: NVIDIA GeForce RTX 3090

OS: Ubuntu 22.04 LTS

The libraries that I use (PyTorch, XGBoost, LightGBM and etc.) utilize swap memory a lot for data loading. While working on big datasets, swap memory accumulates slowly and exceeds the limit (2GB). When that happens, all of the cores go crazy and CPU overheats. Workstation shuts down itself couple seconds later.

I'm a data scientist and I'm not good with hardware. It took couple weeks for me to figure out why my workstation was keep shutting itself down. I have to find a way to prevent this since I can't progress on my own tasks anymore. What are your suggestions?

To give you more details, this wasn't happening 3-4 months ago. It started very recently.

Edit: Added nvidia-smi and sensors outputs while training two models (UNet and YOLOv6) simultaneously.

nvidia-smi

+-----------------------------------------------------------------------------+| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||                               |                      |               MIG M. ||===============================+======================+======================||   0  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A ||100%   79C    P2   338W / 350W |  14171MiB / 24576MiB |    100%      Default ||                               |                      |                  N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                                  ||  GPU   GI   CI        PID   Type   Process name                  GPU Memory ||        ID   ID                                                   Usage      ||=============================================================================||    0   N/A  N/A      1361      G   /usr/lib/xorg/Xorg                 56MiB ||    0   N/A  N/A      1568      G   /usr/bin/gnome-shell               10MiB ||    0   N/A  N/A     27955      C   python                           2743MiB ||    0   N/A  N/A     31692      C   python                          11355MiB |+-----------------------------------------------------------------------------+

sensors

nvme-pci-0300Adapter: PCI adapterComposite:    +74.8°C  (low  = -273.1°C, high = +84.8°C)                       (crit = +84.8°C)Sensor 1:     +74.8°C  (low  = -273.1°C, high = +65261.8°C)Sensor 2:     +74.8°C  (low  = -273.1°C, high = +65261.8°C)iwlwifi_1-virtual-0Adapter: Virtual devicetemp1:        +57.0°C  k10temp-pci-00c3Adapter: PCI adapterTctl:         +87.8°C  Tccd1:        +89.2°C  Tccd2:        +79.5°C

Viewing all articles
Browse latest Browse all 4

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>