Is there a reason why Python GPUGrid tasks are using over 50% of 12c/24t Ryzen 9 3900x?

Message boards : Number crunching : Is there a reason why Python GPUGrid tasks are using over 50% of 12c/24t Ryzen 9 3900x?

Author	Message
ncoded.com Send message Joined: 16 Aug 16 Posts: 20 Credit: 496,721,413 RAC: 0 Level Scientific publications	Message 58835 - Posted: 22 May 2022 \| 0:33:03 UTC Last modified: 22 May 2022 \| 0:53:32 UTC
	Is there a reason why Python GPUGrid tasks are using over 50% CPU load on a 12c/24t Ryzen 9 3900x? The only app running on this host is BOINC. All non-GPUGrid projects have been suspended. As you can see in the screenshot, Windows task manager is reporting that this Python task is using over 50% of the full Ryzen CPU. CoreTemp backs this up. GPU-Z is showing that around 15% (average) is being used on the GPU. Anyone know what is going on here? https://imgur.com/KjPvmZn (screenshot)
	ID: 58835 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 37,000,532,483 RAC: 40,036,101 Level Scientific publications	Message 58836 - Posted: 22 May 2022 \| 0:58:41 UTC
	it's normal for the python tasks. ____________
	ID: 58836 \| Rating: 0 \| rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 98 Credit: 15,299,400,388 RAC: 1,226,564 Level Scientific publications	Message 58837 - Posted: 22 May 2022 \| 1:18:39 UTC - in response to Message 58835. Last modified: 22 May 2022 \| 1:23:45 UTC
	ncoded, I'm not the master guru here but I'm actually impressed the python app is actually running on your system. I have been jammed up with ones that don't get past 2%. See the other post on this forum here: https://www.gpugrid.net/forum_thread.php?id=5316 Your systems are hidden so I can't see what version of Windows you are running and I'm very curious how much memory and swap space you have. If you don't mind posting the specs back here that would be interesting. The short answer is I believe it is running as intended. The python app is very CPU and memory intensive. It doesn't appear to use the GPU as much. Be sure to let this task run and see if it actually completes properly. The longer answer is you should open up the Performance tab and look at the Resource Monitor. Check how many python images are really running there. I have been finding multiples which are also using a chunk of memory for each one.
	ID: 58837 \| Rating: 0 \| rate: / Reply Quote

ncoded.com Send message Joined: 16 Aug 16 Posts: 20 Credit: 496,721,413 RAC: 0 Level Scientific publications	Message 58838 - Posted: 22 May 2022 \| 1:45:11 UTC Last modified: 22 May 2022 \| 2:00:00 UTC
	Ian&Steve C: Thanks for the heads up. jjch: Host is: Windows 10 Pro v10.0.19044 Ryzen 9 3900X GTX 1080TI 80GB Memory Windows default swap space Resource monitor screenshot: https://imgur.com/NNlqP7N If I remember rightly when we tested 2 hosts, RTX3060TI had issues with current GPUGrid tasks, but the GTX1080TI were okay. Both hosts were exactly the same apart from the GPU. I will confirm if the actual task completes okay.
	ID: 58838 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,125,306,959 RAC: 9,532,326 Level Scientific publications	Message 58839 - Posted: 22 May 2022 \| 2:22:28 UTC - in response to Message 58835.
	The Python on GPU application is actually a machine learning type that uses bursts of both cpu and gpu for reinforcement learning. What you observe is normal. Read this for the explanation of the application by the scientiest-developer. The core research idea is to train populations of reinforcement learning agents that learn independently for a certain amount of time and, once they return to the server, put their learned knowledge in common with other agents to create a new generation of agents equipped with the information acquired by previous generations. Each GPUgrid job is one of these agents doing some training independently. Having alternating phases of lower and higher GPU utilisation is normal in Reinforcement Learning, as the agent alternates between data collection (generally low GPU usage) and training (higher GPU memory and utilisation). Once we solve most of the errors we will focus on maximizing GPU efficiency during the training phases.
	ID: 58839 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,125,306,959 RAC: 9,532,326 Level Scientific publications	Message 58840 - Posted: 22 May 2022 \| 2:34:03 UTC Last modified: 22 May 2022 \| 2:35:13 UTC
	The issue with Windows and the too small paging file error has to do with PyTorch and Windows. This piece of a Stackoverflow thread on the topic explains it thusly: It seems the issue is caused by NVidia fatbins (.nv_fatb) being loaded into memory. Several DLLs, such as cusolver64_xx.dll, torcha_cuda_cu.dll, and a few others, have .nv_fatb sections in them. These contain tons of different variations of CUDA code for different GPUs, so it ends up being several hundred megabytes to a couple gigabytes. When Python imports 'torch' it loads these DLLs, and maps the .nv_fatb section into memory. For some reason, instead of just being a memory mapped file, it is actually taking up memory. The section is set as 'copy on write', so it's possible something writes into it? I don't know. But anyway, if you look at Python using VMMap ( https://docs.microsoft.com/en-us/sysinternals/downloads/vmmap ) you can see that these DLLs are committing huge amounts of committed memory for this .nv_fatb section. The frustrating part is it doesn't seem to be using the memory. For example, right now my Python.exe has 2.7GB committed, but the working set is only 148MB. Every Python process that loads these DLLs will commit several GB of memory loading these DLLs. So if 1 Python process is wasting 2GB of memory, and you try running 8 workers, you need 16GB of memory to spare just to load the DLLs. It really doesn't seem like this memory is used, just committed. The problem ultimately comes down to how the .nv_fatb file sections are handled. They use up a lot of memory being mapped even if they are never being used. Supposedly solved in CUDA 11.7 but I don't believe that version is released under the general current Nvidia drivers. You would need to load the latest developer version. One workaround is not let the application use all of the virtual threads of the cpu. I have noticed on my 32 thread Ryzens that there are 32 python instances loaded for each task. So the workaround is to disable virtual threads and only use physical cores when crunching the task on Windows.
	ID: 58840 \| Rating: 0 \| rate: / Reply Quote

ncoded.com Send message Joined: 16 Aug 16 Posts: 20 Credit: 496,721,413 RAC: 0 Level Scientific publications	Message 58841 - Posted: 22 May 2022 \| 3:23:15 UTC
	Keith Myers: Thank you, that was really helpful and informative. JJRC: Task completed successfully in just under 13 hrs on GTX1080TI.
	ID: 58841 \| Rating: 0 \| rate: / Reply Quote

jjch Send message Joined: 10 Nov 13 Posts: 98 Credit: 15,299,400,388 RAC: 1,226,564 Level Scientific publications	Message 58842 - Posted: 23 May 2022 \| 0:54:37 UTC
	I would expect that 80GB of memory is enough for the GPUgrid python tasks to run successfully. Two of my old HP workstations only have 8GB so that is why I was trying to bump up the swap space. What I really need to know is the minimum or recommended amount of memory for the python app. From what I can tell from a Linux user I'm thinking it actually needs a minimum of 16GB to start but even another Windows system with 32GB was having the problems starting. I do have a few servers with more memory but those seemed to be having a problem as well. I can try just enabling GPUgrid on those and see how it goes. If there were more tasks available it would be easier to troubleshoot these quicker.
	ID: 58842 \| Rating: 0 \| rate: / Reply Quote

ncoded.com Send message Joined: 16 Aug 16 Posts: 20 Credit: 496,721,413 RAC: 0 Level Scientific publications	Message 58850 - Posted: 23 May 2022 \| 16:06:51 UTC - in response to Message 58842.
	I would be surprised if 32GB was not enough. The most I have seen used (not that I am logging use over time), is around 20GB, and that was for both GPU Python tasks and 40% Rosetta tasks. Something I have noticed though (now I have seen these Python tasks running on different systems) is that the tasks seem to use 50+% of available cores/threads rather than N threads. E.G. 50+% threads used on 12c/24t Ryzen 9 50+% threads used on 22c/44t Xeon V4
	ID: 58850 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,125,306,959 RAC: 9,532,326 Level Scientific publications	Message 58851 - Posted: 23 May 2022 \| 21:08:53 UTC
	I just looked at the task running on my Epyc 7402P 24C/48T processor running Ubuntu 20.04.4 LTS Linux. The task spawned 34 processes using 15% of the cpu. So nowhere near 50% of the cpu you are seeing on your Windows host.
	ID: 58851 \| Rating: 0 \| rate: / Reply Quote

ncoded.com Send message Joined: 16 Aug 16 Posts: 20 Credit: 496,721,413 RAC: 0 Level Scientific publications	Message 58857 - Posted: 24 May 2022 \| 6:58:21 UTC - in response to Message 58851.
	Thank you Keith, that is very interesting. I'll probably test out some non-EPYC CPUs later on Linux.
	ID: 58857 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : Is there a reason why Python GPUGrid tasks are using over 50% of 12c/24t Ryzen 9 3900x?

	About	Science	Volunteers	Performance	Forum	Join us	Donate