Advanced search

Message boards : Number crunching : PythonGPU tasks: Issues And Answers Thread

Author Message
Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59879 - Posted: 5 Feb 2023 | 1:41:18 UTC

Hello and best wishes to all,

I'm endeavoring to start this thread in order to document and discuss the behaviour of PythonGPU tasks in detail. I'm suggesting and encouraging that the discussion of individual host issues be held here in order to less complicate the main news thread which announces Python development.

Please continue this thread by noting your observations of anomalies and errors, along with any known remedies. This should be helpful to us and abouh for reference.

Those experiencing errors and crashes should check the items below as they appear to cause most of the errors.
_____________

Let's begin by listing some of my most commonly observed characteristics of the PythonGPU app:


* Needs up to 16GB of RAM (while expanding at startup) and insane amounts (45-65GB or more) of commit charge. Recommend to set the swap file to at least 50GB manually and be prepared to increase it if you need to.

* Needs 6GB minimum graphics memory (though some tasks will run on 4GB). Be sure to use the latest NVIDIA drivers. (I run GeForce Experience to automatically keep up-to-date.)
Your GPU must be CUDA version 11.31 capable.

* GPU (and CPU) will run in an oscillation-like pattern with frequent sporadic GPU activity spikes mirroring simultaneous drops in CPU usage. (This may be the CPU giving the GPU tasks and waiting for results as I naively understand it.)

* Uses more CPU than it states in the properties. This app will spawn threads on every available CPU and use up to (or more than) 60% of all logical CPUs. That appears to be why it generates CPU time stats much greater than the host's actual run times. The time window of task completion seems more dependant on CPU size and speed than GPU capabilities.

* The amount of BOINC resources Python tasks use cannot be controlled in the manager preferences. Tasks from other projects can be limited to the amount of CPU usage % remaining while the Python app is running to avoid maxxing out the CPU.

* May collide in memory with apps from other projects. Not proven, but circumstancial evidence suggests it has happened on my Ryzen host. Erich56 has stated that these tasks run best without any other project's competition for the BOINC-alloted CPU time.

____________

Again, please add to this thread with any issues, observations, corrections, or tips on running Python for GPU WUs.
____________
Many thanks to all who have helped me here at GPUGrid.
____________
"Together we crunch
To check out a hunch
And wish all our credit
Could just buy us lunch"


Piasa Tribe - Illini Nation

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1330
Credit: 7,039,492,459
RAC: 15,642,634
Level
Tyr
Scientific publications
watwatwatwatwat
Message 59883 - Posted: 5 Feb 2023 | 21:12:10 UTC
Last modified: 5 Feb 2023 | 21:12:40 UTC

For Linux users the tasks run very differently from what you've stated.

The tasks always spawn 32 separate python processes, no matter the cpu cores available. No more. No less.

I certainly don't use ALL of the available cpu cores on my 128 Epyc hosts just for the one Python gpu task.

Most of the cpu cores on my hosts run other cpu applications for other projects, primarily Universe.

Looking in htop the main python process runs from 2-4% of the cpu which is about 1 core. The 32 spawned process each use 0.7% of the cpu which is .224 of a core.

Pop Piasa
Avatar
Send message
Joined: 8 Aug 19
Posts: 252
Credit: 458,054,251
RAC: 0
Level
Gln
Scientific publications
watwat
Message 59885 - Posted: 6 Feb 2023 | 16:32:44 UTC - in response to Message 59883.

Thanks Keith,
It's challenging to get the "big picture" on these experimental tasks as the parameters keep changing, also.
I see that recently built tasks use much less graphics memory than I thought. Now they will run on 4GB quite well. The amount of processor usage has also noticeably dropped in Windows on my hosts with over 8 threads.

I have no experience with professional market processors or Linux so my observations may often be somewhat naïve in ther conception.
I'm not qualified to teach anyone, that's what I hope others (more educated) will step in and accomplish here.
Your post is a step in that direction, thanks again. Since there are so many folks having trouble running these, I hope we all can jot down any enigmatic characteristics we see and work-arounds for them.

In any case, my goal here is to provide a reference guide for other hobbyists like myself to easily identify and remedy the issues keeping them from successfully completing tasks and wasting time and power($).
This will undoubtedly speed the project up for the researchers, who are the reason for us volunteering in my view.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1064
Credit: 40,231,533,983
RAC: 55,339
Level
Trp
Scientific publications
wat
Message 59887 - Posted: 6 Feb 2023 | 17:29:20 UTC - in response to Message 59883.



The tasks always spawn 32 separate python processes, no matter the cpu cores available. No more. No less.



it's actually 36 processes per task now.

32x for the agent training (multiprocessing.spawn)
4x for the main program (run.py)
____________

KAMasud
Send message
Joined: 27 Jul 11
Posts: 137
Credit: 523,901,354
RAC: 0
Level
Lys
Scientific publications
watwat
Message 59891 - Posted: 7 Feb 2023 | 16:59:33 UTC - in response to Message 59879.
Last modified: 7 Feb 2023 | 17:00:24 UTC

Hello and best wishes to all,

I'm endeavoring to start this thread in order to document and discuss the behaviour of PythonGPU tasks in detail. I'm suggesting and encouraging that the discussion of individual host issues be held here in order to less complicate the main news thread which announces Python development.

Please continue this thread by noting your observations of anomalies and errors, along with any known remedies. This should be helpful to us and abouh for reference.

Those experiencing errors and crashes should check the items below as they appear to cause most of the errors.
_____________

Let's begin by listing some of my most commonly observed characteristics of the PythonGPU app:


* Needs up to 16GB of RAM (while expanding at startup) and insane amounts (45-65GB or more) of commit charge. Recommend to set the swap file to at least 50GB manually and be prepared to increase it if you need to.

* Needs 6GB minimum graphics memory (though some tasks will run on 4GB). Be sure to use the latest NVIDIA drivers. (I run GeForce Experience to automatically keep up-to-date.)
Your GPU must be CUDA version 11.31 capable.

* GPU (and CPU) will run in an oscillation-like pattern with frequent sporadic GPU activity spikes mirroring simultaneous drops in CPU usage. (This may be the CPU giving the GPU tasks and waiting for results as I naively understand it.)

* Uses more CPU than it states in the properties. This app will spawn threads on every available CPU and use up to (or more than) 60% of all logical CPUs. That appears to be why it generates CPU time stats much greater than the host's actual run times. The time window of task completion seems more dependant on CPU size and speed than GPU capabilities.

* The amount of BOINC resources Python tasks use cannot be controlled in the manager preferences. Tasks from other projects can be limited to the amount of CPU usage % remaining while the Python app is running to avoid maxxing out the CPU.

* May collide in memory with apps from other projects. Not proven, but circumstancial evidence suggests it has happened on my Ryzen host. Erich56 has stated that these tasks run best without any other project's competition for the BOINC-alloted CPU time.

____________

Again, please add to this thread with any issues, observations, corrections, or tips on running Python for GPU WUs.
____________
Many thanks to all who have helped me here at GPUGrid.

-----
Your last point, "May collide". It was happening before but something has been changed by abou. Now I can run three Einstein WUs on my six-core, twelve-thread Intel machines without error.

Post to thread

Message boards : Number crunching : PythonGPU tasks: Issues And Answers Thread

//