Advanced search

Message boards : Number crunching : problems since ACEMD 2.10

Author Message
EdwardPF
Send message
Joined: 24 Nov 12
Posts: 17
Credit: 453,679,903
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 54338 - Posted: 15 Apr 2020 | 20:54:07 UTC

I have been running SETi and using GPUGRID as my backup project for quite some time.

Since the demise of SETI I have been running GPUGRID for my GPUs and rosetta on my CPUs and all has been fine. I have been running 2 WUs (one each) on my 2 nvidia 1070s and 9 wus on my 12 CPU AMD computer (leaving 1 CPU and 12 Logical CPUs idle). That gave me 2 GPUs busy and 2 WUs "in the wings" ready to run. BOINC 7.14.2 Win-10.

BUT

Since the introduction of ACEMD 2.10 I am only running 1 WU (on GPU 0) nothing on GPU 1 and 3 WUs "in the wings".

I have plenty of memory (32Gb total) plenty of page file (64Gb) and plenty of free disk space (232Gb) as well as "13 process slots". I have scheduled 4.5 days of work plus .1 day of "additional' work".

As far as I know the only thing to change is the move to V2.10 .

running win-10 with most current nvidia driver 445.75.

BOINC is set to use 100% memory and 75% page/swap space.

also ...

GPUGRID will run 2 NVIDIA GPU's (0 and 1) IFF I suspend rosetta. (now there's a twist ... some sort of memory shortage?? I shut rosetta down to 4 cpu's and it didn't change the GPUGRID problem at all)

Ed Frybarger

P.S.

Ed, You may need to add this line to your cc_config.xml file:
<use_all_gpus>1</use_all_gpus>


This line was already in cc_config as implied by:
I have been running 2 WUs (one each) on my 2 nvidia 1070s


but thanks for the thought

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 211
Credit: 4,496,324,562
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 54339 - Posted: 15 Apr 2020 | 21:22:40 UTC - in response to Message 54338.

Without knowing how Rosetta works, it's hard to say. But if turning it off gets both GPUs to run, then you already have your answer. It might be how much memory each work unit is using. Maybe they require a large amount of CPU cycles above what they say they use.(seen this in several projects where they say 1 thread but end up using 2-3 per work unit) Maybe it's how much is being written to the Hard drive each time. There could be a number of reasons. One method would be to start with only 1 work unit of Rosetta running and increase by 1 each time until the second GPU drops off. Then go down by 1 to keep both GPUs running.
____________

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1065
Credit: 40,231,533,983
RAC: 22,690
Level
Trp
Scientific publications
wat
Message 54340 - Posted: 15 Apr 2020 | 22:05:01 UTC

how much memory is actually being used on the system while it's running?

I've heard that Rosetta can use quite a lot of memory.
____________

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 8,995,312,024
RAC: 16,427,753
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54341 - Posted: 15 Apr 2020 | 22:24:26 UTC
Last modified: 15 Apr 2020 | 22:49:37 UTC

Since the introduction of ACEMD 2.10 I am only running 1 WU (on GPU 0) nothing on GPU 1 and 3 WUs "in the wings".

I see that you have set your preferences to leave enough CPU threads unused, and your system has as much as 32 GB RAM.
Try for a while to add this "app_config.xml" on your C:\ProgramData\BOINC\projects\www.gpugrid.net directory and restart.

<app_config>
<app>
<name>acemd3</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.49</cpu_usage>
</gpu_versions>
</app>
</app_config>

ACEMD3 still will use two full CPU threads through wrapper, but BOINC Manager will "think" it needs only one CPU thread for two concurrent GPU tasks...

Another more exotic remedy to try:
If your monitor has two inputs, you can try to connect each graphics card to one of them (one to DVI input and the other to HDMI input, for example), for both cards to be headed.

EdwardPF
Send message
Joined: 24 Nov 12
Posts: 17
Credit: 453,679,903
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 54342 - Posted: 15 Apr 2020 | 23:07:12 UTC

I have tried running with just 1 rosetta WU and BOINC will only run 1 GPUGRID WU.

I'll hold my breath BUT for now specifying .49 CPU equiv's for each GPUGRID IS WORKING for me now. Two CPUs AND 2 GPUs running!!

THANKS!!

If anything new crops up I'll give a call.

Ed F

EdwardPF
Send message
Joined: 24 Nov 12
Posts: 17
Credit: 453,679,903
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 54343 - Posted: 16 Apr 2020 | 0:33:40 UTC

Almost!

GPUGRID runs 2 GPU WUs 'till one of them finish and hangs with 1 WU running and 3 "Ready to Start".

If I suspend GPUGRIT and restart it ... nojoy
If I suspend Rosetta ... nojoy. (not as good as before)
If I shutdown BOINC and restart it ... Joy!!
If I set CPU loas from .49 to .24 and shutdown BOINC and restart it ... nojoy

Ideas??


Ed F

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 581
Credit: 8,995,312,024
RAC: 16,427,753
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54347 - Posted: 16 Apr 2020 | 6:48:58 UTC

Time for some other kind user with experience on Windows multiGPU systems.
I have three of them currently running with no such problems, but all of them are Linux systems.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,148,917,459
RAC: 14,552,567
Level
Tyr
Scientific publications
watwatwatwatwat
Message 54349 - Posted: 16 Apr 2020 | 7:02:23 UTC

I found Rosetta cpu tasks tie up memory, even after you have suspended them or reduced the amount running. And that was not with leave tasks in memory setting in preferences.

I had umpteen tries at reducing the cpu tasks running and still getting the not enough memory message and gpus idling.

Finally found that running 50% of the number I was originally running finally freed up enough memory for all the gpus to start crunching.

Running Rosetta and any gpu project is a challenge and won't be able to run the amount of tasks you think your cpu and memory you think should be capable of running.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1124
Credit: 9,049,770,176
RAC: 27,656,577
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwat
Message 54350 - Posted: 16 Apr 2020 | 7:51:48 UTC - in response to Message 54349.
Last modified: 16 Apr 2020 | 7:56:31 UTC

I found Rosetta cpu tasks tie up memory ...

my experience with Rosetta, over the years, is that maximum RAM a task has ever used was about 1GB (mostly far below).
Of course, the more Rosetta tasks are being processed concurrently, the more RAM they take.

EdwardPF
Send message
Joined: 24 Nov 12
Posts: 17
Credit: 453,679,903
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 54354 - Posted: 16 Apr 2020 | 14:17:42 UTC

Rosetta takes a lot of RAM ... that's for sure ... I had 6 of them running in the 1 to 1.5 Gb range at the same time on occasion.

I would think (what do I know) 32 GB memory 64Gb swap on a NVME M.2 and 13 free CPUs would just shrug off the load ... I'll keep looking ...

Please keep feeding ideas to me!!

Thanks

Ed F

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 54355 - Posted: 16 Apr 2020 | 15:35:28 UTC

I am running Rosettas on 9 Ubuntu machines and 1 Win7 at the moment (over 100 cores). There is no problem running them with a GPU; you just reserve a core, as usual. I switch between GPUGrid and Folding on the GPU, depending on who has work (GPUGrid at the moment).

The Rosettas take up a lot of memory when a new series is first introduced. They reduce that after a few days to a more manageable level, usually less than 500 MB. But at the moment, I have one running at 2979 MB, and another at 2923 MB; a few more around 2 GB. All my machines have 32 GB memory, except for a Ryzen 2700 with only 16 GB. On that one, I run a mix of Rosettas (100% resource share) and TN-Grid (60% resource share) to keep the memory within bounds. They actually run quite well.

EdwardPF
Send message
Joined: 24 Nov 12
Posts: 17
Credit: 453,679,903
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwat
Message 54356 - Posted: 17 Apr 2020 | 7:05:05 UTC

Since my last post I have done nothing but wait ...

This past A.M., 4/16/2020, I checked in on GPUGRID and it was running on 2 GPUs with 2 Ready to start!!

Now, as I retire, things are STILL running fine ... must be some kind of counter that needed to fix itself(??!!)

Ed F

Post to thread

Message boards : Number crunching : problems since ACEMD 2.10

//