Advanced search

Message boards : Server and website : New user can't get any work

Author Message
Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53719 - Posted: 21 Feb 2020 | 19:17:07 UTC

I have a teammate who has joined GPUGrid and is unable to get any work. His cruncher meets all the requirements.
This is his rig.
https://www.gpugrid.net/hosts_user.php?sort=rpc_time&rev=0&show_all=1&userid=552015

This is the output of work_fetch_debug in the Event Log. I don't see any reason why the schedulers respond with no work sent.

Fri 21 Feb 2020 01:45:45 PM EST | | [work_fetch] Request work fetch: Backoff ended for GPUGRID
Fri 21 Feb 2020 01:45:49 PM EST | | choose_project(): 1582310749.544944
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] ------- start work fetch state -------
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] target work buffer: 86400.00 + 864.00 sec
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] --- project states ---
Fri 21 Feb 2020 01:45:49 PM EST | Einstein@Home | [work_fetch] REC 34160464.119 prio -0.000 can't request work: suspended via Manager
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | [work_fetch] REC 0.000 prio 0.000 can request work
Fri 21 Feb 2020 01:45:49 PM EST | SETI@home | [work_fetch] REC 226750485.698 prio -0.000 can't request work: suspended via Manager (88.81 sec)
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] --- state for CPU ---
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] shortfall 1396224.00 nidle 16.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 01:45:49 PM EST | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 01:45:49 PM EST | SETI@home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] --- state for NVIDIA GPU ---
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] shortfall 5584896.00 nidle 64.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 01:45:49 PM EST | Einstein@Home | [work_fetch] share 0.000
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | [work_fetch] share 1.000
Fri 21 Feb 2020 01:45:49 PM EST | SETI@home | [work_fetch] share 0.000
Fri 21 Feb 2020 01:45:49 PM EST | | [work_fetch] ------- end work fetch state -------
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | choose_project: scanning
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | can't fetch CPU: blocked by project preferences
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | can fetch NVIDIA GPU
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | NVIDIA GPU needs work - buffer low
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | checking CPU
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | CPU can't fetch: blocked by project preferences
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | checking NVIDIA GPU
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | NVIDIA GPU set_request: 1.000000
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | [sched_op] Starting scheduler request
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (1.00 sec, 64.00 inst)
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | Sending scheduler request: To fetch work.
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | Requesting new tasks for NVIDIA GPU
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Fri 21 Feb 2020 01:45:49 PM EST | GPUGRID | [sched_op] NVIDIA GPU work request: 1.00 seconds; 64.00 devices
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | Scheduler request completed: got 0 new tasks
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | [sched_op] Server version 613
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | No tasks sent
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | No tasks are available for New version of ACEMD
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | Project requested delay of 31 seconds
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | [work_fetch] backing off NVIDIA GPU 518 sec
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | [sched_op] Deferring communication for 00:00:31
Fri 21 Feb 2020 01:45:50 PM EST | GPUGRID | [sched_op] Reason: requested by project
Fri 21 Feb 2020 01:45:50 PM EST | | [work_fetch] Request work fetch: RPC complete
Fri 21 Feb 2020 01:45:55 PM EST | | choose_project(): 1582310755.574162
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] ------- start work fetch state -------
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] target work buffer: 86400.00 + 864.00 sec
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] --- project states ---
Fri 21 Feb 2020 01:45:55 PM EST | Einstein@Home | [work_fetch] REC 34160464.119 prio -0.000 can't request work: suspended via Manager
Fri 21 Feb 2020 01:45:55 PM EST | GPUGRID | [work_fetch] REC 0.000 prio 0.000 can't request work: scheduler RPC backoff (25.98 sec)
Fri 21 Feb 2020 01:45:55 PM EST | SETI@home | [work_fetch] REC 226750485.698 prio -0.000 can't request work: suspended via Manager (82.78 sec)
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] --- state for CPU ---
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] shortfall 1396224.00 nidle 16.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 01:45:55 PM EST | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 01:45:55 PM EST | GPUGRID | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 01:45:55 PM EST | SETI@home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] --- state for NVIDIA GPU ---
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] shortfall 5584896.00 nidle 64.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 01:45:55 PM EST | Einstein@Home | [work_fetch] share 0.000
Fri 21 Feb 2020 01:45:55 PM EST | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 513.41, inc 600.00)
Fri 21 Feb 2020 01:45:55 PM EST | SETI@home | [work_fetch] share 0.000
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] ------- end work fetch state -------
Fri 21 Feb 2020 01:45:55 PM EST | GPUGRID | choose_project: scanning
Fri 21 Feb 2020 01:45:55 PM EST | GPUGRID | skip: scheduler RPC backoff
Fri 21 Feb 2020 01:45:55 PM EST | SETI@home | choose_project: scanning
Fri 21 Feb 2020 01:45:55 PM EST | SETI@home | skip: suspended via Manager
Fri 21 Feb 2020 01:45:55 PM EST | Einstein@Home | choose_project: scanning
Fri 21 Feb 2020 01:45:55 PM EST | Einstein@Home | skip: suspended via Manager
Fri 21 Feb 2020 01:45:55 PM EST | | [work_fetch] No project chosen for work fetch


Anybody see something I don't recognize as an impediment to getting work?

Anybody else have a similar problem getting work as a first time volunteer to GPUGrid?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53720 - Posted: 21 Feb 2020 | 19:23:05 UTC

Just a FYI, the user has all the project configuration settings set correctly and matches mine to a T.

Resource share ---
Use CPU no
Use ATI GPU no
Use NVIDIA GPU yes
Run test applications? no
Is it OK for GPUGRID and your team (if any) to email you? yes
Should GPUGRID show your computers on its web site? yes
Default computer location ---
Maximum CPU % for graphics 20
Run only the selected applications
ACEMD short runs (2-3 hours on fastest card): no
ACEMD long runs (8-12 hours on fastest GPU): no
ACEMD3: yes
Quantum Chemistry (CPU): no
Quantum Chemistry (CPU, beta): no
Python Runtime: no
If no work for selected applications is available, accept work from other applications? no
Use Graphics Processing Unit (GPU) if available yes
Use Central Processing Unit (CPU) yes

biodoc
Send message
Joined: 26 Aug 08
Posts: 167
Credit: 1,633,077,546
RAC: 738,979
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53721 - Posted: 21 Feb 2020 | 19:37:19 UTC

Try setting resource share to something above zero just to get the project going.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53722 - Posted: 21 Feb 2020 | 20:06:35 UTC - in response to Message 53721.

He tried that already with a resource share of 100. Tried beta applications also.

Still not getting any work from a request.

Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | sched RPC pending: Requested by user
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | piggyback_work_request()
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] ------- start work fetch state -------
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] target work buffer: 86400.00 + 864.00 sec
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] --- project states ---
Fri 21 Feb 2020 02:45:29 PM EST | Einstein@Home | [work_fetch] REC 34239357.438 prio -0.000 can't request work: suspended via Manager
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [work_fetch] REC 0.000 prio -0.000 can request work
Fri 21 Feb 2020 02:45:29 PM EST | SETI@home | [work_fetch] REC 226350578.976 prio 0.000 can't request work: suspended via Manager
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] --- state for CPU ---
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] shortfall 1396224.00 nidle 16.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 02:45:29 PM EST | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 02:45:29 PM EST | SETI@home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] --- state for NVIDIA GPU ---
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] shortfall 5584896.00 nidle 64.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 02:45:29 PM EST | Einstein@Home | [work_fetch] share 0.000
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [work_fetch] share 1.000
Fri 21 Feb 2020 02:45:29 PM EST | SETI@home | [work_fetch] share 0.000
Fri 21 Feb 2020 02:45:29 PM EST | | [work_fetch] ------- end work fetch state -------
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | piggyback: resource CPU
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | piggyback: can't fetch CPU: blocked by project preferences
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | piggyback: resource NVIDIA GPU
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 64 nused_total 0.00 nidle_now 64.00 fetch share 1.00 req_inst 64.00 req_secs 5584896.00
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [sched_op] Starting scheduler request
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (5584896.00 sec, 64.00 inst)
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | Sending scheduler request: Requested by user.
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | Requesting new tasks for NVIDIA GPU
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [sched_op] NVIDIA GPU work request: 5584896.00 seconds; 64.00 devices
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | Scheduler request completed: got 0 new tasks
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | [sched_op] Server version 613
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | No tasks sent
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | No tasks are available for New version of ACEMD
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | Project requested delay of 31 seconds
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | [sched_op] Deferring communication for 00:00:31
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | [sched_op] Reason: requested by project
Fri 21 Feb 2020 02:45:30 PM EST | | [work_fetch] Request work fetch: RPC complete
Fri 21 Feb 2020 02:45:35 PM EST | | choose_project(): 1582314335.760470
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] ------- start work fetch state -------
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] target work buffer: 86400.00 + 864.00 sec
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] --- project states ---
Fri 21 Feb 2020 02:45:35 PM EST | Einstein@Home | [work_fetch] REC 34239357.438 prio -0.000 can't request work: suspended via Manager
Fri 21 Feb 2020 02:45:35 PM EST | GPUGRID | [work_fetch] REC 0.000 prio 0.000 can't request work: scheduler RPC backoff (25.98 sec)
Fri 21 Feb 2020 02:45:35 PM EST | SETI@home | [work_fetch] REC 226350578.976 prio 0.000 can't request work: suspended via Manager
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] --- state for CPU ---
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] shortfall 1396224.00 nidle 16.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 02:45:35 PM EST | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 02:45:35 PM EST | GPUGRID | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 02:45:35 PM EST | SETI@home | [work_fetch] share 0.000 blocked by project preferences
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] --- state for NVIDIA GPU ---
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] shortfall 5584896.00 nidle 64.00 saturated 0.00 busy 0.00
Fri 21 Feb 2020 02:45:35 PM EST | Einstein@Home | [work_fetch] share 0.000
Fri 21 Feb 2020 02:45:35 PM EST | GPUGRID | [work_fetch] share 0.000
Fri 21 Feb 2020 02:45:35 PM EST | SETI@home | [work_fetch] share 0.000
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] ------- end work fetch state -------
Fri 21 Feb 2020 02:45:35 PM EST | GPUGRID | choose_project: scanning
Fri 21 Feb 2020 02:45:35 PM EST | GPUGRID | skip: scheduler RPC backoff
Fri 21 Feb 2020 02:45:35 PM EST | SETI@home | choose_project: scanning
Fri 21 Feb 2020 02:45:35 PM EST | SETI@home | skip: suspended via Manager
Fri 21 Feb 2020 02:45:35 PM EST | Einstein@Home | choose_project: scanning
Fri 21 Feb 2020 02:45:35 PM EST | Einstein@Home | skip: suspended via Manager
Fri 21 Feb 2020 02:45:35 PM EST | | [work_fetch] No project chosen for work fetch

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1003
Credit: 2,537,540,285
RAC: 3,303,364
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53723 - Posted: 21 Feb 2020 | 20:34:31 UTC - in response to Message 53722.

I think that the only two lines in there that have any meaning are:

Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [sched_op] NVIDIA GPU work request: 5584896.00 seconds; 64.00 devices
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | Scheduler request completed: got 0 new tasks

He asked for work, and I think we must regard that as a refusal, rather than a lack of availability.

Unfortunately, at this project we don't get the rejection letter with a checklist of possible reasons at the bottom that some projects send out. So we'll have to turn detective.

We know

* NVIDIA GeForce RTX 2080 (8192MB) driver: 420.69
* Linux Ubuntu

And not much else that the server will want to check. Version of CUDA available? Compute Capability? To be honest, I don't know - but I'm wondering if Turing cards are yet supported under Linux?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53724 - Posted: 21 Feb 2020 | 20:53:44 UTC - in response to Message 53723.

I think that the only two lines in there that have any meaning are:

Fri 21 Feb 2020 02:45:29 PM EST | GPUGRID | [sched_op] NVIDIA GPU work request: 5584896.00 seconds; 64.00 devices
Fri 21 Feb 2020 02:45:30 PM EST | GPUGRID | Scheduler request completed: got 0 new tasks

He asked for work, and I think we must regard that as a refusal, rather than a lack of availability.

Unfortunately, at this project we don't get the rejection letter with a checklist of possible reasons at the bottom that some projects send out. So we'll have to turn detective.

We know

* NVIDIA GeForce RTX 2080 (8192MB) driver: 420.69
* Linux Ubuntu

And not much else that the server will want to check. Version of CUDA available? Compute Capability? To be honest, I don't know - but I'm wondering if Turing cards are yet supported under Linux?


I already questioned whether his spoofed Nvidia driver version was confusing the schedulers. (The driver is 440.59 under the covers) But he already changed that in a previous try to report the standard version. No dice.

Yes, the Turing cards have been working under Linux since last July. He is running sufficient driver and CC level for the CUDA100 application to be sent.

I was hoping you would point out the obvious reason to us since you are more expert in reading and understanding the work_fetch_debug output.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 199
Credit: 1,458,715,434
RAC: 823,206
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53725 - Posted: 21 Feb 2020 | 22:04:09 UTC - in response to Message 53724.

I see that your friend's host ID # 524248 is showing "[64] NVIDIA GeForce RTX 2080 (8192MB) driver: 440.59"
Perhaps there is a new protection at server for not to attend requests for systems with such high amount of GPUs (?)
Please, try instructing this new user for his system to reflect its true number of GPUs for a while, to test wether it corrects the problem...
Toni kindly requested at this post for not using this kind of practices.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 208
Credit: 4,490,828,031
RAC: 7,407
Level
Arg
Scientific publications
watwatwatwat
Message 53726 - Posted: 21 Feb 2020 | 22:32:55 UTC

Try a reset of the project. Force it to send a test unit to see if the GPUs are viable.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53729 - Posted: 21 Feb 2020 | 23:06:27 UTC - in response to Message 53725.

I see that your friend's host ID # 524248 is showing "[64] NVIDIA GeForce RTX 2080 (8192MB) driver: 440.59"
Perhaps there is a new protection at server for not to attend requests for systems with such high amount of GPUs (?)
Please, try instructing this new user for his system to reflect its true number of GPUs for a while, to test wether it corrects the problem...
Toni kindly requested at this post for not using this kind of practices.

I've been getting work all day for all my hosts, the majority which have spoofed gpu counts.

No problem there.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 68
Credit: 951,958,976
RAC: 6,392,869
Level
Glu
Scientific publications
wat
Message 53732 - Posted: 22 Feb 2020 | 2:22:15 UTC

Hi guys, I'm the guy who was having issues. I finally was able to get some work and earn credit so I can actually post.

I tried resetting the project, but that did not work. still gave the same message that no tasks are available.

Then I removed my custom edits and let BOINC re-generate my coproc_info.xml with the correct number of GPUs [7]. It then downloaded some WUs and processed them.

I will have to play around with some things, I see that Keith uses the spoofing also without any detriment, so maybe something just wasnt right in my coproc file.


second issue, GPUGrid doesnt seem to play nicely with Einstein when they are both set to 0 resource share. Basically I want to run them BOTH as a backup, so that when SETI runs out, they share the backup load equally. is that possible? right now, I was only able to download GPUgrid work if I suspended Einstein.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53735 - Posted: 22 Feb 2020 | 3:36:32 UTC - in response to Message 53732.
Last modified: 22 Feb 2020 | 3:43:09 UTC

Glad you finally got work Ian. So how would that work if both Einstein and GPUGrid has resource share of zero when Seti runs out of work.

Who gets the first shot at work? Is it based on where each project comes out in the alphabet. That seems the case with my 4 project hosts. Einstein goes first, then GPUGrid, then Milkyway and finally Seti when all the projects update at the same time or when you first fire up BOINC.

Or is it the project with the most REC deficit? I would think that is what is supposed to happen. But Einstein will always send more work than you want unless you set a really small cache size. And if it gets to the schedulers first it will fill up your hard drive space allocation and prevent the other projects from getting work.

That is what was occurring on my dual Einstein/GPUGrid host. GPUGrid never could get work because the gpu cache was always full with Einstein. Only when there was a brief outage at Einstein and the gpu cache had dwindled down did GPUGrid get a chance to download work. And once that happened . . . . then GPUGrid commandeered the gpus totally because it had to work off a huge REC deficit to Einstein.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53736 - Posted: 22 Feb 2020 | 3:41:43 UTC

Maybe the servers thought that 64 gpus was out of bounds.

While 24 or 32 spoofed gpus is acceptable.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 68
Credit: 951,958,976
RAC: 6,392,869
Level
Glu
Scientific publications
wat
Message 53737 - Posted: 22 Feb 2020 | 3:55:47 UTC - in response to Message 53736.

Glad you finally got work Ian. So how would that work if both Einstein and GPUGrid has resource share of zero when Seti runs out of work.

Who gets the first shot at work? Is it based on where each project comes out in the alphabet. That seems the case with my 4 project hosts. Einstein goes first, then GPUGrid, then Milkyway and finally Seti when all the projects update at the same time or when you first fire up BOINC.

Or is it the project with the most REC deficit? I would think that is what is supposed to happen. But Einstein will always send more work than you want unless you set a really small cache size. And if it gets to the schedulers first it will fill up your hard drive space allocation and prevent the other projects from getting work.

That is what was occurring on my dual Einstein/GPUGrid host. GPUGrid never could get work because the gpu cache was always full with Einstein. Only when there was a brief outage at Einstein and the gpu cache had dwindled down did GPUGrid get a chance to download work. And once that happened . . . . then GPUGrid commandeered the gpus totally because it had to work off a huge REC deficit to Einstein.


with a 0 resource share, the project is "supposed" to only send you 1 WU per device, and basically send them back at a 1-1 ratio, send one, get one. it appears that GPUGrid is following this. It doesnt look like Einstein is following that though. even when I had 64 GPUs listed, it would routinely send me 80-100 WUs. I think JStateson complained about this at Einstein as well.

How do you have resource share allocated for GPUGrid? I thought you ran GPUGrid as a backup project on one or more of your systems.

Maybe the servers thought that 64 gpus was out of bounds.

While 24 or 32 spoofed gpus is acceptable.


if so this must be some kind of project limit. BOINC's limit is 64. and never prevented me from getting work at SETI or Einstein.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 508
Credit: 531,975,454
RAC: 1,711,807
Level
Lys
Scientific publications
wat
Message 53738 - Posted: 22 Feb 2020 | 4:26:33 UTC - in response to Message 53737.
Last modified: 22 Feb 2020 | 4:28:00 UTC

No I don't run any project as backup. They all get a standard resource share.

I have never seen Einstein obey the standard BOINC protocols for work allotment.

They don't run any server software that is current. Getting to be almost ten year old server software by now.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 208
Credit: 4,490,828,031
RAC: 7,407
Level
Arg
Scientific publications
watwatwatwat
Message 53740 - Posted: 22 Feb 2020 | 4:57:59 UTC - in response to Message 53738.
Last modified: 22 Feb 2020 | 4:58:11 UTC

Keith has started to do what I do. Dedicated machines for certain projects.

biodoc
Send message
Joined: 26 Aug 08
Posts: 167
Credit: 1,633,077,546
RAC: 738,979
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53741 - Posted: 22 Feb 2020 | 11:18:41 UTC
Last modified: 22 Feb 2020 | 11:35:19 UTC

So setting "0" resource share on both GPUGrid and Einstein results in only Einstein tasks downloading and processing. Is that correct? If so, have you tried adding an app_config.xml file to the Einstein project folder restricting the number of max concurrent tasks to 3 or 4?

<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>

EDIT: I'm hoping the above app_config.xml file will limit Einstein to 3 GPUs and leave the other 4 to process GPUGrid tasks if they download

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 68
Credit: 951,958,976
RAC: 6,392,869
Level
Glu
Scientific publications
wat
Message 53742 - Posted: 22 Feb 2020 | 15:21:35 UTC - in response to Message 53741.

hmm maybe that would work. i'll have to play around with it.

but I think since I'm planning to run GPUGrid as a backup, on this host I'll just leave Einstein set to NNT, and only release it if GPUGrid doesnt have any work at the time.

I have 2 other hosts [7x RTX2070, 10x RTX2070] that run SETI(prime)/Einstein(backup) but one of them has all 7 of the GPUs on USB PCIe 3.0x1 risers (PCIe 3.0 x1/x1/x1/x1/x1/x1/x1). which is fine for SETI/Einstein, but looking at PCIe usage on GPUGrid from the 2080 system, it looks to need at least a PCIe 3.0 x4 link. I'm observing ~20% PCIe use on a 3.0x16 link and ~40% use on a 3.0x8 link, meaning anything less than a 3.0x4 link will introduce a bottleneck. this 7x 2080 system has all cards plugged into the board with 3.0 x16/x8/x8/x16/x8/x8/x8 links respectively thanks to some clever PLX switches integrated to the motherboard.

the other system has 8 GPUs on a 3.0x8 link each, and 2 on USB risers (PCIe 3.0 x8/x8/x8/x8/x8/x8/x8/x8/x1/x1) I could probably play around with the gpu_exclude flag but it might be easier to just dedicate the one system to SETI(prime)/GPUGrid(backup), and let the other 2 be SETI(prime)/Einstein(backup)

Post to thread

Message boards : Server and website : New user can't get any work