Advanced search

Message boards : Graphics cards (GPUs) : Compute error with GTX 285 and 8800 GTS

Author Message
Kaxaky
Send message
Joined: 7 Apr 09
Posts: 2
Credit: 236,045
RAC: 0
Level

Scientific publications
watwatwat
Message 8646 - Posted: 20 Apr 2009 | 10:40:35 UTC


Hi,

i get an error after few seconds on any task. But only on the 8800 GTS.
GTX 285 work correct.

<core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 1
# Device 0: "GeForce GTX 285"
# Clock rate: 1476000 kilohertz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce 8800 GTS"
# Clock rate: 1188000 kilohertz
# Total amount of global memory: 671088640 bytes
# Number of multiprocessors: 12
# Number of cores: 96
Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol.

</stderr_txt>
]]>

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 47,698,744
RAC: 144,502
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 8648 - Posted: 20 Apr 2009 | 11:10:27 UTC - in response to Message 8646.

If you look at the table of cards that can run GPUGRID (here), you'll see that the 8800GTS has an older chipset (G80) that can not perform the necessary calculations.

The error that you see is the correct result with that card.

Mike

____________
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

Kaxaky
Send message
Joined: 7 Apr 09
Posts: 2
Credit: 236,045
RAC: 0
Level

Scientific publications
watwatwat
Message 8663 - Posted: 20 Apr 2009 | 18:06:47 UTC - in response to Message 8648.

how can i limit the treads to the gtx 280?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8668 - Posted: 20 Apr 2009 | 20:45:09 UTC

Oh, that's a tricky case. BOINC recognizes both cards as CUDA capable (correct), but doesn't know that one of them can in fact not run GPU-Grid (it could run seti@GPU).

Sadly for now there is absolutely no way to influence how BOINC uses the GPUs (except project on/off). And your case is a very probable configuration, so something should be done about this.

Paul, heat up the alpha-list again? ;)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8672 - Posted: 20 Apr 2009 | 22:27:31 UTC

I posted this:

On GPU Grid a situation arose where a participant has multiple CUDA devices in the system, but only one of them is capable of running GPU Grid tasks. This occurs because GPU Grid requires a later version of CUDA than other projects.

System error information:

i get an error after few seconds on any task. But only on the 8800 GTS.
GTX 285 work correct.

<core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 1
# Device 0: "GeForce GTX 285"
# Clock rate: 1476000 kilohertz
# Total amount of global memory: 1073741824 bytes
# Number of multiprocessors: 30
# Number of cores: 240
# Device 1: "GeForce 8800 GTS"
# Clock rate: 1188000 kilohertz
# Total amount of global memory: 671088640 bytes
# Number of multiprocessors: 12
# Number of cores: 96
Cuda error in file 'nonbonded.cu' in line 189 : invalid device symbol.

It is critical that there be a way to limit or control the scheduling of tasks so that GPU Grid tasks are not scheduled on a GPU that cannot run those tasks even if the targeted GPU can run tasks from other projects.

GPU Grid thread: http://www.gpugrid.net/forum_thread.php?id=962#8668

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8683 - Posted: 21 Apr 2009 | 10:58:56 UTC - in response to Message 8672.
Last modified: 21 Apr 2009 | 11:12:41 UTC

At risk of getting my head taken off ..... :)

In this kind of situation is it not the Project app responsibility to test suitability before running on a GPU? BOINC could be at this for ever and a day as Projects requirements develop/come on stream, CUDA versions change etc etc. Its almost a case of "hard coding" a special inside BOINC for GPUGRID, which seems a little unlikely.

(My thought is that the app marks it "Device Incompatible" and returns it. The mechanism for that exists already [aka "compute error"], just a case of a new information label so the user [and project] realise whats happened)

Regards
Zy

jrobbio
Send message
Joined: 13 Mar 09
Posts: 59
Credit: 324,366
RAC: 0
Level

Scientific publications
watwatwatwat
Message 8684 - Posted: 21 Apr 2009 | 11:42:53 UTC - in response to Message 8683.

Boinc has the control to limit the number of CPU's used so why could it not limit the number of GPU's or the ability to specify which GPU's are disabled i.e. CUDA 0, CUDA 1 etc. through the cc_config.xml file.

Rob

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 8690 - Posted: 21 Apr 2009 | 21:38:31 UTC - in response to Message 8684.

Boinc has the control to limit the number of CPU's used so why could it not limit the number of GPU's or the ability to specify which GPU's are disabled i.e. CUDA 0, CUDA 1 etc. through the cc_config.xml file.

The problem with that is that this is an all or nothing solution. The card may be suitable for SaH or other projects, just not GPU Grid.

And the responsibility for scheduling is the BOINC Client. The application build might have a responsibility to notify the BOINC Client of a specific level of need (API revision for example) and then it would be the BOINC Client's responsibility to not assign that task to a GPU that does not meet or exceed that level ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8693 - Posted: 21 Apr 2009 | 21:50:09 UTC

Zydor,

you're right, ideally the app should be able to handle this, as it knows best on which hardware it can run and on which it can't. However, right know the app can only return a different error than it currently does. That's nice for diagnostics, but we want more than this. We want it to work, ideally automatically. Some possibilities:

- BOINC knows which coprocessor features which hardware capability level and the app specifies which level is neccessary for it to run. BOINC then only schedules tasks on the appropriate hardware.

- the user tells BOINC which project to run on which coprocessor. This is micromanagement and not in the spirit of BOINC. Would be a welcome hotfix, though.

The last point would be welcome anyway, as some users wish to use only selected GPUs for BOINC (be it due to lag, noise or whatever reason).

Jrobbio,

only setting it via cc_config seems cumbersome (not good) and inflexible, e.g. imagine that the driver suddenly decides to assign different numbers to the GPUs and everything gets screwed up, tasks crash etc and it takes days until you notice the mess..
To me it seems much better to make BOINC a little smarter. This has to be done anyway, as coprocessors are here to stay (NV, ATI, Larrabee..).

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Compute error with GTX 285 and 8800 GTS

//