Advanced search

Message boards : Graphics cards (GPUs) : Server won't give me work

Author Message
anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10074 - Posted: 22 May 2009 | 22:42:27 UTC

So, after the problems with the KASHIF work units, and coming in this mornign to find that three work units had all failed with a computation error, I decided to go ahead and upgrade to 6.6.28. Now the server won't give me any work saying I have no CUDA device. It's been running for a while now no problem, why this issue now? Any suggestions?

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10076 - Posted: 23 May 2009 | 0:21:42 UTC - in response to Message 10074.

So, after the problems with the KASHIF work units, and coming in this mornign to find that three work units had all failed with a computation error, I decided to go ahead and upgrade to 6.6.28. Now the server won't give me any work saying I have no CUDA device. It's been running for a while now no problem, why this issue now? Any suggestions?

Did you change the opt-in / opt-out setting in the preferences? If you changed from the right versino of BOINC it was opt-in, then they suddenly changed it to opt out so that CUDA cards are disabled by default.

Change the preference on the web site here, then update the machine...

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10083 - Posted: 23 May 2009 | 7:48:32 UTC - in response to Message 10076.

Thanks Paul. I'm gonna need a bit more help here (which is sad as I'm a professional scientist) but where is this opt/in opt/out setting. There's three preferences tabs (computing/gpugrid/computing) and in none of them do I see anything labeled opt-in or opt-out. Searching the pages for 'opt' isn't turning up anything, and when I look at my preferences it looks like the GPU should be available for computing. I'd like to get back to crunching since I've done nothing for the last almost 2 days due to a series of compute failures. Thanks.

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 51,279,371
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 10084 - Posted: 23 May 2009 | 8:17:26 UTC

It is in the computing preferences - http://www.gpugrid.net/prefs.php?subset=global. Paul is talking about the setting "Suspend GPU work while computer is in use?".
____________

pixelicious.at - my little photoblog

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10091 - Posted: 23 May 2009 | 14:53:19 UTC

The error says you're in "device emulation" mode, meaning no CUDA device is found. Did you change anything else? Changed the GPU and installed drivers from the CD?

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10100 - Posted: 23 May 2009 | 19:04:24 UTC

Ok, Time Bandit helped me out as I was not clear ... sorry about that ... but that is only one possibility that leapt off the top of my pate...

Can you give us the top lines of the message tab from a fresh start of BONC?

It will look something like:

Fri May 22 16:58:11 2009 Starting BOINC client version 6.6.29 for x86_64-apple-darwin
Fri May 22 16:58:11 2009 Configured to use all coprocessors
Fri May 22 16:58:12 2009 log flags: task, file_xfer, sched_ops, cpu_sched, cpu_sched_debug, sched_op_debug
Fri May 22 16:58:12 2009 log flags: coproc_debug
Fri May 22 16:58:12 2009 Libraries: libcurl/7.19.4 OpenSSL/0.9.7l zlib/1.2.3 c-ares/1.6.0
Fri May 22 16:58:12 2009 Data directory: /Library/Application Support/BOINC Data
Fri May 22 16:58:12 2009 Milkyway@home Found app_info.xml; using anonymous platform
Fri May 22 16:58:12 2009 Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU X5482 @ 3.20GHz [x86 Family 6 Model 23 Stepping 6]
Fri May 22 16:58:12 2009 Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM SSE3 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1
Fri May 22 16:58:12 2009 OS: Darwin: 9.7.0
Fri May 22 16:58:12 2009 Memory: 16.00 GB physical, 235.23 GB virtual
Fri May 22 16:58:12 2009 Disk: 453.32 GB total, 234.98 GB free
Fri May 22 16:58:12 2009 Local time is UTC -7 hours
Fri May 22 16:58:12 2009 Can't load library libcudart
Fri May 22 16:58:12 2009 No coprocessors
Fri May 22 16:58:13 2009 Not using a proxy

.... bunch of project lines her

Fri May 22 16:58:14 2009 World Community Grid Host location: none
Fri May 22 16:58:14 2009 World Community Grid General prefs: using your defaults
Fri May 22 16:58:14 2009 Preferences limit memory usage when active to 13107.20MB
Fri May 22 16:58:14 2009 Preferences limit memory usage when idle to 16056.32MB
Fri May 22 16:58:14 2009 Preferences limit disk usage to 200.00GB

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10104 - Posted: 23 May 2009 | 20:04:05 UTC - in response to Message 10100.

No, I haven't changed anything else, just upgraded the BOINC client. I did notice however that after 4 tasks had failed due to a 'computation error' that new work hadn't been requested or given and it had been sitting idle. I have the same card and the same drivers. I even reinstalled the drivers from NVidia's website (no CD for this computer) but no change. Per the requested I'm pasting the error messages below.

5/23/2009 12:59:42 PM Starting BOINC client version 6.6.28 for windows_intelx86
5/23/2009 12:59:42 PM log flags: task, file_xfer, sched_ops
5/23/2009 12:59:42 PM Libraries: libcurl/7.19.4 OpenSSL/0.9.8j zlib/1.2.3
5/23/2009 12:59:42 PM Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
5/23/2009 12:59:42 PM Running under account anthonmg
5/23/2009 12:59:42 PM Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5440 @ 2.83GHz [x86 Family 6 Model 23 Stepping 6]
5/23/2009 12:59:42 PM Processor features: fpu tsc pae nx sse sse2 mmx
5/23/2009 12:59:42 PM OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
5/23/2009 12:59:42 PM Memory: 3.25 GB physical, 5.09 GB virtual
5/23/2009 12:59:42 PM Disk: 465.74 GB total, 394.19 GB free
5/23/2009 12:59:42 PM Local time is UTC -7 hours
5/23/2009 12:59:42 PM No CUDA devices found
5/23/2009 12:59:42 PM No coprocessors
5/23/2009 12:59:42 PM Not using a proxy
5/23/2009 12:59:42 PM Docking@Home URL: http://docking.cis.udel.edu/; Computer ID: 26200; location: (none); project prefs: default
5/23/2009 12:59:42 PM GPUGRID URL: http://www.gpugrid.net/; Computer ID: 32550; location: (none); project prefs: default
5/23/2009 12:59:42 PM GPUGRID General prefs: from GPUGRID (last modified 23-May-2009 00:54:35)
5/23/2009 12:59:42 PM GPUGRID Host location: none
5/23/2009 12:59:42 PM GPUGRID General prefs: using your defaults
5/23/2009 12:59:42 PM Preferences limit memory usage when active to 1663.63MB
5/23/2009 12:59:42 PM Preferences limit memory usage when idle to 2994.54MB
5/23/2009 12:59:42 PM Preferences limit disk usage to 100.00GB
5/23/2009 1:00:03 PM GPUGRID update requested by user
5/23/2009 1:00:07 PM GPUGRID Sending scheduler request: Requested by user.
5/23/2009 1:00:07 PM GPUGRID Requesting new tasks
5/23/2009 1:00:12 PM GPUGRID Scheduler request completed: got 0 new tasks
5/23/2009 1:00:12 PM GPUGRID Message from server: No work sent
5/23/2009 1:00:12 PM GPUGRID Message from server: Can't use CUDA app for Full-atom molecular dynamics: Your computer has no CUDA device
5/23/2009 1:00:12 PM GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
5/23/2009 1:01:47 PM GPUGRID Sending scheduler request: To fetch work.
5/23/2009 1:01:48 PM GPUGRID Requesting new tasks
5/23/2009 1:01:53 PM GPUGRID Scheduler request completed: got 0 new tasks
5/23/2009 1:01:53 PM GPUGRID Message from server: No work sent
5/23/2009 1:01:53 PM GPUGRID Message from server: Can't use CUDA app for Full-atom molecular dynamics: Your computer has no CUDA device
5/23/2009 1:01:53 PM GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.
5/23/2009 1:02:28 PM GPUGRID Sending scheduler request: To fetch work.
5/23/2009 1:02:28 PM GPUGRID Requesting new tasks
5/23/2009 1:02:33 PM GPUGRID Scheduler request completed: got 0 new tasks
5/23/2009 1:02:33 PM GPUGRID Message from server: No work sent
5/23/2009 1:02:33 PM GPUGRID Message from server: Can't use CUDA app for Full-atom molecular dynamics: Your computer has no CUDA device
5/23/2009 1:02:33 PM GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10106 - Posted: 23 May 2009 | 20:28:33 UTC - in response to Message 10104.

Which driver are you using? I cae you don't know and have already deleted the installation file GPU-Z should tell you.

MrS
____________
Scanning for our furry friends since Jan 2002

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10108 - Posted: 23 May 2009 | 20:46:05 UTC - in response to Message 10100.

Also, should say the the driver version is 6.14.11.8265. Just a month old.

Profile Stefan Ledwina
Avatar
Send message
Joined: 16 Jul 07
Posts: 464
Credit: 51,279,371
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwat
Message 10109 - Posted: 23 May 2009 | 20:47:51 UTC - in response to Message 10108.

Have you already tried to reinstall the drivers?
Somehow BOINC can't find a CUDA device...

5/23/2009 12:59:42 PM No CUDA devices found
5/23/2009 12:59:42 PM No coprocessors


____________

pixelicious.at - my little photoblog

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10110 - Posted: 23 May 2009 | 20:56:43 UTC - in response to Message 10109.

Um, yes, I can tell from the messages that it can't find the device. When I first joined the project in early April Boinc was unable to find a CUDA device. I upgraded the drivers from the ones that came with the machine when I got it last September. With that update, 182.46, everything worked fine and I've been crunching away.

Then, lots of timed out workunits, never ending workunits. I was told to upgrade BOINC. Since then, no CUDA found. I just upgraded again to 182.65, didnt' help. Tried reinstalling the original drivers, 182.46, and 182.65, no success.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10116 - Posted: 24 May 2009 | 0:06:08 UTC
Last modified: 24 May 2009 | 0:07:25 UTC

I don't know why, but it sure does not see the CUDA device. And I am stumped as to what to try next.

Are you running any sort of remote login or remote desktop?

{edit}
Virtualization software , alternative OS hosting?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 10124 - Posted: 24 May 2009 | 10:22:59 UTC - in response to Message 10110.

I was told to upgrade BOINC.


Who was that? Must have been someone who doesn't like you? ;)

No, seriously.. if your answer to Pauls question does not include any "yes" I'd revert to 6.6.20 and see if that gets you going again. If not - we'll all scratch our heads.

You could try 6.5.0 and the drivers you list, 182.46 and 182.65 are both beta, as far as I know. 182.50 is the last WHQL driver of the non-185 series. I'd try 182.50 or 185.6x or 185.8x, though the latter 2 seem not to be trouble free yet.

MrS
____________
Scanning for our furry friends since Jan 2002

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10147 - Posted: 24 May 2009 | 23:45:31 UTC - in response to Message 10124.

Hmm, today it started working again on its own. Argh. Ok, well, will watch and see if I get errors of if ti runs smoothly again. Thanks everyone for your input.

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10176 - Posted: 26 May 2009 | 5:14:26 UTC - in response to Message 10147.

Well, it's getting work, but ALL the work units seem to be failing after only a few seconds (<10-20 sec). Are we having more workunit problems?

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1947
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 10181 - Posted: 26 May 2009 | 9:03:10 UTC - in response to Message 10176.

You are running in device emulation.
When you install boinc, you should use the advanced tab and say to run in unprotected mode.

# Device 0: "Device Emulation (CPU)"

gdf

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10196 - Posted: 26 May 2009 | 18:37:35 UTC - in response to Message 10181.

Interesting about the emulation mode and that it wasn't a problem before. Can you say where in the advanced tab this setting is? I don't see it under any of the subheadings in the advanced menus. Are you suggesting I find the config file and change it at the level of the text? Is unproected mode going to affect the stability of the system?

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10200 - Posted: 26 May 2009 | 18:49:01 UTC - in response to Message 10196.

Interesting about the emulation mode and that it wasn't a problem before. Can you say where in the advanced tab this setting is? I don't see it under any of the subheadings in the advanced menus. Are you suggesting I find the config file and change it at the level of the text? Is unproected mode going to affect the stability of the system?

It is the third or fourth screen in during the install. I cannot recall if the repair install allows you to change this setting or not.

Uninstall only removes BOINC is should not change your data or project settings unless you whack the directories yourself. You can also do a downlevel and up level of the version ... but you have to catch the screen as it goes by and make sure that the setting is unchecked.

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10201 - Posted: 26 May 2009 | 19:34:18 UTC - in response to Message 10200.

The repair utility doesn't give you access there. I uninstalled and reinstalled. The option to run in protected mode (an unpriveldged account) was already unchecked, so it seems like that's not the issue.

Also, still getting the, "Project has no work" error when I try to update and fetch new work.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10203 - Posted: 26 May 2009 | 20:42:15 UTC - in response to Message 10201.

The repair utility doesn't give you access there. I uninstalled and reinstalled. The option to run in protected mode (an unpriveldged account) was already unchecked, so it seems like that's not the issue.

Also, still getting the, "Project has no work" error when I try to update and fetch new work.

Ok, there is still the issue where the device emulation comes up. Are you running remote monitoring software, a remote desktop, VM emulation, anything like that?

If the video device is "virtualized" it cannot be used for work. Many of these software systems make the video card a "virtual" device that can be shared. Sadly, that means that it cannot be used by BOINC.

Once you have errored out a certain number of tasks you cannot get any more until 24 hours has passed. This is to prevent you from "trashing" all the available tasks with a bad system set-up. When you get another task and return it safely, then you will be able to get another and another ... until you are back in good graces and can get the maximum.

But, if the device is "broken" we have to fix that first...

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10216 - Posted: 26 May 2009 | 23:47:43 UTC - in response to Message 10203.

Ok, the server just let me have a workunit. They take 17-22 hours each (despite the large number of shaders on my card) so it'll be a while before I'm 'redeemed' but so far it's running smoothly.

Thanks for the comments about virtualized. I didn't realize that was a problem. I had to be away from the machine with the good graphics card and was doing some work on it using remote desktop connection. Looks like there's no way to 'unvirtualize' the machine until I've logged back in to it from sitting in front of it. Strange that it only came up now after having run remote desktop successfuly since I signed up for GPUgrid. Well, at least it's running even though it's likely my RAC won't recover from this for a long time to come.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10219 - Posted: 27 May 2009 | 1:14:20 UTC

Cool ...

Ok, if you need remote access you CAN use one of the many varieties of VNC to get there from here. Almost all versions of MS Remote Desktop and some other remoting software all will virtualize the GPU and once done, as you noted, is not easily undone.

anthonmg
Send message
Joined: 11 Apr 09
Posts: 17
Credit: 11,086,149
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10308 - Posted: 29 May 2009 | 18:00:46 UTC - in response to Message 10219.

thanks for the note about VNC. I'd have to install it across too many different computers just to be able to support this one card on this one project. Unforutnately the time and resource required to continue to participate is getting a little high. Remote Desktop didn't cause problems for the first few months I was on the project so it's weird to see it happening now. Maybe earlier BOINC versions just paused it or something, I don't know. At this point I think it's best to detach from the project.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 10337 - Posted: 30 May 2009 | 22:46:39 UTC - in response to Message 10308.

thanks for the note about VNC. I'd have to install it across too many different computers just to be able to support this one card on this one project. Unforutnately the time and resource required to continue to participate is getting a little high. Remote Desktop didn't cause problems for the first few months I was on the project so it's weird to see it happening now. Maybe earlier BOINC versions just paused it or something, I don't know. At this point I think it's best to detach from the project.

ugh, well, we will miss you ... we always do ... :)

Come back when it is easier ...

Oh, and GDF indicated that they might be doing CPU work later ... (will that eventually trigger a name change to "CPU and GPU CUDA and OpenCL with ATI cards and Larabie Grid.net" when they finish adding capabilities ???)

:)

Sorry guys, only sense of humor I got ...

Anyway, come back when you can ...

Post to thread

Message boards : Graphics cards (GPUs) : Server won't give me work

//