Message boards : Number crunching : GTX 770 won't get work
Author | Message |
---|---|
I don't know what's wrong, but on this system, I can't get work since weeks although the server indicates that tasks are available - and I have completed a lot of tasks using this same machine before: | |
ID: 47212 | Rating: 0 | rate: / Reply Quote | |
Could you please tell us which OS this is, which graphic card (including driver version) and which crunching (acemd ...) software. | |
ID: 47215 | Rating: 0 | rate: / Reply Quote | |
Could you please tell us which OS this is, which graphic card (including driver version) and which crunching (acemd ...) software. Ubuntu Linux 16.04 LTS x64, kernel: 4.4.0-75-generic BOINC version: 7.6.31 CPU: GenuineIntel Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz [Family 6 Model 30 Stepping 5] (8 Prozessoren) GPU: NVIDIA GeForce GTX 770 (1998MB) GPU driver: 375.39 Credits generated on this machine: 73,260,609 When did the machine stop crunching? On April 14? Unfortunately, I can't specify the date precisely but it is a problem since several weeks (can't find the last successfully processed task in the GPUGRID database of my user account anymore). Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47222 | Rating: 0 | rate: / Reply Quote | |
I have the very same issue (no longer getting any work), only with a different GPU. Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz [Family 6 Model 94 Stepping 3] with (8 processors) NVIDIA GeForce GTX 950M (2048MB) driver: 369.9 http://gpugrid.net/show_host_detail.php?hostid=421103 For about two weeks I was getting WUs just fine. Then suddenly, out of the blue (I made no changes) I was no longer getting ANY tasks. I have tried about everything I can: I set the computing preferences store at least __ days and store up to an additional ___ days each to various larger amounts (e.g., 2, 3, 4, 6.6 etc, etc) I have reset the project (several times) I have even 'Removed' the project and then added it again (I did this twice). ALL to no avail. I have sort of given up and have been using my GPU for another project. However, I would prefer to be able to run GPUgrid tasks. (Since this thread is about 770s, I think I need to post to a new thread.) Thanks, LP ____________ Essential biomedical science: At fertilization, a new and unique member of the species homo sapiens is formed. Abortion wounds the Mother, and kills a very tiny baby girl or baby boy. Life! Les P., PhD Prof. Engr. | |
ID: 47242 | Rating: 0 | rate: / Reply Quote | |
Update the driver and you should get the new app and WU's | |
ID: 47248 | Rating: 0 | rate: / Reply Quote | |
Update the driver and you should get the new app and WU's In my case, the driver is certainly not the problem: It is the same driver which I use for my GTX 970 and that machine receives one WU after the other without any issues. Moreover, this propr. NVIDIA driver (375.39) is the latest you get with Ubuntu console update. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47258 | Rating: 0 | rate: / Reply Quote | |
The driver is not the problem in my case either. | |
ID: 47263 | Rating: 0 | rate: / Reply Quote | |
Today I updated to Linux kernel 4.4.0-78-generic keeping NVIDIA driver 375.39. Still no tasks for my GTX 770 even after resetting GPUGRID. | |
ID: 47285 | Rating: 0 | rate: / Reply Quote | |
Today I updated to Linux kernel 4.4.0-78-generic keeping NVIDIA driver 375.39. Still no tasks for my GTX 770 even after resetting GPUGRID. I know little about Linux drivers, except that they must be matched to the Linux version, and work for me when I get them from the Ubuntu software center. And even if the drivers are apparently installed properly, they must implement CUDA properly to work here. Given that the GTX 770 is now an older card and you are now using the 375.39 drivers, it is much more likely that it is a Linux problem rather than a GPUGrid problem. | |
ID: 47293 | Rating: 0 | rate: / Reply Quote | |
Given that the GTX 770 is now an older card and you are now using the 375.39 drivers, it is much more likely that it is a Linux problem rather than a GPUGrid problem. No, because otherwise the other machine (GTX 970) with the same Linux kernel would also not receive tasks. But it does on a daily basis. I know for sure that the GTX 970 box does use CUDA 8. I am not sure whether the GTX 770 also does (originally that machine was setup using CUDA 7), but assume(d) that the autoupdate (apt update & apt upgrade) will also update CUDA as it does update the NVIDIA drivers. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47294 | Rating: 0 | rate: / Reply Quote | |
You seem to be assuming that at Kepler card runs CUDA, or even has it available in those drivers, the same way that a Maxwell card does, just because the version numbers of Linux and the drivers are comparable for the cards. I would not assume that. | |
ID: 47295 | Rating: 0 | rate: / Reply Quote | |
Please restart your PC, and check the first lines of the event log of BOINC manager for the GPU report. 2017. 05. 17. 3:44:59 CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 382.05, CUDA version 8.0, compute capability 6.1, 4096MB, 3557MB available, 9654 GFLOPS peak)
2017. 05. 17. 3:44:59 OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 382.05, device version OpenCL 1.2 CUDA, 8192MB, 3557MB available, 9654 GFLOPS peak) Could you please post yours? | |
ID: 47296 | Rating: 0 | rate: / Reply Quote | |
Please restart your PC, and check the first lines of the event log of BOINC manager for the GPU report. Here it is: Mo 22 Mai 2017 11:06:31 CEST | | CUDA: NVIDIA GPU 0: GeForce GTX 770 (driver version 375.39, CUDA version 8.0, compute capability 3.0, 1999MB, 1948MB available, 3693 GFLOPS peak) Mo 22 Mai 2017 11:06:31 CEST | | OpenCL: NVIDIA GPU 0: GeForce GTX 770 (driver version 375.39, device version OpenCL 1.2 CUDA, 1999MB, 1948MB available, 3693 GFLOPS peak) Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47306 | Rating: 0 | rate: / Reply Quote | |
You have a CC3 card and need an earlier driver before CUDA 8.0 and you will get Cuda 6.5 app. | |
ID: 47307 | Rating: 0 | rate: / Reply Quote | |
You have a CC3 card and need an earlier driver before CUDA 8.0 and you will get Cuda 6.5 app.According to the applications page, there's no CUDA6.5 client for Linux, so this won't work under Linux. | |
ID: 47308 | Rating: 0 | rate: / Reply Quote | |
You should check the venue of your host #342877. Then you should check the "GPUGrid settings" in your profile that the venue of your host #342877 should have the "Use NVidia GPU" selected, also the "Run only the selected applications" should have the "ACEMD long runs (8-12 hours on fastest GPU)" selected. <cc_config>
<log_flags>
<work_fetch_debug>
</log_flags>
</cc_config> If there's a cc_config.xml, then you should copy the following after the first line: <log_flags>
<work_fetch_debug>
</log_flags> Then click settings -> re-read configuration files, and update the GPUGrid project. Then post us the messages in the event log after the line: GPUGRID | update requested by user | |
ID: 47309 | Rating: 0 | rate: / Reply Quote | |
As said above, all the settings are correct as that machine picked tasks on a daily basis. Nothing was changed on the website's end. | |
ID: 47310 | Rating: 0 | rate: / Reply Quote | |
Also, I never used an .xml configuration files for GPUGRID.Exactly. You can set options by cc_config.xml which are not accessible trough the GUI of the BOINC manager (this parameter does not change the settings of GPUGrid, or any project). See the BOINC client configuration wiki for details. See this post about how to read the elaborate info provided by work_fetch_debug. Something has changed at GPUGRID's end such that my card does not receive work anymore.The previous GPUGrid client was CUDA6.5, your host has processed 126 (short runs) + 249 (long runs) of them. The CUDA6.5 client has deprecated, there's only CUDA8.0 client available for Linux. I think that you wouldn't receive CUDA6.5 tasks either. It appears to me that the auto-downloaded CUDA 8 app just won't work under Linux when shader model 3 is in use.The single CUDA8.0 task your host has received finished successfully on your host, so the CUDA 8 app is working on your host. We need to find the reason why your host does not asks for / provided with new tasks. BTW how many GPU project your host is attached to? | |
ID: 47311 | Rating: 0 | rate: / Reply Quote | |
You should check the venue of your host #342877. Then you should check the "GPUGrid settings" in your profile that the venue of your host #342877 should have the "Use NVidia GPU" selected, also the "Run only the selected applications" should have the "ACEMD long runs (8-12 hours on fastest GPU)" selected. Here are the messages: Di 23 Mai 2017 10:27:18 CEST | | Re-reading cc_config.xml Di 23 Mai 2017 10:27:18 CEST | | Config: GUI RPCs allowed from: Di 23 Mai 2017 10:27:18 CEST | | log flags: file_xfer, sched_ops, task, work_fetch_debug Di 23 Mai 2017 10:27:18 CEST | | [work_fetch] Request work fetch: Core client configuration Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] ------- start work fetch state ------- Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] --- project states --- Di 23 Mai 2017 10:27:22 CEST | GPUGRID | [work_fetch] REC 6457.612 prio -1.000 can request work Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] --- state for CPU --- Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] shortfall 299527.82 nidle 0.00 saturated 24073.06 busy 0.00 Di 23 Mai 2017 10:27:22 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] --- state for NVIDIA GPU --- Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] shortfall 84909.97 nidle 0.00 saturated 1490.03 busy 0.00 Di 23 Mai 2017 10:27:22 CEST | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 693.27, inc 600.00) Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] ------- end work fetch state ------- Di 23 Mai 2017 10:27:22 CEST | | [work_fetch] No project chosen for work fetch Di 23 Mai 2017 10:27:26 CEST | GPUGRID | update requested by user Di 23 Mai 2017 10:27:26 CEST | | [work_fetch] Request work fetch: project updated by user Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] ------- start work fetch state ------- Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] --- project states --- Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] REC 6457.612 prio -1.000 can request work Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] --- state for CPU --- Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] shortfall 299556.98 nidle 0.00 saturated 24061.56 busy 0.00 Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] --- state for NVIDIA GPU --- Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] shortfall 84914.93 nidle 0.00 saturated 1485.07 busy 0.00 Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] share 1.000 Di 23 Mai 2017 10:27:27 CEST | | [work_fetch] ------- end work fetch state ------- Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 1.00 req_secs 84914.93 Di 23 Mai 2017 10:27:27 CEST | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (84914.93 sec, 1.00 inst) Di 23 Mai 2017 10:27:27 CEST | GPUGRID | Sending scheduler request: Requested by user. Di 23 Mai 2017 10:27:27 CEST | GPUGRID | Requesting new tasks for NVIDIA GPU Di 23 Mai 2017 10:27:29 CEST | GPUGRID | Scheduler request completed: got 0 new tasks Di 23 Mai 2017 10:27:29 CEST | GPUGRID | No tasks sent Di 23 Mai 2017 10:27:29 CEST | | [work_fetch] Request work fetch: RPC complete Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] ------- start work fetch state ------- Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] --- project states --- Di 23 Mai 2017 10:27:34 CEST | GPUGRID | [work_fetch] REC 6457.612 prio 0.000 can't request work: scheduler RPC backoff (25.93 sec) Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] --- state for CPU --- Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] shortfall 299601.76 nidle 0.00 saturated 24047.75 busy 0.00 Di 23 Mai 2017 10:27:34 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] --- state for NVIDIA GPU --- Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] shortfall 84921.89 nidle 0.00 saturated 1478.11 busy 0.00 Di 23 Mai 2017 10:27:34 CEST | GPUGRID | [work_fetch] share 0.000 Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] ------- end work fetch state ------- Di 23 Mai 2017 10:27:34 CEST | | [work_fetch] No project chosen for work fetch Di 23 Mai 2017 10:27:45 CEST | | Contacting account manager at https://bam.boincstats.com/ Di 23 Mai 2017 10:27:47 CEST | | Account manager: BAM! User: 3739, Michael H.W. Weber Di 23 Mai 2017 10:27:47 CEST | | Account manager: BAM! Host: 653689 Di 23 Mai 2017 10:27:47 CEST | | Account manager: Number of BAM! connections for this host: 7467 Di 23 Mai 2017 10:27:47 CEST | | Account manager contact succeeded Di 23 Mai 2017 10:28:00 CEST | | [work_fetch] Request work fetch: Backoff ended for GPUGRID Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] ------- start work fetch state ------- Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- project states --- Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] REC 6457.612 prio -1.000 can request work Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- state for CPU --- Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] shortfall 299801.40 nidle 0.00 saturated 23987.94 busy 0.00 Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] --- state for NVIDIA GPU --- Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] shortfall 84953.52 nidle 0.00 saturated 1446.48 busy 0.00 Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] share 1.000 Di 23 Mai 2017 10:28:04 CEST | | [work_fetch] ------- end work fetch state ------- Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.00 fetch share 1.00 req_inst 1.00 req_secs 84953.52 Di 23 Mai 2017 10:28:04 CEST | GPUGRID | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (84953.52 sec, 1.00 inst) Di 23 Mai 2017 10:28:04 CEST | GPUGRID | Sending scheduler request: To fetch work. Di 23 Mai 2017 10:28:04 CEST | GPUGRID | Requesting new tasks for NVIDIA GPU Di 23 Mai 2017 10:28:05 CEST | GPUGRID | Scheduler request completed: got 0 new tasks Di 23 Mai 2017 10:28:05 CEST | GPUGRID | No tasks sent Di 23 Mai 2017 10:28:05 CEST | GPUGRID | [work_fetch] backing off NVIDIA GPU 740 sec Di 23 Mai 2017 10:28:05 CEST | | [work_fetch] Request work fetch: RPC complete Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] ------- start work fetch state ------- Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] --- project states --- Di 23 Mai 2017 10:28:10 CEST | GPUGRID | [work_fetch] REC 6457.612 prio 0.000 can't request work: scheduler RPC backoff (25.92 sec) Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] --- state for CPU --- Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] shortfall 299843.54 nidle 0.00 saturated 23976.43 busy 0.00 Di 23 Mai 2017 10:28:10 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] --- state for NVIDIA GPU --- Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] shortfall 84959.51 nidle 0.00 saturated 1440.49 busy 0.00 Di 23 Mai 2017 10:28:10 CEST | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 734.50, inc 600.00) Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] ------- end work fetch state ------- Di 23 Mai 2017 10:28:10 CEST | | [work_fetch] No project chosen for work fetch Di 23 Mai 2017 10:28:37 CEST | | [work_fetch] Request work fetch: Backoff ended for GPUGRID Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] ------- start work fetch state ------- Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] target work buffer: 86400.00 + 0.00 sec Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] --- project states --- Di 23 Mai 2017 10:28:41 CEST | GPUGRID | [work_fetch] REC 6457.298 prio -1.000 can request work Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] --- state for CPU --- Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] shortfall 300039.40 nidle 0.00 saturated 23916.62 busy 0.00 Di 23 Mai 2017 10:28:41 CEST | GPUGRID | [work_fetch] share 0.000 account manager prefs Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] --- state for NVIDIA GPU --- Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] shortfall 84989.53 nidle 0.00 saturated 1410.47 busy 0.00 Di 23 Mai 2017 10:28:41 CEST | GPUGRID | [work_fetch] share 0.000 project is backed off (resource backoff: 704.19, inc 600.00) Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] ------- end work fetch state ------- Di 23 Mai 2017 10:28:41 CEST | | [work_fetch] No project chosen for work fetch Remember: All GPUGRID projects are chosen for this machine. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47313 | Rating: 0 | rate: / Reply Quote | |
Thank you for posting the details, but I didn't get any smarter. Remember: All GPUGRID projects are chosen for this machine. That's ok. Do you have the "Use NVidia GPU" and the "Use Graphics Processing Unit (GPU) if available" selected in GPUGrid preferences? Do you have at least 8 GB disk space in the partition the BOINC data directory resides? How many other GPU project this host is attached to? You could try to increase the work buffer (it is set to 1 day now) for testing. If nothing works try the following: 1. detach this host from all projects 2. uninstall BOINC manager 3. restart your host 4. install BOINC manager 5. attach this host only to GPUGrid, and test it 6. attach this host to other projects one by one, only one at a day. | |
ID: 47318 | Rating: 0 | rate: / Reply Quote | |
Do you have the "Use NVidia GPU" and the "Use Graphics Processing Unit (GPU) if available" selected in GPUGrid preferences? Yes. Do you have at least 8 GB disk space in the partition the BOINC data directory resides? Will check, but would quite certainly say yes. How many other GPU project this host is attached to? Just Primegrid and GPUGRID but even if I suspend Primegrid, the machine won't fetch work for GPUGRID. You could try to increase the work buffer (it is set to 1 day now) for testing. I did, but it won't change the situation even if I set the work buffer to 10/10 days. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47321 | Rating: 0 | rate: / Reply Quote | |
Getting new work is a two-part collaboration between your computer and the project server. | |
ID: 47322 | Rating: 0 | rate: / Reply Quote | |
After that, it's a question of verifying that your GPU's 'compute capability' and graphics driver together match the current minimum project requirements.It's been done. Moreover this host has already successfully completed a single CUDA8.0 task, but no more sent by the project. I've experienced the same behavior once when I was trying my GTX 1080 under Linux. I thought then that I'd messed up something while trying to make the SWAN_SYNC work under Linux (well, I couldn't). This host stopped receiving work, even if GPUGrid was the only project on this host all the time. Then I've installed Windows 10 on the same hardware, and it has received work, and it haven't stopped receiving work after I've set SWAN_SYNC on. | |
ID: 47323 | Rating: 0 | rate: / Reply Quote | |
One more to check - are you allowing 'ACEMD long runs' (project preferences)? As said above: Yes, ALL GPUGRID subprojects are allowed for this machine. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47324 | Rating: 0 | rate: / Reply Quote | |
I would detach from everything and reattach only to GPUGrid as Retvari suggests. If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know. | |
ID: 47325 | Rating: 0 | rate: / Reply Quote | |
I would detach from everything and reattach only to GPUGrid as Retvari suggests. I detached from GPUGRID, rebooted the system and re-attached to GPUGRID. No improvement. If that doesn't work, you have found the one hardware/software configuration that just doesn't obey the rules as we know them. It happens as you know. No, I do not know or accept that. This is science, not homeopathy (although homeopathy at least offers the placebo effect (for those of us who are believers) which can't principally be excluded to be something scientifically accessable, too - although we still have not found a clue how that might be possible). Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47326 | Rating: 0 | rate: / Reply Quote | |
I am not sure you followed the instructions. Homeopathy is your idea, not mine. | |
ID: 47327 | Rating: 0 | rate: / Reply Quote | |
It's frustrating that the server doesn't give more details in its reply. | |
ID: 47328 | Rating: 0 | rate: / Reply Quote | |
I would detach from everything and reattach only to GPUGrid as Retvari suggests. Are you sure its not your work buffer or some other config in BOINC | |
ID: 47329 | Rating: 0 | rate: / Reply Quote | |
Are you sure its not your work buffer or some other config in BOINC Yes, I am sure about that. This machine just did not receive any work anymore from one day to the other without me having altered any of the BOINC or project settings. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47330 | Rating: 0 | rate: / Reply Quote | |
I have two 750ti's on different machines at different physical locations using the same BOINC and GPUGrid settings/prefs. I noticed one was getting GPUGrid work and the other wasn't. After a couple of days of not getting work I began to investigate. I found this thread did some reading and noticed the driver difference between the two. I updated the driver to the newest (382.33) and got work. | |
ID: 47332 | Rating: 0 | rate: / Reply Quote | |
Moreover this host has already successfully completed a single CUDA8.0 task... How did you actually find out about that? I could't see any of the completed WUs in my client's history even before I started posting this thread. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47333 | Rating: 0 | rate: / Reply Quote | |
First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host. | |
ID: 47334 | Rating: 0 | rate: / Reply Quote | |
First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host. Indeed. Never checked that link. One question though: Why aren't all the tasks completed using CUDA 6.5 indicated as valid (although they all were valid)? Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47335 | Rating: 0 | rate: / Reply Quote | |
First post has a link to a host. Then on that page, you can click Application Details, to see application details for that host. Your assumption, that they were all valid, seems invalid :) From my experience, if a task is suspended+resumed, or stopped+resumed, then it has a chance of being invalid, even if you watched it complete without error. Something in the validator must not like the output, sometimes, when those scenarios happen. Getting back on topic, I'm sure that GPUGrid changed their logic to decide when to give hosts work, and I'm fairly certain that "driver version detected" has a hand in that criteria. I wonder if they screwed something up for the app version criteria, for the 700-series-GPUs on linux? Also, can you see if you can upgrade your driver (I looked briefly and there might be a minor update available to you). | |
ID: 47336 | Rating: 0 | rate: / Reply Quote | |
Your assumption, that they were all valid, seems invalid :) Not really. They generated at least 73 billions of credits, so a few should have been OK. :) The point is that not a single valid task is listed (and none invalid, too). Getting back on topic, I'm sure that GPUGrid changed their logic to decide when to give hosts work, and I'm fairly certain that "driver version detected" has a hand in that criteria. I wonder if they screwed something up for the app version criteria, for the 700-series-GPUs on linux? Two things: (1) the NVIDIA proprietary driver is updated from time to time using auto-update of Ubuntu. I actually do not like to change this manually as everything except for GPUGRID works perfectly. (2) This GTX 770 machine uses the same driver as my GTX 970 machine. The latter does receive tasks on a daily basis, the former not. So, I don't really see a reason for why the current driver should be the problem. Especially since, as stated above, even the GTX 770 completed a CUDA 8 WU successfully. But why should I care? It is not my project and the GTX 770 now contributes to some other project until the GPGRID team decides to do something in order to keep or increase their number of contributers. I find it really kind of strange that - if I got it correctly - so far this topic has exclusively been discussed by volunteers? Thank you guys, I think you did your best. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47338 | Rating: 0 | rate: / Reply Quote | |
The point is that not a single valid task is listed (and none invalid, too). Don't worry about that. Task data is kept in a short-term 'transactional' database, and purged (to save space and processing time) when no longer needed - usually after 10 days or so. The important scientific data is transferred to a long-term scientific database and kept indefinitely. But from the same 'application details' link for your machine, we can see for cuda65 (long tasks): Number of tasks completed 249 Max tasks per day 1 Number of tasks today 0 Consecutive valid tasks 0 'Max tasks per day' and 'consecutive valid tasks' together imply that your machine produced a considerable number of invalid tasks at some point: no shame in that, we all did the same thing when the cuda65 licence expired, but it shows the sort of inferences you can draw. Two things: I'm not a Linux user, but I have read comments that Linux GPU drivers tend to be compiled against a specific Linux kernel. If your kernel also auto-updates, you may need to take precautions to ensure that your kernel and driver updates are kept in sync. | |
ID: 47339 | Rating: 0 | rate: / Reply Quote | |
But from the same 'application details' link for your machine, we can see for cuda65 (long tasks): Hm, I do not understand how that conclusion can be drawn. The number of tasks is anyway limited to two per day by the GPUGRID server. The GTX 770 mostly got long runs, so it rarely can complete more than one per day. Moreover, I virtually checked the machine and its output on a daily basis during the entire year 2016 and early 2017. Rarely have I seen an invalid task, and when it happened, I caused it by accidentally updating the system inlcuding NVIDIA drivers during full DC operation. I must confess, though, that around the time when GPUGRID stopped sending tasks to my system, I had not checked regularly for probably a few weeks. IF there had been many, many consecutive errors at the transition from CUDA 6.5 to 8.0, wouldn't it be possible that some information flag is stored somewhere locally on my machine (or on the GPUGRID server) that causes my system being marked as permanently unreliable? And that this flag somewhow has not yet been removed and now causes WUs not to be sent? Hm, probably also not the case as it completed a CUDA 8.0 task...
See, that is exactly why I am hesitant to manually install a more recent NVIDIA driver: When you use the console to update the whole system, everything is coordinately (!) brought to the most recent state. A new kernel plus the corresponding and tested GPU driver will be delivered. For now, I will just wait and see whether GPUGRID will again retrieve WUs for my GTX 770 after a future system update with even more recent drivers than the ones I currently have in use. Until then, other DC projects will be supported. Michael. ____________ President of Rechenkraft.net - Germany's first and largest distributed computing organization. | |
ID: 47342 | Rating: 0 | rate: / Reply Quote | |
IF there had been many, many consecutive errors at the transition from CUDA 6.5 to 8.0, wouldn't it be possible that some information flag is stored somewhere locally on my machine (or on the GPUGRID server) that causes my system being marked as permanently unreliable? And that this flag somewhow has not yet been removed and now causes WUs not to be sent? Hm, probably also not the case as it completed a CUDA 8.0 task...This came to my mind too. Perhaps you should try to force the BOINC manager to request a new host ID for your host. You can do it by stopping the BOINC manager, editing the client_state.xml, searching for <hostid>342877</hostid>, and replace the number to the number of a previous host of yours (or a random number, if you don't have an older host), saving the client_state.xml, and restaring the BOINC manager. | |
ID: 47345 | Rating: 0 | rate: / Reply Quote | |
Thanks for this suggestion. | |
ID: 47346 | Rating: 0 | rate: / Reply Quote | |
The information sent out to our clients (on Windows, at least) suggests that even the current cuda80 applications require 512 MB of GPU RAM - which sounds too small, and is probably a left-over from earlier, simpler days. | |
ID: 47347 | Rating: 0 | rate: / Reply Quote | |
Ok, so today Ubuntu has released an NVIDIA driver update to version 375.66. | |
ID: 47351 | Rating: 0 | rate: / Reply Quote | |
I think the bottom line is: Gpugrid changed servers - we'll be right back. When they did that something changed. I was averaging aprox. 500,000 credits per day pre change - now: "... got 0 tasks ...". | |
ID: 47734 | Rating: 0 | rate: / Reply Quote | |
Michael H.W. Weber: | |
ID: 47735 | Rating: 0 | rate: / Reply Quote | |
If you can go back to a driver that supports cuda 6.5 and you should get the old App as I do on my 660ti cc 3.0 | |
ID: 47740 | Rating: 0 | rate: / Reply Quote | |
If you can go back to a driver that supports cuda 6.5 and you should get the old App as I do on my 660ti cc 3.0This solution works only for Windows hosts, as there's no CUDA6.5 client present for Linux (see the applications page) | |
ID: 47741 | Rating: 0 | rate: / Reply Quote | |
If you can go back to a driver that supports cuda 6.5 and you should get the old App as I do on my 660ti cc 3.0This solution works only for Windows hosts, as there's no CUDA6.5 client present for Linux (see the applications page) Aah, sorry. | |
ID: 47743 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : GTX 770 won't get work