Advanced search

Message boards : Graphics cards (GPUs) : GPUs not being used?

Author Message
Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 38982 - Posted: 20 Nov 2014 | 23:13:01 UTC

I built a box and put 1 EVGA GeForceGTX 780 in it. It worked fine with GPUGRID for 2 months. I just put 2 more in there; one EVGA and one PNY (yes completely compatible and working in the system.) Now that they are in, the first one is working exactly as it was at 90-92%. The PNY is working also at 90%. The BOINC Manager shows 2 tasks being run now. The third card, which is showing as properly installed, works with a monitor as the main monitor, and also shows up with NVidiaInspector (like the other 2) is not working any tasks and is sitting at 0%. This third one will, on occasion, go up to 9% or somewhere lower than that, as I switch windows, but I cannot get it to do a task with GPUGRID. I don't see any settings for the project or the BOINC Manager itself, so I need so advice or help on how to get this third graphics card working on its own task like the other 2 are. TY

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 206
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38985 - Posted: 20 Nov 2014 | 23:44:20 UTC - in response to Message 38982.
Last modified: 20 Nov 2014 | 23:45:06 UTC

Hi, possibly modifying - cc_config.xml - to use all GPUs BOINC system:

<cc_config>
<options>
<report_results_immediately>1</report_results_immediately>
<use_all_gpus>1</ use_all_gpus>
</options>
</cc_config>

With this configuration task forces report without waiting and force BOINC to use all GPUs present.

I hope will be useful. Greetings.

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 38986 - Posted: 21 Nov 2014 | 0:20:18 UTC - in response to Message 38985.

OK, when I do a search for that file, I can't find it. I found reference to it in stdoutgpudetect.txt where it keeps repeating the message:

20-Nov-2014 05:10:51 [---] cc_config.xml not found - using defaults

Should I make a new file with that name and the text you gave, should I reinstall, or is there a way to force the program to create a new copy of it?

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 38987 - Posted: 21 Nov 2014 | 1:16:09 UTC - in response to Message 38986.

I went ahead and made that file and started BOINC again. It looks like it has been accepted so I will wait out a full cycle of tasks to see if the other GPU kicks in.

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 206
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38990 - Posted: 21 Nov 2014 | 12:08:31 UTC - in response to Message 38987.

I went ahead and made that file and started BOINC again. It looks like it has been accepted so I will wait out a full cycle of tasks to see if the other GPU kicks in.


Hello: "cc_config.xml" is on - boinc / data - (using OS-Windows) if you are using a version of Boinc> 7.2.42 will be a file with many variables (all 0) just look for the same as I have appointed you modify them and putting - 1 - instead of - 0 -

If an older version just paste the file - cc_config.xml - in - boinc / data - and restart in the "Event Log" Boinc Manager will see if you read the configuration file and if it detects all GPUs. Greetings.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38999 - Posted: 21 Nov 2014 | 23:47:00 UTC

It looks like this thread needs a link on how to setup cc_config.xml.
Here:
http://boinc.berkeley.edu/wiki/Client_configuration

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39151 - Posted: 16 Dec 2014 | 3:44:24 UTC - in response to Message 38999.

Carlesa, thank you again.

Jacob, thank you very much for that page (and linked pages from that page)!

Sorry it took a while. After a short while of working, that PC stopped working. After putting in the second and third GPU and adding another 4Tb HDD I exceeded the PSU and killed it. Now that I have a new PSU in it I have modified the cc_config.xml a little more. It is still only running the 2 GPUs and the third is doing nothing without help.

By "help" I mean that I have cheated the instructions that BOINC is running under and manually added a slot "2" to the Slots folder of BOINC in the ProgramData folder. Then I copy the oldest of the 2 slots contents (minus the lockfile) to the new slot folder. After that, I open cmd, cd to the slot 2 folder, then do the command:

C:\ProgramData\BOINC\slots\2>C:\ProgramData\BOINC\projects\www.gpugrid.net\acemd.847-65.exe projects/www.gpugrid.net/acemd.847-65.exe --device 2
The output to this is:
# ACEMD Molecular Dynamics Version [3212]
# CUDA Synchronisation mode: BLOCKING
# CUDA Synchronisation mode: BLOCKING
# SWAN: Created context 0 on GPU 2
# SWAN Device 2 :
# Name : GeForce GTX 780
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.5
# PCI ID : 0000:03:00.0
# Device clock : 993MHz
# Memory clock : 3004MHz
# Memory width : 384bit
# SWAN Device 2 :
# Name : GeForce GTX 780
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.5
# PCI ID : 0000:03:00.0
# Device clock : 993MHz
# Memory clock : 3004MHz
# Memory width : 384bit
# Driver version : r343_00 : 34475
# SWAN: Configuring Peer Access:
# -
# SWAN NVAPI Version: NVidia Complete Version 1.10


Hopefully I am not corrupting the results, but it then goes and does the one task twice as fast taking the total GPU usage from 48% to 72.5%. Notwithstanding, it only works per task and once it is done (twice as fast) the cmd comes back to a command prompt and BOINC continues with only the 2 tasks running on the first two GPUs.

So I still need help. I am just not sure of anything right now when it comes to what is going wrong, but that may be my lack of knowledge about the program.
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39152 - Posted: 16 Dec 2014 | 3:56:10 UTC - in response to Message 39151.

I actually do see a few corrupted (errored out) tasks in my online task logs, so it looks like cheating does have its consequences. I need to find a solution that actually downloads and works 3 tasks as one per GPU the way the program is supposed to and not 'rigged to blow'.

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39153 - Posted: 16 Dec 2014 | 4:01:36 UTC - in response to Message 39152.

In addition to any help that can be offered on these forums, is anyone willing to help people on here actually directly check my installation, files, settings, etc via something like TeamViewer? Direct help can save a lot of frustration for me trying to make it work and for those helping by just troubleshooting and doing the fix instead of the back and forth. TYYTYTYTYTYVM in advance for any help or suggestions that are given.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39195 - Posted: 18 Dec 2014 | 19:47:32 UTC - in response to Message 39153.
Last modified: 18 Dec 2014 | 19:48:33 UTC

Your GPU's are too hot. You need to keep them reasonably cool. Use MSI Afterburner or similar to set fan speeds.

FAQ - Useful Tools
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39221 - Posted: 19 Dec 2014 | 23:42:24 UTC - in response to Message 39195.
Last modified: 19 Dec 2014 | 23:43:16 UTC

I am using nvidiaInspector's Overclocking options to do nothing but up the fan speed, but what would make you think they are too hot? Does heated GPUs cause the BOINC Manager to only load 2 slots and use 2 devices when 3 are noticed by the OS, nvidiaInspector, and can manually be loaded via command line? I wouldn't think heat is the reason it only loads the top 2 (device 0 and device 1) even if there are 3 slots with units in them, which rarely happens unless I "Suspend" one unit of work and it loads a new third one to run on the one I turned off. But if I do "Resume" any '3rd' unit after 2 are already running, it will sit in "Waiting" mode until one of the other 2 finishes and then it will turn back on and run.

I even tried changing the third "init_data.xml" while the manager was off to

<gpu_device_num>2</gpu_device_num>
<gpu_opencl_dev_index>2</gpu_opencl_dev_index>
, but as soon as the manager is started again, it changes those values back to 0 or 1 and the same result happens.

At this point I have to ask...
Does anyone run 3 different GPUs in one computer and all 3 GPUs load and run work units continuously? Is the program built to even allow that?

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39222 - Posted: 19 Dec 2014 | 23:48:01 UTC - in response to Message 39221.
Last modified: 19 Dec 2014 | 23:51:00 UTC

I mean I honestly bought 2 extra $400 GPUs to run THIS project and it is frustrating that one refuses to be used. I don't even game!

Which BTW, I have switched positions of the GPU cards and no matter which configuration, the top 2 are used by GPUGRID and the bottom one sits idle. So device 0 and device 1 run the project and device 2 will not.

mikey
Send message
Joined: 2 Jan 09
Posts: 291
Credit: 2,044,691,115
RAC: 10,281,271
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39223 - Posted: 20 Dec 2014 | 11:42:34 UTC - in response to Message 39222.
Last modified: 20 Dec 2014 | 11:43:28 UTC

I mean I honestly bought 2 extra $400 GPUs to run THIS project and it is frustrating that one refuses to be used. I don't even game!

Which BTW, I have switched positions of the GPU cards and no matter which configuration, the top 2 are used by GPUGRID and the bottom one sits idle. So device 0 and device 1 run the project and device 2 will not.


Are you leaving any cpu cores free for the gpu's to use? I guessing you DID do the cc_config.xml file to <use_all_gpus> too? Does Boinc itself see all 3 gpu's? Look at the 'event log' on startup and it should list all 3 gpu's, if not you may have to load the drivers again for the 3rd card. Windows sometimes requires that to happen for each gpu in the system, other times it doesn't. After that it may come down to the motherboard, what brand and model do you have?

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39230 - Posted: 20 Dec 2014 | 19:01:49 UTC - in response to Message 39223.

The log does see all three GPUs listed one 3 different lines and numbers them 0, 1, and 2.
I did change <use_all_gpus> to a value of 1.
I have 1 CPU allocated at the value of 1%. The rest of my CPU cores and usage I have allocated to distributed.net and have for years. Does the amount of cores or the % of cores from the CPU affect the project's usage of GPUs to where it would only allow 2 GPUs to run tasks?

As far as the specs, I am running an Intel Core i7 4960X CPU @ 3.60GHz OCd to 4124.9 MHz (33.0 x 125.0 MHz), 64GB Quad Channel DDR3 RAM @ 833.4 MHz, on an ASUSTeK X79-DELUXE Rev 1.xx, with American Megatrends Inc. BIOS 0701 - 01/07/2014 ROM size 8192 KB.

Hope this helps you help me. (Again, I have TeamViewer running if any kind soul would like to pop in and take a look, I would be happy to allow that.)

mikey
Send message
Joined: 2 Jan 09
Posts: 291
Credit: 2,044,691,115
RAC: 10,281,271
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39241 - Posted: 21 Dec 2014 | 12:39:01 UTC - in response to Message 39230.
Last modified: 21 Dec 2014 | 12:39:48 UTC

The log does see all three GPUs listed one 3 different lines and numbers them 0, 1, and 2.
I did change <use_all_gpus> to a value of 1.
I have 1 CPU allocated at the value of 1%. The rest of my CPU cores and usage I have allocated to distributed.net and have for years. Does the amount of cores or the % of cores from the CPU affect the project's usage of GPUs to where it would only allow 2 GPUs to run tasks?

As far as the specs, I am running an Intel Core i7 4960X CPU @ 3.60GHz OCd to 4124.9 MHz (33.0 x 125.0 MHz), 64GB Quad Channel DDR3 RAM @ 833.4 MHz, on an ASUSTeK X79-DELUXE Rev 1.xx, with American Megatrends Inc. BIOS 0701 - 01/07/2014 ROM size 8192 KB.

Hope this helps you help me. (Again, I have TeamViewer running if any kind soul would like to pop in and take a look, I would be happy to allow that.)


Try suspending your cpu project and see if the 3rd gpu starts crunching, if so then yes it's causing problems.

As for "I did change <use_all_gpus> to a value of 1", 1 means yes and zero means no, so yes you should be using all 3.

There IS a problem at some projects where Boinc won't use two Nvidia cards no matter what the settings are, I wonder if you have found a new problem with 3 cards? The only thing someone can do at those projects is use the <exclude_gpu> line to make one crunch for a different project. To test that do you happen to have an AMD card laying around? If so can you take out the 3rd Nvidia gpu and put in the AMD one and see if it tries to get work or not?

Have you tried using a 'dummy plug' on the cards that do NOT have a monitor plugged into them yet? Windows has a bad habit of disabling things during startup if nothing is plugged into a device, if a gpu is disabled that way it won't be enabled except thru a restart.

The only other thing I can think of is have you looked on the Asus message boards to see if there is a problem using 3 cards on that model motherboard?

I do not use Team Viewer so would not feel comfortable using it, sorry.

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39248 - Posted: 21 Dec 2014 | 18:21:47 UTC - in response to Message 39241.

OK, quick update.

After killing the CPU project, still no love for the GPUGRID project getting more units than 2 to work on at one time.

I never had any graphics cards and only ever used on-board graphics before building this rig. First time I ever had the money to make rather than buy mass produced cheap rigs.

I have not tried using a dummy plug, but thanks for asking. Thinking about that question, I noticed that the one that won't get a task is the one that I have the monitor actually plugged in to. That may or may not make a difference since I rebooted with the monitor unplugged from the PC completely and it still won't load a third task.

I have been watching the Event Log as I try to Update for new tasks and start and stop the BOINC Manager and I noticed that even when I Update for a new task, it reads

12/21/2014 1:03:20 PM | GPUGRID | Sending scheduler request: Requested by user.
12/21/2014 1:03:20 PM | GPUGRID | Not requesting tasks
12/21/2014 1:03:22 PM | GPUGRID | Scheduler request completed
This leads me to believe that the issue is in the program itself and not with the hardware. This may be a false lead, but it is not a big leap to get to that conclusion either. If the program sees 3 GPUs
12/21/2014 12:56:04 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2779MB available, 4878 GFLOPS peak)
12/21/2014 12:56:04 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 12:56:04 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 12:56:04 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2779MB available, 4878 GFLOPS peak)
12/21/2014 12:56:04 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 12:56:04 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 12:56:04 PM | | Host name: BeastMode
12/21/2014 12:56:04 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz [Family 6 Model 62 Stepping 4]
12/21/2014 12:56:04 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx tm2 dca pbe fsgsbase smep
12/21/2014 12:56:04 PM | | OS: Microsoft Windows 8.1: Professional x64 Edition, (06.03.9600.00)
12/21/2014 12:56:04 PM | | Memory: 63.94 GB physical, 107.43 GB virtual
12/21/2014 12:56:04 PM | | Disk: 465.42 GB total, 337.59 GB free
12/21/2014 12:56:04 PM | | Local time is UTC -5 hours
12/21/2014 12:56:04 PM | | Config: report completed tasks immediately
12/21/2014 12:56:04 PM | | Config: use all coprocessors
12/21/2014 12:56:04 PM | | Config: fetch minimal work
12/21/2014 12:56:04 PM | | Config: fetch on update
12/21/2014 12:56:04 PM | GPUGRID | URL http://www.gpugrid.net/; Computer ID xxxxxx; resource share 100
12/21/2014 12:56:04 PM | GPUGRID | General prefs: from GPUGRID (last modified 19-Dec-2014 18:57:09)
12/21/2014 12:56:04 PM | GPUGRID | Computer location: home
12/21/2014 12:56:04 PM | GPUGRID | General prefs: no separate prefs for home; using your defaults
12/21/2014 12:56:04 PM | | Preferences:
12/21/2014 12:56:04 PM | | max memory usage when active: 65470.82MB
12/21/2014 12:56:04 PM | | max memory usage when idle: 65470.82MB
12/21/2014 12:56:04 PM | | max disk usage: 232.71GB
12/21/2014 12:56:04 PM | | max CPUs used: 1
12/21/2014 12:56:04 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
12/21/2014 12:56:04 PM | | Not using a proxy
but won't get tasks for them all, then a hardware issue seems less likely than something in the code itself or a setting I am just missing. I do have it set on the site to fetch work for 5 days, but I am not sure if that setting is only valid if you have other connection settings set?

I will check the ASUS website for issues with 3 GPUs.

If I did find a new bug with using 3 GPUs, how/to whom would I report such a thing? Is this forum enough for them to see that and respond, test, or fix the issue?

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39250 - Posted: 21 Dec 2014 | 21:11:52 UTC
Last modified: 21 Dec 2014 | 21:29:24 UTC

There is a lot of confusion in this thread.

I run 3 GPUs, and have no work fetch issues with them while running BOINC 7.4.36.

Manipulating slots directories, or running executables directly, or editing projects/slots .xml files, is wrong wrong wrong. Don't do it.

I see you have BOINC showing 3 GPUs in the Event Log when BOINC starts up. That's good. So, it is finding them. Is the concern that you have downloaded tasks that won't run? Or is the concern that it won't even download 3 tasks?

Assuming it is a work fetch concern... Okay, edit your cc_config.xml and turn on <work_fetch_debug>, then restart BOINC, then show us what a work fetch iteration looks like.

Also, please note that, at the moment, GPUGrid is on fumes, and may not be able to provide GPU work for every request. If you are attached to other projects that have GPU apps, you should be able to get GPU work from them.

The best way for us to help you diagnose a work fetch behavior, is to turn on work_fetch_debug and give us some output to look at. I'm an expert at looking at work fetch output. If you'd like to try to translate it yourself, feel free to have a look at this post:
http://www.bitcoinutopia.net/bitcoinutopia/forum_thread.php?id=691&postid=7369

Once you give us some debug output, we can try to help you further.

PS: If you are still up for doing a TeamViewer session, I would be willing to connect and take a peek. I'm an excellent troubleshooter, usually. Send me a Private Message with details.

Regards,
Jacob

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39252 - Posted: 22 Dec 2014 | 2:52:17 UTC - in response to Message 39250.
Last modified: 22 Dec 2014 | 2:53:16 UTC

{{{Warning, long answer on its way.}}}

Thank you very much for your reply Jacob. I think I caused some of the confusion by being ignorant of what information to provide, some because of my ignorance of how BOINC works compared to other distributed projects I have worked on in the past, and some because of my fondness and willingness to troubleshoot and tinker to fix things in order to understand them rather than understand them in order to troubleshoot and tinker. Some of the confusion was also caused by replies to my issues with good answers that I just didn't understand or did not apply.

I think I understand your information better than much of the answers and help that I have gotten so far. Deciphering (not writing) code and reading the manual are two of my strong points because of my previous job working with the coders and doing customer support for a project I learned first for the company after our company acquired a different company that relied more on code than on Windows front end programs. I was pretty much in charge of helping rewrite the manual while simultaneously going through the manual with the product in hand making sure what the manual said is what the product did. Then I was tasked with teaching much of the rest of the staff on the product line that the company eventually made its main line for years. After that, I did customer service and troubleshooting along with working back and forth between the coders and the CS dept on bugs, new issues, upgrades, and old versions. So reading your information about logs and reading the linked 'man' pages on scheduling, I know much of why I confused people in my requests and why they were confused with my answers.

So I turned the BOINC client. Then I turned on the <work_fetch_debug> and thought that while I was at it, I would turn on the <sched_op_debug> to see the output of both. Here is the result:

12/21/2014 8:35:17 PM | | Starting BOINC client version 7.4.27 for windows_x86_64
12/21/2014 8:35:17 PM | | log flags: file_xfer, sched_ops, task, sched_op_debug, slot_debug, task_debug
12/21/2014 8:35:17 PM | | log flags: work_fetch_debug
12/21/2014 8:35:17 PM | | Libraries: libcurl/7.33.0 OpenSSL/1.0.1h zlib/1.2.8
12/21/2014 8:35:17 PM | | Data directory: C:\ProgramData\BOINC
12/21/2014 8:35:17 PM | | Running under account Mike
12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 8:35:17 PM | | Host name: BeastMode
12/21/2014 8:35:17 PM | | Processor: 12 GenuineIntel Intel(R) Core(TM) i7-4960X CPU @ 3.60GHz [Family 6 Model 62 Stepping 4]
12/21/2014 8:35:17 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes f16c rdrandsyscall nx lm avx vmx tm2 dca pbe fsgsbase smep
12/21/2014 8:35:17 PM | | OS: Microsoft Windows 8.1: Professional x64 Edition, (06.03.9600.00)
12/21/2014 8:35:17 PM | | Memory: 63.94 GB physical, 107.43 GB virtual
12/21/2014 8:35:17 PM | | Disk: 465.42 GB total, 337.70 GB free
12/21/2014 8:35:17 PM | | Local time is UTC -5 hours
12/21/2014 8:35:17 PM | | Config: report completed tasks immediately
12/21/2014 8:35:17 PM | | Config: use all coprocessors
12/21/2014 8:35:17 PM | | Config: fetch minimal work
12/21/2014 8:35:17 PM | | Config: fetch on update
12/21/2014 8:35:17 PM | GPUGRID | URL http://www.gpugrid.net/; Computer ID 189656; resource share 100
12/21/2014 8:35:17 PM | GPUGRID | General prefs: from GPUGRID (last modified 21-Dec-2014 12:59:37)
12/21/2014 8:35:17 PM | GPUGRID | Computer location: home
12/21/2014 8:35:17 PM | GPUGRID | General prefs: no separate prefs for home; using your defaults
12/21/2014 8:35:17 PM | | Preferences:
12/21/2014 8:35:17 PM | | max memory usage when active: 65470.82MB
12/21/2014 8:35:17 PM | | max memory usage when idle: 65470.82MB
12/21/2014 8:35:17 PM | | max disk usage: 232.71GB
12/21/2014 8:35:17 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
12/21/2014 8:35:17 PM | | [work_fetch] Request work fetch: Prefs update
12/21/2014 8:35:17 PM | | [work_fetch] Request work fetch: Startup
12/21/2014 8:35:17 PM | | Not using a proxy
12/21/2014 8:35:18 PM | | [work_fetch] ------- start work fetch state -------
12/21/2014 8:35:18 PM | | [work_fetch] target work buffer: 180.00 + 432000.00 sec
12/21/2014 8:35:18 PM | | [work_fetch] --- project states ---
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] REC 355175.655 prio -1.000 can request work
12/21/2014 8:35:18 PM | | [work_fetch] --- state for CPU ---
12/21/2014 8:35:18 PM | | [work_fetch] shortfall 5186160.00 nidle 12.00 saturated 0.00 busy 0.00
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] share 1.000
12/21/2014 8:35:18 PM | | [work_fetch] --- state for NVIDIA GPU ---
12/21/2014 8:35:18 PM | | [work_fetch] shortfall 1296540.00 nidle 3.00 saturated 0.00 busy 0.00
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] share 1.000
12/21/2014 8:35:18 PM | | [work_fetch] ------- end work fetch state -------
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] Starting scheduler request
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] request: CPU (1.00 sec, 12.00 inst) NVIDIA GPU (1.00 sec, 3.00 inst)
12/21/2014 8:35:18 PM | GPUGRID | Sending scheduler request: To fetch work.
12/21/2014 8:35:18 PM | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] CPU work request: 1.00 seconds; 12.00 devices
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] NVIDIA GPU work request: 1.00 seconds; 3.00 devices
12/21/2014 8:35:20 PM | GPUGRID | Scheduler request completed: got 0 new tasks
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Server version 613
12/21/2014 8:35:20 PM | GPUGRID | No tasks sent
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for ACEMD beta version
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for the applications you have selected.
12/21/2014 8:35:20 PM | GPUGRID | Project requested delay of 31 seconds
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/logogpugrid.png to projects/www.gpugrid.net/stat_icon
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_1.png to projects/www.gpugrid.net/slideshow_ga_00
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_1.png to projects/www.gpugrid.net/slideshow_cellmd_00
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_2.png to projects/www.gpugrid.net/slideshow_ga_01
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_2.png to projects/www.gpugrid.net/slideshow_cellmd_01
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_3.png to projects/www.gpugrid.net/slideshow_ga_02
12/21/2014 8:35:20 PM | GPUGRID | [slot] linked projects/www.gpugrid.net/project_3.png to projects/www.gpugrid.net/slideshow_cellmd_02
12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off CPU 580 sec
12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off NVIDIA GPU 312 sec
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Deferring communication for 00:00:31
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Reason: requested by project
12/21/2014 8:35:20 PM | | [work_fetch] Request work fetch: RPC complete
12/21/2014 8:35:52 PM | | [work_fetch] Request work fetch: Backoff ended for GPUGRID

As you can tell from the log, and as I had previously not mentioned but should have, is my usage of BOINC. I am not sure I made it clear in all that I have written or maybe I did but it was scattered across several posts:
The ONLY project I run with BOINC is GPUGRID. I have, since starting to troubleshoot, turned on all of the resources of my computer to the BOINC clinet, now knowing that the GPUGRID project has very little use for my CPUs, my memory (virtual or physical), my network resources, or my drive space and what little it does need, it has plenty to draw from without denting anything else at all. I do have a CPU intensive distributed project running from distributed.net and it does NOT use the BOINC client at all, as it is a separate install completely. The only other distributed project I ever worked on before distributed.net was the United Devices project that ran under several names such as grid.org, Intel's Crunch for the Cure, and UD.com/uniteddevices.org. That was pretty much the first ever publicly accessible distributed project and it also was not BOINC, but a stand-alone install. So in turn, this being my first and only BOINC project, I was not aware of project priorities (to add to your confusion of what might be causing my issue, but your input still helped and hopefully will continue [and conclude] immensely), how the work fetch even works, or why I was asked about my CPU availability when GPUGRID seems only to use a total of like 1% for each running task anyway.

So looking at the logs and the results after both debugs are turned on, it seems that is it asking for work for 3 GPUs and that it sees all 3 GPUs with both debugs and without any. I have not tinkered enough to see how many tasks it has actually stored into memory/hard drive to work on, but as I mention earlier in this thread, I once was able to "Suspend" or pause one taks and another one started. In that one instance, I was not able to pause anymore and get more to start. I assumed that was due to the fact that it knew I only had 3 GPUs, so it would not allow 4 active tasks, even if some are "Suspended", but now it may be because it only collected 3 tasks when fetching, so it only had 3 to work with until they were done. But the question still remains, why will only 2 GPUs work on tasks at one time even when it has 3 tasks to work on and knows I have 3 GPUs to work them on?

Additional: I have also, during the course of the past 2 days, uninstalled and reinstalled the BOINC client completely as to undo any tinkering and troubleshooting I had done. I know some information gets passed to the servers which in turn got passed back down to the client, but those things, I think, are more practical use than my experimental troubleshooting related.

Also, yeah, I did confirm that manually adding slots and copying files and running from the command line DOES return tasks that are either a complete error or cannot be validated. So yeah, learned my lesson there on that troubleshooting/tinkering escapade.

To answer one of your direct questions, and hopefully you have already figured out the answer
So, it is finding them. Is the concern that you have downloaded tasks that won't run? Or is the concern that it won't even download 3 tasks?
The answer is no. lol My issue is not tasks that download and never run. My issue is not with the client to fail to download 3 tasks. The issue is that it may download 3 tasks, but never runs on more than 2 tasks at one time. It will load 1 task on one GPU and then a second task on a second GPU and then not run a third task while those other 2 are running. So most times it is running (when not on holiday) 2 tasks (one each on 2 different GPUs) and never will it run 3 although occasionally it is only running 1 due to the fact that once both of the first 2 tasks run, the third task will want to complete before the client gets more tasks. So if it downloads 2, it will run those two until those 2 are done. If it downloads 3, it will finish all 3 before getting any more. If it downloads 1, then it will go get a second one, but will then, in turn, not get more until both are done. I hope that is clear on all the iterations I have witnessed. I realize now that it would seem my "min" is 1 and my "max" is 3 (based on the amount of resources available when the work_fetch does its evaluations). I also may not have changed (before the holiday slow down) the report_results_immediately, which may or may not have an effect on the work fetch process or just has to do with the way results are reported for 'scoring' purposes.

You say you are running BOINC 7.4.36, but according to http://boinc.berkeley.edu/download_all.php?xml=1 the recommended Windows 64-bit version is 7.4.27 and that is the version I am running. Should I find an update or run the 32bit version, which seems to have a higher version number in order to try to fix this? Or a Beta version that I don't have?

I realize that now, as the work units are tough to find out of GPUGRID, may not be the best time to get you in here to troubleshoot, as I am not working any tasks at all. When the holidays are over I will certainly be back all over this and ready to let someone take a personal look. If you think you can figure it out without actual tasks loaded, I am willing to have you take a look.

A question sort of off topic, but when I was doing the work for UD/grid.org, they would send out a minimum (with no max) on how many times any one work unit would be sent out. Many of them were probably run hundreds of times. The reason for this is error reduction in getting consistent results from different end users (reducing "jitter"), some end users would take too long or not return results at all, and during times (like this holiday) when they would all simply be away they would let the servers give out copies of the same tasks over and over. Why doesn't GPUGRID do this? The first answer that comes to mind is that BOINC has so many projects running that a great majority of the users could get active tasks from so many other sources that some time off from GPUGRID won't even go noticed. But I would think that as long as somebody somewhere wants to work on your project, keep feeding them work, even if just for validation and jitter reduction reasons.

Mike

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39253 - Posted: 22 Dec 2014 | 3:45:05 UTC - in response to Message 39252.
Last modified: 22 Dec 2014 | 3:52:25 UTC

Ok, nice answer :) I see you have a technical background, that's good. I'll give you answers that are hopefully at the right "level" of making sense for you. PS: Because of the GPUGrid work shortage, I won't be able to conclusively tell you what the problem is. But I am willing to troubleshoot this as long as it takes to help you solve it.

Here goes.

BOINC is primarily meant to set it and forget it. Configure your settings, attach to some projects, and let it do it's thing. The Advanced view has tabs for Projects and Tasks, and you should familiarize yourself with the buttons there. Especially the Tasks grid. When a GPU task is running, the task's status will include something like "Running (0.667 CPUs + 1 NVIDIA GPU (device 0))", telling you how much CPU it is budgeting as "used" for the GPU task, and which GPU it is running on.

I'd also recommend attaching to more projects than just GPUGrid, and setting up your "Use at most X% of the processors" computing preference to be equal to the number of threads you'd like to let BOINC manage.

Looking at your log posting, I see...

Starting BOINC client version 7.4.27
... If you'd like to upgrade to the latest release candidates (I do recommend 7.4.36), feel free to bookmark:
http://boinc.berkeley.edu/download_all.php

12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, CUDA version 6.5, compute capability 3.5, 3072MB, 2809MB available, 4576 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2665MB available, 4878 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4698 GFLOPS peak)
12/21/2014 8:35:17 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 780 (driver version 344.75, device version OpenCL 1.1 CUDA, 3072MB, 2809MB available, 4576 GFLOPS peak)
... perfectly seeing your 3 GPUs (note: driver 347.09 beta is available now, and appears to work fine in BOINC)

Damn nice machine! 12-threads, 64GB RAM, 3 GTX 780 GPUs -- #Jealous

12/21/2014 8:35:18 PM | | [work_fetch] ------- start work fetch state -------
12/21/2014 8:35:18 PM | | [work_fetch] target work buffer: 180.00 + 432000.00 sec
... I see your buffer settings are maintain at least 0 days (BOINC enforces a 3-minute low-water-mark to allow some time to ask projects for work, which is why you see 180.00 there)
... and allow an additional 432000secs = 5 days cache

12/21/2014 8:35:18 PM | | [work_fetch] --- project states ---
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] REC 355175.655 prio -1.000 can request work
... "can request work" is good. BOINC would say "can't request work" and give a reason, if a reason applied. Some reasons are: Project set to No New Tasks, Project set to Suspended, or one of the Project's tasks is suspended (that's right, it will not request more work from a Project, if you have a suspended task for that project).

12/21/2014 8:35:18 PM | | [work_fetch] --- state for CPU ---
12/21/2014 8:35:18 PM | | [work_fetch] shortfall 5186160.00 nidle 12.00 saturated 0.00 busy 0.00
... 12 CPUs, times 432180 high-water-mark-per-resource, equals 5186160.00 instance-seconds of shortfall. nidle (number idle) shows 12 idle CPUs.

12/21/2014 8:35:18 PM | | [work_fetch] --- state for NVIDIA GPU ---
12/21/2014 8:35:18 PM | | [work_fetch] shortfall 1296540.00 nidle 3.00 saturated 0.00 busy 0.00
... 3 GPUs, times 432180 high-water-mark-per-resource, equals 1296540.00 instance-seconds of shortfall. nidle (number idle) shows 3 idle NVIDIA GPUs.

12/21/2014 8:35:18 PM | | [work_fetch] ------- end work fetch state -------
... time for work fetch to decide if it should ask a project for work

12/21/2014 8:35:18 PM | GPUGRID | [sched_op] Starting scheduler request
12/21/2014 8:35:18 PM | GPUGRID | [work_fetch] request: CPU (1.00 sec, 12.00 inst) NVIDIA GPU (1.00 sec, 3.00 inst)
12/21/2014 8:35:18 PM | GPUGRID | Sending scheduler request: To fetch work.
12/21/2014 8:35:18 PM | GPUGRID | Requesting new tasks for CPU and NVIDIA GPU
... it has decided to ask GPUGRID for 12 CPU instances, and 3 NVIDIA GPU instances. I'm a bit curious why the sec values are only 1.00. I would have expected the full shortfalls on each. But this is okay for now.

12/21/2014 8:35:18 PM | GPUGRID | [sched_op] CPU work request: 1.00 seconds; 12.00 devices
12/21/2014 8:35:18 PM | GPUGRID | [sched_op] NVIDIA GPU work request: 1.00 seconds; 3.00 devices
... The "scheduler" (a BOINC Project server-side-process) received the request successfully.

12/21/2014 8:35:20 PM | GPUGRID | Scheduler request completed: got 0 new tasks
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Server version 613
12/21/2014 8:35:20 PM | GPUGRID | No tasks sent
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Short runs (2-3 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for ACEMD beta version
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for Long runs (8-12 hours on fastest card)
12/21/2014 8:35:20 PM | GPUGRID | No tasks are available for the applications you have selected.
12/21/2014 8:35:20 PM | GPUGRID | Project requested delay of 31 seconds
... and has replied that it has no tasks for either of the resources, per your selections. NOTE: GPUGrid actually DOES have CPU tasks now, with their "Test application for CPU MD" multi-threaded application. You could edit your web preferences, to turn on that app (might also have to check the "Run test applications?" checkbox too), if you'd like to run CPU tasks from GPUGrid. I'm sure they'd appreciate your CPU support.

12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off CPU 580 sec
12/21/2014 8:35:20 PM | GPUGRID | [work_fetch] backing off NVIDIA GPU 312 sec
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Deferring communication for 00:00:31
12/21/2014 8:35:20 PM | GPUGRID | [sched_op] Reason: requested by project
12/21/2014 8:35:20 PM | | [work_fetch] Request work fetch: RPC complete
... Because you needed work for CPU and didn't get any from this Project, BOINC enforces a semi-random "resource backoff timer" for that Project. 580 secs, in this case.
... Because you needed work for NVIDIA GPU and didn't get any, it also enforced a backoff of 312 secs for that resource.
... And the server said "don't come back here for at least 31 seconds please" :)

Sorry, but it is a bad time to be troubleshooting this with GPUGrid, since they are "on fumes" in terms of having GPU tasks available. See the Server Status "unsent" column here: http://www.gpugrid.net/server_status.php

Once you get 3 GPUGrid tasks in the Task grid, I'd like to see a screenshot of the behavior you saw. Are you sure you saw 3 GPUGrid GPU tasks listed in the Tasks grid, and BOINC was only running 2 of them? I don't know how to troubleshoot further without seeing the issue.

One other thing to note is that, I believe GPUGrid has a server-side rule that only allows "2 GPU tasks per GPU" to be on a client. For you, that means you should only ever see up-to-6 GPUGrid GPU tasks in your Task grid, regardless of your buffer settings.

Once you get 3-or-more tasks, host a screenshot and let us take a look. :) Then we might have to turn on more of those awesome debug flags to get geeky!

Oh, you asked about whether tasks get sent out multiple times. BOINC Projects set this up on a per-application basis; they can decide how many "instances" of a work unit are initially replicated, and can also set how many results must be in agreement before considering it completed, and how many error results should trigger abortion of the work unit. If you look at some tasks (here are mine: http://www.gpugrid.net/results.php?hostid=153764)... if you click the "Work Unit", you'll see that unit's values for "minimum quorum" (# results that must agree), "initial replication" (# tasks initially created/sent), and "max # of error/total/success tasks". GPUGrid does not do verification on their GPU apps, but most other project choose to verify their results with at least 2 successful returns.

I too used to run Distributed.net back in the day, but I go for the science research now. You might consider attaching BOINC to more projects, like World Community Grid, Citizen Science Grid, etc.

Merry Christmas,
Jacob

PS: Here are my list of running projects (I give WCG 4x the Resource Share as my others, and I have a couple 0-Resource-Share projects, which are "Backup Projects" to BOINC, meaning it will only get work from those if there is no other work from other projects)... and here's a what my Tasks grid looks like. I use an app_config.xml to tell BOINC to "consider 0.667 CPU budgeted per GPUGrid Task", and then I run with "Use at most 100% CPUs." That way, if BOINC runs 3 GPUGrid tasks, it automatically budgets ("frees up") 0.667*3=2.001 CPUs. Fun.

http://1drv.ms/13tRIpR
http://1drv.ms/1zTv9I5

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39254 - Posted: 22 Dec 2014 | 10:57:04 UTC - in response to Message 39221.

I am using nvidiaInspector's Overclocking options to do nothing but up the fan speed, but what would make you think they are too hot?


Just answering this question, if you click on a task you ran you can see the logs which include temperature:

http://www.gpugrid.net/result.php?resultid=13575551

Name I11R36-SDOERR_BARNA5-62-100-RND9235_0
Workunit 10453012
Created 21 Dec 2014 | 11:26:11 UTC
Sent 21 Dec 2014 | 11:26:33 UTC
Received 21 Dec 2014 | 17:17:11 UTC
Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 189656
Report deadline 26 Dec 2014 | 11:26:33 UTC
Run time 20,692.87
CPU time 2,424.98
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65)
Stderr output

<core_client_version>7.4.27</core_client_version>
<![CDATA[
<stderr_txt>
# GPU [GeForce GTX 780] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 780
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.5
# PCI ID : 0000:01:00.0
# Device clock : 1058MHz
# Memory clock : 3104MHz
# Memory width : 384bit
# Driver version : r343_00 : 34475
# GPU 0 : 43C
# GPU 1 : 34C
# GPU 2 : 27C
# GPU 0 : 47C
# GPU 0 : 50C
# GPU 0 : 53C
# GPU 0 : 55C
# GPU 0 : 58C
# GPU 0 : 60C
# GPU 0 : 63C
# GPU 0 : 64C
# GPU 0 : 66C
# GPU 0 : 67C
# GPU 0 : 68C
# GPU 0 : 70C
# GPU 0 : 71C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 0 : 77C
# GPU 0 : 78C
# GPU 1 : 35C
# GPU 2 : 28C
# GPU 0 : 79C
# GPU 0 : 80C
# GPU 1 : 36C
# GPU 1 : 37C
# GPU 0 : 81C
# GPU 2 : 29C
# GPU 0 : 82C
# GPU 1 : 38C
# GPU 0 : 83C
# GPU 1 : 44C
# GPU 2 : 30C
# GPU 1 : 46C
# GPU 1 : 47C
# GPU 0 : 84C
# BOINC suspending at user request (exit)
# GPU [GeForce GTX 780] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 780
# ECC : Disabled
# Global mem : 3072MB
# Capability : 3.5
# PCI ID : 0000:01:00.0
# Device clock : 1058MHz
# Memory clock : 3104MHz
# Memory width : 384bit
# Driver version : r343_00 : 34475
# GPU 0 : 66C
# GPU 1 : 42C
# GPU 2 : 30C
# GPU 0 : 68C
# GPU 0 : 70C
# GPU 0 : 71C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 74C
# GPU 0 : 75C
# GPU 0 : 76C
# GPU 1 : 43C
# GPU 0 : 77C
# GPU 0 : 78C
# GPU 2 : 31C
# GPU 0 : 79C
# GPU 0 : 80C
# GPU 1 : 44C
# GPU 2 : 32C
# GPU 1 : 45C
# GPU 2 : 33C
# GPU 2 : 34C
# GPU 1 : 46C
# GPU 2 : 38C
# GPU 2 : 43C
# GPU 2 : 46C
# GPU 2 : 49C
# GPU 2 : 51C
# GPU 2 : 54C
# GPU 2 : 56C
# GPU 2 : 58C
# GPU 2 : 60C
# GPU 2 : 61C
# GPU 2 : 63C
# GPU 1 : 49C
# GPU 2 : 64C
# GPU 0 : 81C
# GPU 1 : 57C
# GPU 2 : 66C
# GPU 1 : 61C
# GPU 2 : 67C
# GPU 1 : 64C
# GPU 2 : 68C
# GPU 1 : 68C
# GPU 2 : 69C
# GPU 1 : 71C
# GPU 2 : 70C
# GPU 0 : 82C
# GPU 1 : 74C
# GPU 2 : 71C
# GPU 1 : 76C
# GPU 2 : 72C
# GPU 1 : 77C
# GPU 0 : 83C
# GPU 1 : 78C
# GPU 1 : 79C
# GPU 2 : 73C
# GPU 0 : 84C

# GPU 1 : 80C
# GPU 0 : 85C
# GPU 0 : 86C
# GPU 0 : 87C
# GPU 0 : 88C
# GPU 0 : 89C
# GPU 0 : 90C
# GPU 0 : 91C
# GPU 0 : 92C
# GPU 0 : 93C
# GPU 0 : 94C
# GPU 0 : 95C

# GPU 1 : 81C
# GPU 1 : 82C
# GPU 0 : 96C
# GPU 1 : 83C
# GPU 2 : 74C
# GPU 1 : 84C
# GPU 2 : 75C
# GPU 1 : 85C
# GPU 2 : 76C
# GPU 1 : 86C
# GPU 2 : 77C
# GPU 1 : 87C
# GPU 1 : 88C
# Time per step (avg over 3675000 steps): 5.520 ms
# Approximate elapsed time for entire WU: 20700.631 s
# PERFORMANCE: 87466 Natoms 5.520 ns/day 0.000 ms/step 0.000 us/step/atom
12:15:44 (6544): called boinc_finish

Outcome Validate error


While Boinc might not be seeing the GPU's the GPUGrid App clearly sees all 3 GPU's; GPU 0, 1 and 2 are underlined above.
Whatever the problem there you are not sufficiently cooling all the GPU's. 95C or 96C is dangerously high IMO and my primary concern would be that you could damage your GPU's or other hardware. I suggest you start by working safely - hard drives don't like being cooked, neither do motherboards, RAM modules...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39256 - Posted: 22 Dec 2014 | 13:30:26 UTC
Last modified: 22 Dec 2014 | 13:32:09 UTC

I agree with skgiven -- Your GPUs are getting too hot. For a GTX 780, althought it is rated to support up-to-100*C, that is it's "absolute maximum operating temperature" (TjMax). It actually starts thermal downclocking at 80*C, since GPU Boost v2.0 GPUs use an 80*C threshold. And 80*C is generally about the "maximum comfort zone" for a GPU that you want to take care of.

I'd investigate your cooling, and then consider using a program like Precision X or Afterburner to enforce a custom fan curve that keeps it below 80*C.

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39259 - Posted: 22 Dec 2014 | 17:58:27 UTC - in response to Message 39256.

Well, I have good news. If the current gold version of the BOINC client does have a bug to not use all 3 GPUs, the .36 version does not. I updated the driver and the BOINC and now it is crunching n 3 GPUs with 3 tasks!

Funny that this whole conversation could have been solved (but I would not have learned as much) if someone just gave me the link to 7.4.36 and said, "Here, try this." lol

I will check on that heat thing.

Thank you thank you thank you so much for all your help. (I will not close the door on this until I see this repeated, but I will operate as if it is.)

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39263 - Posted: 22 Dec 2014 | 19:26:01 UTC - in response to Message 39259.

Both your GPUs 0 & 1 are hitting 95-96C I wouldn't take too long to try to solve the temp problem.

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39265 - Posted: 22 Dec 2014 | 20:06:32 UTC - in response to Message 39263.

OK done, using Afterburner.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39267 - Posted: 22 Dec 2014 | 22:47:25 UTC - in response to Message 39265.
Last modified: 23 Dec 2014 | 12:46:59 UTC

Glad to hear you got things working and your priorities sorted out, hopefully.
The 780's are probably quite difficult to keep cool (and quiet), but it's well worth it.

- Make sure you set a fan profile for each/all of the cards.

PS. Boinc not seeing the GPU was probably down to the driver or card swapping:
In layman's terms it's not Boinc's fault as Boinc only reads what the drivers tell it. The ACEMD app doesn't ask Boinc, it looks & reports, but as GPUGrid uses Boinc it can still only use the resources Boinc says are available!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39398 - Posted: 6 Jan 2015 | 6:44:58 UTC

This seems like a new topic, but it also seems like something those already responding here can answer for me. Sorry if this is answered all over the forums and for my own lack of investigation. I could not find anything with a few searches of the terms I was looking for.

I don't quite understand how the GPU works compared to a CPU with shaders and all that goes with them. With that in mind, I found that my laptop has a mobile version of an NVidia card and was able to run GPUGrid on it. After it started running, I noticed that it is running at 78 degrees, which is fine, but it is running at 97% GPU load also. This struck me as odd since the GPU loads on the 780s in the big rig run at loads in the mid to high 70s. Since my last post, I found a new cooling solution for that rig (it being super cold out now) with a window open and a room fan aimed at it. And since that is keeping the temps in check without limiting with Afterburner and knowing overclocking a GPU will corrupt GPUGrid results, I want to know how to push the GPUs to use closer to 100% load. I do have it running cooler and can increase the cooling to make it run even cooler yet, if need be.

1) The same GPU load is reported by Afterburner, OpenHardwareMonitor, and NVidiaInspector, but is that reported load correct according to actual load or is it just an semi-accurate approximate?
2) Is the reported load of 7x% actually close to 100% load but with things I don't understand (like shaders, VPU, MCU, Bus speed, Memory speed, "Boost", etc.), it is showing me a number that looks much lower?
3) Is the GPU load a task specific item? I only ask this because I do see some tasks in the 60s and some in the 80s.
4) If it really is running in the 70s and 80s, how can I get it to run the full GPU load?

If you would like to answer simple answers, I can take that or if you can give or point me to definitive detailed answers, I would appreciate those.

Thanks again,
Mike
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39399 - Posted: 6 Jan 2015 | 10:54:58 UTC - in response to Message 39398.
Last modified: 6 Jan 2015 | 12:14:30 UTC

Mike, there are four main things to consider here; the system architecture (especially GPU), the WU/app, exactly what is GPU load/utilization/usage and what else you are doing on the system.

Your systems are both of high spec, so there is no obvious bottlenecks or performance differences that would impact GPU usage.
The Quadro K2100M is a GK106 with a 576:48:16 core config (Unified Shaders : Texture mapping units : Render output units).
It's often the case that a smaller core results in greater GPU utilization (according to the apps/tools you mentioned).
Compared to your 780 (2304:192:48), your K2100M is basically 1/4 the size and has a slightly higher Render Output Unit ratio, but it also has a 128bit bus compared to the 780's 384 thats relatively high (1/3rd) and I expect it results in a reduced memory controller load for the Quadro.

GPUGrid factors that change GPU utilization are the Work Unit type; some result in a higher GPU utilization than others. Incidentally, the WU's that utilise the GPU less tend to use less power. It's likely that these WU's simply don't require to utilize the GPU as much. Different WU types use different amounts of GDDR, different amounts of the CPU and impact on the Memory Controller Load differently too. Only the researchers could tell you exactly how each WU utilizes your GPU (especially the shaders) differently and why some tasks use more CPU, GDDR or bandwidth, but it stems from the molecules being examined; the number of atoms.
Note that the relative performance of different cards varies with the WU type, making it more difficult to compare GPU's. Different apps can also impact performance, though in your case the apps are the same.

Exactly what is being measured to determine GPU utilization by the tools is a good question, but not one I can answer. I know that the NVidia facilitate this, so perhaps they would be the ones to ask.

If you are using a system heavily it will negate performance somewhat. If you are trying to run 10 CPU apps and 3 GPU apps for example the GPU performance will drop off a bit.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ecafkid
Send message
Joined: 31 Dec 10
Posts: 4
Credit: 1,359,947,817
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 39400 - Posted: 6 Jan 2015 | 12:48:54 UTC - in response to Message 39398.

In order for me to get better GPU utilization added a file in C:\ProgramData\BOINC\projects\www.gpugrid.net called app_config.xml with. I have 3 GPU's in the machine. This lets you run two WU's per GPU with each getting half. My GPU utilization runs around 98% on each GPU.


<?xml version="1.0"?>

-<app_config>


-<app>

<name>acemdlong</name>

<max_concurrent>6</max_concurrent>


-<gpu_versions>

<gpu_usage>.5</gpu_usage>

<cpu_usage>1</cpu_usage>

</gpu_versions>

</app>

</app_config>

mikey
Send message
Joined: 2 Jan 09
Posts: 291
Credit: 2,044,691,115
RAC: 10,281,271
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39401 - Posted: 6 Jan 2015 | 12:58:19 UTC - in response to Message 39398.
Last modified: 6 Jan 2015 | 12:58:48 UTC

I want to know how to push the GPUs to use closer to 100% load. I do have it running cooler and can increase the cooling to make it run even cooler yet, if need be.

Thanks again,
Mike


A simple app_config.xml file will let your run multiple gpu units on one cpu at once, thereby utilizing your gpu to it's max. The problem will come in that since you are then pushing your gpu to work harder, the units will EACH take longer and you may not get an 'bonus' credits for finishing the units within the shorter times you are now. A unit is using about 70% of your gpu right now, that means thee is not enough room to load a full 2nd unit so it will have to be sharing some of the gpu's resources to run, slowing down each one. At most projects that isn't a problem as there are no 'bonus' credits for finishing units faster, but here there are. I did not even address the heat issue of pushing your gpu harder!

All this comes down to you have a gpu that has more capability than what the programmers designed their software to run on, and it is just cruising thru the units, while the rest of us with our older gpu's are struggling. You are at the tippy top of the spear right now, in a few years, when the rest of us upgrade to something even better, we will pass you on by and you will be the one struggling, enjoy your time out front while you can, it will end! I for one am envious, but shopping!

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39402 - Posted: 6 Jan 2015 | 19:42:23 UTC
Last modified: 6 Jan 2015 | 19:45:02 UTC

Is the GPU load a task specific item? I only ask this because I do see some tasks in the 60s and some in the 80s.

Yes. It depends on:

- The speed of your GPU. The faster it is, the lower its load (since there are always small pauses where CPU support is needed)

- The number of atoms being simulated, i.e. the complexity of the WU. This can be seen in the task output in your profile.

- The physical model choose by the scientist. The more work the GPU has to do before it needs CPU support again, the less pauses occur per second.

Edit regarding running multiple concurrent WUs: if you're running at 85% GPU load or better, there's little benefit for even a performance loss from doing it. Below 80% load throughput improves. Those numbers are not exact, but the turning point is somewhere between them.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39491 - Posted: 16 Jan 2015 | 1:36:20 UTC - in response to Message 39402.

I found a way to get over 90% of the GPUs working. Since Dnetc is not a BOINC project, I set the Dnetc GPU priority to 2 and it took up the slack from GPUGrid. Unfortunately there is no happy medium and it is slowing down the GPUGrid project, but my BOINC stats went from 1,500,000 to 850,000 a day round abouts, but my Dnetc stats went from 7,000 a day to over 50,000. I wish I could get some happy place where the DNetc only took the idle that GPUGrid doesn't touch, but any other priority either takes all the GPU or almost none of it. Cancers and other diseases are my moral priority, but getting the GPU as close to 100% is why I spent the money on the cards making it the financial priority for the moment. And the extra stats on the other side where I've spent almost 10 years doesn't hurt either. Maybe if I can think about it one day, I can figure out a way to make BOINC and Dnetc play nicer with each other.
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org

Profile caffeineyellow5
Avatar
Send message
Joined: 30 Jul 14
Posts: 225
Credit: 2,658,976,345
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwat
Message 39535 - Posted: 20 Jan 2015 | 3:51:19 UTC - in response to Message 39491.

OK, so I see that the distributed.net GPU client can set its GPU runtime priority. I can't find any information out there on how to set the GPU priority in Windows like you can with the CPU. Does BOINC or the GPUGrid have any settings that can change the GPU runtime priority of it? Since I find that I need to run the DNetc at level 2, running BOINC GPUGrid at level 3 would then allow for BOINC to use as much of the GPU as it can force, then the DNetc will take the remaining percentage. If BOINC/GPUGrid does not have any way of changing or forcing this, does anyone know how to change it per task at the Windows OS level.

When I want to change the CPU runtime, I use Process Explorer. It is like Task Manager on steroids. It allows manipulation at program level, OS level, and CPU level and gives usage graphs and statistics. Now this also comes with "System Information" that gives graphs of your usages. Drilling into them, you can view your GPU engines. But even this program that can change all things program and CPU/OS level, it seems to not be able to manipulate the GPU in the same way (yet maybe.) Anyone? Anything? Even speculation if there is no clear answer.
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)
http://tbc-pa.org

Post to thread

Message boards : Graphics cards (GPUs) : GPUs not being used?

//