Author |
Message |
|
Hello fellow crunchers!
I am pretty new to crunching, started 4 days ago with Rosetta@Home and GPUGRID, contributing to grcpool. I mainly left the PC alone and it crunched almost 24h every day and I didn't really pay attention to BOINC.
Recently I realized that a GPUGRID task put out a computation error at 100% and I looked further into it. I looked through the log and to my surprise, literally every WU was being cancelled and spit out a computation error.
The logs look like this:
26.03.2020 13:09:09 | GPUGRID | Computation for task 3e3xA02_320_1-TONI_MDADpr4se-2-10-RND2806_0 finished
26.03.2020 13:09:09 | GPUGRID | Output file 3e3xA02_320_1-TONI_MDADpr4se-2-10-RND2806_0_0 for task 3e3xA02_320_1-TONI_MDADpr4se-2-10-RND2806_0 absent
26.03.2020 13:09:09 | GPUGRID | Output file 3e3xA02_320_1-TONI_MDADpr4se-2-10-RND2806_0_9 for task 3e3xA02_320_1-TONI_MDADpr4se-2-10-RND2806_0 absent
26.03.2020 13:10:40 | GPUGRID | Sending scheduler request: To report completed tasks.
26.03.2020 13:10:40 | GPUGRID | Reporting 1 completed tasks
26.03.2020 13:10:40 | GPUGRID | Requesting new tasks for NVIDIA GPU
Another example:
26.03.2020 14:28:29 | GPUGRID | Computation for task 3vu1B03_348_1-TONI_MDADpr4sv-2-10-RND7340_0 finished
26.03.2020 14:28:29 | GPUGRID | Output file 3vu1B03_348_1-TONI_MDADpr4sv-2-10-RND7340_0_0 for task 3vu1B03_348_1-TONI_MDADpr4sv-2-10-RND7340_0 absent
26.03.2020 14:28:29 | GPUGRID | Output file 3vu1B03_348_1-TONI_MDADpr4sv-2-10-RND7340_0_9 for task 3vu1B03_348_1-TONI_MDADpr4sv-2-10-RND7340_0 absent
26.03.2020 14:28:29 | GPUGRID | Starting task 3ncvA01_379_0-TONI_MDADpr4sn-2-10-RND1169_0
This occurs with almost every WU, finding one in the log that (seemingly) finished correctly was harder than to find one which failed. The log then goes from task started to computation for task finished to the upload.
Maybe it is helpful to mention that ever GPUGRID WU takes about 1 to 1,5 hour to complete, while Rosetta's WUs take from 5 to 10 hours.
This PC has an AMD Ryzen 5 3600 and a NVidia GeForce RTX 2070, so a pretty recent and decent hardware. I have 16 GB RAM, almost 1 TB of SSD space free and am running Windows 10 Education 64 bit.
At grcpool, I assigned this host to Rosetta@Home and GPUGRID, both are at 100% ressource sharing. In the hosts section, I have a link at Rosetta@Home which redirects me to "task details", this link is not present at GPUGRID.
At the hosts overview, on the left of the PC's name there is a dropdown-menu with a "1", clicking it displays my contribution for Rosetta@Home; no GPUGRID there.
I have another host, my notebook with an i7 4710HQ and a GTX 850M, here the "task details"-link is present at both projects and both projects are shown at the overview.
I don't want my GPU to constantly compute at 100% power, but just producing errors. Am I doing something wrong? Is my hardware not supported? Did I just configure something wrong?
|
|
|
|
I think I may have fixed it. I noticed a message that the GPUGRID host already exists in the pool. I figured it had to do with the name of my computer. I removed my host in grcpool, cleaned my PC from everything BOINC-related, renamed my PC and restarted. I set up BOINC again, connected my PC with its new name to the grcpool, attached the projects and now didn't get the message that the host already exists in the pool. In grcpool, I now also have the link to the task details at GPUGRID, so I assume it is attached now correctly.
Will now leave it working overnight, tomorrow will show if everything is set up right now. |
|
|
Aurum Send message
Joined: 12 Jul 17 Posts: 401 Credit: 16,755,010,632 RAC: 180,567 Level
Scientific publications
|
Good work Kaddaman. My 1080s & 2080s don't seem to be having any problems. I assume you have the latest Nvidia graphics driver circa 444.75. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1341 Credit: 7,678,783,251 RAC: 13,432,718 Level
Scientific publications
|
Some of your errors are caused by BOINC's "finish file present" error.
This is because of your old client. You would need to get a newer client to eliminate those errors where the bug has been fixed.
The error is caused by your computer being too busy so BOINC can't clean up its files when stopping itself or when you shut the computer down.
You could try the latest BOINC client.
https://boinc.berkeley.edu/dl/boinc_7.16.5_windows_x86_64.exe |
|
|
ToniVolunteer moderator Project administrator Project developer Project tester Project scientist Send message
Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level
Scientific publications
|
Remarkable detective-ing, people |
|
|
|
Even after setting everything up, I got some computing errors. I am thinking the WUs were somehow bad since I absolutely can't imagine what should have been wrong with my PC.
Anyway, I gathered some GRC and started solo crunching. Everything works perfectly now, I didn't see a computing error since I started.
Some of your errors are caused by BOINC's "finish file present" error.
This is because of your old client. You would need to get a newer client to eliminate those errors where the bug has been fixed.
The error is caused by your computer being too busy so BOINC can't clean up its files when stopping itself or when you shut the computer down.
You could try the latest BOINC client.
https://boinc.berkeley.edu/dl/boinc_7.16.5_windows_x86_64.exe
I don't know...This version is a developement version and - according to the boinc website - may be unstable and should only be used for testing. I've read some posts which are telling to install the - also according to the boinc website - recommended version (7.14.2).
Since I started crunching by myself, I didn't see any Error, it seems to be working now. If something hits the fan again, I will try the development version. |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1341 Credit: 7,678,783,251 RAC: 13,432,718 Level
Scientific publications
|
That is just the usual disclaimer to cover their butt. It means there may be as yet undiscovered bugs in the "test" version. The use at your discretion disclaimer applies.
If it has made it do the download page, there are no "showstopper" bugs or they would have pulled it immediately.
They did in fact pull the 7.16.4 version immediately because of a showstopper bug. The 7.16.5 is the bugfixed version.
The "test" versions are normally perfectly fine and usable. Desirable in fact because they have the most current fixes in place.
Like the fix for the "finish file present" bug that is in 7.14.2.
I run the latest 7.17.0 code branch with no issues. I have always run the latest code branch with no issues. I like testing for the developers to find the bugs which I always seem to find the rare corner cases to raise a bug issue for.
If you want to see the current state of development, read the issues tab at:
https://github.com/BOINC/boinc/issues |
|
|
Keith Myers Send message
Joined: 13 Dec 17 Posts: 1341 Credit: 7,678,783,251 RAC: 13,432,718 Level
Scientific publications
|
I am thinking the WUs were somehow bad
Yes, there have been a few badly formatted tasks lately.
If in doubt over whether your hardware is acting up, just look at the WU on the tasks page and see how many times the task has been resent because of errors on other hosts. If the task has been resent 7 times and errored out, it will be pulled from distribution because it is a bad task.
https://www.gpugrid.net/workunit.php?wuid=18610278 |
|
|