Advanced search

Message boards : Number crunching : All WUs terminating with error

Author Message
SJC_Steve
Send message
Joined: 31 Oct 12
Posts: 19
Credit: 184,741,704
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38929 - Posted: 16 Nov 2014 | 17:36:39 UTC

My cruncher has produced ~70 terminated WUs in quick succession with the following STDERR output;

Stderr output
<core_client_version>7.4.23</core_client_version>
<![CDATA[
<message>
process exited with code 212 (0xd4, -44)
</message>
<stderr_txt>

</stderr_txt>
]]>

Here's the BOINC start-up messages showing it's starting state;

1: 16-Nov-2014 09:58:24 (low) [] Starting BOINC client version 7.4.23 for x86_64-pc-linux-gnu
2: 16-Nov-2014 09:58:24 (low) [] log flags: file_xfer, sched_ops, task
3: 16-Nov-2014 09:58:24 (low) [] Libraries: libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
4: 16-Nov-2014 09:58:24 (low) [] Data directory: /var/lib/boinc-client
5: 16-Nov-2014 09:58:24 (low) [] CUDA: NVIDIA GPU 0: GeForce GTX 650 Ti BOOST (driver version 331.38, CUDA version 6.0, compute capability 3.0, 1023MB, 996MB available, 1746 GFLOPS peak)
6: 16-Nov-2014 09:58:24 (low) [] Host name: stippy
7: 16-Nov-2014 09:58:24 (low) [] Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz [Family 6 Model 15 Stepping 11]
8: 16-Nov-2014 09:58:24 (low) [] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm lahf_lm dtherm tpr_shadow vnmi flexpriority
9: 16-Nov-2014 09:58:24 (low) [] OS: Linux: 3.13.0-39-generic

Any idea on how to resolve this and what is error 212?

Thanks,
Steve

SJC_Steve
Send message
Joined: 31 Oct 12
Posts: 19
Credit: 184,741,704
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38930 - Posted: 16 Nov 2014 | 19:22:31 UTC - in response to Message 38929.
Last modified: 16 Nov 2014 | 19:23:18 UTC

Interesting update on my failed GPUGRID WUs. I stopped WUs for GPUGRID and loaded Einstein@Home. So far, the machine is crunching E@H GPU WUs with no errors while the GPUGRID WUs were failing immediately. Anyone else having issues?
Steve

sis651
Send message
Joined: 25 Nov 13
Posts: 66
Credit: 193,925,538
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 38931 - Posted: 16 Nov 2014 | 19:56:11 UTC - in response to Message 38930.

Same problem here. Probably it is related to Nvidia 331.38 driver. They were going to drop support for it. We should update to newest drivers but the drivers in the repos have problems with CUDA and Boinc.

CPU tasks are crunching but when I restart Boinc all the progress is lost...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,249,865,968
RAC: 4,089,892
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38932 - Posted: 16 Nov 2014 | 22:48:00 UTC

I'm not experienced with Linux, but you should check the "Important news for Linux crunchers" thread.
You are right about that your drivers need to be updated.

sis651
Send message
Joined: 25 Nov 13
Posts: 66
Credit: 193,925,538
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwat
Message 38933 - Posted: 17 Nov 2014 | 0:07:50 UTC - in response to Message 38932.

I had and stopped crunching for Gpugrid. But after some days checked and it got CUDA42 jobs and crunched them succesfully. Now tasks are said to be cuda42 but ends with an error, probably they are cuda60 internally and need the driver update.

SJC_Steve
Send message
Joined: 31 Oct 12
Posts: 19
Credit: 184,741,704
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38938 - Posted: 17 Nov 2014 | 4:41:59 UTC - in response to Message 38933.

I thought the new drivers weren't required until 2015? My cruncher has been completing CUDA 42 WUs successfully until today. I've tried unsuccessfully for several hours to upgrade drivers to higher then 331 with no luck and was hoping to run CUDA 42 WUs through the end of 2014. For the time being I've stopped accepting work for this project until someone can find a solution to the CUDA 42 WUs terminating with errors. I'm fairly certain that it's a project issue as the cruncher is completing E@H GPU WUs successfully.
Thanks,
Steve

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38946 - Posted: 17 Nov 2014 | 21:44:04 UTC - in response to Message 38938.

I thought the new drivers weren't required until 2015? My cruncher has been completing CUDA 42 WUs successfully until today.

That's fairly strange, since your BOINC still reports it was using the same 8.03 CUDA 4.2 app as it was doing before. "Removing support for this" would mean to disable this app completely.

It could be something in the new WUs not being compatible with the old app. This would then probably be a mistake.

MrS
____________
Scanning for our furry friends since Jan 2002

fractal
Send message
Joined: 16 Aug 08
Posts: 87
Credit: 1,248,879,715
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38948 - Posted: 18 Nov 2014 | 3:32:45 UTC

I don't check in often so I upgraded my box earlier this month.


01-Nov-2014 16:02:50 [---] Starting BOINC client version 7.0.28 for x86_64-pc-linux-gnu
01-Nov-2014 16:02:50 [---] log flags: file_xfer, sched_ops, task
01-Nov-2014 16:02:50 [---] Libraries: libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
01-Nov-2014 16:02:50 [---] Running as a daemon
01-Nov-2014 16:02:50 [---] Data directory: /home/boinc/BOINC
01-Nov-2014 16:02:50 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz [Family 6 Model 42 Stepping 7]
01-Nov-2014 16:02:50 [---] OS: Linux: 3.2.0-29-generic
01-Nov-2014 16:02:50 [---] Memory: 15.57 GB physical, 15.89 GB virtual
01-Nov-2014 16:02:50 [---] Disk: 43.64 GB total, 37.49 GB free
01-Nov-2014 16:02:50 [---] Local time is UTC -7 hours
01-Nov-2014 16:02:50 [---] NVIDIA GPU 0: GeForce GTX 650 Ti (driver version unknown, CUDA version 6.50, compute capability 3.0, 134214656MB, 134214626MB available, 1646 GFLOPS peak)
01-Nov-2014 16:02:50 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 650 Ti (driver version 343.22, device version OpenCL 1.1 CUDA, 1024MB, 134214626MB available)

It completes the jobs in about the same 18 hrs as before but now it only uses 3% of a core instead of 98% of a core.

Consider giving the 343.22 driver a try. I had to search the nvidia site for a while to find it but I eventually found a 343 that was not listed as beta. The machine is running ubuntu 12.04.01 LTS server if that makes any difference. I like running the server without x-windows to save a few cycles.

Post to thread

Message boards : Number crunching : All WUs terminating with error

//