Advanced search

Message boards : Graphics cards (GPUs) : Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

Author Message
Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 83
Credit: 1,564,689,193
RAC: 185
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57382 - Posted: 26 Sep 2021 | 15:09:52 UTC

3090 FE.
Driver Date Aug. 27th, 2021

----------------------------------
Name e1s247_I282-ADRIA_AdB_KIXCMYB_HIP-1-2-RND4280_1
Workunit 27079548
Created 26 Sep 2021 | 7:33:50 UTC
Sent 26 Sep 2021 | 7:33:56 UTC
Received 26 Sep 2021 | 7:36:06 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 195 (0xc3) EXIT_CHILD_FAILED
Computer ID 140554
Report deadline 1 Oct 2021 | 7:33:56 UTC
Run time 10.12
CPU time 0.00
Validate state Invalid
Credit 0.00
Application version New version of ACEMD v2.18 (cuda101)
Stderr output
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
00:34:25 (21456): wrapper (7.9.26016): starting
00:34:25 (21456): wrapper: running bin/acemd3.exe (--boinc --device 0)
ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

00:34:28 (21456): bin/acemd3.exe exited; CPU time 0.000000
00:34:28 (21456): app exit status: 0x1
00:34:28 (21456): called boinc_finish(195)
0 bytes in 0 Free Blocks.
268 bytes in 4 Normal Blocks.
1144 bytes in 1 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 0 bytes.
Total allocations: 190200 bytes.
Dumping objects ->
{323252} normal block at 0x0000018D079E9B30, 126 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
..\api\boinc_api.cpp(309) : {323249} normal block at 0x0000018D079A6B10, 8 bytes long.
Data: < &#149; > 00 00 95 07 8D 01 00 00
{322607} normal block at 0x0000018D079E9A70, 126 bytes long.
Data: <<project_prefere> 3C 70 72 6F 6A 65 63 74 5F 70 72 65 66 65 72 65
{321996} normal block at 0x0000018D079A6E80, 8 bytes long.
Data: <&#192;&#196;&#158; > C0 C4 9E 07 8D 01 00 00
..\zip\boinc_zip.cpp(122) : {147} normal block at 0x0000018D079ADD40, 260 bytes long.
Data: < > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{134} normal block at 0x0000018D079A7290, 16 bytes long.
Data: <p&#171;&#154; > 70 AB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{133} normal block at 0x0000018D079AAB70, 40 bytes long.
Data: < r&#154; conda-pa> 90 72 9A 07 8D 01 00 00 63 6F 6E 64 61 2D 70 61
{126} normal block at 0x0000018D079AA9B0, 48 bytes long.
Data: <--boinc --device> 2D 2D 62 6F 69 6E 63 20 2D 2D 64 65 76 69 63 65
{125} normal block at 0x0000018D079A6930, 16 bytes long.
Data: <8&#236;&#154; > 38 EC 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{124} normal block at 0x0000018D079A7330, 16 bytes long.
Data: < &#236;&#154; > 10 EC 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{123} normal block at 0x0000018D079A76A0, 16 bytes long.
Data: <&#232;&#235;&#154; > E8 EB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{122} normal block at 0x0000018D079A68E0, 16 bytes long.
Data: <&#192;&#235;&#154; > C0 EB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{121} normal block at 0x0000018D079A6F20, 16 bytes long.
Data: < &#235;&#154; > 98 EB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{120} normal block at 0x0000018D079A7600, 16 bytes long.
Data: <p&#235;&#154; > 70 EB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{119} normal block at 0x0000018D079A6F70, 16 bytes long.
Data: <P&#235;&#154; > 50 EB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{118} normal block at 0x0000018D079A6FC0, 16 bytes long.
Data: <(&#235;&#154; > 28 EB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{117} normal block at 0x0000018D079A6840, 16 bytes long.
Data: < &#235;&#154; > 00 EB 9A 07 8D 01 00 00 00 00 00 00 00 00 00 00
{116} normal block at 0x0000018D079AEB00, 496 bytes long.
Data: <@h&#154; bin/acem> 40 68 9A 07 8D 01 00 00 62 69 6E 2F 61 63 65 6D
{66} normal block at 0x0000018D079A6890, 16 bytes long.
Data: < &#234;f&#232;&#247; > 80 EA 66 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{65} normal block at 0x0000018D079A75B0, 16 bytes long.
Data: <@&#233;f&#232;&#247; > 40 E9 66 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{64} normal block at 0x0000018D079A6D40, 16 bytes long.
Data: <&#248;Wc&#232;&#247; > F8 57 63 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{63} normal block at 0x0000018D079A7560, 16 bytes long.
Data: <&#216;Wc&#232;&#247; > D8 57 63 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{62} normal block at 0x0000018D079A6B60, 16 bytes long.
Data: <P c&#232;&#247; > 50 04 63 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{61} normal block at 0x0000018D079A6A20, 16 bytes long.
Data: <0 c&#232;&#247; > 30 04 63 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{60} normal block at 0x0000018D079A71F0, 16 bytes long.
Data: <&#224; c&#232;&#247; > E0 02 63 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{59} normal block at 0x0000018D079A7740, 16 bytes long.
Data: < c&#232;&#247; > 10 04 63 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{58} normal block at 0x0000018D079A6AC0, 16 bytes long.
Data: <p c&#232;&#247; > 70 04 63 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
{57} normal block at 0x0000018D079A6C00, 16 bytes long.
Data: < &#192;a&#232;&#247; > 18 C0 61 E8 F7 7F 00 00 00 00 00 00 00 00 00 00
Object dump complete.

</stderr_txt>
]]>

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 808
Credit: 1,077,149,831
RAC: 985,453
Level
Met
Scientific publications
watwatwatwatwat
Message 57384 - Posted: 26 Sep 2021 | 16:53:51 UTC - in response to Message 57382.

Known issue. The CUDA101 app will fail on Ampere cards.
See this thread. https://www.gpugrid.net/forum_thread.php?id=5246

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 83
Credit: 1,564,689,193
RAC: 185
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57386 - Posted: 26 Sep 2021 | 18:43:25 UTC

Maybe I'm missing some context, but the link shows that issue had been fixed and does not mention my error code specifically.

This host has run several successful tasks since then, but perhaps they were another GPUGrid application.

I'm surprised there are still known issues with Ampere cards.
While getting mine was a struggle, the architecture has released for over a year at this point.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 808
Credit: 1,077,149,831
RAC: 985,453
Level
Met
Scientific publications
watwatwatwatwat
Message 57388 - Posted: 26 Sep 2021 | 18:57:27 UTC - in response to Message 57386.
Last modified: 26 Sep 2021 | 18:58:28 UTC

The thread does in fact mention exactly the error message title of this thread in the latest posts.
https://www.gpugrid.net/forum_thread.php?id=5246&nowrap=true#57363

ACEMD failed:
Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)


The CUDA1121 application runs fine on Ampere cards. Only when the scheduler sends a task assigned with the CUDA 101 application do the tasks fail.

The issue is that the driver level does not match the CUDA101 application.

Simplest solution is to remove the CUDA101 app from the scheduler and force all hosts to use the CUDA1121 application which requires minimum CUDA 11.2 level of drivers.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1942
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57412 - Posted: 1 Oct 2021 | 10:11:17 UTC - in response to Message 57388.

We have now changed the scheduler, let's see if now it's better.

gdf

Profile PDW
Send message
Joined: 7 Mar 14
Posts: 12
Credit: 909,754,286
RAC: 1,672,438
Level
Glu
Scientific publications
watwatwatwatwat
Message 57418 - Posted: 1 Oct 2021 | 13:50:46 UTC - in response to Message 57412.

We have now changed the scheduler, let's see if now it's better.

gdf

Is this a result of the scheduler changes or something else ?

The result http://gpugrid.net/result.php?resultid=32646962 failed (see below) to launch CUDA which isn't surprising as the host doesn't show a GPU. Host: http://gpugrid.net/show_host_detail.php?hostid=514156


New version of ACEMD v2.18 (cuda1121)
Stderr output

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
process got signal 67</message>
<stderr_txt>
14:40:06 (57305): wrapper (7.7.26016): starting
14:40:06 (57305): wrapper (7.7.26016): starting
14:40:06 (57305): wrapper: running /bin/tar (xf conda-pack.tar.bz2)
14:42:47 (57305): /bin/tar exited; CPU time 127.344146
14:42:47 (57305): wrapper: running bin/acemd3 (--boinc --device 0)
ACEMD failed:
Error invoking kernel: CUDA_ERROR_LAUNCH_FAILED (719)
19:16:23 (57305): bin/acemd3 exited; CPU time 6047.267986
19:16:23 (57305): app exit status: 0x1
19:16:23 (57305): called boinc_finish(195)

</stderr_txt>
]]>

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57424 - Posted: 2 Oct 2021 | 8:49:10 UTC

I'm seeing my Linux machines receive the cuda1121 plan class more consistently, but my Windows machines receive cuda101 - I don't think I've ever seen cuda1121 under Windows.

Cards are from the same range (GTX 1660), and drivers are up-to-date - Linux 470.63, Windows 472.12

Post to thread

Message boards : Graphics cards (GPUs) : Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)