Advanced search

Message boards : Graphics cards (GPUs) : New version tasks failing on Windows hosts

Author Message
jjch
Send message
Joined: 10 Nov 13
Posts: 59
Credit: 14,591,077,215
RAC: 1,944,796
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57318 - Posted: 18 Sep 2021 | 3:13:03 UTC
Last modified: 18 Sep 2021 | 3:15:51 UTC

I'm getting a bunch of new work for my Windows hosts but they are all failing with Error while computing. Here is a sample stderr log.

https://www.gpugrid.net/result.php?resultid=32640734

It's happening on both the cuda101 and 1121 types.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57319 - Posted: 18 Sep 2021 | 8:39:13 UTC - in response to Message 57318.

You're getting error code 0xc0000135, which officially means 'status_dll_not_found'.

So, you're missing an essential system component. The internet will tell you that you need to re-install the 'dot Net' libraries, but that's wrong in this case: the GPUGrid application ('acemd3') won't - at least, shouldn't - use dot net.

We really need to find the name of the DLL that's missing, and find out whether it's just missing from your system, or a wider problem. It might be a driver problem, or it might be something needed by acemd3 itself.

If and when one of my Windows machines manages to catch one of these new applications, I can investigate further. There's a tool called dependency walker which can show you what's needed but missing, but it's a bit tecchie and requires experience to use it.

If anyone else has this problem, please chip in with whatever information you can find - especially the name of the missing DLL.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 404
Credit: 5,815,469,764
RAC: 1,007,028
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57320 - Posted: 18 Sep 2021 | 9:30:14 UTC - in response to Message 57318.

My first successful WU from this particular batch:

https://www.gpugrid.net/result.php?resultid=32640974

Output file 09 is very large, over 472 megs.


Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 459
Credit: 2,122,879,742
RAC: 886,440
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57321 - Posted: 18 Sep 2021 | 10:08:28 UTC - in response to Message 57320.

My first successful WU from this particular batch:

Watching at Hosts Ranking page, many other windows hosts are starting to succeed their tasks, and are quickly ascending in rank.
Nice to see that Windows hosts are recovered to contribute to the Project along with Linux ones!

Output file 09 is very large, over 472 megs.

Just to remember that if some file happened to exceed a 512 MB size, it would become stalled when uploading to server.
Ian&Steve C. mentioned it and was discussed at can't upload results. file size too big being blocked? thread.
I experienced this same problem in one of my hosts, Message #57194.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57322 - Posted: 18 Sep 2021 | 10:15:27 UTC - in response to Message 57321.

The file size limit is set on the server, but implemented by the client on your own machine. It's possible to inspect the current value in client_state.xml, and even change it if you catch it soon enough.

But the ultimate fix would have to be made by the project, so it's worth reporting if you see this error happening again.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 459
Credit: 2,122,879,742
RAC: 886,440
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57323 - Posted: 18 Sep 2021 | 10:17:17 UTC - in response to Message 57318.

I'm getting a bunch of new work for my Windows hosts but they are all failing with Error while computing. Here is a sample stderr log.

Sometimes, resetting Gpugrid Project at BOINC Manager may help, given that all app related files will be reloaded.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57324 - Posted: 18 Sep 2021 | 10:39:05 UTC - in response to Message 57323.

I'm getting a bunch of new work for my Windows hosts but they are all failing with Error while computing. Here is a sample stderr log.

Sometimes, resetting Gpugrid Project at BOINC Manager may help, given that all app related files will be reloaded.

Perhaps worth a try, but I think it's unlikely to help with this one. Another tried-and-tested solution to some rare problems is to perform a full system restart.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 404
Credit: 5,815,469,764
RAC: 1,007,028
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57325 - Posted: 18 Sep 2021 | 13:05:32 UTC - in response to Message 57320.

Another WU is done:

https://www.gpugrid.net/result.php?resultid=32640779

And with no more WUs available, it's back to folding.



jjch
Send message
Joined: 10 Nov 13
Posts: 59
Credit: 14,591,077,215
RAC: 1,944,796
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57326 - Posted: 18 Sep 2021 | 21:23:48 UTC

I have had three tasks complete successfully. They all ran on Quadro RTX 4000 GPU's. Is it possible that only RTX series cards will work?

Everything else is mostly GTX 1080 series but there are a couple Quadro P6000's and Titan Xp's.

These all have driver version 471.11 which is fairly recent but I haven't checked for the latest.

The systems are all Windows based of one flavor or another depending on if it's a server or workstation.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 808
Credit: 1,077,149,831
RAC: 985,453
Level
Met
Scientific publications
watwatwatwatwat
Message 57327 - Posted: 19 Sep 2021 | 6:14:44 UTC

I have completed two tasks so far, one on a RTX 2080 and another on a GTX 1080 Ti.
Two more completed successfully AFAIK and are uploading now.
Same cards.

Erich56
Send message
Joined: 1 Jan 15
Posts: 825
Credit: 3,453,836,727
RAC: 383,462
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57328 - Posted: 19 Sep 2021 | 14:05:41 UTC

Can anyone tell whether these new tasks run on Ampere cards, too?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 808
Credit: 1,077,149,831
RAC: 985,453
Level
Met
Scientific publications
watwatwatwatwat
Message 57330 - Posted: 19 Sep 2021 | 15:11:06 UTC - in response to Message 57328.

Can anyone tell whether these new tasks run on Ampere cards, too?

Yes, they run fine on Ampere cards also.

Erich56
Send message
Joined: 1 Jan 15
Posts: 825
Credit: 3,453,836,727
RAC: 383,462
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57332 - Posted: 19 Sep 2021 | 16:17:33 UTC - in response to Message 57330.

Can anyone tell whether these new tasks run on Ampere cards, too?

Yes, they run fine on Ampere cards also.

Thanks, Keith, for the valuable information.
Just to make sure: you talk about Windows, too, or just Linux?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 459
Credit: 2,122,879,742
RAC: 886,440
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57333 - Posted: 19 Sep 2021 | 16:19:00 UTC - in response to Message 57330.

Can anyone tell whether these new tasks run on Ampere cards, too?

Yes, they run fine on Ampere cards also.

In both Windows and Linux systems with updated drivers, at last.

jjch
Send message
Joined: 10 Nov 13
Posts: 59
Credit: 14,591,077,215
RAC: 1,944,796
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57334 - Posted: 19 Sep 2021 | 19:56:47 UTC
Last modified: 19 Sep 2021 | 20:55:09 UTC

I found one task that failed and the acemd3 application had crashed.

The appcrash is in vcruntime140_1.dll with code c0000135

This server was running Win server 2012 so that could be clue.

I remember awhile back I had problems with another GPU app and I had to update the Visual C++ Redistributable.

I'll keep checking to see if I can find what will get this to work.

Erich56
Send message
Joined: 1 Jan 15
Posts: 825
Credit: 3,453,836,727
RAC: 383,462
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57339 - Posted: 20 Sep 2021 | 8:02:26 UTC - in response to Message 57334.


I remember awhile back I had problems with another GPU app and I had to update the Visual C++ Redistributable.

I had this happen with Folding@Home, several weeks ago. Maybe you are talking about the same project anyway :-)

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57345 - Posted: 20 Sep 2021 | 10:14:13 UTC - in response to Message 57334.

I've now got a Windows v218, flavour cuda101, running and saved for posterity.

This is the DLL show of the task actually running (from Process Explorer) - the list of system DLLs is too long to show in this format, but I can look them up.

jjch
Send message
Joined: 10 Nov 13
Posts: 59
Credit: 14,591,077,215
RAC: 1,944,796
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57353 - Posted: 21 Sep 2021 | 4:41:31 UTC
Last modified: 21 Sep 2021 | 4:41:55 UTC

I think the problem is due to older Windows OS versions that don't have the newer Microsoft Visual C++ 2015-2019 Redistributable. The new ACEMD app seems to require this to work.

If you are running a newer OS and have at least a minimum version of the Microsoft Visual C++ 2015-2019 Redistributable, GPUGRID should work.

If you don't have it you can download and install the latest version from here: https://support.microsoft.com/en-us/topic/the-latest-supported-visual-c-downloads-2647da03-1eea-4433-9aff-95f26a218cc0

I always install both the x86 and x64 versions as I don't know what GPUGRID really needs.

There isn't much work available at the moment to fully test this, but I do have a couple WU running that should finish tonight or tomorrow.

I think the other GPU program I had trouble with a few years ago was Einstein @ home.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57356 - Posted: 21 Sep 2021 | 13:04:41 UTC - in response to Message 57353.

Well, as you can see from my screenshot yesterday (which was taken on a machine running Windows 7, BTW), acemd3 has loaded vcruntime140 from the BOINC slot directory - i.e., from an image circulated by the project via its various downloads. That file is actually v14.28.29325.2, dated Fri Sep 25 2020.

Further down the list, it has also loaded vcruntime140_1 from my Windows\System32 directory. That file is v14.25.28508.3, dated Wed Jan 8 2020. Both files are 64-bit images, which is right for my system.

Dependency Walker also finds a requirement for both versions of vcruntime140.

But inspecting the acemd3.exe application with a hex editor, there's only vcruntime140.dll, at offset 000B5A78: no sign of the _1 version.

Looking around Microsoft, I think VCruntime140.dll is the VS2015 version, and VCruntime140_1.dll is the VS2017 version. It rather looks as if the final acemd3 package was linked together from components compiled with two different generations of Visual Studio.

That suggests that jjch's solution in the previous post is the best one - if you encounter any problems in vcruntimeXXX with this project, install the combined redistributable for Visual Studio 2015, 2017 and 2019 from the link in his post.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57357 - Posted: 21 Sep 2021 | 16:56:04 UTC

Well, welcome to DLL Hell (yes, that's a thing).

Looked at my full task list, and found that another machine had a failed task with, you guessed it, error 0xc0000135. The two machines arrived here as exact twins, but diverged over the years as I upgraded bits, used them for different tasks, and so on.

Checked C:\Windows\System32, and sure enough no sign of vcruntime140_1.dll. Downloaded the redistributables package - it's hiding right at the botton of the VS2019 page, in the 'Other Tools, Frameworks, and Redistributables' drop-down list - ran it, and 140_1 appeared.

We've got to wait for another spin of the task-issue lottery wheel to be certain, but that increasingly looks like the explanation and solution. It does look silly to build a single application that requires two different sets of runtime files, and then only distribute one of them in the conda-pack.zip download. I'll send a note to the admins once I've got confirmation that this second machine is working properly.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57359 - Posted: 22 Sep 2021 | 15:44:52 UTC

Good news. The machine which failed yesterday is now running an acemd3 cuda101 task. Here is the full set of C runtime library files, and where they're being run from:



I'll drop a line to Gianni, to pass on to the development / deployment team.

Erich56
Send message
Joined: 1 Jan 15
Posts: 825
Credit: 3,453,836,727
RAC: 383,462
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwat
Message 57374 - Posted: 25 Sep 2021 | 3:42:43 UTC - in response to Message 57359.

I'll drop a line to Gianni, to pass on to the development / deployment team.

any reaction from their side so far ?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57375 - Posted: 25 Sep 2021 | 7:44:51 UTC - in response to Message 57374.

Sadly, none at all. I might try Toni, after our conversation in the news thread yesterday.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1942
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 57413 - Posted: 1 Oct 2021 | 10:23:19 UTC - in response to Message 57375.

This is already fixed in acemdbeta and will be fixed as soon as we deploy the new app.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57415 - Posted: 1 Oct 2021 | 12:27:58 UTC

Sadly, I'm not completely convinced by this.

My host 43404 ran one of the Beta tasks on 29 September. I have two downloaded files from that day in my project directory:

windows_x86_64__cuda101.zip
job.xml

Significantly, there is no new conda-pack.zip: that file is still dated 20 September.

I've examined the new acemd3.exe file from the zip - dated 27 September - using Dependency Walker. That still shows vcruntime140_1.dll as a required file, invoked by msvcp140.dll

My machine has the full triple-pack of VC runtime files installed, so this may be an artifact of that. Did anyone get one of the Beta test jobs, without having the manual VC runtime pack installed. Did it run successfully?

bozz4science
Send message
Joined: 22 May 20
Posts: 104
Credit: 21,759,591
RAC: 77,228
Level
Pro
Scientific publications
wat
Message 57416 - Posted: 1 Oct 2021 | 12:43:42 UTC - in response to Message 57415.

I didn't manually intervene with anything and got 3 beta tasks that ran successfully on my Win10 machine. They all finished successfully within a few minutes. Does that help?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1245
Credit: 3,344,411,168
RAC: 867,709
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57417 - Posted: 1 Oct 2021 | 12:49:37 UTC - in response to Message 57416.

That's reassuring - thanks. But I'll keep an eye on my Windows 7 machines when the project next ramps up.

marsinph
Send message
Joined: 11 Feb 18
Posts: 38
Credit: 529,840,974
RAC: 56,181
Level
Lys
Scientific publications
wat
Message 57542 - Posted: 8 Oct 2021 | 7:37:53 UTC - in response to Message 57413.

This is already fixed in acemdbeta and will be fixed as soon as we deploy the new app.


No further explanation ?
How can we fix it ? What is the problem ?
Not only the C++ (I have all dll and C++ manually installed)
Till june all was working. Now, not more (without any change in my config.
It seems the problem comes from the WU

Post to thread

Message boards : Graphics cards (GPUs) : New version tasks failing on Windows hosts