Advanced search

Message boards : News : All acemd3 apps updated (210)

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52852 - Posted: 16 Oct 2019 | 10:47:28 UTC
Last modified: 16 Oct 2019 | 10:58:14 UTC

Currently there should be no major *known* bugs. We should cover Win64 and Linux, with reasonably recent cards.

Unfortunately, an internal cleanup in the filenames will make *existing* WUs fail. Sorry about that. Will send new test WUs soon.

By the way, the scheduler for this app will base its decision simply on the CUDA version supported by your driver, rather than other heuristics.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,677,994,526
RAC: 13,490,590
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52853 - Posted: 16 Oct 2019 | 14:41:05 UTC

Yes, I have had several task failures today when I never had any. Validated three test tasks with the new 2.10 app and one normal task.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52854 - Posted: 16 Oct 2019 | 14:49:15 UTC - in response to Message 52853.
Last modified: 16 Oct 2019 | 14:50:12 UTC

The DHFR210 set was botched because old versions were still lurking around. I deprecated all the old apps now. The 210a set was created after this so should be ok.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52857 - Posted: 16 Oct 2019 | 21:26:05 UTC - in response to Message 52852.

CUDA version supported by your driver, rather than other heuristics

Which version is recommended as the minimum now?

MrS
____________
Scanning for our furry friends since Jan 2002

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52860 - Posted: 17 Oct 2019 | 0:06:41 UTC - in response to Message 52857.
Last modified: 17 Oct 2019 | 0:07:53 UTC

Looking at the supported applications page - http://www.gpugrid.net/apps.php

Supported "New version of ACEMD" applications:
Linux - GPU/driver capable of CUDA80 or better
Windows - GPU/driver capable of CUDA92 or better

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52861 - Posted: 17 Oct 2019 | 2:17:38 UTC - in response to Message 52857.
Last modified: 17 Oct 2019 | 2:18:08 UTC

Which version is recommended as the minimum now?


As per Nvidia deployment documentation (previously posted by Keith Myers): https://docs.nvidia.com/deploy/cuda-compatibility/index.html

CUDA80 Minimum Driver r367.48 or higher
CUDA92 Minimum Driver r396.26 or higher
CUDA100 Minimum Driver r410.48 or higher
CUDA101 Minimum Driver r418.39 or higher

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52863 - Posted: 17 Oct 2019 | 7:06:29 UTC - in response to Message 52861.

More failures because the old app (206) is still being sent out despite being deprecated.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52864 - Posted: 17 Oct 2019 | 9:47:14 UTC - in response to Message 52861.

Which version is recommended as the minimum now?



CUDA80 Minimum Driver r367.48 or higher
CUDA92 Minimum Driver r396.26 or higher
CUDA100 Minimum Driver r410.48 or higher
CUDA101 Minimum Driver r418.39 or higher


Exactly. Updated drivers are necessary for RTX users. They should go for r418.39 or higher.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52866 - Posted: 17 Oct 2019 | 12:26:51 UTC - in response to Message 52864.

Seems to be working. I added a FAQ item. Old WUs may still fail.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52867 - Posted: 17 Oct 2019 | 12:49:36 UTC

It also depends on what generation of card you have. My GTX 980 with the 430.26 drivers running under Ubuntu 18.04 is getting the CUDA 100 work units. It ran a total time of 43 minutes for a a81-TONI_TESTTESTLONG210. But it uses a whole CPU core, so I just reserve one for it.

I will await the real work units before doing more.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 180,567
Level
Trp
Scientific publications
watwatwat
Message 52869 - Posted: 18 Oct 2019 | 15:15:53 UTC

Now we need 10,000 WUs loaded.
____________

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,720,229,693
RAC: 1,979,566
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52870 - Posted: 18 Oct 2019 | 15:41:48 UTC - in response to Message 52869.

Now we need 10,000 WUs loaded.

+1

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 52871 - Posted: 18 Oct 2019 | 20:09:41 UTC

Over the past several days I have received 4 ACEMD 210 WU's on two Linux machines with 3 GTX-1060's and all validated fine (3 x 7,500 and 1 x 75,000 points). Linux machines awaiting production WU's anytime.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52877 - Posted: 20 Oct 2019 | 11:46:02 UTC
Last modified: 20 Oct 2019 | 12:30:32 UTC

Could you please answer these 2 questions...
... and consider including their answers in the FAQ thread here:
http://www.gpugrid.net/forum_thread.php?id=5002

2 questions:

1) Is the new app capable of resuming on a different GPU? I ask, because my main 2 crunching PCs are below, each having 3 different GPU types, and you said "no major known bugs", but I thought I saw some other report saying earlier how resuming on a different GPU type wasn't working yet.

2) Will this app work for my GTX 660 Ti that is in the same PC as newer generation GPUs? I believe mixing Maxwell and Pascal was a problem previously, which is why I'm asking about your new app.

Depending on your answers, both of these would be major problems for me, as I'm trying to keep these PCs stable while also working long-running RNA World tasks.

Please let me know,
Thanks,
Jacob Klein

PC 1: RTX 2080, GTX 980 Ti, GTX 980
PC 2: GTX 1050 Ti, GTX 970, GTX 660 Ti

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,677,994,526
RAC: 13,490,590
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52879 - Posted: 20 Oct 2019 | 22:33:51 UTC - in response to Message 52877.

As far as I know, you still can't resume a task on a different card type. I solved it by changing my preferences to switch among apps to 360 minutes vice the default 60 minutes and that solves the issue. The task starts and finishes on the same card. Haven't seen any task require that long to finish yet but probably will be adequate until we get the app declared to Main and start getting Long tasks again with the new apps.

If you have the same card type in a multiple card host, there is no issue starting on one card and finishing on another.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52880 - Posted: 21 Oct 2019 | 10:13:24 UTC - in response to Message 52877.
Last modified: 21 Oct 2019 | 10:14:37 UTC

Sadly, one still can't restart between on different card types.

Regarding the mixing of cards: I don't see why it shouldn't work, but a real world test will confirm.



Could you please answer these 2 questions...
... and consider including their answers in the FAQ thread here:
http://www.gpugrid.net/forum_thread.php?id=5002

2 questions:

1) Is the new app capable of resuming on a different GPU? I ask, because my main 2 crunching PCs are below, each having 3 different GPU types, and you said "no major known bugs", but I thought I saw some other report saying earlier how resuming on a different GPU type wasn't working yet.

2) Will this app work for my GTX 660 Ti that is in the same PC as newer generation GPUs? I believe mixing Maxwell and Pascal was a problem previously, which is why I'm asking about your new app.

Depending on your answers, both of these would be major problems for me, as I'm trying to keep these PCs stable while also working long-running RNA World tasks.

Please let me know,
Thanks,
Jacob Klein

PC 1: RTX 2080, GTX 980 Ti, GTX 980
PC 2: GTX 1050 Ti, GTX 970, GTX 660 Ti

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52881 - Posted: 21 Oct 2019 | 11:39:21 UTC

Are there any plans to allow the app to resume on a different GPU?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52882 - Posted: 21 Oct 2019 | 11:56:49 UTC - in response to Message 52881.
Last modified: 21 Oct 2019 | 11:57:08 UTC

We looked into it, but do not know if and when there will be progress on the front. For the time being, I've amended the FAQ with a pointer on gpu exclusion.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52883 - Posted: 21 Oct 2019 | 12:08:55 UTC
Last modified: 21 Oct 2019 | 12:13:32 UTC

Thank you. That is suitable, and I plan on implementing that approach shortly for one of my systems that gets suspended/resumed a lot. Did you know I'm responsible for exclude_gpu existing? ;)

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52884 - Posted: 21 Oct 2019 | 12:11:19 UTC - in response to Message 52883.

Thank you. That is suitable, and I plan on implementing that approach shortly for one of my systems that gets suspended/resumed a lot. Did you know I'm responsible for exclude_gpu existing? ;)


No I didn't and let me add: well done :)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52885 - Posted: 21 Oct 2019 | 12:15:24 UTC - in response to Message 52884.

Thanks :)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52897 - Posted: 25 Oct 2019 | 18:22:42 UTC

Toni, a minor request for the enxt version: the previous one had some nice information in the Stderr output, i.e. GPU, driver, clocks and natoms. The latter was useful for determining small performance differences, as the "credits per time" always depended on the number of atoms in a simulation. So please include whatever you can port without too much trouble.

MrS
____________
Scanning for our furry friends since Jan 2002

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,720,229,693
RAC: 1,979,566
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52912 - Posted: 30 Oct 2019 | 15:26:50 UTC

I would like to ask: As I understand a GTX 670 with the latest driver: 440.97 should work with the acemd3 app, as this driver and confirmed by BOINC has the latest CUDA tools 10.2. Is this correct?

I ask as the GTX670 fails all acemd2 v9.22 (cuda65) at the moment as reported by others. However what I do not understand is why the acemd2 app does not switch to acemd2 v9.22 (cuda80) with the new driver installed?

I would be very grateful, if somebody might answer these two questions. If the GTX 670 will not work anymore on these project I might as well pass it to another BOINC user.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,677,994,526
RAC: 13,490,590
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52913 - Posted: 30 Oct 2019 | 15:34:23 UTC

Are ANY of the 9.22 or 9.23 apps working? I thought I saw mention that none of them work anymore because their license expired again.

Aurum
Avatar
Send message
Joined: 12 Jul 17
Posts: 401
Credit: 16,755,010,632
RAC: 180,567
Level
Trp
Scientific publications
watwatwat
Message 52914 - Posted: 30 Oct 2019 | 16:04:16 UTC - in response to Message 52912.

latest driver: 440.97 should work with the acemd3 app

440.97 is working fine for me but I don't have a 670.

____________

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,720,229,693
RAC: 1,979,566
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52915 - Posted: 30 Oct 2019 | 19:00:20 UTC

The 9.23 app works on this computer: http://www.gpugrid.net/show_host_detail.php?hostid=441816, but not on this: http://www.gpugrid.net/show_host_detail.php?hostid=486229
Both Windows though..

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,677,994,526
RAC: 13,490,590
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52916 - Posted: 30 Oct 2019 | 19:35:16 UTC

Any reason why you don't want to move on to the acemd3 app? Other than there is no work for it again.
From the prerequisites page https://www.gpugrid.net/join.php the 670 qualifies as a CC 3.0 capable card. The 441 drivers in their docs say the 670 is supported.
Other than the normal BOINC mechanism that tries out all possible applications compatible with your hardware and had decided the old CUDA65 still qualifies and has decided not to try out the CUDA80 app, I don't know what you can do other than wait it out. Or go with an anonymous platform and define the CUDA80 9.23 app as your gpu app of choice.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,214,765,968
RAC: 1,002,217
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52917 - Posted: 30 Oct 2019 | 19:47:39 UTC - in response to Message 52915.

The 9.23 app works on this computer: http://www.gpugrid.net/show_host_detail.php?hostid=441816, but not on this: http://www.gpugrid.net/show_host_detail.php?hostid=486229
Both Windows though..
The second one is 9.22 (CUDA 6.5). It's license got expired.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,677,994,526
RAC: 13,490,590
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52918 - Posted: 30 Oct 2019 | 19:53:46 UTC - in response to Message 52917.

The 9.23 app works on this computer: http://www.gpugrid.net/show_host_detail.php?hostid=441816, but not on this: http://www.gpugrid.net/show_host_detail.php?hostid=486229
Both Windows though..
The second one is 9.22 (CUDA 6.5). It's license got expired.

Thanks Zoltan. I thought I saw somewhere that some(one) of the apps had an expired license again.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,720,229,693
RAC: 1,979,566
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52919 - Posted: 30 Oct 2019 | 20:16:09 UTC - in response to Message 52917.

The 9.23 app works on this computer: http://www.gpugrid.net/show_host_detail.php?hostid=441816, but not on this: http://www.gpugrid.net/show_host_detail.php?hostid=486229
Both Windows though..
The second one is 9.22 (CUDA 6.5). It's license got expired.
That is why I thought, installing the newest driver would automatically trigger that BOINC would download CUDA80 9.23 app until acemd3 is wildly available. Or just dismantle this particular computer as it is not a very efficient GPU anymore and pass it to a user which still uses GTX5XX for another BOINC project.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,677,994,526
RAC: 13,490,590
Level
Tyr
Scientific publications
watwatwatwatwat
Message 52920 - Posted: 30 Oct 2019 | 21:11:19 UTC - in response to Message 52919.

Well the project should send out the available 9.23 CUDA80 app for applicable hardware . . . if they have configured the scheduler correctly for deprecating the CUDA65 9.22 app. Obviously that hasn't happened.

Nick Name
Send message
Joined: 3 Sep 13
Posts: 53
Credit: 1,533,531,731
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 52921 - Posted: 31 Oct 2019 | 7:58:51 UTC

It's gotten too confusing keeping track of what runs on what. I'm setting my hosts to normal preferences, which for me is long runs and ACEMD3. The server shouldn't be sending the wrong apps to the wrong hosts. Any failures I have will be dealt with via the normal BOINC mechanisms.
____________
Team USA forum | Team USA page
Join us and #crunchforcures. We are now also folding:join team ID 236370!

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 52922 - Posted: 31 Oct 2019 | 9:16:45 UTC - in response to Message 52921.

It's gotten too confusing keeping track of what runs on what. I'm setting my hosts to normal preferences, which for me is long runs and ACEMD3. The server shouldn't be sending the wrong apps to the wrong hosts. Any failures I have will be dealt with via the normal BOINC mechanisms.


I think it's the sensible approach.

t

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,214,765,968
RAC: 1,002,217
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52924 - Posted: 1 Nov 2019 | 20:16:05 UTC - in response to Message 52922.
Last modified: 1 Nov 2019 | 20:25:04 UTC

It's gotten too confusing keeping track of what runs on what. I'm setting my hosts to normal preferences, which for me is long runs and ACEMD3. The server shouldn't be sending the wrong apps to the wrong hosts. Any failures I have will be dealt with via the normal BOINC mechanisms.

I think it's the sensible approach.

t

Is this policy of app assignment sill in function?
This might be the reason for sending the 9.22 (CUDA6.5) app to hosts with CC3.0 cards.
It's time to deprecate these apps, if their license in fact got expired.
I've saved a WU, which has failed 7 times before my host picked it up:
9.22 (CUDA65) app: 4 times
Turing card: 2 times
exit code -55: 1 time
It's time to release the new app, and deprecate the not working ones, as this project spamming itself to the void.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,214,765,968
RAC: 1,002,217
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52995 - Posted: 18 Nov 2019 | 13:28:09 UTC

Now that the queues run completely dry, it's time to deprecate the old app, release the new, and fill up the queue.
Please, clear all the old errors from the database.

Erich56
Send message
Joined: 1 Jan 15
Posts: 1132
Credit: 10,256,332,676
RAC: 29,209,412
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwat
Message 52996 - Posted: 18 Nov 2019 | 17:11:21 UTC - in response to Message 52995.

Now that the queues run completely dry, it's time to deprecate the old app, release the new, and fill up the queue.

you are a real optimist, Zoltan :-)))

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 53000 - Posted: 19 Nov 2019 | 21:28:11 UTC

Now that the queues run completely dry, it's time to deprecate the old app, release the new, and fill up the queue.
Please, clear all the old errors from the database.

I'll second that. I don't have any Windows machines so all my Linux hosts have crunching E@H and WCG only since May.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 53001 - Posted: 19 Nov 2019 | 21:53:06 UTC

They are dependent on only a few people for work. I am sure some of them have teaching obligations and other course work.

When they are between PhD dissertations or other publications, we are out of luck.

Post to thread

Message boards : News : All acemd3 apps updated (210)

//