failing tasks lately

Message boards : Number crunching : failing tasks lately

Author	Message
Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52174 - Posted: 3 Jul 2019 \| 16:56:32 UTC Last modified: 3 Jul 2019 \| 16:59:27 UTC
	This afternoon, I had 4 tasks in a row which failed after few seconds; see here: http://www.gpugrid.net/results.php?userid=125700&offset=0&show_names=1&state=0&appid= -97 (0xffffffffffffff9f) Unknown error number The simulation has become unstable. Terminating to avoid lock-up I've never had that before; and I didn't change anything in my settings or so. Does anyone else experience the same problem? I now stopped the download.
	ID: 52174 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,869,754,793 RAC: 20,243,356 Level Scientific publications	Message 52176 - Posted: 3 Jul 2019 \| 18:48:47 UTC
	I've had three failed tasks over the last two days, but all the others have run normally. All the failed tasks had PABLO_V3_p27_sj403_IDP in their name. But I'm currently uploading e10s21_e4s18p1f211-PABLO_V3_p27_sj403_IDP-0-2-RND5679_0 - which fits that name pattern, but has run normally. By the time you read this, it will probably have reported and you can read the outcome for yourselves. If it's valid, I think you can assume that Pablo has found the problem and corrected it.
	ID: 52176 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52177 - Posted: 3 Jul 2019 \| 19:33:05 UTC
	Yes, part of the PABLO_V3_p27_sj403_ID series seems to be erronious. Within the past few days, some of them worked well here. But others don't, as can be seen. The server status page shows an error rate of 56.37% for them. Which is high, isn't it? I'll switch off my aircond over night and will try to download the next task tomorrow morning.
	ID: 52177 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52179 - Posted: 4 Jul 2019 \| 4:46:06 UTC - in response to Message 52177.
	The server status page shows an error rate of 56.37% for them. Which is high, isn't it? over night, failure rate has raised to 57.98%. The remaining tasks from this series should be cancelled from the queue.
	ID: 52179 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52182 - Posted: 4 Jul 2019 \| 15:41:34 UTC - in response to Message 52179.
	The server status page shows an error rate of 56.37% for them. Which is high, isn't it? over night, failure rate has raised to 57.98%. The remaining tasks from this series should be cancelled from the queue. meanwhile, the failure rate has passed the 60% mark. It's 60,12%, to be exact. And these faulty tasks are still in the download queue, WHY ???
	ID: 52182 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,869,754,793 RAC: 20,243,356 Level Scientific publications	Message 52189 - Posted: 5 Jul 2019 \| 16:33:40 UTC
	I thought we'd got rid of these, but I've just sent back e15s24_e1s258p1f302-PABLO_V3_p27_sj403_IDP-0-2-RND4645_0 - note the _0 replication. I was the first victim since the job was created at 11:25:23 UTC today, seven more to go.
	ID: 52189 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52194 - Posted: 5 Jul 2019 \| 19:55:48 UTC
	The failure rate now is close to 64%, so it's still climbing up. From what it looks, none of the tasks from this series are successful. Can anyone from the GPUGRID people explain the rationale behind leaving these faulty tasks in the download queue?
	ID: 52194 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,635,065,645 RAC: 10,764,900 Level Scientific publications	Message 52195 - Posted: 5 Jul 2019 \| 21:55:24 UTC - in response to Message 52194.
	The failure rate now is close to 64%, so it's still climbing up. From what it looks, none of the tasks from this series are successful. Can anyone from the GPUGRID people explain the rationale behind leaving these faulty tasks in the download queue? A holiday. Some admins won't even cancel tasks like that even if they are active. Some will just let them error out the max # of times.
	ID: 52195 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52197 - Posted: 6 Jul 2019 \| 4:44:48 UTC - in response to Message 52195.
	Some will just let them error out the max # of times. The bad thing is that once a host has more than 2 or 3 such faulty tasks in a row, the host is considered as unreliable and will no longer receive tasks for the next 24 hours. So the host is penalized for something which is not in the responsibility of the host. Even more I am wondering that the GPUGRID people don't care :-(
	ID: 52197 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52204 - Posted: 7 Jul 2019 \| 5:04:44 UTC
	the failure rate has passed the 70% mark now. Great !
	ID: 52204 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52208 - Posted: 8 Jul 2019 \| 18:50:22 UTC
	meanwhile, the failure rate has passed the 75% mark. It now is 75,18%, to be exact. And still, these faulty tasks are in the download queue. Does anybody understand this?
	ID: 52208 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,678,933,251 RAC: 13,420,208 Level Scientific publications	Message 52210 - Posted: 9 Jul 2019 \| 4:34:10 UTC
	If you are so unhappy running the available Windows tasks, just stop getting any work. Problem solved. You are happy now. I don't have any issues with the project and I haven't had any normal work since February when the Linux app was decommissioned. I trust Toni will eventually figure out the new wrapper apps and we will get work again. Don't PANIC!
	ID: 52210 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52211 - Posted: 9 Jul 2019 \| 5:04:46 UTC - in response to Message 52210.
	If you are so unhappy running the available Windows tasks, just stop getting any work. Problem solved. You are happy now. I don't have any issues with the project and I haven't had any normal work since February when the Linux app was decommissioned. I trust Toni will eventually figure out the new wrapper apps and we will get work again. Don't PANIC! The question isn't whether or not I am unhappy. The question rather is what makes sense and what doesn't. Don't you think the only real solution to the problem would logically be to simply withdraw the remaining tasks of this faulty series from the download queue? Or can you explain the rationale for leaving them in the download queue? In a few more weeks, when all these tasks will be used up, the error rate will be 100%. How does this serve the project? As I explained before: once a host happens to download such a faulty task 2 or 3 times in a row, this host is blocked for 24 hours. So, what sense does this then make?
	ID: 52211 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,869,754,793 RAC: 20,243,356 Level Scientific publications	Message 52215 - Posted: 9 Jul 2019 \| 13:17:11 UTC
	So far as I can tell from my account pages, my machines are processing GPUGrid tasks just fine and at the normal rate. It's just one sub-type which is failing, and it's only wasting a few seconds when it does so. For some people on metered internet connections, there might be an additional cost, but I think it's unlikely that many people are running a high-bandwidth project that way. The rationale for letting them time out naturally? It saves staff time, better spent doing the analysis and debugging behind the scenes. Let them get on with that, and I'm sure the research will be re-run when they find and solve the problem. BTW, "No, it doesn't work" is a valid research outcome.
	ID: 52215 \| Rating: 0 \| rate: / Reply Quote

Redirect Left Send message Joined: 8 Dec 12 Posts: 23 Credit: 181,940,893 RAC: 8 Level Scientific publications	Message 52216 - Posted: 9 Jul 2019 \| 15:12:21 UTC
	My machine has also failed numerous GPUGrid tasks lately, running on 2 GTX 1070 cards (individual, not SLI'd). The failed ones are usually PABLO or NOELIA in their names. Here are four examples of recent fails on my machine, hopefully you can determine from output any issues to resolve. http://www.gpugrid.net/result.php?resultid=7412820 http://www.gpugrid.net/result.php?resultid=21094782 http://www.gpugrid.net/result.php?resultid=7412829 http://www.gpugrid.net/result.php?resultid=21075338 I'll be skipping GPUGrid tasks from now on until it is resolved, as it is wasting CPU/GPU time that i can use for other projects on the machine. I'll refer back to these forums to check on updates though so i know when to restart GPUGRID tasks.
	ID: 52216 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 52217 - Posted: 9 Jul 2019 \| 22:28:06 UTC - in response to Message 52216. Last modified: 9 Jul 2019 \| 22:31:17 UTC
	http://www.gpugrid.net/result.php?resultid=7412820 This WU is from 2013. http://www.gpugrid.net/result.php?resultid=21094782 This WU is from the present bad batch. It took 6 seconds to error out. http://www.gpugrid.net/result.php?resultid=7412829 This WU is from 2013. http://www.gpugrid.net/result.php?resultid=21075338 This WU is from the present bad batch. It took 5 seconds to error out. http://www.gpugrid.net/result.php?resultid=21094816 This WU is from the present bad batch. It took 6 seconds to error out. I'll be skipping GPUGrid tasks from now on until it is resolved, as it is wasting CPU/GPU time that i can use for other projects on the machine. The 3 recent errors wasted 17 seconds on your host in the past 4 days, so there's no reason for panicking. (even though your host didn't received work for 3 days.) I'll refer back to these forums to check on updates though so i know when to restart GPUGRID tasks. The project is running fine beside this one bad batch, so you can do it right away. The number of resends may increase as this bad batch runs out, that may cause a host to be "blacklisted" for 24 hours, but it needs many failing workunits in a row (so it is unlikely to happen, as the maximal number of daily workunits get reduced by 1 after an error). The max number of "Long runs (8-12 hours on fastest card) 9.22 windows_intelx86 (cuda80)" app for your host is currently 28, so this host should be extremely unlucky to receive 28 bad workunits in a row to get "banned" for 24 hours.
	ID: 52217 \| Rating: 0 \| rate: / Reply Quote

Redirect Left Send message Joined: 8 Dec 12 Posts: 23 Credit: 181,940,893 RAC: 8 Level Scientific publications	Message 52218 - Posted: 9 Jul 2019 \| 23:09:11 UTC - in response to Message 52217.
	Oops my bad, i sorted the tasks by 'errored' and mixed up the ones to paste. The results in their entirity are below, with 10 errored ones, only 4 recently with non have errored (or are not showing there) since one in 2015, and the other 5 in 2013. http://www.gpugrid.net/results.php?userid=93721&offset=0&show_names=0&state=5&appid= On your advice i'll restart the GPUGrid task seeking, and hopefully the toin cosses go in my way and it'll fetch a wide spread of tasks to not get itself blacklisted. Interesting it is set to store up to 28, given it only ever stores 4, and that is if 2 are running active on the GPUs with 2 spare. But I guess that is down to the limits on the future work storage settings for BOINC.
	ID: 52218 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 52342 - Posted: 24 Jul 2019 \| 8:42:59 UTC Last modified: 24 Jul 2019 \| 8:46:04 UTC
	There are two more 'bad' batches at the moment in the 'long' queue: PABLO_V4_UCB_p27_isolated_005_salt_ID PABLO_V4_UCB_p27_sj403_short_005_salt_ID Don't be surprised if the tasks from these two batches fail on your host after a couple of seconds - there's nothing wrong with your host. The safety check of these batches is too sensitive, so it thinks that "the simulation became unstable" while it's probably not.
	ID: 52342 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52385 - Posted: 7 Aug 2019 \| 12:10:27 UTC
	any idea why all tasks downloaded within the last few hours fail immediately?
	ID: 52385 \| Rating: 0 \| rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 2,212,787,676 RAC: 3,785,098 Level Scientific publications	Message 52386 - Posted: 7 Aug 2019 \| 12:51:31 UTC - in response to Message 52385.
	any idea why all tasks downloaded within the last few hours fail immediately? No idea, but it's the same for others. I'm using Win7pro, work-units crash at once: Stderr Ausgabe <core_client_version>7.10.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -44 (0xffffffd4)</message> ]]> 07.08.2019 14:17:11 \| GPUGRID \| Sending scheduler request: To fetch work. 07.08.2019 14:17:11 \| GPUGRID \| Requesting new tasks for NVIDIA GPU 07.08.2019 14:17:13 \| GPUGRID \| Scheduler request completed: got 1 new tasks 07.08.2019 14:17:15 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-LICENSE 07.08.2019 14:17:15 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-COPYRIGHT 07.08.2019 14:17:17 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-LICENSE 07.08.2019 14:17:17 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-COPYRIGHT 07.08.2019 14:17:17 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-coor_file 07.08.2019 14:17:17 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-vel_file 07.08.2019 14:17:18 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-vel_file 07.08.2019 14:17:18 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-idx_file 07.08.2019 14:17:19 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-idx_file 07.08.2019 14:17:19 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-pdb_file 07.08.2019 14:17:21 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-coor_file 07.08.2019 14:17:21 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-psf_file 07.08.2019 14:17:30 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-pdb_file 07.08.2019 14:17:30 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-par_file 07.08.2019 14:17:33 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-par_file 07.08.2019 14:17:33 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-conf_file_enc 07.08.2019 14:17:34 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-conf_file_enc 07.08.2019 14:17:34 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-metainp_file 07.08.2019 14:17:35 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-metainp_file 07.08.2019 14:17:35 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-hills_file 07.08.2019 14:17:36 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-hills_file 07.08.2019 14:17:36 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-xsc_file 07.08.2019 14:17:37 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-xsc_file 07.08.2019 14:17:37 \| GPUGRID \| Started download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-prmtop_file 07.08.2019 14:17:38 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-psf_file 07.08.2019 14:17:38 \| GPUGRID \| Finished download of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-prmtop_file 07.08.2019 14:19:22 \| GPUGRID \| Starting task e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4 07.08.2019 14:19:29 \| GPUGRID \| Computation for task e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4 finished 07.08.2019 14:19:29 \| GPUGRID \| Output file e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4_0 for task e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4 absent 07.08.2019 14:19:29 \| GPUGRID \| Output file e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4_1 for task e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4 absent 07.08.2019 14:19:29 \| GPUGRID \| Output file e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4_2 for task e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4 absent 07.08.2019 14:19:29 \| GPUGRID \| Output file e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4_3 for task e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4 absent 07.08.2019 14:19:37 \| GPUGRID \| Started upload of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4_7 07.08.2019 14:19:39 \| GPUGRID \| Finished upload of e14s18_e8s70p1f46-PABLO_V4_UCB_p27_sj403_005_salt_IDP-0-2-RND1985_4_7 Another member of our team has the same problem on Win10. I'd really like to compare this with Linux, but I didn't get any work-unit on my Debian machine for weeks. ____________ - - - - - - - - - - Greetings, Jens
	ID: 52386 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52390 - Posted: 7 Aug 2019 \| 14:29:10 UTC - in response to Message 52386.
	any idea why all tasks downloaded within the last few hours fail immediately? No idea, but it's the same for others. yes, I had checked that before I wrote my posting above. I wonder whether the GPUGRID team has realized this problem yet.
	ID: 52390 \| Rating: 0 \| rate: / Reply Quote

Killersocke Send message Joined: 18 Oct 13 Posts: 53 Credit: 406,647,419 RAC: 0 Level Scientific publications	Message 52392 - Posted: 7 Aug 2019 \| 16:22:35 UTC - in response to Message 52174.
	same here all WU's with the same Error Code <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -44 (0xffffffd4)</message> ]]>
	ID: 52392 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52400 - Posted: 7 Aug 2019 \| 19:24:31 UTC
	it seems that the licence for Windows 10 (and maybe for Windows 7/8, too) has expired. Why do I think so? My Windows XP host downloaded a new tasks a few minutes ago, and it works well.
	ID: 52400 \| Rating: 0 \| rate: / Reply Quote

JStateson Send message Joined: 31 Oct 08 Posts: 186 Credit: 3,384,959,723 RAC: 1,282,931 Level Scientific publications	Message 52405 - Posted: 7 Aug 2019 \| 19:52:18 UTC - in response to Message 52390.
	any idea why all tasks downloaded within the last few hours fail immediately? No idea, but it's the same for others. yes, I had checked that before I wrote my posting above. I wonder whether the GPUGRID team has realized this problem yet. Things left to themselves tend to go from bad to worse.
	ID: 52405 \| Rating: 0 \| rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 755,434,080 RAC: 186,180 Level Scientific publications	Message 52407 - Posted: 7 Aug 2019 \| 22:13:07 UTC Last modified: 7 Aug 2019 \| 22:14:49 UTC
	Several more tasks with computation errors, but nothing definite about just what kind of error. At least they didn't use much CPU or GPU time. http://www.gpugrid.net/result.php?resultid=21242466 http://www.gpugrid.net/result.php?resultid=21242065 http://www.gpugrid.net/result.php?resultid=21241863 http://www.gpugrid.net/result.php?resultid=21233480 And so on. Could more diagnostics be added to v9.22 (cuda80) to show what caused this error, if you can't fix it instead? This appears for both short and long runs.
	ID: 52407 \| Rating: 0 \| rate: / Reply Quote

Moises Cardona Send message Joined: 7 Jun 10 Posts: 3 Credit: 208,405,467 RAC: 0 Level Scientific publications	Message 52410 - Posted: 7 Aug 2019 \| 23:48:39 UTC
	Same here...
	ID: 52410 \| Rating: 0 \| rate: / Reply Quote

Bedrich Hajek Send message Joined: 28 Mar 09 Posts: 485 Credit: 11,109,130,010 RAC: 15,393,587 Level Scientific publications	Message 52411 - Posted: 8 Aug 2019 \| 0:24:44 UTC
	I actually got one to finish successfully: http://www.gpugrid.net/workunit.php?wuid=16709219 I changed the date to before the license expired, right after the WU started crunching and before it crashes, and then change it back. It's actually tricky to do, because boinc acts strangely when the date is moved back. My two other attempts failed, so I had enough of this. BTW, the video card that I used was a gtx 980 ti, not the rtx 2080 ti.
	ID: 52411 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52415 - Posted: 8 Aug 2019 \| 5:39:06 UTC - in response to Message 52411.
	I actually got one to finish successfully: http://www.gpugrid.net/workunit.php?wuid=16709219 I changed the date to before the license expired, right after the WU started crunching and before it crashes, and then change it back. It's actually tricky to do, because boinc acts strangely when the date is moved back. so it's clear that the license has expired. Changing the date of the host can indeed be tricky, even more if also other BOINC projects are running which could be totally confused by doing this. Happened to me last time when the license expired, it all ended up in a total mess. Let's hope that it won't take too long until there is a new acemd with a valid license.
	ID: 52415 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,635,065,645 RAC: 10,764,900 Level Scientific publications	Message 52416 - Posted: 8 Aug 2019 \| 11:58:10 UTC - in response to Message 52415.
	I actually got one to finish successfully: http://www.gpugrid.net/workunit.php?wuid=16709219 I changed the date to before the license expired, right after the WU started crunching and before it crashes, and then change it back. It's actually tricky to do, because boinc acts strangely when the date is moved back. so it's clear that the license has expired. Changing the date of the host can indeed be tricky, even more if also other BOINC projects are running which could be totally confused by doing this. Happened to me last time when the license expired, it all ended up in a total mess. Let's hope that it won't take too long until there is a new acemd with a valid license. I thought one of the reasons for the new app was to not need the license that keeps expiring. Plus Turing support in a BOINC wrapper to separate the science part from the BOINC part.
	ID: 52416 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,672,242,755 RAC: 0 Level Scientific publications	Message 52418 - Posted: 8 Aug 2019 \| 12:25:59 UTC
	They are not using the new app yet, the reason the app expired is because it's still the old app.
	ID: 52418 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 337 Credit: 7,635,065,645 RAC: 10,764,900 Level Scientific publications	Message 52419 - Posted: 8 Aug 2019 \| 12:43:39 UTC - in response to Message 52418.
	They are not using the new app yet, the reason the app expired is because it's still the old app. And? I was replying to this part "new acemd with a valid license." The new app won't need a license from what I recall.
	ID: 52419 \| Rating: 0 \| rate: / Reply Quote

robertmiles Send message Joined: 16 Apr 09 Posts: 503 Credit: 755,434,080 RAC: 186,180 Level Scientific publications	Message 52432 - Posted: 9 Aug 2019 \| 12:23:31 UTC Last modified: 9 Aug 2019 \| 12:25:13 UTC
	I've seen some mentions of tasks still completing properly on some rather old versions of Windows, such as Windows XP. Could some people with at least one computer with such a version give more details? Perhaps the older versions don't include an expiration check, and therefore have to assume that it is not expired.
	ID: 52432 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52433 - Posted: 9 Aug 2019 \| 12:59:21 UTC
	the "older versions" also include an expiration check. However, for XP, a differnt acemd.exe is used (running with CUDA 65), the license for which seems to expire at a later date. No idea at what date exactly, it could be tomorrow, or in a week, or next month ...
	ID: 52433 \| Rating: 0 \| rate: / Reply Quote

GPUGRID Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level Scientific publications	Message 52436 - Posted: 9 Aug 2019 \| 18:40:01 UTC - in response to Message 52433.
	I´m using Win XP 64 and havind just errors aswell.
	ID: 52436 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 52437 - Posted: 9 Aug 2019 \| 19:32:25 UTC - in response to Message 52436.
	No, you are using Windows 7 x64.
	ID: 52437 \| Rating: 0 \| rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 137 Credit: 523,901,354 RAC: 0 Level Scientific publications	Message 52472 - Posted: 12 Aug 2019 \| 11:20:32 UTC
	Stderr output <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -44 (0xffffffd4)</message> ]]> name e18s22_e7s95p0f111-PABLO_V4_UCB_p27_sj403_no_salt_IDP-0-2-RND0646 application Long runs (8-12 hours on fastest card) created 8 Aug 2019 \| 21:02:41 UTC minimum quorum 1 initial replication 1 max # of error/total/success tasks 7, 10, 6 errors Too many errors (may have bug) 100% failure rate for the last three days.
	ID: 52472 \| Rating: 0 \| rate: / Reply Quote

marsinph Send message Joined: 11 Feb 18 Posts: 41 Credit: 579,891,424 RAC: 0 Level Scientific publications	Message 52474 - Posted: 12 Aug 2019 \| 11:34:31 UTC
	Hello everyone, Please read the post in "news" about "expired licence". It is not at our side, but at server side. Admin know it already two days.
	ID: 52474 \| Rating: 0 \| rate: / Reply Quote

GPUGRID Send message Joined: 12 Dec 11 Posts: 91 Credit: 2,730,095,033 RAC: 0 Level Scientific publications	Message 52500 - Posted: 13 Aug 2019 \| 16:09:35 UTC - in response to Message 52437. Last modified: 13 Aug 2019 \| 16:11:10 UTC
	No, you are using Windows 7 x64. You are right, my bad. But I was having errors with the new drivers. Then I rolled back to 378.94 driver and it´s running fine now. http://www.gpugrid.net/show_host_detail.php?hostid=413063 http://www.gpugrid.net/workunit.php?wuid=16717273
	ID: 52500 \| Rating: 0 \| rate: / Reply Quote

mikey Send message Joined: 2 Jan 09 Posts: 297 Credit: 6,193,211,431 RAC: 30,073,363 Level Scientific publications	Message 52503 - Posted: 13 Aug 2019 \| 19:49:08 UTC - in response to Message 52474. Last modified: 13 Aug 2019 \| 19:51:14 UTC
	Hello everyone, Please read the post in "news" about "expired licence". It is not at our side, but at server side. Admin know it already two days. That's fixed now. But the errors continue, 2 seconds into a Pablo unit and poof they error out. I turned off the long run units and it seems there aren't any short run units to do for the gpu's.
	ID: 52503 \| Rating: 0 \| rate: / Reply Quote

rod4x4 Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level Scientific publications	Message 52504 - Posted: 13 Aug 2019 \| 23:42:12 UTC - in response to Message 52503. Last modified: 14 Aug 2019 \| 0:27:35 UTC
	But the errors continue, 2 seconds into a Pablo unit and poof they error out mikey, the tasks with errors were run on a Turing based card (GTX1660ti). These GPUs are not currently supported by the ACEMD2 app. Admins are working on ACEMD3 app which will support Turing based GPUs. Hopefully this will be released soon. There is currently no short tasks in the queue.
	ID: 52504 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52534 - Posted: 27 Aug 2019 \| 5:04:15 UTC
	the faulty tasks seem to be back (erroring out after a few seconds): http://www.gpugrid.net/result.php?resultid=21331546 :-(
	ID: 52534 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52807 - Posted: 8 Oct 2019 \| 8:49:00 UTC
	I had a task fail after few seconds. Stderr says: ERROR: file pme.cpp line 91: PME NX too small here the URL: http://www.gpugrid.net/result.php?resultid=21429528 anyone any idea what was going wrong?
	ID: 52807 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1620 Credit: 8,869,754,793 RAC: 20,243,356 Level Scientific publications	Message 52808 - Posted: 8 Oct 2019 \| 10:30:21 UTC - in response to Message 52807.
	At least it went wrong for everyone, not just for you. A bad workunit. WU 16799014
	ID: 52808 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52819 - Posted: 9 Oct 2019 \| 11:08:06 UTC
	here another one, from this morning, with error message: ERROR: file mdioload.cpp line 81: Unable to read bincoordfile http://www.gpugrid.net/result.php?resultid=21431713
	ID: 52819 \| Rating: 0 \| rate: / Reply Quote

Killersocke Send message Joined: 18 Oct 13 Posts: 53 Credit: 406,647,419 RAC: 0 Level Scientific publications	Message 52820 - Posted: 9 Oct 2019 \| 11:47:45 UTC - in response to Message 52819. Last modified: 9 Oct 2019 \| 11:48:55 UTC
	Same here http://www.gpugrid.net/result.php?resultid=21432948 http://www.gpugrid.net/result.php?resultid=21432946 http://www.gpugrid.net/result.php?resultid=21431340 http://www.gpugrid.net/result.php?resultid=21431266 http://www.gpugrid.net/result.php?resultid=21430771 ...and more others, all CUDA 80
	ID: 52820 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 52822 - Posted: 9 Oct 2019 \| 12:05:46 UTC - in response to Message 52820. Last modified: 9 Oct 2019 \| 12:06:57 UTC
	Same here http://www.gpugrid.net/result.php?resultid=21432948 http://www.gpugrid.net/result.php?resultid=21432946 http://www.gpugrid.net/result.php?resultid=21431340 http://www.gpugrid.net/result.php?resultid=21431266 http://www.gpugrid.net/result.php?resultid=21430771 ...and more others, all CUDA 80 Until the new app (ACEMD3) is released, you should assign this host to a venue which receives work only from the ACEMD3 queue, as the other two queues have the old client, which is incompatible with the Turing cards.
	ID: 52822 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52829 - Posted: 9 Oct 2019 \| 18:45:15 UTC
	obviously, the faulty tasks are back, here the next one from a minute ago: http://www.gpugrid.net/result.php?resultid=21433016 This is even worse in times where new tasks are very rare, anyway :-(
	ID: 52829 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52892 - Posted: 24 Oct 2019 \| 18:57:45 UTC
	the next ones: http://www.gpugrid.net/result.php?resultid=21462742 http://www.gpugrid.net/result.php?resultid=21462460 http://www.gpugrid.net/result.php?resultid=21462682 http://www.gpugrid.net/result.php?resultid=21462715 they all didn't run even one second :-(
	ID: 52892 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52893 - Posted: 24 Oct 2019 \| 19:21:45 UTC
	and here some more: http://www.gpugrid.net/result.php?resultid=21463119 http://www.gpugrid.net/result.php?resultid=21463047 http://www.gpugrid.net/result.php?resultid=21463010 http://www.gpugrid.net/result.php?resultid=21462974 http://www.gpugrid.net/result.php?resultid=21463183 http://www.gpugrid.net/result.php?resultid=21463207
	ID: 52893 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 52894 - Posted: 24 Oct 2019 \| 22:24:57 UTC - in response to Message 52893.
	I think the license of the v9.22 app has expired this time.
	ID: 52894 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52895 - Posted: 25 Oct 2019 \| 2:58:20 UTC - in response to Message 52894.
	I think the license of the v9.22 app has expired this time. that's what I now am suspecting, too :-(
	ID: 52895 \| Rating: 0 \| rate: / Reply Quote

BelgianEnthousiast Send message Joined: 7 Apr 15 Posts: 33 Credit: 1,201,157,375 RAC: 0 Level Scientific publications	Message 52896 - Posted: 25 Oct 2019 \| 14:25:47 UTC
	Any prediction when continous supply of new WU's will become available again ? Nearly full month of very intermittent and small numbers of WU's. Einstein is a happy project in the meantime :-) Are all efforts being put into support of the new 20XX cards at the detriment of the current 10XX cards ? (limited staff available maybe/lack of funding ?)
	ID: 52896 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52923 - Posted: 31 Oct 2019 \| 15:48:16 UTC
	this is an increasingly annoying situation: while there are no tasks available most of the time, some of the few ones that are being downloaded fail after 5 seconds: http://www.gpugrid.net/result.php?resultid=21481323 ERROR: file mdioload.cpp line 81: Unable to read bincoordfile :-( :-( :-(
	ID: 52923 \| Rating: 0 \| rate: / Reply Quote

Clive Send message Joined: 2 Jul 19 Posts: 21 Credit: 90,744,164 RAC: 0 Level Scientific publications	Message 52928 - Posted: 4 Nov 2019 \| 4:41:35 UTC
	Hi: I see this is a well used section of the forum. I would like to contribute some useful results here with my Alienware laptop but I have a high failure rate which I would like to resolve here. The GPU in my laptop is a Geoforce 660M. The OS I am using is uptodate Windows 10. I would appreciate it if a tech person could narrow down the reason or reasons why I am experiencing such a high failure rate. Clive Hunt Canada
	ID: 52928 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52929 - Posted: 4 Nov 2019 \| 5:24:31 UTC - in response to Message 52928.
	I would like to contribute some useful results here with my Alienware laptop I am afraid that laptop GPUs are not made for this kind of load :-(
	ID: 52929 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52930 - Posted: 4 Nov 2019 \| 5:24:49 UTC - in response to Message 52929.
	I would like to contribute some useful results here with my Alienware laptop I am afraid that laptop GPUs are not made for this kind of heavy load :-(
	ID: 52930 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52931 - Posted: 4 Nov 2019 \| 5:25:32 UTC - in response to Message 52930.
	I would like to contribute some useful results here with my Alienware laptop I am afraid that laptop GPUs are not made for this kind of heavy load :-(
	ID: 52931 \| Rating: 0 \| rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 137 Credit: 523,901,354 RAC: 0 Level Scientific publications	Message 52932 - Posted: 4 Nov 2019 \| 7:06:53 UTC
	My Dell G7 15 laptop is happily crunching. That is another matter that I have to send a blast of air every day to get the dust-out.
	ID: 52932 \| Rating: 0 \| rate: / Reply Quote

rod4x4 Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level Scientific publications	Message 52933 - Posted: 4 Nov 2019 \| 7:59:05 UTC - in response to Message 52928. Last modified: 4 Nov 2019 \| 8:02:22 UTC
	Hi: I see this is a well used section of the forum. I would like to contribute some useful results here with my Alienware laptop but I have a high failure rate which I would like to resolve here. The GPU in my laptop is a Geoforce 660M. The OS I am using is uptodate Windows 10. I would appreciate it if a tech person could narrow down the reason or reasons why I am experiencing such a high failure rate. Clive Hunt Canada The issue is with the Scheduler on the GPUgrid servers. The Scheduler is sending CUDA65 tasks to your Laptop, all of which will fail due to an expired license. (Server end) Your laptop can process CUDA80 tasks, but you are at the mercy of the Scheduler. For most Hosts it sends the correct tasks, and for a handful of Hosts, it is sending the wrong tasks. This issue tends to affect Kepler GPUs (600 series GPU), even though they are still supported. Some relevant posts discussing this issue are here: http://www.gpugrid.net/forum_thread.php?id=5000&nowrap=true#52924 http://www.gpugrid.net/forum_thread.php?id=5000&nowrap=true#52920 The Project is in the middle of changing the Application to a newer version, hopefully when the new Application is released (ACEMD3), these issues will be smoothed out.
	ID: 52933 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1132 Credit: 10,259,032,676 RAC: 29,078,017 Level Scientific publications	Message 52934 - Posted: 4 Nov 2019 \| 8:34:40 UTC - in response to Message 52933.
	... when the new Application is released (ACEMD3)... I am curious WHEN this will be the case
	ID: 52934 \| Rating: 0 \| rate: / Reply Quote

rod4x4 Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level Scientific publications	Message 52935 - Posted: 4 Nov 2019 \| 13:23:47 UTC - in response to Message 52934.
	... when the new Application is released (ACEMD3)... I am curious WHEN this will be the case I think you speak for all of us on this point....
	ID: 52935 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1341 Credit: 7,678,933,251 RAC: 13,420,208 Level Scientific publications	Message 52936 - Posted: 4 Nov 2019 \| 15:58:39 UTC - in response to Message 52935.
	I thought at one point when I saw the acemd2 long task buffer dwindle down that was in preparation of the project deprecating the acemd2 applications and move on to the new acemd3 applications. But then they added a lot more acemd2 tasks to the buffer and now the acemd3 tasks have dwindled down to nothing. Just the opposite of what I expected. Who knows what is up with the project? Seems like a lot of wasted effort developing and testing the new acemd3 app that finally removes the yearly aggravations of expired licenses and no sign of significant project acemd3 task work has appeared showing the project is back in gear.
	ID: 52936 \| Rating: 0 \| rate: / Reply Quote

chenshaoju Send message Joined: 28 Dec 18 Posts: 3 Credit: 19,316,371 RAC: 0 Level Scientific publications	Message 52943 - Posted: 7 Nov 2019 \| 3:45:04 UTC
	Sorry for my English. I don't know why my tasks most failed about 1 month: http://www.gpugrid.net/results.php?hostid=495250 I looking in to tasks, some tasks failed on another users too. http://www.gpugrid.net/workunit.php?wuid=16845665 http://www.gpugrid.net/workunit.php?wuid=16845047 http://www.gpugrid.net/workunit.php?wuid=16842720 http://www.gpugrid.net/workunit.php?wuid=16837588 http://www.gpugrid.net/workunit.php?wuid=16835265 http://www.gpugrid.net/workunit.php?wuid=16833172 IMHO, The program maybe have some issue.
	ID: 52943 \| Rating: 0 \| rate: / Reply Quote

rod4x4 Send message Joined: 4 Aug 14 Posts: 266 Credit: 2,219,935,054 RAC: 0 Level Scientific publications	Message 52944 - Posted: 7 Nov 2019 \| 4:48:40 UTC - in response to Message 52943.
	Sorry for my English. I don't know why my tasks most failed about 1 month: http://www.gpugrid.net/results.php?hostid=495250 I looking in to tasks, some tasks failed on another users too. http://www.gpugrid.net/workunit.php?wuid=16845665 http://www.gpugrid.net/workunit.php?wuid=16845047 http://www.gpugrid.net/workunit.php?wuid=16842720 http://www.gpugrid.net/workunit.php?wuid=16837588 http://www.gpugrid.net/workunit.php?wuid=16835265 http://www.gpugrid.net/workunit.php?wuid=16833172 IMHO, The program maybe have some issue. This post here applies to your issues as well: http://www.gpugrid.net/forum_thread.php?id=4954&nowrap=true#52933
	ID: 52944 \| Rating: 0 \| rate: / Reply Quote

chenshaoju Send message Joined: 28 Dec 18 Posts: 3 Credit: 19,316,371 RAC: 0 Level Scientific publications	Message 52952 - Posted: 9 Nov 2019 \| 7:25:05 UTC - in response to Message 52944.
	Thank you.
	ID: 52952 \| Rating: 0 \| rate: / Reply Quote

chenshaoju Send message Joined: 28 Dec 18 Posts: 3 Credit: 19,316,371 RAC: 0 Level Scientific publications	Message 53135 - Posted: 27 Nov 2019 \| 2:53:49 UTC
	Sorry for my English. After update to "New version of ACEMD v2.10 (cuda101)", My first task still failed. http://www.gpugrid.net/result.php?resultid=21504825 Is my graphics too old for this? :\
	ID: 53135 \| Rating: 0 \| rate: / Reply Quote

Retvari Zoltan Send message Joined: 20 Jan 09 Posts: 2343 Credit: 16,217,465,968 RAC: 1,257,790 Level Scientific publications	Message 53145 - Posted: 27 Nov 2019 \| 7:42:34 UTC - in response to Message 53135.
	Is my graphics too old for this? :\ Yes.
	ID: 53145 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : failing tasks lately

	About	Science	Volunteers	Performance	Forum	Join us	Donate