Advanced search

Message boards : Graphics cards (GPUs) : Two Computer Errors

Author Message
Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6555 - Posted: 11 Feb 2009 | 8:30:21 UTC

Well this is depressing. Two tasks with compute errors. The good news is that the errors are different and they happened on different systems.

The first error is ERROR: tclutil.cu, line 23: get_Dvec() not a 3 vector which I have not seen at all on the boards.

The second is Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E which I believe I have seen before ...

Though I do find it interesting that they seem to be the same "class" task from the task name:

jh21064-SMD05-0-1-SH2_SMD_1_0
ik16247-SMD01-1-4-SH2_SMD_1_0

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 6556 - Posted: 11 Feb 2009 | 8:54:04 UTC - in response to Message 6555.

job names are SMD05, SMD01. So they are also different.

gdf

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6557 - Posted: 11 Feb 2009 | 9:27:03 UTC - in response to Message 6556.

job names are SMD05, SMD01. So they are also different.

gdf


Ok, well, they both died ... they have that in common ...

In that I have been running like forever with no errors ... well ... this is worrisome ...

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 6558 - Posted: 11 Feb 2009 | 11:25:09 UTC - in response to Message 6557.
Last modified: 11 Feb 2009 | 11:55:55 UTC

No reason to get depressed Paul.

We've probably spotted the source of the error and it has to do with the nature of the WU type and its input parameters.
These SMD* series (SMD01,SMD02,SMD05,SMD10) are some punctual tests needed to improve the performance of the main WUs (SH2_US_* series). An improvement in performance meaning a quicker convergence to the goal of these simulations. And this goal is to obtain equivalent results to the experimental values reported for the interaction affinity of our main system of study, the SH2-ligand complex.

Therefore, in order to analyze properly the SMD* WUs we changed the frequency at which a certain output file was written. And this may have caused the problem by increasing the size of this output file above the limits set on the templates.

We are working on a workaround for this problem. Sorry for the inconveniences.

ignasi

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 6565 - Posted: 11 Feb 2009 | 14:50:46 UTC - in response to Message 6558.
Last modified: 11 Feb 2009 | 16:31:48 UTC

It should be fixed.
Report any misfunction.

New WUs look like this one:
lF22075-SMD10_1-0-1-SH2

Expect shorter computation times for the SMD10_1 set.

thanks,
ignasi

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6568 - Posted: 11 Feb 2009 | 16:09:43 UTC - in response to Message 6558.

No reason to get depressed Paul.


Sadly, I don't need a reason to get depressed. The state of my life ...

BUT, the important thing is that "we", (me and the mouse in my pocket?), discovered the problem and that is the main point. Every project has those tasks that fail and it is just part of the business...

I just want to see that the problems get fixed early and often ... :)

Thanks for the feedback ...

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 6572 - Posted: 11 Feb 2009 | 17:59:46 UTC - in response to Message 6568.

Unfortunately there seems to be something else with these non-common WUs.
SMD0*_1 series.

They will be totally discontinued for now.

Sorry for the inconveniences,
ignasi

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6573 - Posted: 11 Feb 2009 | 18:22:22 UTC - in response to Message 6572.

Unfortunately there seems to be something else with these non-common WUs.
SMD0*_1 series.

They will be totally discontinued for now.

Sorry for the inconveniences,
ignasi


Thank you for cancelling them ...

For all those lurking ... do a manual update to flush the bad tasks ... the server will cancel them for you ... thank you for watching ... :)

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6575 - Posted: 11 Feb 2009 | 19:36:25 UTC

Thanks for the detailed feedback, it's appreciated :)

MrS
____________
Scanning for our furry friends since Jan 2002

Profile rebirther
Avatar
Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6674 - Posted: 16 Feb 2009 | 12:33:52 UTC

http://www.ps3grid.net/result.php?resultid=310418

Maximum disk usage exceeded, Iam confused?!

4GB HDD free ^^, WU finished completely.

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 6677 - Posted: 16 Feb 2009 | 16:19:41 UTC - in response to Message 6674.
Last modified: 16 Feb 2009 | 16:20:09 UTC

http://www.ps3grid.net/result.php?resultid=310418

Maximum disk usage exceeded, Iam confused?!

4GB HDD free ^^, WU finished completely.


What disk usage preferences do you have?

Profile rebirther
Avatar
Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6688 - Posted: 16 Feb 2009 | 21:35:18 UTC - in response to Message 6677.
Last modified: 16 Feb 2009 | 21:41:34 UTC

http://www.ps3grid.net/result.php?resultid=310418

Maximum disk usage exceeded, Iam confused?!

4GB HDD free ^^, WU finished completely.


What disk usage preferences do you have?


Sorry for late answer, I almost use 100% of disk space total:

<disk_interval>60.000000</disk_interval>
<disk_max_used_gb>100.000000</disk_max_used_gb>
<disk_max_used_pct>100.000000</disk_max_used_pct>
<disk_min_free_gb>0.000000</disk_min_free_gb>
<vm_max_used_pct>75.000000</vm_max_used_pct>
<ram_max_used_busy_pct>90.000000</ram_max_used_busy_pct>
<ram_max_used_idle_pct>100.000000</ram_max_used_idle_pct>

WU ran around 20000sec.

RyanChen
Send message
Joined: 2 Dec 08
Posts: 5
Credit: 3,027,593
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6709 - Posted: 17 Feb 2009 | 8:49:45 UTC

I got the same error here.
The disk usage preferences setting:

<disk_interval>60</disk_interval>
<disk_max_used_gb>100</disk_max_used_gb>
<disk_max_used_pct>50</disk_max_used_pct>
<disk_min_free_gb>0.1</disk_min_free_gb>
<vm_max_used_pct>80</vm_max_used_pct>
<ram_max_used_busy_pct>75</ram_max_used_busy_pct>
<ram_max_used_idle_pct>90</ram_max_used_idle_pct>

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 6713 - Posted: 17 Feb 2009 | 10:32:38 UTC

That is more than enough.

Anyway, we have just cancelled all these WUs for the moment. The main issue here is not to make you waste crunching time.

sorry for that,
ignasi

Profile rebirther
Avatar
Send message
Joined: 7 Jul 07
Posts: 53
Credit: 3,048,781
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 6714 - Posted: 17 Feb 2009 | 10:57:25 UTC - in response to Message 6713.

That is more than enough.

Anyway, we have just cancelled all these WUs for the moment. The main issue here is not to make you waste crunching time.

sorry for that,
ignasi


Pls run test WUs by setting "run test applications" in prefs, this avoid most of computation errors and aborting WUs while they are cancelled by the server. Not all can checking there hosts and wasting much more time.

Profile Bender10
Avatar
Send message
Joined: 3 Dec 07
Posts: 167
Credit: 8,368,897
RAC: 0
Level
Ser
Scientific publications
watwatwatwatwatwatwat
Message 6716 - Posted: 17 Feb 2009 | 13:48:19 UTC - in response to Message 6714.
Last modified: 17 Feb 2009 | 14:01:38 UTC

Pls run test WUs by setting "run test applications" in prefs, this avoid most of computation errors and aborting WUs while they are cancelled by the server. Not all can checking there hosts and wasting much more time.


I thought all (GPU) Wu's run here were TEST Wu's......?? The GPUgrid portion of this project is still Beta right..?

Or maybe I missed a memo...
____________


Consciousness: That annoying time between naps......

Experience is a wonderful thing: it enables you to recognize a mistake every time you repeat it.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6727 - Posted: 17 Feb 2009 | 19:44:48 UTC - in response to Message 6716.

I thought all (GPU) Wu's run here were TEST Wu's......?? The GPUgrid portion of this project is still Beta right..?

Or maybe I missed a memo...


I agree.. so maybe I also missed the memo ;)

(I think Rebirther suggests to use the *new* BOINC functionality to treat special test WUs, which are more of a test than the normal test WUs.)

MrS
____________
Scanning for our furry friends since Jan 2002

Post to thread

Message boards : Graphics cards (GPUs) : Two Computer Errors

//