Advanced search

Message boards : Graphics cards (GPUs) : What do "results" look like, why no independent validation?

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 183
Credit: 3,327,276,529
RAC: 6,665
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6300 - Posted: 1 Feb 2009 | 18:40:40 UTC

I have never seen any "results" on this project though it is not as if all other projects return visible results.

The only results I have ever seen are those showing info about the hardware: milliseconds per step, elapsed time, type of GPU, etc includeing computation errors.

I do not see any way to compare the results I return to GPUGRID with the results returned by other participants.

How does one know that the results returned are actually valid? Unlike other projects there appear to be no wingmen who process the same WU and thus perform a sanity check that the results match.

The reason I ask this is because it has become apparent on the SETI CUDA forum that once an Nvidia display error occurs that subsequent CUDA work units can be processed incorrectly without any computation error showing up. In addition, the same problem on a wingmans system can seemingly provide confirmation that an invalid result is actually good.

Question: If a SETI CUDA work unit leaves the Nvidia board in some corrupted state that renders subsequent SETI CUDA's invalid, how does one know that if a GPUGRID WU's gets processed, that its result is not also messed up?

peace

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6332 - Posted: 2 Feb 2009 | 22:47:35 UTC - in response to Message 6300.

That's surely a question worth asking.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1947
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 6333 - Posted: 3 Feb 2009 | 0:15:03 UTC - in response to Message 6332.

Hi,
in molecular simulations there is not an easy (automatic) way to check if results are correct. It is quite likely depending on the specific simulations.
So far, we have found only very very few results which had an output truncated for instance.

This high good result rate is due to the fact that bad results are discarded as generating errors in following WUs (just 5 errors will abort the WU). So, the fact that output of WU is used as input of another WU (most likely delivered to another host) prevents errors to propagate to the point where we analyze them.

Hope it helps.

gdf

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6335 - Posted: 3 Feb 2009 | 4:32:32 UTC - in response to Message 6333.

Hi,
in molecular simulations there is not an easy (automatic) way to check if results are correct. It is quite likely depending on the specific simulations.
So far, we have found only very very few results which had an output truncated for instance.

This high good result rate is due to the fact that bad results are discarded as generating errors in following WUs (just 5 errors will abort the WU). So, the fact that output of WU is used as input of another WU (most likely delivered to another host) prevents errors to propagate to the point where we analyze them.

Hope it helps.

gdf


This is one of those areas that sadly most projects neglect. That of explaining to the participants what the project is doing and how it is doing it. In the dark ages of history I used to try to capture nuggets like these and then to flesh them out so that the participant base could understand better what the project is doing.

My gut feeling is that one of the reasons we have so much difficult attracting new and less committed participants is that almost no information about what the projects are doing actually makes it out in any organized fashion.

That was why I had pushed so hard for a BOINC wide wiki so that we could develop the explanations of what the project was doing and how the experiments worked. Sadly the only project that took this task seriously (or was it just one guy on one project?) was CPDN where the mechanics of each experiment and model were explained in non-technical ways so that you could understand what was the point of the work we are doing ...

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6363 - Posted: 3 Feb 2009 | 20:56:43 UTC - in response to Message 6333.

I think the main question is: if a fault does not lead to an obvious computation error but rather to a slightly wrong number here and there.. can this be detected without a wingman? Depending on how chaotic the system is this could lead to big errors in final results.. or could easily be corrected by following WUs.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 183
Credit: 3,327,276,529
RAC: 6,665
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 6375 - Posted: 4 Feb 2009 | 1:52:53 UTC - in response to Message 6363.

I think the main question is: if a fault does not lead to an obvious computation error but rather to a slightly wrong number here and there.. can this be detected without a wingman? Depending on how chaotic the system is this could lead to big errors in final results.. or could easily be corrected by following WUs.

MrS


Thanks for the observation ETA. It is nice to know that not everyone smokes the same stuff here.

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 6378 - Posted: 4 Feb 2009 | 5:43:49 UTC - in response to Message 6363.

I think the main question is: if a fault does not lead to an obvious computation error but rather to a slightly wrong number here and there.. can this be detected without a wingman? Depending on how chaotic the system is this could lead to big errors in final results.. or could easily be corrected by following WUs.

MrS


Or the system could depend on the chaos in the result stream to "properly" allow the system to diverge along the potential paths and only the statistical aggregation of all of the models is of interest.

If I recall correctly this is something of the nature of what CPDN is doing ... though they are not using the output of one model to feed the next ... The only other project that I can think of that is using the output of models to feed forward is Milky Way ...

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1947
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 6383 - Posted: 4 Feb 2009 | 9:12:06 UTC - in response to Message 6378.

Either it recovers as the system will move towards the right sampling or it will fail. This is for not systematic errors. A card which produce continuous memory errors will simply fail the workunits.

gdf

Post to thread

Message boards : Graphics cards (GPUs) : What do "results" look like, why no independent validation?

//