Advanced search

Message boards : Graphics cards (GPUs) : Recent hard drive failure

Author Message
Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13175 - Posted: 14 Oct 2009 | 23:40:55 UTC

I recently had a WD Velociraptor refuse to boot.

Previous to the event, I noticed the hard drive was making noise. Well, I thought hardware failure, what are you going to do. So, I RMAed the faulty hard drive thinking there was some manufacturing fault.

So, the new drive arrives. I reload Windows, drivers, and etc. Then I start up BOINC for the first time on the new drive. As soon as the GPUGRID WUs load up, the hard drive (the new one) starts making the same noises as the previous failed drive.

Does GPUGRID obey the disk drive preferences of normal BOINC applications? I am boycotting GPUGRID until something changes.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 13195 - Posted: 16 Oct 2009 | 22:49:15 UTC - in response to Message 13175.

That's decided by the client itself. Sorry you have to blame something else.

gdf

zpm
Avatar
Send message
Joined: 2 Mar 09
Posts: 159
Credit: 13,639,818
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 13196 - Posted: 16 Oct 2009 | 23:43:07 UTC - in response to Message 13195.
Last modified: 16 Oct 2009 | 23:43:41 UTC

raptors are sometimes have had a bad rap for coming out bad...

best thing i suggest, a SolidStateDrive.

in the settings of boinc, change the write to disk time... that may help.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13197 - Posted: 17 Oct 2009 | 1:26:22 UTC - in response to Message 13196.

Sorry if I gave you a bad rap, but no other BOINC project has the same effect.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13198 - Posted: 17 Oct 2009 | 1:32:02 UTC - in response to Message 13196.

raptors are sometimes have had a bad rap for coming out bad...

best thing i suggest, a SolidStateDrive.

in the settings of boinc, change the write to disk time... that may help.


I doubled the write to disk time from 30 sec. to 60 sec. seemingly to no effect.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13199 - Posted: 17 Oct 2009 | 1:35:46 UTC - in response to Message 13198.

The rig in question has 12 GB of good RAM.

I'm wondering why the disk is active at all.

Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13201 - Posted: 17 Oct 2009 | 3:18:26 UTC - in response to Message 13199.
Last modified: 17 Oct 2009 | 3:20:23 UTC

I found that the indexing service in Vista access the drive quite frequently, so I shut it off. It helped the problem a bit but I have the same drive access problems you describe. One drive is a 150Gb Raptor and the other is a 500Gb Seagate.

I have also tried setting the disk access to 300sec with no success.

Looks like another feature that does not work like the remote access using the BOINC client, but that is for another thread. I understand that BOINC uses checkpoints and I expect the drive to access but not as often as it does.


Pat

MarkJ
Volunteer moderator
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 738
Credit: 200,909,904
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13202 - Posted: 17 Oct 2009 | 3:53:44 UTC

It depends on the science app if it will check point and how often. BOINC has the setting that you have adjusted, but its still determined by the science app how often to do a write. Usually there is little overhead on a disk write as the files are fairly small, but updated frequently so they stay in the cache.

On all my crunching rigs I have the print spooler and the indexing services disabled, so there is less competition for disk access. The drive LED is blinking every 2 seconds, but I have i7's so typically 8-10 tasks running at a time all doing their checkpoints and result files.
____________
BOINC blog

_Ryle_
Send message
Joined: 7 Jun 09
Posts: 24
Credit: 1,138,093,416
RAC: 14,161
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13208 - Posted: 17 Oct 2009 | 6:58:26 UTC

I second what zpm said about the velociraptors being faulty.

Theres a thread over on storagereviews forum about it, maybe you should read it.

http://forums.storagereview.net/index.php?showtopic=27303&st=50

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13224 - Posted: 18 Oct 2009 | 20:55:28 UTC - in response to Message 13208.

I second what zpm said about the velociraptors being faulty.

Theres a thread over on storagereviews forum about it, maybe you should read it.

http://forums.storagereview.net/index.php?showtopic=27303&st=50



Excellent link, very informative. Thank you.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14104 - Posted: 3 Jan 2010 | 16:26:50 UTC - in response to Message 13224.

Well, I RMAed the velociraptor, flashed the drive with new firmware, but still if I turn off GPUGrid, the hard drive calms down. With GPUGrid running, the drive is working constantly. If I run a game, even with low graphics, the drive begins to rattle or clatter. I wonder if anyone else is experiencing this same issue. Perhaps there is a workaround.

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 115,639
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14119 - Posted: 4 Jan 2010 | 15:14:58 UTC - in response to Message 14104.

While the velociraptor firmwear problem is a serious issue, it doesn't seem that it would explain the disk activity you're seeing while running GPUGRID.

I'm also running GPUGRID under Vista SP2 and BOINC client 6.10.18, and looking at the process in the task manager shows exactly the behavior one would expect: a low memory footprint (around 60 megs), minimal I/O and minimal page faults. No untoward disk activity.

So what's different?

There's two significant differences between your machine and mine. The first is that you have two GPUs while I only have one, and the second is that you are running a later driver version (195 vs. my 191). If I had to guess, it's either a driver problem or an issue with dual GPUs that's causing the disk access.

You said that the disk access only occurs with GPUGRID -- are you running any other CUDA projects? I doubt it's an issue with GPUGRID (*nobody* else has ever reported anything like this to my knowledge), but it might have something to do with any project that uses the GPU.

Oh, one thing you could check: are you running BOINC as a service? My understanding is that when using CUDA, BOINC shouldn't be a service. I don't know what exactly breaks, but maybe this is what happens? Just a shot in the dark here; I could be barking up the wrong tree altogether.

Mike

P.S. You mentioned changing your checkpoint interval from 30 to 60 seconds. On this machine, I increased the interval to 300 seconds (5 minutes). I don't suspend tasks while the user is active, and the task-switch interval is 24 hours (allowing most tasks to complete in one shot), so I don't have a lot of tasks being preempted. Checkpointing slows the tasks down and keeps the disk busy (especially with multi-core & multi-GPU systems), and isn't really necessary if you're not preempting the tasks frequently.
____________
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14121 - Posted: 4 Jan 2010 | 23:26:34 UTC - in response to Message 14104.
Last modified: 4 Jan 2010 | 23:28:04 UTC

BOINC is running as application. I turned off Windows Search, updated my NVIDIA drivers. Still, the hard drive is working overtime and occasionally hiccups (the drive stops and becomes quiet), interrupting graphics applications such as games. Oddly, if there is a hiccup while I'm at the desktop, the clock's second hand keeps running. I guess the clock runs in memory. Still, if I suspend GPUGRID in BOINC manager, the drive quiets down. I changed the checkpoint interval to 300 seconds. No effect. Check local prefs, also no effect. No other CUDA applications are currently active. (More than not active, none have ever been loaded.)

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 115,639
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14123 - Posted: 5 Jan 2010 | 0:15:48 UTC - in response to Message 14121.

As a diagnostic tool, you might want to try connecting to one of the other CUDA projects such as SETI or Milkyway just to see if you have the same problem with those. That will at least let you know if it's a generic problem with the CUDA installation on your system or something specific to GPUGRID.

Those two projects have both CPU and GPU applications, so before you download any tasks go to the preferences part of "your account" on their website to deselect the CPU tasks. For testing purposes, Milkyway is probably best -- it's WUs are VERY short. Just set your cache to 0 so you don't download a gazillion WUs, and you should get only a single task for testing. Those only take about 15 minutes to run.

Also, SETI is down until at least Tuesday morning PST, so Milkyway is your best bet for this test.
____________
Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 19,482
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14133 - Posted: 6 Jan 2010 | 2:41:53 UTC - in response to Message 14121.

BOINC is running as application. I turned off Windows Search, updated my NVIDIA drivers. Still, the hard drive is working overtime and occasionally hiccups (the drive stops and becomes quiet), interrupting graphics applications such as games. Oddly, if there is a hiccup while I'm at the desktop, the clock's second hand keeps running. I guess the clock runs in memory. Still, if I suspend GPUGRID in BOINC manager, the drive quiets down. I changed the checkpoint interval to 300 seconds. No effect. Check local prefs, also no effect. No other CUDA applications are currently active. (More than not active, none have ever been loaded.)


Do you have an antivirus program allowed to run when it chooses? Mine (Norton Internet Security 2010) keeps the disk active about half the time and does NOT seem to offer a way to pause the disk accesses when desired.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14134 - Posted: 6 Jan 2010 | 2:53:04 UTC - in response to Message 14123.

Milkyway is keeping the hard drive fairly inactive. Unlike GPUGRID, I hear no hard drive noise.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14135 - Posted: 6 Jan 2010 | 2:54:46 UTC - in response to Message 14133.

While I won't reveal the variety of Anti-virus/firewall which I use publically, rest assured there is no issue there.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14136 - Posted: 6 Jan 2010 | 2:57:28 UTC - in response to Message 14135.

Mr. Goetz,

Do you suspect the CUDA driver may be an issue? If so, can you help me to roll back? Where is the download for previous versions of nVidia drivers?

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 115,639
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14137 - Posted: 6 Jan 2010 | 3:41:04 UTC - in response to Message 14134.

Milkyway is keeping the hard drive fairly inactive. Unlike GPUGRID, I hear no hard drive noise.


That's decidedly odd. At this point, I admit to being totally stumped. I'm out of ideas. I can't think of anything that would cause this to happen with one CUDA application but not another.

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 115,639
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14138 - Posted: 6 Jan 2010 | 3:49:16 UTC - in response to Message 14136.

Mr. Goetz,

Do you suspect the CUDA driver may be an issue? If so, can you help me to roll back? Where is the download for previous versions of nVidia drivers?


Go here:

http://www.nvidia.com/Download/Find.aspx?lang=en-us

Enter the correct info (GTX260, Vista 64, etc.), and you'll get a page that lists all the archived versions of the driver.

I did notice another difference -- I'm running 32 bit and you're running 64 bit.

Maybe one of the GPUGRID project guys has an idea what's going on. There's really no reason a project should be doing disk access like that.

Mike

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14187 - Posted: 12 Jan 2010 | 23:35:11 UTC - in response to Message 14138.

It is most likely Windows Indexing Service (search) Thrashing the drive.
Turn it off!

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14193 - Posted: 13 Jan 2010 | 20:40:12 UTC - in response to Message 14187.
Last modified: 13 Jan 2010 | 20:40:47 UTC

most likely Windows Indexing Service (search) Thrashing the drive.


Already done away with.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14210 - Posted: 16 Jan 2010 | 16:46:04 UTC - in response to Message 14193.
Last modified: 16 Jan 2010 | 16:47:44 UTC

I've now begun to run Milkyway half of the time just to give my hard drive a rest. I have flashed the latest eVGA BIOS on the mobo. Still no answer to why my hard drive thrashes under GPUGRID.

Anyone?

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14211 - Posted: 16 Jan 2010 | 16:50:18 UTC - in response to Message 14210.

I just noticed something. The drive noise stops when I scroll down in the forum. Ever so briefly, but this could be a clue.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14212 - Posted: 16 Jan 2010 | 18:36:10 UTC - in response to Message 14211.

Perhaps Boinc is Writing to the Disk too often?
Look at the settings. They should be Tasks Checkpoint to Disk at Most Every 60 seconds. I usually up this to 2 or 3 minutes.

Open Boinc in Advanced View and selest, Advanced Preferences, Disk and Memory Usage. Also check around the other settings there.

It still sounds like the disk is caching something. If your sure it is not Windows Indexing Services, it might be a disk defragmenter running in the background, a disk sweeper, or an antivirus product.

Try using Task Manager to see what processes are running and look them up if you dont know what they are.

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14219 - Posted: 17 Jan 2010 | 4:09:38 UTC - in response to Message 14212.

Perhaps Boinc is Writing to the Disk too often?
Look at the settings. They should be Tasks Checkpoint to Disk at Most Every 60 seconds. I usually up this to 2 or 3 minutes.

Open Boinc in Advanced View and selest, Advanced Preferences, Disk and Memory Usage. Also check around the other settings there.

It still sounds like the disk is caching something. If your sure it is not Windows Indexing Services, it might be a disk defragmenter running in the background, a disk sweeper, or an antivirus product.

Try using Task Manager to see what processes are running and look them up if you dont know what they are.


Tasks Checkpoint... is set to 300 seconds. I increased the maximum disk space to 50GB which should include space for the kitchen sink. Click on read prefs. - no change.

Windows Seach service is turned off.

The only other software using CPU time is taskmgr, at a whopping 1% occasionally. No antivirus issues (I run the same software on 5 rigs.), no disk defragmenter, no disk sweeper, no bloody hand under the table, etc.

The rig in question is pretty much loaded with Vista 64 Home Premium, BOINC and a few games gathering dust.

I noticed something else. Now that I am running two cuda projects (Milkyway and GPUGRID), sometimes a Milkyway and a GPUGID project will be running simultaneously. When one of each is running, the sound level from hard drive usage is approximately half the noise of two GPUGRID projects running at the same time. As I stated before, two Milkyway projects running together makes no appreciable hard drive noise.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14224 - Posted: 17 Jan 2010 | 17:27:01 UTC - in response to Message 14219.
Last modified: 17 Jan 2010 | 17:37:28 UTC

OK, I think this is a Virtual Memory / Paging issue, even though you have 12GB RAM and are using 64bit Vista SP2!
Try setting the Virtual Memory to the same as the Physical Memory, by diabling paging, and see if it makes any difference.

To change VM:
Start, Right click on Computer, select Properties,
Advanced System Settings,
In the System Properties Window select the Advanced Tab,
Under Performance select Settings...
Advanced
Virtual Memory, Change
and just make the VM identical to the Total RAM.

Unselect, Automatically Manage Paging File Size for all Drives
then select no paging file.

Note the warning message!

Stephenish
Send message
Joined: 24 Mar 09
Posts: 37
Credit: 35,698,253
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14229 - Posted: 17 Jan 2010 | 21:38:04 UTC - in response to Message 14224.
Last modified: 17 Jan 2010 | 21:46:28 UTC

OK, I did that.

No more paging file. Also, no change in the hard drive noise either.

Rebooted with new settings, still no change.

Restoring page file.

Good guess, though.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,791,311,851
RAC: 9,244,272
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14231 - Posted: 17 Jan 2010 | 22:11:27 UTC

Have you tried looking at Task Manager, and viewing the optional columns, like I/O Read/Write, and I/O Read/Write Bytes.

On my XP system, the real-time virus scanner has read over a terabyte since I last rebooted - that's several multiples of the whole hard disk size. BOINC.exe has read 725MB, and written just under 400MB. Einstein has read 1.3GB, and SETI under 2MB.

That's the sort of comparison you could be looking for. More reads/writes = more noise.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14241 - Posted: 18 Jan 2010 | 17:43:40 UTC - in response to Message 14231.
Last modified: 18 Jan 2010 | 17:46:12 UTC

Have a good look and see what is going on.

Task Manager, Processes, View, Select Columns,
Select Page Faults, and anything else you want to look into.

You can Right Click on a Task and select, End Process, to see if it makes any difference - you should be able to identify your loud task that way, assuming the issue is not a GPUGrid write to disk issue, or directly related to the hard disk.

P/S If this does not work, make sure you dont have a screen saver or desktop background that keeps changing/playing and you might want to have a look in the Bios for HDD settings and check for any pending Boinc updates that resolve writing to disk problems or noise issues.

O/T For anyone running GPUGrid who has automatic updates on, turn it Off - It kills tasks following the forced restarts, and resets some system changes to their defaults.

Post to thread

Message boards : Graphics cards (GPUs) : Recent hard drive failure

//