Advanced search

Message boards : Number crunching : The Simulation has become unstable. Terminating to avoid lock-up.

Author Message
Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 84
Credit: 1,663,883,415
RAC: 7,668
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38799 - Posted: 3 Nov 2014 | 4:19:35 UTC

My information is below. Any suggestions would be welcomed.


Name I4R48-SDOERR_BARNA5-43-100-RND4453_1
Workunit 10224954
Created 1 Nov 2014 | 12:11:35 UTC
Sent 1 Nov 2014 | 13:54:44 UTC
Received 1 Nov 2014 | 19:04:15 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status -97 (0xffffffffffffff9f) Unknown error number
Computer ID 140554
Report deadline 6 Nov 2014 | 13:54:44 UTC
Run time 15,370.11
CPU time 991.51
Validate state Invalid
Credit 0.00
Application version Long runs (8-12 hours on fastest card) v8.47 (cuda65)
Stderr output
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -97 (0xffffff9f)
</message>
<stderr_txt>
# GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1342MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r344_32 : 34448
# GPU 0 : 53C
# GPU 0 : 55C
# GPU 0 : 56C
# GPU 0 : 57C
# GPU 0 : 58C
# GPU 0 : 59C
# GPU 0 : 60C
# GPU 0 : 61C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 770000)
# GPU [GeForce GTX 970] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1342MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# Driver version : r344_32 : 34448
# The simulation has become unstable. Terminating to avoid lock-up (1)

</stderr_txt>
]]>

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38809 - Posted: 4 Nov 2014 | 19:56:59 UTC

This message shows you that something went horribly wrong. If it happens just once the app can usually recover, but if it happens too often or too quickly, it terminates.

Normally your WU's seem fine, so it does not like a fundamental issue. What I do notice, though, is that your GPU clock of 1342 MHz is rather high. Do you have a heavily factory-overclocked card? If so: the manufacturer may have choosen the clock speed too agressively. Or if it's your overclock: you may have set the clock too high.

I would let the system run for some more time and watch for errors. If you get more, lower the GPU clock by 26 MHz (one step is 13 MHz, you can't change less than that) and see if it helps.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38833 - Posted: 5 Nov 2014 | 23:20:48 UTC - in response to Message 38809.

Normally your WU's seem fine, so it does not like a fundamental issue. What I do notice, though, is that your GPU clock of 1342 MHz is rather high. Do you have a heavily factory-overclocked card? If so: the manufacturer may have choosen the clock speed too agressively. Or if it's your overclock: you may have set the clock too high.

MrS

What I have noticed after reading many stderr output files is that the mentioned clock is the one the manufacturer has "put in the card". So to say the speed mentioned on the box.
For instance one on my 780Ti runs almost always higher then the mentioned value in the stderr file.
____________
Greetings from TJ

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 84
Credit: 1,663,883,415
RAC: 7,668
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38919 - Posted: 16 Nov 2014 | 6:03:41 UTC

For the record, I've recently activated a pair of GTX 970's. It's EVGA's Super-Superclocked edition. They are running without SLI mode (old motherboard). I haven't overclocked them beyond factory settings.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38923 - Posted: 16 Nov 2014 | 11:06:58 UTC

By now you seem to be running fine, mostly. I inspected about 5 WUs and only found 1 more error, from which the simulation recovered. What I also notice, though, is that your runtimes are really long for such cards and WUs. How is your CPU & GPU load?

MrS
____________
Scanning for our furry friends since Jan 2002

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 84
Credit: 1,663,883,415
RAC: 7,668
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38951 - Posted: 18 Nov 2014 | 5:02:16 UTC - in response to Message 38923.

Good evening Mr. S and thank you for all of your feedback.

My CPU Load is kept between 99 and 100%.
I'm running an AMD Phenom II X3 720 processor and two GTX 970's.

There's a pretty gaping disparity - I'm upgrading parts as I can afford them, and awesome stuff gets onto the market. I'm pretty late to the game in learning about buying CPUs, but from my understanding is that both AMD and Intel are at the end of their lifespans for their current sockets. Meanwhile the nicer solutions, like 6 and 8 core CPUs or a comparable duel CPU socket motherboard, will remain cost prohibitive for quite a while.

At least I can vouch for the CPU's stability when running CPU tasks like World Community Grid. This processor has done roughly 9,100 work units with that project, and my error rate there is well under 1%.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38955 - Posted: 18 Nov 2014 | 16:41:38 UTC - in response to Message 38951.
Last modified: 18 Nov 2014 | 16:42:44 UTC

Good evening Mr. S and thank you for all of your feedback.

My CPU Load is kept between 99 and 100%.
I'm running an AMD Phenom II X3 720 processor and two GTX 970's.

You must also be running CPU WUs. How many?

There's a pretty gaping disparity - I'm upgrading parts as I can afford them, and awesome stuff gets onto the market. I'm pretty late to the game in learning about buying CPUs, but from my understanding is that both AMD and Intel are at the end of their lifespans for their current sockets. Meanwhile the nicer solutions, like 6 and 8 core CPUs or a comparable duel CPU socket motherboard, will remain cost prohibitive for quite a while.

At least I can vouch for the CPU's stability when running CPU tasks like World Community Grid. This processor has done roughly 9,100 work units with that project, and my error rate there is well under 1%.

The fastest (for DC) CPU currently for your socket is the Phenom II X6, available in 95W-125W models. They're available on eBay but usually go for prices that are about what they originally cost (sometimes more). My X6s handle 2 NVidia cards running GPUGRID and 5 CPU WUs from various projects. If you want more wiggle room you can limit it to 4 WUs. There's also the 83xx series which runs 8 cores but which is also quite a bit slower (for DC) per core. A good cooler for any CPU is advised.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38957 - Posted: 18 Nov 2014 | 20:45:06 UTC
Last modified: 18 Nov 2014 | 20:49:54 UTC

I suspect your GPUs would appreciate more CPU support. Limit BOINC CPU usage by "use at most 66% of CPUs" in the advanced settings. That should make your BOINC run one CPU task less. Let's see if this stretches your GPUs wings! For WUs like this one your cards should take about 22 ks instead of 28 ks. This won't help stability, though ;)

Edit: I don't think a CPU upgrade would make all that much sense on that old socket. And you're right, buying 6 or 8 core CPUs is prohibitive and will remain so for some time. On the AMD side I don't see anything changing this anytime soon, whereas on the Intel side the current Haswells are nice but expensive, as always. There's probably going to be an upgrade for that socket to 14 nm Broadwell around summer next year, but around the same time the next CPU architecture step should also arrive. I'm sure I'd want the newer one, but we don't have all facts yet.
To summarize: if one needs or wants to buy now it's fine, otherwise better stuff will come for those patient enough.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38960 - Posted: 19 Nov 2014 | 1:59:35 UTC - in response to Message 38957.
Last modified: 19 Nov 2014 | 2:01:22 UTC

Edit: I don't think a CPU upgrade would make all that much sense on that old socket. And you're right, buying 6 or 8 core CPUs is prohibitive and will remain so for some time. On the AMD side I don't see anything changing this anytime soon, whereas on the Intel side the current Haswells are nice but expensive, as always. There's probably going to be an upgrade for that socket to 14 nm Broadwell around summer next year, but around the same time the next CPU architecture step should also arrive. I'm sure I'd want the newer one, but we don't have all facts yet.

Maybe, maybe not. If he wants to run CPU projects then a simple CPU upgrade for that AM3 socket would give him another 3 cores to play with. As I said the Phenom II X6 CPUs are going for near retail (often more) but there's a reason for that. They're good. Sure Haswell is the latest thing. I've got 15 of them running in my in home shop right now (Folding). For laptops they're GREAT due to low power usage in certain states. For performance desktops: yawn. No real improvement over Ivy Bridge. For DC a lot of it depends on your project. One of my favorite projects is Yoyo and there the Phenom X6 is still king. In other projects Ivy Bridge is the best. We really have to ditch our brand loyalty and paid-performance-site hype and look at the numbers rationally. I've been building/ocing/modding boxes since the 8088/8086 to V20/V30 days. Heard a lot of hype. Been bored by a lot of fans. If I was building a new desktop box right now I'd go with Haswell or Ivy Bridge. Would I ditch an AM3 platform for those: NOT. The desktop CPU landscape of late has been pretty boring. Nothing worth much on either the Intel or AMD fronts. Hopefully that will change, but I wouldn't hold my breath unless you like yourself in blue.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38966 - Posted: 19 Nov 2014 | 12:47:32 UTC - in response to Message 38960.

Edit: I don't think a CPU upgrade would make all that much sense on that old socket. And you're right, buying 6 or 8 core CPUs is prohibitive and will remain so for some time. On the AMD side I don't see anything changing this anytime soon, whereas on the Intel side the current Haswells are nice but expensive, as always. There's probably going to be an upgrade for that socket to 14 nm Broadwell around summer next year, but around the same time the next CPU architecture step should also arrive. I'm sure I'd want the newer one, but we don't have all facts yet.

Maybe, maybe not. If he wants to run CPU projects then a simple CPU upgrade for that AM3 socket would give him another 3 cores to play with. As I said the Phenom II X6 CPUs are going for near retail (often more) but there's a reason for that. They're good. Sure Haswell is the latest thing. I've got 15 of them running in my in home shop right now (Folding). For laptops they're GREAT due to low power usage in certain states. For performance desktops: yawn. No real improvement over Ivy Bridge. For DC a lot of it depends on your project. One of my favorite projects is Yoyo and there the Phenom X6 is still king. In other projects Ivy Bridge is the best. We really have to ditch our brand loyalty and paid-performance-site hype and look at the numbers rationally. I've been building/ocing/modding boxes since the 8088/8086 to V20/V30 days. Heard a lot of hype. Been bored by a lot of fans. If I was building a new desktop box right now I'd go with Haswell or Ivy Bridge. Would I ditch an AM3 platform for those: NOT. The desktop CPU landscape of late has been pretty boring. Nothing worth much on either the Intel or AMD fronts. Hopefully that will change, but I wouldn't hold my breath unless you like yourself in blue.


The standard Haswell/Ivy Quad core 85W desktop CPU readily available - the 25W/35W Quad Haswell/Ivy is hard to find and there a premium asking price. Lowering the CPU cooling requirement helps a multi GPU setup. At higher power rate: AMD runs integer near or at Haswell/Ivy speeds. AMD AVX FP is gimped. AVX runs hotter on Intel. (runs at or above TDP with FMA/AVX)
Intel Broadwell adds 512bit instruction sets. AVX IPC for Haswell is slightly better than Ivy due to more (2) execution ports.

You're right about CPU being stagnant. Haswell been here for year and half. But look at GP-GPU side: Nvidia been with Kelper for 3 years. AMD with GCN for over Two. Maxwell is not true compute arch yet. The Big Maxwell should change this. The one thing: hardware been steady while software opened up in last few years. There are more tools being released for developers.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 38967 - Posted: 19 Nov 2014 | 16:56:07 UTC - in response to Message 38966.
Last modified: 19 Nov 2014 | 16:58:07 UTC

Maybe, maybe not. If he wants to run CPU projects then a simple CPU upgrade for that AM3 socket would give him another 3 cores to play with. As I said the Phenom II X6 CPUs are going for near retail (often more) but there's a reason for that. They're good. Sure Haswell is the latest thing. I've got 15 of them running in my in home shop right now (Folding). For laptops they're GREAT due to low power usage in certain states. For performance desktops: yawn. No real improvement over Ivy Bridge. For DC a lot of it depends on your project. One of my favorite projects is Yoyo and there the Phenom X6 is still king. In other projects Ivy Bridge is the best. We really have to ditch our brand loyalty and paid-performance-site hype and look at the numbers rationally. I've been building/ocing/modding boxes since the 8088/8086 to V20/V30 days. Heard a lot of hype. Been bored by a lot of fans. If I was building a new desktop box right now I'd go with Haswell or Ivy Bridge. Would I ditch an AM3 platform for those: NOT. The desktop CPU landscape of late has been pretty boring. Nothing worth much on either the Intel or AMD fronts. Hopefully that will change, but I wouldn't hold my breath unless you like yourself in blue.

The standard Haswell/Ivy Quad core 85W desktop CPU readily available - the 25W/35W Quad Haswell/Ivy is hard to find and there a premium asking price. Lowering the CPU cooling requirement helps a multi GPU setup. At higher power rate: AMD runs integer near or at Haswell/Ivy speeds. AMD AVX FP is gimped. AVX runs hotter on Intel. (runs at or above TDP with FMA/AVX)
Intel Broadwell adds 512bit instruction sets. AVX IPC for Haswell is slightly better than Ivy due to more (2) execution ports.

You're right about CPU being stagnant. Haswell been here for year and half. But look at GP-GPU side: Nvidia been with Kelper for 3 years. AMD with GCN for over Two. Maxwell is not true compute arch yet. The Big Maxwell should change this. The one thing: hardware been steady while software opened up in last few years. There are more tools being released for developers.

The Phenom X6 1035T/1045T/1065T and some 1055T CPUs are 95W and to tell the truth they're not that much slower for DC then the T1090 125W. CPU-Z reports my newest Phenom X6 T1045 as 92.6W TDP. The all time highest producing CPU at Yoyo is a Phenom X6 T1035 with mild OC Except for a gaggle of Github 40 core Xeons it's also the highest in RAC and even surpasses a few of those (and all the 32 core Xeons). The Haswells are definitely much more efficient in an idle state but for us DC junkies our machines are never at idle :-)

eXaPower
Send message
Joined: 25 Sep 13
Posts: 293
Credit: 1,897,601,978
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 38970 - Posted: 19 Nov 2014 | 23:02:22 UTC - in response to Message 38967.

All Project stats has you at #1 in USA for Total Credit! ( No ASIC projects are included in rankings)
For YoYo: excluding the Ivy/Sandy Xeons- you're AMD Phenom X6 T-1045 is above: 6c12t (Gulftown) Westmere > 6c12t Ivy-E > Low power Haswell > Standard Haswell > Standard Ivy > Low Power Ivy .

Haswell Xeons operate at lower core clock for AVX code along with lower Quad channel DDR4 IMC speeds. Core clocks are higher for Non-AVX. Ivy Xeons three die set-ups Core/DDR3 Quad IMC speeds are higher than Haswell's three different die configurations for 4-18C/8T-36T products.

Running full bore Haswell/Ivy both are efficient. An Advanced Motherboard BIOS can offer strong energy management. Lower Voltage saturate OP for any circuitry powered for years at a time will help with longevity.

A caveat: Intel integrated GPU drivers can hinder discrete boards power management with Nvidia on Intel MB. Most every SLI laptop has intel iGPU (including Iris Pro) disabled in BIOS that are Intel MB or Gigabyte. (V3 Gigabyte/Sager/Lenovo) With non-OEM BIOS you can enable iGPU at expense of CPU performance and overclocking or eco-tuning Nvidia card(s). For multi GPU set-up on desktop or server: Intel iGPU is off-die with Xeons and X-99 platform with Haswell-E and be disabled on Z97 multi GPU MB. Is OpenCL included for X6?




Killersocke
Send message
Joined: 18 Oct 13
Posts: 53
Credit: 406,647,419
RAC: 0
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 39027 - Posted: 25 Nov 2014 | 21:37:54 UTC - in response to Message 38799.

same here

http://www.gpugrid.net/result.php?resultid=13447659

Stderr Ausgabe
<core_client_version>7.4.27</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -97 (0xffffff9f)
</message>
<stderr_txt>
# GPU [GeForce GTX 760] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 760
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:01:00.0
# Device clock : 1071MHz
# Memory clock : 3004MHz
# Memory width : 256bit
# Driver version : r343_00 : 34475
# GPU 0 : 51C
# GPU 0 : 54C
# GPU 0 : 56C
# GPU 0 : 58C
# GPU 0 : 61C
# GPU 0 : 62C
# GPU 0 : 64C
# GPU 0 : 65C
# GPU 0 : 67C
# GPU 0 : 68C
# GPU 0 : 69C
# GPU 0 : 70C
# GPU 0 : 71C
# GPU 0 : 72C
# GPU 0 : 73C
# GPU 0 : 74C
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 152045000)
# GPU [GeForce GTX 760] Platform [Windows] Rev [3212] VERSION [65]
# SWAN Device 0 :
# Name : GeForce GTX 760
# ECC : Disabled
# Global mem : 2048MB
# Capability : 3.0
# PCI ID : 0000:01:00.0
# Device clock : 1071MHz
# Memory clock : 3004MHz
# Memory width : 256bit
# Driver version : r343_00 : 34475
# The simulation has become unstable. Terminating to avoid lock-up (1)

</stderr_txt>
]]>

Post to thread

Message boards : Number crunching : The Simulation has become unstable. Terminating to avoid lock-up.

//