Advanced search

Message boards : Multicore CPUs : Simultaneously starting MCs

Author Message
DRSMT
Send message
Joined: 23 Feb 17
Posts: 21
Credit: 4,872,335,990
RAC: 68,586,062
Level
Arg
Scientific publications
watwatwatwat
Message 49378 - Posted: 2 May 2018 | 11:38:08 UTC

Problem with simultaneously starting Multicore CPU tasks has not been fixed yet!

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49382 - Posted: 2 May 2018 | 14:48:28 UTC - in response to Message 49378.

Problem with simultaneously starting Multicore CPU tasks has not been fixed yet!

I have heard about this error but maybe I don't understand the symptoms. I have had three 4 thread WUs start at once and sometimes they work, sometimes they don't. Could this be the cause of my errors? Linked below are the tasks from the system:

http://www.gpugrid.net/results.php?hostid=424454

DRSMT
Send message
Joined: 23 Feb 17
Posts: 21
Credit: 4,872,335,990
RAC: 68,586,062
Level
Arg
Scientific publications
watwatwatwat
Message 49383 - Posted: 2 May 2018 | 16:15:15 UTC - in response to Message 49382.

If two WUs start at the same time, in most cases one of the WU failes right at start and throws an calculation error, which is very inconveniant, if you have to start often times more than one WU at the same time (in my case up to 20 WUs on my 80 threads machine). Would like to hear some statement of the developer(s) or simply a bugfix within the WUs, because at the moment, I do not see any suitable work around. Don't think it's just your fault or mine, because several users have reported this issue for a while now.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49614 - Posted: 6 Jun 2018 | 12:29:25 UTC - in response to Message 49383.
Last modified: 6 Jun 2018 | 12:34:48 UTC

If it is still failing, please provide a task number for me to check.

@Thomas: pls try to reset the project

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,627,468,018
RAC: 18,522,614
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49615 - Posted: 6 Jun 2018 | 12:32:41 UTC
Last modified: 6 Jun 2018 | 12:36:41 UTC

Toni,

Here are two task numbers that started at the same time. Both failed.

17737243
17737150

Let me know if you need more info.

EDIT:

Here is a task that started by itself and failed.

17714391

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49616 - Posted: 6 Jun 2018 | 12:36:17 UTC - in response to Message 49615.
Last modified: 6 Jun 2018 | 12:37:19 UTC

@captainjack: please try two things for me

1. open a terminal, and run the

flock
command. See if it gives an error (command not found) or a longer message.
2. reset the project

Thanks

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,627,468,018
RAC: 18,522,614
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49617 - Posted: 6 Jun 2018 | 12:50:46 UTC

Toni,

The "flock" command was found and asked for more arguments.

After a project reset, the following two tasks were started and both failed.

17737309
17737152

The following task was started by itself and failed.

17737252

What next?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49618 - Posted: 6 Jun 2018 | 12:55:33 UTC - in response to Message 49617.

:(

Which means that for your host the new app is a regression. I need an enlightenment.

T

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 49620 - Posted: 6 Jun 2018 | 13:18:42 UTC

Statistically though the new app seems to have worked on other hosts. We went from 900 WU to 1500 WU in progress.

DRSMT
Send message
Joined: 23 Feb 17
Posts: 21
Credit: 4,872,335,990
RAC: 68,586,062
Level
Arg
Scientific publications
watwatwatwat
Message 49621 - Posted: 6 Jun 2018 | 13:20:38 UTC

with all my computers just the same... Toni, does it help if I give you by private message the remote control access credentials of one of my Linux machines, so you can test on your own?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49622 - Posted: 6 Jun 2018 | 13:40:14 UTC - in response to Message 49621.

Thomas, that would help, but perhaps let me ask another thing first:

What OS do you have, and which procedure did you use to install boinc?

(Also, I assume QM tasks were working ok before, right?)

DRSMT
Send message
Joined: 23 Feb 17
Posts: 21
Credit: 4,872,335,990
RAC: 68,586,062
Level
Arg
Scientific publications
watwatwatwat
Message 49623 - Posted: 6 Jun 2018 | 13:52:34 UTC - in response to Message 49622.

Sometimes they work and sometimes not. If two or more WUs start at the same time, they all throw calculation errors. This was the state until now. But with the very new version you just released today, it seems like they are not working anymore at all. My operating system is Linux Mint 18.3 64 Bit with actual linux kernel. I installed boinc with "sudo apt-get install boinc". gcc-5 and g++-5 are installed; also python-support.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49624 - Posted: 6 Jun 2018 | 14:16:27 UTC - in response to Message 49623.
Last modified: 6 Jun 2018 | 14:18:08 UTC

Ok thanks. This will need a while to debug. As you can see there is no error info to help (as usual in boinc...).

[VENETO] boboviz
Send message
Joined: 10 Sep 10
Posts: 158
Credit: 388,132
RAC: 0
Level

Scientific publications
wat
Message 49625 - Posted: 6 Jun 2018 | 14:33:24 UTC
Last modified: 6 Jun 2018 | 14:43:00 UTC

All errors on my VM (with 4 virtual core).

<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
16:29:25 (2940): wrapper (7.7.26016): starting
16:29:25 (2940): wrapper (7.7.26016): starting
16:29:25 (2940): wrapper: running /bin/bash (-c "flock /var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock ./miniconda-installer -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda")
Please run using "bash" or "sh", but not "." or "source"\n16:29:26 (2940): /bin/bash exited; CPU time 0.000000
16:29:26 (2940): app exit status: 0x1


I have not app_config.

Addendum: i also tried to stop all wus and started manually one-by-one. Same error.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49626 - Posted: 6 Jun 2018 | 14:57:07 UTC - in response to Message 49624.

Version 320 out

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49627 - Posted: 6 Jun 2018 | 15:10:13 UTC - in response to Message 49626.

Also, you need the libc6-dev package

sudo apt install libc6-dev

bormolino
Send message
Joined: 16 May 13
Posts: 41
Credit: 88,126,864
RAC: 683
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 49629 - Posted: 6 Jun 2018 | 15:29:59 UTC

All tasks exit with error:

14:19:18 (8806): wrapper: running /bin/bash (-c "flock /var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock ./miniconda-installer -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda")
Please run using "bash" or "sh", but not "." or "source"\n14:19:19 (8806): /bin/bash exited; CPU time 0.001596
14:19:19 (8806): app exit status: 0x1

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49630 - Posted: 6 Jun 2018 | 15:32:00 UTC - in response to Message 49629.

These were 3.19. See with 3.20

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,627,468,018
RAC: 18,522,614
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49631 - Posted: 6 Jun 2018 | 15:51:43 UTC

When I tried to start two at the same time, one of them runs okay and the other one aborts.

Task id for the aborted work unit is 17714759
Work unit number for the aborted work unit is 13679491

Running version 3.20

Let me know if you need more info.

DRSMT
Send message
Joined: 23 Feb 17
Posts: 21
Credit: 4,872,335,990
RAC: 68,586,062
Level
Arg
Scientific publications
watwatwatwat
Message 49632 - Posted: 6 Jun 2018 | 17:57:54 UTC

I have had libc6-dev already installed... Earlier I got a bunch of new WUs, but all failed after several minutes of calculation (~ 5 - 15 minutes).

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 49634 - Posted: 6 Jun 2018 | 21:16:59 UTC

Started two QC Wu's simultaneously using app version 3.20 and one failed with this stderr report:

<core_client_version>7.9.3</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
13:45:21 (5569): wrapper (7.7.26016): starting
13:45:21 (5569): wrapper (7.7.26016): starting
13:45:21 (5569): wrapper: running /bin/bash (-c "flock /var/lib/boinc/projects/www.gpugrid.net/miniconda.lock ./miniconda-installer.sh -b -u -p /var/lib/boinc/projects/www.gpugrid.net/miniconda")
flock: failed to execute ./miniconda-installer.sh: Text file busy
13:45:30 (5569): /bin/bash exited; CPU time 0.000265
13:45:30 (5569): app exit status: 0x45
13:45:30 (5569): called boinc_finish(195)

</stderr_txt>
]]>

Machine is an AMD FX8350 at 4.1 GHz with Fedora 28 kernel 4.16.13 with both gcc and glibc-devel installed.

I had been running both my 8 core machines in 4 core mode and 2 concurrent QC WU's for several months with only one simultaneous start error the whole time but with no other project to compete for the CPU. I found that when running both WCG and QC CPU WU's, simultaneous starts occurred more frequently and unfortunately when it did happen, then failures would leap frog through the queue quickly and blacklist the computer from more work for a day.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49636 - Posted: 6 Jun 2018 | 21:35:46 UTC - in response to Message 49634.

Yes, now simultaneous starts crash with "text file busy". Will look for yet another workaround.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49640 - Posted: 7 Jun 2018 | 8:32:32 UTC - in response to Message 49636.

Attempting fix at version 321

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49641 - Posted: 7 Jun 2018 | 10:48:54 UTC

Just got a couple of errors from last night's WUs version 3.20.

Here are the links:
https://www.gpugrid.net/result.php?resultid=17740487
https://www.gpugrid.net/result.php?resultid=17740592

All the rest ran fine.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49642 - Posted: 7 Jun 2018 | 12:03:25 UTC - in response to Message 49641.

^^ These seem connection errors (network down or so)

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49643 - Posted: 7 Jun 2018 | 12:37:07 UTC - in response to Message 49642.

^^ These seem connection errors (network down or so)

Network can affect computation after the WU is already downloaded?

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 834,330,407
RAC: 3,860,166
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 49644 - Posted: 7 Jun 2018 | 13:25:26 UTC

I have exactly the same problems so I dont think this is connection related.

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 3,627,468,018
RAC: 18,522,614
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49645 - Posted: 7 Jun 2018 | 13:26:02 UTC

Version 321 looks promising. Just finished two that started at the same time and they finished normally. Just started three at the same time and they are all processing as they should.

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 49646 - Posted: 7 Jun 2018 | 14:31:14 UTC

Success with two QC WU's simultaneous start using app 3.21. Both WU's happily crunching @ 1.098% completed so far.

Event Log Excerpt:

Thu 07 Jun 2018 07:22:04 AM MST | GPUGRID | task m0000000638_2278bdac_n00020-SDOERR_QMML50-0-1-RND5957_0 resumed by user
Thu 07 Jun 2018 07:22:04 AM MST | GPUGRID | task m0000000643_5ed133e9_n00020-SDOERR_QMML50-0-1-RND3172_0 resumed by user
Thu 07 Jun 2018 07:22:05 AM MST | GPUGRID | Starting task m0000000638_2278bdac_n00020-SDOERR_QMML50-0-1-RND5957_0
Thu 07 Jun 2018 07:22:05 AM MST | GPUGRID | Starting task m0000000643_5ed133e9_n00020-SDOERR_QMML50-0-1-RND3172_0

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49647 - Posted: 7 Jun 2018 | 14:55:56 UTC - in response to Message 49643.

^^ These seem connection errors (network down or so)

Network can affect computation after the WU is already downloaded?


Yes. WUs check the latest version of conda packages/libraries right after start (from conda cloud).

mmonnin
Send message
Joined: 2 Jul 16
Posts: 337
Credit: 7,649,422,647
RAC: 10,415,452
Level
Tyr
Scientific publications
watwatwatwatwat
Message 49652 - Posted: 7 Jun 2018 | 20:30:07 UTC - in response to Message 49647.

^^ These seem connection errors (network down or so)

Network can affect computation after the WU is already downloaded?


Yes. WUs check the latest version of conda packages/libraries right after start (from conda cloud).


Dang is that only after start? If tasks were started, paused, another started, etc could networking be disabled after they start? Or at each start/resume? Not me, but I've heard of some setup schedules to allow downloads at certain times of the day due to varying bandwidth costs.

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 91
Credit: 1,603,303,394
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 49654 - Posted: 7 Jun 2018 | 21:14:00 UTC

Today I have had three successful simultaneous starts and returns without error on two different FX8350 machines. As far as I am concerned, this bug has been resolved at least for the Fedora distro.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49655 - Posted: 8 Jun 2018 | 1:40:53 UTC

Haven't gotten a single error with version 3.21 and it's been running all day. What did you change?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49656 - Posted: 8 Jun 2018 | 12:01:04 UTC - in response to Message 49655.
Last modified: 8 Jun 2018 | 12:06:29 UTC

The main change was locking the miniconda directory upon initial installation/update. This in turn required some workarounds. May not be perfect but should be much better.

Regarding network access, it is attempted at each WU start or re-start. The amount of downloaded data should be usually negligible (except the first time).

Edit to add: the network accesses are only to "conda cloud", a python distribution and package manager.

[VENETO] boboviz
Send message
Joined: 10 Sep 10
Posts: 158
Credit: 388,132
RAC: 0
Level

Scientific publications
wat
Message 49657 - Posted: 8 Jun 2018 | 15:18:33 UTC

No more error, but a strange behaviour.
Remanining time for every wus are over 20h, but wus are crunched in 40/50 minutes...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,693,387,292
RAC: 13,149,905
Level
Tyr
Scientific publications
watwatwatwatwat
Message 49661 - Posted: 9 Jun 2018 | 1:27:58 UTC

Didn't get any response from my post in the cpu tasks thread.

How do you get the cpu tasks? I tried and failed. Is the QC app still considered a Test app? That was the only Preference toggle I didn't select.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,721,425,539
RAC: 1,808,309
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49662 - Posted: 9 Jun 2018 | 1:59:22 UTC - in response to Message 49661.
Last modified: 9 Jun 2018 | 2:02:42 UTC

Did you check: use CPU as well? You might not have allowed it.

No it is not necessary to check BETA tasks. I do not have checked it either, but I do get QC tasks.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,693,387,292
RAC: 13,149,905
Level
Tyr
Scientific publications
watwatwatwatwat
Message 49663 - Posted: 9 Jun 2018 | 18:47:33 UTC - in response to Message 49662.

Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49664 - Posted: 9 Jun 2018 | 21:42:15 UTC - in response to Message 49663.
Last modified: 9 Jun 2018 | 22:28:02 UTC

Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks.

That is your problem. The BOINC scheduler can get all mixed up when you select both CPU and GPU work on the same project, and you are eventually left high and dry on one or the other.

There are several discussions on it at Einstein, which has the same problem since they do both CPU and GPU work. Here is one recent discussion, where the moderator explains why the requester is not getting GPU work.
https://einsteinathome.org/content/not-getting-gpu-wus-anymore#comment-165295

I use separate machines for the CPU work and the GPU work on GPUGrid.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,693,387,292
RAC: 13,149,905
Level
Tyr
Scientific publications
watwatwatwatwat
Message 49665 - Posted: 9 Jun 2018 | 23:05:59 UTC - in response to Message 49664.

Thanks for the reply. I don't know. It worked a couple of months ago when the QC app and tasks first showed up. I was crunching both gpu and cpu at the same time. I know that shutting off a gpu request will probably work just to get some of the new QC tasks along with the latest 3.21 app.

I was just wondering how well the app works now with concurrent starts.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 49666 - Posted: 10 Jun 2018 | 2:06:18 UTC - in response to Message 49665.

I was just wondering how well the app works now with concurrent starts.

Let's put it this way. I was running three 3.21 QC on my i7-8700 and rebooted. They all resumed normally without error. So it is solved enough.

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 834,330,407
RAC: 3,860,166
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 49667 - Posted: 10 Jun 2018 | 9:34:50 UTC

My threadripper this night crunched 50 WU's, no issues. Good job :)

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49668 - Posted: 10 Jun 2018 | 9:44:30 UTC - in response to Message 49663.

Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess.


At risk of stating trivialities: did you check in the log if by chance it's a matter of disk space (either allocated to boinc, or actually free)? QC tasks are unusually demanding on disk space.

Anyway: thanks for trying to make it work :)

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 834,330,407
RAC: 3,860,166
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 49669 - Posted: 10 Jun 2018 | 9:58:13 UTC

Not an important thing, but are you planning to implement separate badges for CPU points?

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49673 - Posted: 11 Jun 2018 | 18:18:03 UTC

Hello, I just got a new error with v3.21. Does anyone have any idea what could be causing it?

Task is available below:
http://www.gpugrid.net/result.php?resultid=17764870

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,693,387,292
RAC: 13,149,905
Level
Tyr
Scientific publications
watwatwatwatwat
Message 49675 - Posted: 12 Jun 2018 | 4:10:29 UTC - in response to Message 49668.

Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess.


At risk of stating trivialities: did you check in the log if by chance it's a matter of disk space (either allocated to boinc, or actually free)? QC tasks are unusually demanding on disk space.

Anyway: thanks for trying to make it work :)

Well the disk space allotted to BOINC is 10GB. Have about 8GB free for BOINC/project use.

That wasn't the issue. Probably the request for both gpu and cpu at the same time. I got loaded up with gpu work on my multiple requests for cpu work along with my normal gpu work. Waiting till I clear out the gpu work and can set only cpu work requested. Will see if that makes the scheduler send me cpu work.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49699 - Posted: 20 Jun 2018 | 1:38:08 UTC

Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below

http://www.gpugrid.net/results.php?hostid=480159&offset=0&show_names=0&state=5&appid=

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1341
Credit: 7,693,387,292
RAC: 13,149,905
Level
Tyr
Scientific publications
watwatwatwatwat
Message 49700 - Posted: 20 Jun 2018 | 7:01:18 UTC - in response to Message 49699.

Lots of things don't work the same in 18.04 the way they did on 16.04. I figure the change in GTK and Python is the base cause of why compute doesn't work the same.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49701 - Posted: 20 Jun 2018 | 8:28:20 UTC - in response to Message 49699.

Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below

http://www.gpugrid.net/results.php?hostid=480159&offset=0&show_names=0&state=5&appid=


Have you carried over the boinc dir from a previous installation?

In any case, try resetting the project.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49702 - Posted: 20 Jun 2018 | 10:58:25 UTC - in response to Message 49701.

Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below

http://www.gpugrid.net/results.php?hostid=480159&offset=0&show_names=0&state=5&appid=


Have you carried over the boinc dir from a previous installation?

In any case, try resetting the project.

Unfortunately I tried resetting the project with no luck. This is a brand new installation. gcc and libc6-dev said they were already the most recent version after I sudo apt update'd. Do I need to sudo apt upgrade after sudo apt update?

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49703 - Posted: 20 Jun 2018 | 11:28:42 UTC - in response to Message 49702.
Last modified: 20 Jun 2018 | 11:31:34 UTC

Can you try installing python-support (if not already?)

I am under the impression that the WUs failing with segmentation fault is the root cause, and other errors are consequences.

tullio
Send message
Joined: 8 May 18
Posts: 190
Credit: 104,426,808
RAC: 0
Level
Cys
Scientific publications
wat
Message 49705 - Posted: 20 Jun 2018 | 16:11:36 UTC

CPU tasks do not fail on my HP laptop with SuSE Leap 42.3 which is constantly updated by SuSE. They fail instead on my SUN workstation, also with SuSE Linux 42.3 which is not updated by SuSE, I don't know why. On the other hand, GPU tasks run on the GTX 750 Ti board on the SUN, giving me huge credits.
Tullio

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,672,242,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 49706 - Posted: 20 Jun 2018 | 23:37:10 UTC - in response to Message 49703.
Last modified: 20 Jun 2018 | 23:38:22 UTC

Can you try installing python-support (if not already?)

I am under the impression that the WUs failing with segmentation fault is the root cause, and other errors are consequences.

I tried installing python-support and was able to get different errors!

Here is the link again to the error page:
http://www.gpugrid.net/results.php?hostid=480159&offset=0&show_names=0&state=5&appid=

The abandons are due to completely detaching then adding project again which didn't help.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 49710 - Posted: 22 Jun 2018 | 10:58:08 UTC - in response to Message 49706.
Last modified: 22 Jun 2018 | 14:56:29 UTC

From what I can tell the error is the same, the segmentation fault in pthread. Try to play around with your gcc installation, e.g. see if you have the latest one, g++, and so on. Ubuntu 18.04 is a widespread distro so it's surprising it doesn't work.

Post to thread

Message boards : Multicore CPUs : Simultaneously starting MCs

//