Message boards : Multicore CPUs : Simultaneously starting MCs
Author | Message |
---|---|
Problem with simultaneously starting Multicore CPU tasks has not been fixed yet! | |
ID: 49378 | Rating: 0 | rate: / Reply Quote | |
Problem with simultaneously starting Multicore CPU tasks has not been fixed yet! I have heard about this error but maybe I don't understand the symptoms. I have had three 4 thread WUs start at once and sometimes they work, sometimes they don't. Could this be the cause of my errors? Linked below are the tasks from the system: http://www.gpugrid.net/results.php?hostid=424454 | |
ID: 49382 | Rating: 0 | rate: / Reply Quote | |
If two WUs start at the same time, in most cases one of the WU failes right at start and throws an calculation error, which is very inconveniant, if you have to start often times more than one WU at the same time (in my case up to 20 WUs on my 80 threads machine). Would like to hear some statement of the developer(s) or simply a bugfix within the WUs, because at the moment, I do not see any suitable work around. Don't think it's just your fault or mine, because several users have reported this issue for a while now. | |
ID: 49383 | Rating: 0 | rate: / Reply Quote | |
If it is still failing, please provide a task number for me to check. | |
ID: 49614 | Rating: 0 | rate: / Reply Quote | |
Toni, | |
ID: 49615 | Rating: 0 | rate: / Reply Quote | |
@captainjack: please try two things for me flock command. See if it gives an error (command not found) or a longer message.2. reset the project Thanks | |
ID: 49616 | Rating: 0 | rate: / Reply Quote | |
Toni, | |
ID: 49617 | Rating: 0 | rate: / Reply Quote | |
:( | |
ID: 49618 | Rating: 0 | rate: / Reply Quote | |
Statistically though the new app seems to have worked on other hosts. We went from 900 WU to 1500 WU in progress. | |
ID: 49620 | Rating: 0 | rate: / Reply Quote | |
with all my computers just the same... Toni, does it help if I give you by private message the remote control access credentials of one of my Linux machines, so you can test on your own? | |
ID: 49621 | Rating: 0 | rate: / Reply Quote | |
Thomas, that would help, but perhaps let me ask another thing first: | |
ID: 49622 | Rating: 0 | rate: / Reply Quote | |
Sometimes they work and sometimes not. If two or more WUs start at the same time, they all throw calculation errors. This was the state until now. But with the very new version you just released today, it seems like they are not working anymore at all. My operating system is Linux Mint 18.3 64 Bit with actual linux kernel. I installed boinc with "sudo apt-get install boinc". gcc-5 and g++-5 are installed; also python-support. | |
ID: 49623 | Rating: 0 | rate: / Reply Quote | |
Ok thanks. This will need a while to debug. As you can see there is no error info to help (as usual in boinc...). | |
ID: 49624 | Rating: 0 | rate: / Reply Quote | |
All errors on my VM (with 4 virtual core). <message> I have not app_config. Addendum: i also tried to stop all wus and started manually one-by-one. Same error. | |
ID: 49625 | Rating: 0 | rate: / Reply Quote | |
Version 320 out | |
ID: 49626 | Rating: 0 | rate: / Reply Quote | |
Also, you need the libc6-dev package sudo apt install libc6-dev | |
ID: 49627 | Rating: 0 | rate: / Reply Quote | |
All tasks exit with error: 14:19:18 (8806): wrapper: running /bin/bash (-c "flock /var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock ./miniconda-installer -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda") Please run using "bash" or "sh", but not "." or "source"\n14:19:19 (8806): /bin/bash exited; CPU time 0.001596 14:19:19 (8806): app exit status: 0x1 | |
ID: 49629 | Rating: 0 | rate: / Reply Quote | |
These were 3.19. See with 3.20 | |
ID: 49630 | Rating: 0 | rate: / Reply Quote | |
When I tried to start two at the same time, one of them runs okay and the other one aborts. | |
ID: 49631 | Rating: 0 | rate: / Reply Quote | |
I have had libc6-dev already installed... Earlier I got a bunch of new WUs, but all failed after several minutes of calculation (~ 5 - 15 minutes). | |
ID: 49632 | Rating: 0 | rate: / Reply Quote | |
Started two QC Wu's simultaneously using app version 3.20 and one failed with this stderr report: | |
ID: 49634 | Rating: 0 | rate: / Reply Quote | |
Yes, now simultaneous starts crash with "text file busy". Will look for yet another workaround. | |
ID: 49636 | Rating: 0 | rate: / Reply Quote | |
Attempting fix at version 321 | |
ID: 49640 | Rating: 0 | rate: / Reply Quote | |
Just got a couple of errors from last night's WUs version 3.20. | |
ID: 49641 | Rating: 0 | rate: / Reply Quote | |
^^ These seem connection errors (network down or so) | |
ID: 49642 | Rating: 0 | rate: / Reply Quote | |
^^ These seem connection errors (network down or so) Network can affect computation after the WU is already downloaded? | |
ID: 49643 | Rating: 0 | rate: / Reply Quote | |
I have exactly the same problems so I dont think this is connection related. | |
ID: 49644 | Rating: 0 | rate: / Reply Quote | |
Version 321 looks promising. Just finished two that started at the same time and they finished normally. Just started three at the same time and they are all processing as they should. | |
ID: 49645 | Rating: 0 | rate: / Reply Quote | |
Success with two QC WU's simultaneous start using app 3.21. Both WU's happily crunching @ 1.098% completed so far. | |
ID: 49646 | Rating: 0 | rate: / Reply Quote | |
^^ These seem connection errors (network down or so) Yes. WUs check the latest version of conda packages/libraries right after start (from conda cloud). | |
ID: 49647 | Rating: 0 | rate: / Reply Quote | |
^^ These seem connection errors (network down or so) Dang is that only after start? If tasks were started, paused, another started, etc could networking be disabled after they start? Or at each start/resume? Not me, but I've heard of some setup schedules to allow downloads at certain times of the day due to varying bandwidth costs. | |
ID: 49652 | Rating: 0 | rate: / Reply Quote | |
Today I have had three successful simultaneous starts and returns without error on two different FX8350 machines. As far as I am concerned, this bug has been resolved at least for the Fedora distro. | |
ID: 49654 | Rating: 0 | rate: / Reply Quote | |
Haven't gotten a single error with version 3.21 and it's been running all day. What did you change? | |
ID: 49655 | Rating: 0 | rate: / Reply Quote | |
The main change was locking the miniconda directory upon initial installation/update. This in turn required some workarounds. May not be perfect but should be much better. | |
ID: 49656 | Rating: 0 | rate: / Reply Quote | |
No more error, but a strange behaviour. | |
ID: 49657 | Rating: 0 | rate: / Reply Quote | |
Didn't get any response from my post in the cpu tasks thread. | |
ID: 49661 | Rating: 0 | rate: / Reply Quote | |
Did you check: use CPU as well? You might not have allowed it. | |
ID: 49662 | Rating: 0 | rate: / Reply Quote | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess. | |
ID: 49663 | Rating: 0 | rate: / Reply Quote | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. That is your problem. The BOINC scheduler can get all mixed up when you select both CPU and GPU work on the same project, and you are eventually left high and dry on one or the other. There are several discussions on it at Einstein, which has the same problem since they do both CPU and GPU work. Here is one recent discussion, where the moderator explains why the requester is not getting GPU work. https://einsteinathome.org/content/not-getting-gpu-wus-anymore#comment-165295 I use separate machines for the CPU work and the GPU work on GPUGrid. | |
ID: 49664 | Rating: 0 | rate: / Reply Quote | |
Thanks for the reply. I don't know. It worked a couple of months ago when the QC app and tasks first showed up. I was crunching both gpu and cpu at the same time. I know that shutting off a gpu request will probably work just to get some of the new QC tasks along with the latest 3.21 app. | |
ID: 49665 | Rating: 0 | rate: / Reply Quote | |
I was just wondering how well the app works now with concurrent starts. Let's put it this way. I was running three 3.21 QC on my i7-8700 and rebooted. They all resumed normally without error. So it is solved enough. | |
ID: 49666 | Rating: 0 | rate: / Reply Quote | |
My threadripper this night crunched 50 WU's, no issues. Good job :) | |
ID: 49667 | Rating: 0 | rate: / Reply Quote | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess. At risk of stating trivialities: did you check in the log if by chance it's a matter of disk space (either allocated to boinc, or actually free)? QC tasks are unusually demanding on disk space. Anyway: thanks for trying to make it work :) | |
ID: 49668 | Rating: 0 | rate: / Reply Quote | |
Not an important thing, but are you planning to implement separate badges for CPU points? | |
ID: 49669 | Rating: 0 | rate: / Reply Quote | |
Hello, I just got a new error with v3.21. Does anyone have any idea what could be causing it? | |
ID: 49673 | Rating: 0 | rate: / Reply Quote | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess. Well the disk space allotted to BOINC is 10GB. Have about 8GB free for BOINC/project use. That wasn't the issue. Probably the request for both gpu and cpu at the same time. I got loaded up with gpu work on my multiple requests for cpu work along with my normal gpu work. Waiting till I clear out the gpu work and can set only cpu work requested. Will see if that makes the scheduler send me cpu work. | |
ID: 49675 | Rating: 0 | rate: / Reply Quote | |
Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below | |
ID: 49699 | Rating: 0 | rate: / Reply Quote | |
Lots of things don't work the same in 18.04 the way they did on 16.04. I figure the change in GTK and Python is the base cause of why compute doesn't work the same. | |
ID: 49700 | Rating: 0 | rate: / Reply Quote | |
Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below Have you carried over the boinc dir from a previous installation? In any case, try resetting the project. | |
ID: 49701 | Rating: 0 | rate: / Reply Quote | |
Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below Unfortunately I tried resetting the project with no luck. This is a brand new installation. gcc and libc6-dev said they were already the most recent version after I sudo apt update'd. Do I need to sudo apt upgrade after sudo apt update? | |
ID: 49702 | Rating: 0 | rate: / Reply Quote | |
Can you try installing python-support (if not already?) | |
ID: 49703 | Rating: 0 | rate: / Reply Quote | |
CPU tasks do not fail on my HP laptop with SuSE Leap 42.3 which is constantly updated by SuSE. They fail instead on my SUN workstation, also with SuSE Linux 42.3 which is not updated by SuSE, I don't know why. On the other hand, GPU tasks run on the GTX 750 Ti board on the SUN, giving me huge credits. | |
ID: 49705 | Rating: 0 | rate: / Reply Quote | |
Can you try installing python-support (if not already?) I tried installing python-support and was able to get different errors! Here is the link again to the error page: http://www.gpugrid.net/results.php?hostid=480159&offset=0&show_names=0&state=5&appid= The abandons are due to completely detaching then adding project again which didn't help. | |
ID: 49706 | Rating: 0 | rate: / Reply Quote | |
From what I can tell the error is the same, the segmentation fault in pthread. Try to play around with your gcc installation, e.g. see if you have the latest one, g++, and so on. Ubuntu 18.04 is a widespread distro so it's surprising it doesn't work. | |
ID: 49710 | Rating: 0 | rate: / Reply Quote | |
Message boards : Multicore CPUs : Simultaneously starting MCs