Message boards : News : More CPU jobs
Author | Message |
---|---|
...with a new and improved application (Linux only). The current version should eliminate dependencies on gcc and devel libraries. | |
ID: 49769 | Rating: 0 | rate:
![]() ![]() ![]() | |
By the way, the new app downloads updated libraries. Feel free to reset the project to free up disk space taken by the old ones. | |
ID: 49770 | Rating: 0 | rate:
![]() ![]() ![]() | |
why are these CPU jobs for Linux only, and not for Windows, too? | |
ID: 49771 | Rating: 0 | rate:
![]() ![]() ![]() | |
Because they can make the app work under Linux but are not successful yet in creating a Windows app that works. | |
ID: 49773 | Rating: 0 | rate:
![]() ![]() ![]() | |
Because they can make the app work under Linux but are not successful yet in creating a Windows app that works. hm, this makes we wonder why it is so much more difficult to create an app for Windows than for Linux ... further, an easy way to solve this would be to have the Linux app run in a Virtual Machine (like, for example, LHC is doing it for some of it's sub-projects). | |
ID: 49774 | Rating: 0 | rate:
![]() ![]() ![]() | |
Making boinc apps is like building a ship in a bottle, in the sense that your tools are very limited and you don't control the environment. In the case of windows the bottle is dark. ;) | |
ID: 49776 | Rating: 0 | rate:
![]() ![]() ![]() | |
Erich56 said: further, an easy way to solve this would be to have the Linux app run in a Virtual Machine Erich56, if you want to run the Linux app in a Virtual Machine, you can create your own virtual machine, install Linux and BOINC, then run the QC tasks from there. That is what I have done on my Windows machines and it works fine. | |
ID: 49780 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have been running CERN LHC@home Virtual Machines for more than ten years, and I have been rewarded with a CERN Polo Shirt. But yes, they do present some problems. Now your CPU tasks seem to run fine on my old SUN Workstation with SuSE Leap 42.3 Linux. | |
ID: 49781 | Rating: 0 | rate:
![]() ![]() ![]() | |
That is what I have done on my Windows machines and it works fine. +1 Virtual box on my Win10. But i think it's not the best solution for performance.... | |
ID: 49783 | Rating: 0 | rate:
![]() ![]() ![]() | |
As far as I know virtualization is almost native speed these days, especially for computing. | |
ID: 49784 | Rating: 0 | rate:
![]() ![]() ![]() | |
The recent batch of CPU WUs seems to be done. Will there be more soon? | |
ID: 49785 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes, I am making some now. I'll try to submit new ones today | |
ID: 49786 | Rating: 0 | rate:
![]() ![]() ![]() | |
Sorry, sorry, sorry I messed up due to a small mistake. Had to nuke the WUs. Redoing them now. | |
ID: 49787 | Rating: 0 | rate:
![]() ![]() ![]() | |
No issue at all. I'm glad the team communicates openly. | |
ID: 49788 | Rating: 0 | rate:
![]() ![]() ![]() | |
All of my Stefan CPU WUs are stuck at 10% and I aborted them after about 4 hours. This is the machine (16.04 LTS) that has never had any issues with pretty much any of the WUs. | |
ID: 49789 | Rating: 0 | rate:
![]() ![]() ![]() | |
Holy cow the website is SOO SLOW. I had to use a proxy in Sweden to just get anything to load. I can't even get tasks even though the site says there are plenty. | |
ID: 49790 | Rating: 0 | rate:
![]() ![]() ![]() | |
GPUGRID is taking 3.34 GB of disk space on my main Linux host, 3.90 on a Linux laptop. On the same laptop LHC@home is taking 5.75 GB. | |
ID: 49791 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes this is the stuff I resent to the beta queue I guess. They are much larger molecules so they were crashing on the QM queue cause they ran out of scratch space. I have seen them use up to 18GB scratch space so at the moment I don't know yet how to run these on GPUGRID as it seems to be an issue with many users. | |
ID: 49792 | Rating: 0 | rate:
![]() ![]() ![]() | |
http://gpugrid.net/results.php?hostid=470907 | |
ID: 49793 | Rating: 0 | rate:
![]() ![]() ![]() | |
Must we update conda? | |
ID: 49794 | Rating: 0 | rate:
![]() ![]() ![]() | |
They are much larger molecules so they were crashing on the QM queue cause they ran out of scratch space. I have seen them use up to 18GB scratch space so at the moment I don't know yet how to run these on GPUGRID as it seems to be an issue with many users. The most recent Betas have worked OK for me. But I have 32 GB memory, which may help. http://www.gpugrid.net/results.php?hostid=334241&offset=0&show_names=0&state=0&appid=35 You could set up a special sub-project for the large molecules if you want to. | |
ID: 49795 | Rating: 0 | rate:
![]() ![]() ![]() | |
We are using QC beta to test large molcules and how much disk space they take. I think they can (temporarily of course) go up to 20 GB of space (!). I am not sure about RAM - they should be < 4 GB. | |
ID: 49796 | Rating: 0 | rate:
![]() ![]() ![]() | |
We are using QC beta to test large molcules and how much disk space they take. I think they can (temporarily of course) go up to 20 GB of space (!). I am not sure about RAM - they should be < 4 GB. Well... So my Threadripper could need up to 160GB of space?! It has just 32GB... | |
ID: 49797 | Rating: 0 | rate:
![]() ![]() ![]() | |
We are talking about DISK space. Only a few WUs will be that big - unless we make a "big" queue. T [/quote] | |
ID: 49798 | Rating: 0 | rate:
![]() ![]() ![]() | |
As far as I know virtualization is almost native speed these days, especially for computing. Yes, if you are using "hard" virtualization like Esx and Hyper-v. "Soft" virtualization like VirtualBox or VmPlayer may suffer bottlenecks | |
ID: 49799 | Rating: 0 | rate:
![]() ![]() ![]() | |
We are using QC beta to test large molcules and how much disk space they take. Toni, if I may ask you, what molecule size are we (roughly) talking about? As you know, because of my son I have personal interest in HCF1 research, and I would like to get a feel how far science is still away from handling that large molecules. Thanks in advance and my apologies for coming up with my personal issues once in a while. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49800 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have a degree in Theoretical physics obtained in 1967, but that was related to elementary particle physics. Then in the Nineties, while at Trieste Area Science Park as manager of a UNIX BULL Laboratory I attended a few lectures in the UN Center for Genetic Engineering and Biotechnology on the Density Functional Theory. Since retirement, I have run a few BOINC projects including one on Monte Carlo Method applied to Quantum Chemistry but it no longer exists. This is the first time I am running a project which uses Neural Networks. | |
ID: 49801 | Rating: 0 | rate:
![]() ![]() ![]() | |
[/quote] I'm also talking about disk space. I'm using 32GB Optane module as boot drive. It's time to change it for something bigger. | |
ID: 49802 | Rating: 0 | rate:
![]() ![]() ![]() | |
@kain: the disk space is used in the directory BOINC is running. Usually (if you use the distribution installers) it is the in the disk used at the root of the file system, indeed. | |
ID: 49804 | Rating: 0 | rate:
![]() ![]() ![]() | |
I had to enlarge the root partition to accommodate the QC beta 3.31 since boinc from the Fedora distro by default installs in /var/lib/boinc and runs as a daemon under systemctl. After that, the WU's seemed to run fine but they sure ate up a lot of RAM. Both my 8 core's have 16 GB RAM and I was running them 2 concurrent with 4 cores each. I think only two errored out and the rest completed and validated. Guess I'll have to max my 8 core machines with 32 GB RAM to run the bigger molecules. | |
ID: 49805 | Rating: 0 | rate:
![]() ![]() ![]() | |
Are you able to set the disk limit in the boinc preferences to prevent too many WUs from running? | |
ID: 49806 | Rating: 0 | rate:
![]() ![]() ![]() | |
Easier and more precise simply to set max_concurrent in app_config.xml | |
ID: 49807 | Rating: 0 | rate:
![]() ![]() ![]() | |
Great to know, thanks. Actually I was also wondering if boinc respects the disk limits . | |
ID: 49808 | Rating: 0 | rate:
![]() ![]() ![]() | |
Worth it to perform the experiment, certainly. Possibly depends whether it respects the declared space needed (<rsc_disk_bound>), or the actual space used. If the latter, there might be a problem if the actual usage increases gradually during the run - BOINC might only check it when deciding whether to start a(nother) new task. Lots of fun to be had with those possibilities... | |
ID: 49809 | Rating: 0 | rate:
![]() ![]() ![]() | |
@Toni, Yes boinc does appear to respect the client disk settings as it lets one know if disk space is too low to run certain projects in the event log. I usually set a high arbitrary GB size but the client appears to react to the real amount available in the execution partition and uses the percentage limits to notify user when disk space is too low. I had to readjust the percent limits higher (in the client settings) a few days ago to run the 3.30 app on one of my machines. Probably due to the project directory getting too full. I hate to reset the project and loose WU's but I suppose I will have to eventually. | |
ID: 49810 | Rating: 0 | rate:
![]() ![]() ![]() | |
We are using QC beta to test large molcules and how much disk space they take. I get it. Thank you so much for your help. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49811 | Rating: 0 | rate:
![]() ![]() ![]() | |
The molecules for QM are max 50 atoms or so. The size is however not very indicative. This is a specific "chemistry-oriented" type of calculations. | |
ID: 49812 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thank you VERY much for that line, I really appreciate that. I was already afraid of being a constant bother. Of course I understand that we are still years or even decades away from handling huge proteins like HCF1 and I don't want to be obtrusive. Having said that, I would like to keep sight of those long term targets. Thanks again... if I may, I will get back to you with this question in a couple of years. But I am glad that Gpugrid and its team is more than just being "exclusively academic". There actually is a vision of the future we can believe in. ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49813 | Rating: 0 | rate:
![]() ![]() ![]() | |
On an EDX online course on quantum computers which I followed recently there was a professor at Dartmouth University who uses a quantum computer to do quantum chemistry calculations. | |
ID: 49814 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hey JoergF, while we are not doing proteins with QM yet, (some other groups are trying to do that with networks), what we are calculating is directly related to drug design so I think it is very relevant. | |
ID: 49820 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thank you very much. Which kind of contribution will help you most in order to make progress on proteins (in the long run of course)? Because I am just considering whether to buy an additional GPU or CPU this autumn. | |
ID: 49822 | Rating: 0 | rate:
![]() ![]() ![]() | |
We as a group are not really focusing on applying QM to proteins. The problem being double: | |
ID: 49823 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thank you... no problem. So we just keep on crunching on all sides and see where the road leads us to. :) | |
ID: 49824 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid. I'm thinking of leaving it crunching these instead of Rosetta (the rest of my PCs all run Windows...). Hope it helps! | |
ID: 49825 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid. I'm thinking of leaving it crunching these instead of Rosetta (the rest of my PCs all run Windows...). Hope it helps! Wow! That's a lot of compute! | |
ID: 49830 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid. I'm thinking of leaving it crunching these instead of Rosetta (the rest of my PCs all run Windows...). Hope it helps! Epyc with 48 threads ... I go green with envy :-)) ____________ I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday. | |
ID: 49831 | Rating: 0 | rate:
![]() ![]() ![]() | |
48 QM WUs are 192 (CPU) threads. I need 4 computers to reach that. | |
ID: 49834 | Rating: 0 | rate:
![]() ![]() ![]() | |
48 QM WUs are 192 (CPU) threads. I need 4 computers to reach that. I just realized from your comment that it actually crunches 12 WUs at a time (I just saw all 48 threads running @ 100% immediately thinking it was running 48 WUs just like Rosetta) I am not a smart man. ____________ ![]() | |
ID: 49839 | Rating: 0 | rate:
![]() ![]() ![]() | |
Well i run QC on a 2core pc just for fun :-) | |
ID: 49840 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid.To everybody using hyper-threaded CPUs for crunching: You should test how well the given app scales with HT on or off on your system. The other approach is leave HT on, but lower the percentage of the usable CPUs in BOINC manager (down to 50%). Too many simultaneous memory intensive apps would cause too many cache misses, resulting in degraded combined performance. With HT off (or by setting the usable CPUs to 50%) calculation time should be halved (due that two threads have one FPU). If it's more than a half, then the number of usable CPUs could be increased, while the RAC has risen accordingly (= in a direct ratio). I can't test it myself until the Windows app has been released, but I'm interested. A simultaneous GPU task also could degrade the performance of the CPU tasks and vice versa. | |
ID: 49842 | Rating: 0 | rate:
![]() ![]() ![]() | |
Most tasks benefit from HT but I only recall one doing better overall with HT off on my 2670v1s. | |
ID: 49843 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have a related question I cannot answer myself. | |
ID: 49856 | Rating: 0 | rate:
![]() ![]() ![]() | |
2) Can I limit the number of cores used by QC? Use an "app_config.xml" file to limit the number of cores per work unit, and also the number of QC work units running if you wish. http://www.gpugrid.net/forum_thread.php?id=4748&nowrap=true#49369 I have found that QC is tough on resources too. Even though I reserved a CPU core to support a GTX 1070 on Folding, running QC still caused a drop in Folding points, showing that the GPU was being starved for CPU support. To fix that, I now run only six cores of my i7-4770 on CPU work, and leave two cores to support the GPU. But even that was not enough, so I run 4 cores on QC (two work units running two cores each) with the other two on LHC/native ATLAS. That frees up enough CPU resources so that I see only a minimal drop in Folding points. | |
ID: 49857 | Rating: 0 | rate:
![]() ![]() ![]() | |
Use an "app_config.xml" file to limit the number of cores per work unit, and also the number of QC work units running if you wish. Thanks Jim1348, I just tried, but without improvement. Seems to be connected to the algorithm. | |
ID: 49858 | Rating: 0 | rate:
![]() ![]() ![]() | |
Use an "app_config.xml" file to limit the number of cores per work unit, and also the number of QC work units running if you wish. You must tell BOINC to reread configs to pick up the changes. Tasks already downloaded will still say 4c even. Only new ones will say 1c or 2c but all will run at your new setting. Its a BOINC thing to sometimes squeeze in more tasks than cores. I've seen it happen on my 3570k when a single threaded task completes and a 4c tasks starts it will show more running for a but but it eventually corrects. | |
ID: 49859 | Rating: 0 | rate:
![]() ![]() ![]() | |
You must tell BOINC to reread configs to pick up the changes. Tasks already downloaded will still say 4c even. Only new ones will say 1c or 2c but all will run at your new setting. I did, but it still required a reboot. Tasks that were 4 core previously appeared as x core after a reboot and were crunched as such as well. Credit might take a hit, but I didn't mind for this test. Its a BOINC thing to sometimes squeeze in more tasks than cores. I've seen it happen on my 3570k when a single threaded task completes and a 4c tasks starts it will show more running for a but but it eventually corrects. I thought that was the issue, but it wasn't. I even suspended one/several/all QC task, but if/when BOINC could start another task it always did, despite CPU% >400%. CPU% stayed >700% (i.e. 7 tasks running) for an hour plus. It is working now. | |
ID: 49860 | Rating: 0 | rate:
![]() ![]() ![]() | |
Looks like the CPU WU Queue is almost running dry | |
ID: 49968 | Rating: 0 | rate:
![]() ![]() ![]() | |
Holidays...mumble...something...something...holidays :D hahah. I restocked them now. From Monday I'll be back working so I'll take more care of my WUs | |
ID: 49969 | Rating: 0 | rate:
![]() ![]() ![]() | |
I downloaded 4 QC tasks on my Windows 10 PC and of course they failed. But why the server sends me QC tasks on a Windows PC? | |
ID: 50017 | Rating: 0 | rate:
![]() ![]() ![]() | |
I downloaded 4 QC tasks on my Windows 10 PC and of course they failed. But why the server sends me QC tasks on a Windows PC? same is true for GPU tasks - one can download them on a Windows OS, and they fail after a few seconds. | |
ID: 50019 | Rating: 0 | rate:
![]() ![]() ![]() | |
Just a notice to Stefan, only a few days left of CPU WUs in the queue. | |
ID: 50273 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thanks, I noticed :) I'm in the process of creating new WUs but the issue is that they are more demanding than the last ones so we are trying to figure out ways to make them use less disk at the cost of more computation time because the largest one used 50GB of scratch space to calculate. | |
ID: 50276 | Rating: 0 | rate:
![]() ![]() ![]() | |
My HP Linux laptop running SuSE Leap 15.0 after Leap 42.3 (any relationship to SLES 15.0 ?) has 752.37 GB available to BOINC. Instead my older SUN WS running SuSE Leap 42.3 has at most 30 GB available to BOINC 7.8.3 of a 1 TB disk. | |
ID: 50277 | Rating: 0 | rate:
![]() ![]() ![]() | |
So the ones I am sending out now should use maximum around 6GB scratch space on /tmp/. If you hit any problems feel free to report here. | |
ID: 50278 | Rating: 0 | rate:
![]() ![]() ![]() | |
Minor note: barring changes I am unaware of, the scratch space used during the run is in the slot directory. (/tmp is limited on many systems) | |
ID: 50280 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have a QC task running on my Linux laptop. It is at 73% after 9;07;27 hours. But its slot is empty. | |
ID: 50281 | Rating: 0 | rate:
![]() ![]() ![]() | |
Two QC tasks failed on my main Linux box which has a 30 GB limit to BOINC 7.8.3 with the same message DISK USAGE LIMIT EXCEEDED. GPU task running fine on its GTX 750 Ti at 61 C. | |
ID: 50282 | Rating: 0 | rate:
![]() ![]() ![]() | |
OK I have a feeling we hit a file-size limit of BOINC and not of the drives. I'll chat it up with Toni and see what we can do. | |
ID: 50283 | Rating: 0 | rate:
![]() ![]() ![]() | |
OK I have a feeling we hit a file-size limit of BOINC and not of the drives. I'll chat it up with Toni and see what we can do. Every workunit sent out by a BOINC server has an associated value <rsc_disk_bound> (in bytes). That value - set by the project - has to be large enough to accommodate all anticipated disk usage. If you use more than you've declared in advance, 'DISK USAGE LIMIT EXCEEDED' is exactly the error message you'd expect. | |
ID: 50285 | Rating: 0 | rate:
![]() ![]() ![]() | |
Completed and validated two QC task on my main Linux host. A GPU task is running on it at 1202 MHz clock,5400 MHz memory transfer, temperature 61 C on its GTX 750 Ti, driver 384.111. | |
ID: 50292 | Rating: 0 | rate:
![]() ![]() ![]() | |
Completed and validated two QC task on my main Linux host. It's really too bad that QC is not available for Windows :-( | |
ID: 50293 | Rating: 0 | rate:
![]() ![]() ![]() | |
just get Linux installed. | |
ID: 50294 | Rating: 0 | rate:
![]() ![]() ![]() | |
Two more QC tasks completed, two ready to start. Thanks. | |
ID: 50296 | Rating: 0 | rate:
![]() ![]() ![]() | |
CPU usage reaches 197% on my old Opteron 1210 with 2 cores, 145% when a GPU task is also running. RAM is 8 GB. | |
ID: 50301 | Rating: 0 | rate:
![]() ![]() ![]() | |
Sorry, but these are still the same old QC jobs. Thanks for the reports but it should not have changed much. I had to put some more of the old ones on queue while we fix the app space configuration so that I can send the new "SELE*" workunits. | |
ID: 50302 | Rating: 0 | rate:
![]() ![]() ![]() | |
I canceled remaining QMML50_2 jobs because I found out that some of them might be duplicates of already calculated WUs since there was a minor issue when retrieving them which left some behind. I am redoing the calculation of missing WUs now to make sure the ones I send out are correct. It might take me a day so please be patient. | |
ID: 50306 | Rating: 0 | rate:
![]() ![]() ![]() | |
SELE2 WUs are being sent out now. Toni increased the allowed space of the app to 30GB. From my tests the WUs should not use more than 6GB space each (the largest molecule). If you run many in parallel you might hit the limit though? I'm not certain about that. | |
ID: 50307 | Rating: 0 | rate:
![]() ![]() ![]() | |
CPU usage reaches 197% on my old Opteron 1210 with 2 cores, 145% when a GPU task is also running. RAM is 8 GB. Wouldn't it be more energy efficient to run a newer CPU? It's 100W for 2 Cores @ 1Ghz. Unless your electricity is free :D ____________ ![]() | |
ID: 50308 | Rating: 0 | rate:
![]() ![]() ![]() | |
new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever. | |
ID: 50309 | Rating: 0 | rate:
![]() ![]() ![]() | |
I see 89 successes and 17 errors. Seems ok for a start. I'll look into the errors but they don't seem to be broken as a whole. | |
ID: 50310 | Rating: 0 | rate:
![]() ![]() ![]() | |
Actually 14 out of the total 17 failures are on your machines Thomas so it might be specific to your case. Generally they seem ok. | |
ID: 50311 | Rating: 0 | rate:
![]() ![]() ![]() | |
It's running at 1.8 GHz and I have a 1220 Opteron in my drawers at 2.8 GHz. It's been running since January 2008.My electricity costs me 0.21 euro /kWh and I have 3 computers running 24/7, this Opteron, an AMD E-450 and a A10-6700 which should have 4 cores but Windows Task Manager says 2 cores and 4 logical processors. My total electricity expenditure is about 60 euro/month. | |
ID: 50312 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have an Intel 8 core (16 thread) Xeon server that has a 146 GB disk drive (has 2 of them but one died). It also has 24GB RAM. | |
ID: 50314 | Rating: 0 | rate:
![]() ![]() ![]() | |
@Conan: the QM calculations need to store lots of data in memory for best performance. Since we cannot ask for 20GB of RAM the software instead writes any amount of calculation data that exceeds the RAM limit (4GB) to the hard drive. | |
ID: 50316 | Rating: 0 | rate:
![]() ![]() ![]() | |
The current "disk limit" for CPU jobs is set at 20 GB. This is a ballpark estimate to accommodate both the software and libraries (largish by themselves) and the temporary (scratch) data. | |
ID: 50317 | Rating: 0 | rate:
![]() ![]() ![]() | |
new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever. On your failures I see "connection errors". Could be firewall filtering, or the like. | |
ID: 50318 | Rating: 0 | rate:
![]() ![]() ![]() | |
First SELE task done by my Old Faithful Opteron 1210 running SuSE Linux Leap 42.3. | |
ID: 50323 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have a funny SELE task on my Linux laptop. It is stuck at 10% after 14 hours 38 min, but the remaining estimated time is rising to more than 5 days. All seems normal by the "top" command and it has lots of disk space. | |
ID: 50335 | Rating: 0 | rate:
![]() ![]() ![]() | |
new WUs don't seem to work: they consume a lot of memory, throw computation errors or just rest at 10% progress forever. No firewall here. And same problem. | |
ID: 50336 | Rating: 0 | rate:
![]() ![]() ![]() | |
As said, those WUs do not work properly. I am away for another project and come back, if they are fixed. | |
ID: 50337 | Rating: 0 | rate:
![]() ![]() ![]() | |
OK, thanks Toni and Stefan for the information, that explains a lot. | |
ID: 50343 | Rating: 0 | rate:
![]() ![]() ![]() | |
In the slot of a running task there is an output directory which leads to a report of what the program is doing in physical terms. Maybe some explanation by the admins would be welcome. | |
ID: 50347 | Rating: 0 | rate:
![]() ![]() ![]() | |
We investigated another algorithm which doesn't use scratch disk space. Unfortunately on my test it was 13x slower than the one that uses disk (25 minutes became 5:30 hours). | |
ID: 50353 | Rating: 0 | rate:
![]() ![]() ![]() | |
I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing. | |
ID: 50354 | Rating: 0 | rate:
![]() ![]() ![]() | |
I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing. The 10% progress is explained as follows: updating (if necessary) the app is 10%, and usually happens immediately. The remaining 90% advances when molecules are calculated (e.g. 5 molecules = 90%/5 increments). However very big WUs have only one molecule, so no apparent progress until the end. (We have no finer grain progress). | |
ID: 50355 | Rating: 0 | rate:
![]() ![]() ![]() | |
I got plenty of disk space on my two Linux boxes because the slots directory is in my /home/user partition,which has more than 700 GB on my SuSE Linux Leap 42.3 and Leap 15.0 OS. What amazes me is that QC tasks are always stuck at 10% progress while GPU tasks show progress as increasing. So how much space do these WUs need? I'm running 12 at a time with 64 GB of RAM, but no swap space. I see that not all 48 threads are at 100%, I'm thinking it's the lack of swap. ____________ ![]() | |
ID: 50389 | Rating: 0 | rate:
![]() ![]() ![]() | |
In the old UNIX days a rule of thumb was that you needed a swap space twice the RAM, which was usually small.Now RAM is plenty. I got 22 GB RAM on the Windows 10 PC, and 8 GB RAM on each Linux box. GGPUGRID CPU tasks use some swap but most is not used. | |
ID: 50390 | Rating: 0 | rate:
![]() ![]() ![]() | |
I amped the swap to 300GB, but it only seems to be using RAM. Is this "scratch space" used in swap space or does the WU use the file directory for storage? I'm thinking it is the latter since the BOINC space usage goes up and down. | |
ID: 50391 | Rating: 0 | rate:
![]() ![]() ![]() | |
I see temporary files in the slots/0 directory They are named psi.25019.number | |
ID: 50392 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes afaik it doesn't use swap space, so increasing that will not help. It's probably where Tullio mentioned. The files are called `psi.XXXXX.XX`. Usually there are two and the second can grow significantly. | |
ID: 50393 | Rating: 0 | rate:
![]() ![]() ![]() | |
Stefan, I see 4 plus one which says psi.30091.clean | |
ID: 50394 | Rating: 0 | rate:
![]() ![]() ![]() | |
Im fixing an issue with SELE2 so I cancelled them and will send out SELE3 in a bit | |
ID: 50397 | Rating: 0 | rate:
![]() ![]() ![]() | |
Im fixing an issue with SELE2 so I cancelled them and will send out SELE3 in a bit Is this related to the 'upload failure - file size too big' problem reported for SELE2 last week? Whether or not, please double-check the <max_nbytes> value for the new batch. | |
ID: 50398 | Rating: 0 | rate:
![]() ![]() ![]() | |
I had to add WCG to this 48-thread beast because it isn't using all of the threads @ 100% when running GPUGRID only. I'd wager it's because of scratch space bottleneck (it's running a SSD tho, 200 MB/s according to hdparm)... ? | |
ID: 50399 | Rating: 0 | rate:
![]() ![]() ![]() | |
No, the issue was with an old version of psi4 giving wrong results on large molecules when using the scratch space. This is fixed in the latest version now. | |
ID: 50400 | Rating: 0 | rate:
![]() ![]() ![]() | |
No, the issue was with an old version of psi4 giving wrong results on large molecules when using the scratch space. This is fixed in the latest version now. I'll set WCG to don't allow new work and I'll report back! ____________ ![]() | |
ID: 50405 | Rating: 0 | rate:
![]() ![]() ![]() | |
I am running 3.31 SELE6. | |
ID: 50406 | Rating: 0 | rate:
![]() ![]() ![]() | |
Something is wrong. The BOINC Manager says it is running but python does not appear in the "top" console. | |
ID: 50408 | Rating: 0 | rate:
![]() ![]() ![]() | |
I just had about 20 of these fly through before they corrected and started to run correctly. <core_client_version>7.8.3</core_client_version> | |
ID: 50409 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes we had to do some testing with SELE3-5. SELE6 ought to work fine though. 1741/88 success/fail ratio | |
ID: 50413 | Rating: 0 | rate:
![]() ![]() ![]() | |
Things seem rather stable for SELE6. For further discussion let's please go to the multicore forum. | |
ID: 50416 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid.To everybody using hyper-threaded CPUs for crunching: Zoltan, I think you have a great point here. I am noticing much higher CPU utilization and half of the RAM usage when I switched to 50% CPU in BOINC on these new QC WUs. I think it's mostly to due to the much lower Hard drive bandwidth required and perhaps also the cache on the CPU is more efficiently allocated. | |
ID: 50428 | Rating: 0 | rate:
![]() ![]() ![]() | |
I'm trying out the AMD EPYC trial from Packet, it runs 48 QM WUs at a time... all valid.To everybody using hyper-threaded CPUs for crunching: Yup, I added Rosetta and WCG to the mix and the few GPUGRID WU run constantly @ 400% ____________ ![]() | |
ID: 50429 | Rating: 0 | rate:
![]() ![]() ![]() | |
Do you have any tips for getting higher utilization out of these new large molecule QC WUs? I am already running 4 WUs on a 16 core system which is 50% usage in BOINC but the utilization is all over the place. It's using up to 23gb of ram (I have 32gb) with only 4 WUs and I have plenty of space on the SSD. | |
ID: 50435 | Rating: 0 | rate:
![]() ![]() ![]() | |
CPU tasks - unsent: 44,723; in progress: 848; users in last 24hrs: 76 Quantum Chemistry unsent: 13,191 in progress: 866 Looks like we are cutting that number down to size quickly... ____________ ![]() ![]() | |
ID: 50488 | Rating: 0 | rate:
![]() ![]() ![]() | |
QC WUs are almost out, less than 600 to send out. | |
ID: 50518 | Rating: 0 | rate:
![]() ![]() ![]() | |
They may be waiting until the 3.31 jobs finish before introducing the new 3.32 version. I expect they have plenty more. | |
ID: 50519 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hopefully. We are officially out of cpu work. | |
ID: 50521 | Rating: 0 | rate:
![]() ![]() ![]() | |
I am running two resends. One of them failed with "file too big" error. The other is running. | |
ID: 50528 | Rating: 0 | rate:
![]() ![]() ![]() | |
I submitted some WUs but I am warning you :P This batch will use lots of scratch space. | |
ID: 50529 | Rating: 0 | rate:
![]() ![]() ![]() | |
This batch will use lots of scratch space. I am set up to run four work units at a time. How much will that need? I can change it as necessary; it is "only" a 120 GB SSD, with maybe 80 GB free at the moment. | |
ID: 50530 | Rating: 0 | rate:
![]() ![]() ![]() | |
I think the largest one took 50GB of scratch space. But they should scale linearly (they are not all that big) so it's practically up to chance if you will be able to run them all in parallel depending on if you get some of the smaller ones or the larger ones. | |
ID: 50531 | Rating: 0 | rate:
![]() ![]() ![]() | |
OK, the 250 GB SSDs are a good buy at the moment in the U.S. | |
ID: 50532 | Rating: 0 | rate:
![]() ![]() ![]() | |
I submitted some WUs but I am warning you :P This batch will use lots of scratch space. Most errors I get from these work units have this </stderr_txt> ____________ ![]() ![]() | |
ID: 50538 | Rating: 0 | rate:
![]() ![]() ![]() | |
Two have completed on my Linux box. | |
ID: 50544 | Rating: 0 | rate:
![]() ![]() ![]() | |
One more task gained 1848.00 credits. | |
ID: 50548 | Rating: 0 | rate:
![]() ![]() ![]() | |
One more task gained 1848.00 credits. 9 1/2 hours? that's longer than most Long Run GPUs tasks ;) | |
ID: 50549 | Rating: 0 | rate:
![]() ![]() ![]() | |
Again,1526.20 credits. | |
ID: 50553 | Rating: 0 | rate:
![]() ![]() ![]() | |
Again,1526.20 credits. Starting to see some of the longer run work units Run time - CPU time - Credit 3,812.94 - 11,088.78 - 2,307.16 4,054.73 - 11,758.62 - 2,637.67 ____________ ![]() ![]() | |
ID: 50565 | Rating: 0 | rate:
![]() ![]() ![]() | |
Longer runs don't seem to be affected by the DISK_LIMIT_EXCEEDED error which happens in some shorter runs.My later long run gave me 1151 credits. | |
ID: 50567 | Rating: 0 | rate:
![]() ![]() ![]() | |
I am submitting now some more of the faster QMML50_3 workunits. These should be quite quick and have a higher priority than the SELE6 so you might be getting these for a while now. | |
ID: 50569 | Rating: 0 | rate:
![]() ![]() ![]() | |
Run time 3,241.13 | |
ID: 50570 | Rating: 0 | rate:
![]() ![]() ![]() | |
Run time 3,241.13 Hm, a marked drop in the credit, compared to what Zalster got (see a few postings above):
| |
ID: 50571 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes, but his CPU time is much higher, probably because of the core number he has. I have only two. | |
ID: 50572 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes, but his CPU time is much higher, probably because of the core number he has. I have only two. Couple of things I've noticed. The computer that are getting the higher time/credit only has 12 threads. My 10core/20thread is still getting the shorter, quicker work units. Not sure why. Also, most of the long run are resends that erred out on other computers. Maybe it was the disk space as the issue, don't know. Just thought I would point that out as well. | |
ID: 50573 | Rating: 0 | rate:
![]() ![]() ![]() | |
My Linux HP laptop cannot get GPUGRID tasks because it has only 24.90 GB available and the server says it needs 32. So I am running SETI@home on it, which does not require that much space. | |
ID: 50574 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have no clue how the BOINC scheduler works but if it works as I hope it works you should be getting only the QMML50 workunits for a while now. Maybe some SELE6 were still scheduled from before. | |
ID: 50575 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes the new QMML50s are flowing. Running between 2-4 minutes currently. Will keep an eye on them. | |
ID: 50578 | Rating: 0 | rate:
![]() ![]() ![]() | |
Help! I have no working coming in and it has been a number of days, I see there is plenty of work on the server??? Please advise soonest, thanks Gary | |
ID: 52318 | Rating: 0 | rate:
![]() ![]() ![]() | |
There is GPU work available. And you will need an Nvidia GPU. | |
ID: 52319 | Rating: 0 | rate:
![]() ![]() ![]() | |
Message boards : News : More CPU jobs