Message boards : Number crunching : Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem
Author | Message |
---|---|
Aborting CUDA3.1 tasks are easy (also detrimental for the project), but what can you do if you have a CUDA3.1 workunit running for hours, and you don't want to waste it? | |
ID: 25924 | Rating: 0 | rate: / Reply Quote | |
Hopefully this will not impact on the building of subsequent tasks? I will see it soon, my first converted tasks will finish in two hours. I will try your workaround if I get any more CUDA 4.2 tasks on Windows, but I fully expect it to work. You don't have to wait, the CUDA4.2 files are there in the directory of the project. | |
ID: 25925 | Rating: 0 | rate: / Reply Quote | |
Am I correct in that you need to do this every time you notice a cuda3.1 WU running? | |
ID: 25926 | Rating: 0 | rate: / Reply Quote | |
Am I correct in that you need to do this every time you notice a cuda3.1 WU running? Not if the next task runs in the same slot (and I think it does). Might be a coincidence but I have not received any 3.1 tasks since resetting yesterday, just 4.2. acemd.win.2352 therefore doesn't exist, so you can't do this in advance (and after a project reset). ____________ FAQ's HOW TO: - Opt out of Beta Tests - Ask for Help | |
ID: 25927 | Rating: 0 | rate: / Reply Quote | |
Nice work Retvari!!! | |
ID: 25928 | Rating: 0 | rate: / Reply Quote | |
Am I correct in that you need to do this every time you notice a cuda3.1 WU running? The bad news is that the BOINC manager notices that I've overwritten the cuda3.1 client, and downloads the original one. (so you have to do it every time) 2012. 06. 27. 15:08:15 GPUGRID [error] File acemd.win.2352 has wrong size: expected 2349568, got 3454464 2012. 06. 27. 15:09:29 GPUGRID Started download of acemd.win.2352 The good news is I'm writing a little batch program to do the job. Stay tuned. | |
ID: 25929 | Rating: 0 | rate: / Reply Quote | |
Does anyone know how it notices ... is it smart enough to read the file's header or is it only looking at the timestamp. If it is only the time stamp I have code (at work) that uses the win32 api to set the stamp to whatever you tell it to be. | |
ID: 25930 | Rating: 0 | rate: / Reply Quote | |
Does anyone know how it notices ... is it smart enough to read the file's header or is it only looking at the timestamp. If it is only the time stamp I have code (at work) that uses the win32 api to set the stamp to whatever you tell it to be. According to the error message, it compares the size of the files. | |
ID: 25933 | Rating: 0 | rate: / Reply Quote | |
Workaround V3.0 :) | |
ID: 25934 | Rating: 0 | rate: / Reply Quote | |
I tried your batch files. Seemed to run fine on the running task (cuda3.1) and made it run like a cuda4.2. | |
ID: 25942 | Rating: 0 | rate: / Reply Quote | |
Thank you for your report. This error could be by coincidence. | |
ID: 25943 | Rating: 0 | rate: / Reply Quote | |
It sounds like all of you are receiving a combination of CUDA 3.1 and 4.2 tasks. I have the same issue. My real concern is that all of my cards are overclocked and CUDA 4.2 tasks may required a different hardware config that CUDA 3.1 tasks. It would be great to get only 4.2 work units going forward. | |
ID: 25945 | Rating: 0 | rate: / Reply Quote | |
It would be quicker and simpler to create an app_info.xml file from the information already available in client_state, as I did for Running multiple tasks per GPU - count=0.5. | |
ID: 25947 | Rating: 0 | rate: / Reply Quote | |
I have a different problem with the scheduling of this batch program: I think I've had a cuda3.1 download after a cuda4.2, without an "error while downloading". I'll let your batch run for 24 hours and see what other errors it generates. So far, having a single WU error out immediately while running a cuda3.1 WU in only 60% of the time (with another well on its way to completion) has been worth the effort. Definitely interesting way to try to solve the issue, though probably not the silver bullet a separate queue selection would be. | |
ID: 25949 | Rating: 0 | rate: / Reply Quote | |
I think I've had a cuda3.1 download after a cuda4.2, without an "error while downloading". OK. On my second system, I definitely just had this scenario. No issues and it is 20 minutes in. One interesting thing is that the slot appears to have changed from 8 (this afternoon) to 9 (right now). Looking into the slot 9 folder, it definitely has the cuda4.2 file size renamed as the cuda3.1 filename. However, it has the cuda3.2 .dll files (no sign of the cuda4.2 dll's). This doesn't seem to impact the WU running (its been plugging along for 20+ minutes now), but not sure why I don't see the cuda4.2 dll's. Speed seems on par for the cuda4.2 WU's. I'm not an expert, so does it makes sense that the cuda4.2 executable can run with the cuda3.1 dll's? | |
ID: 25951 | Rating: 0 | rate: / Reply Quote | |
I'm not an expert, so does it makes sense that the cuda4.2 executable can run with the cuda3.1 dll's? No. Use Dependency Walker to see which support files an application executable needs - the cuda42 app here needs, as you would expect, the cu...32_42_9 DLLs. Use Process Explorer to see which DLLs a running application is using - and where it's loading them from. My guess is that the application is finding the right DLLs somewhere else in the path, and loading those in preference to the ones in the slot directory - I've been caught that way in the past. If the project is doing a copy-rename on those DLLs, then it's wasting a lot of time and disk access on something which is going to be no use at all. Somebody should have a look at the <app_version> section of client_state.xml to see what's going on. | |
ID: 25954 | Rating: 0 | rate: / Reply Quote | |
There is a Boinc flag to help with the earlier problem.. | |
ID: 25965 | Rating: 0 | rate: / Reply Quote | |
Here is my first converted workunit, which ran from the start to completion as a converted wu. | |
ID: 25967 | Rating: 0 | rate: / Reply Quote | |
There is a Boinc flag to help with the earlier problem.. Wow! I put this in my cc_config.xml. If it works, then we have to apply only once my batch program. | |
ID: 25968 | Rating: 0 | rate: / Reply Quote | |
It's working! | |
ID: 25971 | Rating: 0 | rate: / Reply Quote | |
Trying to round out the conversation ... | |
ID: 25973 | Rating: 0 | rate: / Reply Quote | |
It would be beneficial if there was some way to tell after the fact that a WU labeled 3.1 was processed using 4.2 for the user and the project in case troubleshooting was required for some problems with results or whatever. Hopefully no problems will arise. | |
ID: 25975 | Rating: 0 | rate: / Reply Quote | |
Mark, thanks for pointing out the solution for the flaw of my workaround! It would be beneficial if there was some way to tell after the fact that a WU labeled 3.1 was processed using 4.2 for the user and the project in case troubleshooting was required for some problems with results or whatever. The CUDA4.2 and the CUDA3.1 client produce very different stderr output files, so if you see a CUDA4.2 like stderr output of a CUDA3.1 task, then it's a converted one. The runtimes are also very distinguishable. Hopefully no problems will arise. I am confident in that no problems will arise from converted workunits. The CUDA3.1 and the CUDA4.2 client has the same version (and sub-version) numbers (v6.16). My guess is that the only difference between them is the version of the CUDA compiler used to build them. Using a bat file to make a copy of a file and rename it seems overly complex but mapping out what needed to happen has been extremely valuable! Yes, it can be done with Total Commander in a single step. I did it this way with this utility, but I had to provide a workaround for everyone, and every host, also the V3.0 was operating with the slots too, but it seems unnecessary, since the BOINC manager runs the client from the project's directory. I don't know why the BOINC manager copies the executables to the slot directory, if it won't use them afterwards. | |
ID: 25978 | Rating: 0 | rate: / Reply Quote | |
Looks like you need to have already run a 4.2 task in order to use the batch program? | |
ID: 25982 | Rating: 0 | rate: / Reply Quote | |
Looks like you need to have already run a 4.2 task in order to use the batch program? Yes. | |
ID: 25983 | Rating: 0 | rate: / Reply Quote | |
Looks like after a system reboot, the old cuda3.1 executable is downloaded again. So, it appears the batch file needs to be run after every reboot. | |
ID: 25984 | Rating: 0 | rate: / Reply Quote | |
---the default location of the cc_config.xml file is cc_config.xml is a BOINC global file, not a project specific file. It belongs in the root of the BOINC data folder structure. The easiest way to verify whether your own installation is using the default location is to look at the BOINC Manager message/event log: the working BOINC data directory is listed at around the fourth line after every BOINC restart. | |
ID: 25996 | Rating: 0 | rate: / Reply Quote | |
You are right. Sorry, it's a copy-paste bug. Let's blame it on the heat. | |
ID: 26002 | Rating: 0 | rate: / Reply Quote | |
The 3.1 DLLs are being pulled to the slot directory with the renamed 4.2 acmed file, is this correct ? | |
ID: 26011 | Rating: 0 | rate: / Reply Quote | |
Yes. It's because the BOINC manager doesn't know that it's a CUDA 4.2 client. | |
ID: 26025 | Rating: 0 | rate: / Reply Quote | |
It would be quicker and simpler to create an app_info.xml file from the information already available in client_state I'm very new to GPU crunching so would you be willing to help a noob and post a copy of the app_info file you're using to make the 3.1 run like a 4.2? Thanks. | |
ID: 26026 | Rating: 0 | rate: / Reply Quote | |
I had my first two 4.2 tasks complete and validate. Now everything is failing with the message below. What happened? | |
ID: 26048 | Rating: 0 | rate: / Reply Quote | |
From the posts I have kept up with that error is often caused by overclocking. | |
ID: 26056 | Rating: 0 | rate: / Reply Quote | |
From the posts I have kept up with that error is often caused by overclocking. The card is an MSI GTX465 GE unlocked to a 470. I bought it used and had been running it at the settings it had when I bought it. 1.025 vcore, 700 MGz core clock, 1400 shader clock, 1848 memory clock. I thought after the first 2 tasks completed I was good to go but everything else after that failed. This is my first real dive into Nvidia GPUs and I'm finding that what works on ATI doesn't work on Nvidia. I've been trying different things but it seems that all the card needed was a little bump in the vcore voltage. I'm 3 1/2 hours into a long run task so we'll see how that goes. If it and the 1 in cache make it to validation then I'll try the 3.1 to 4.2 work around or maybe someone can point me to an app_info that will do the trick. Thanks for the help. | |
ID: 26059 | Rating: 0 | rate: / Reply Quote | |
I managed to get this running on a windows xp computer, with one video card, with no problem, by following the instructions. On a windows 7 computer with 2 video cards, it doesn't work for me. I followed the instructions, made sure the project and slot directories were correct,(mine were in a different location than listed on the postings, so I adjusted commands to those appropriate locations) and ran as administrator. Nothing! | |
ID: 26076 | Rating: 0 | rate: / Reply Quote | |
Works for me on 2 XP machines and a W7 machine. Great post Retvari. | |
ID: 26081 | Rating: 0 | rate: / Reply Quote | |
It's working! | |
ID: 26085 | Rating: 0 | rate: / Reply Quote | |
Retvari, Thank you! | |
ID: 26288 | Rating: 0 | rate: / Reply Quote | |
Of course, the wu still has to validate, but I am optimistic that it will do so and will qualify for 24 hr bonus. All of my converted workunits have been validated. I'm sure that yours will validate too. I'm a noob at any kind of programming, and really far better with hardware than software, so I really appreciate the detail in your instructions. You're welcome! It's my pleasure if my workaround helps you. I'll let you know if the task validates. I'll keep an eye on it. | |
ID: 26293 | Rating: 0 | rate: / Reply Quote | |
The wu validated and I feel really good about my little programming venture, even though all I did was follow your instructions. | |
ID: 26297 | Rating: 0 | rate: / Reply Quote | |
Retvari, | |
ID: 26367 | Rating: 0 | rate: / Reply Quote | |
This works by updating the file in the project directory. If you set up that short batch file: | |
ID: 26368 | Rating: 0 | rate: / Reply Quote | |
One way you can verify is to confirm that in your GPUGrid project directory (potentially at C:\ProgramData\BOINC\projects\www.gpugrid.net on Win7), the files "acemd.win.2352" (CUDA3.1) and "acemd.2562.cuda42" (CUDA4.2) are the same size. | |
ID: 26369 | Rating: 0 | rate: / Reply Quote | |
I did reboot recently and did not run the batch program. Ah ha moment! Now for the noob question. I don't know how to run a batch program so I manually copied the files and renamed as per the long version of the original workaround instructions. I have your exact cc_config file installed. I also checked the files you suggested and they are the same size. With the speed increase I'm seeing, I feel confident I did it correctly, but would much rather run a batch every time I reboot. | |
ID: 26370 | Rating: 0 | rate: / Reply Quote | |
1. Why did it switch slots? Slots are storing those workunits which were already processed to some extent. The (new or empty) slots are (re)assigned to these workunits by the order they are processed. So after a couple of workunits, this order gets quite random if you participate in more than one project, because the slots aren't assigned to projects. 2. Do I need to re-implement the workaround every time a cuda 31 task gets sent my way? From my experience: No. In addition I don't have to bother with the files in the slot directories at all. I was hoping there was a way to process 31's as 42's without my involvement, since I cannot sit in front of the screen all day to look for cuda 31's that download. That's the aim of this workaround, but there's no guarantee that it will work on every system, with every version of BOINC manager etc. Do you have another rabbit to pull out of your hat? :) I'm not a magician. But I guess that the "don't check file sizes" option haven't applied on your BOINC manager. You can check it in the event log of the BOINC manager. If you can find the following error message, then this option is not set correctly. 2012. 06. 27. 15:08:15 GPUGRID [error] File acemd.win.2352 has wrong size: expected 2349568, got 3454464 2012. 06. 27. 15:09:29 GPUGRID Started download of acemd.win.2352 You can check if you have the correct cc_config.xml on the correct path with the correct name by: Click on the start button. Type in the search box: notepad c:\ProgramData\BOINC\cc_config.xml and press enter. If you see an empty document, copy the following text and paste it in notepad, then save the file. <cc_config> <options> <report_results_immediately>1</report_results_immediately> <dont_check_file_sizes>1</dont_check_file_sizes> </options> </cc_config> Re-read local configuration file in BOINC manager (it's in the advanced menu). So, how do I runSET GPUGRIDDIR=c:\ProgramData\BOINC\projects\www.gpugrid.net\ COPY "%GPUGRIDDIR%acemd.2562.cuda42" "%GPUGRIDDIR%acemd.win.2352" /y
Click on the start button. Type in the search box: notepad workaround.bat and press enter. Copy the two lines of text from the post, and paste it into notepad. Click File, then click Save As... then choose the Desktop as the destination. After this, you should see an icon somewhere on your desktop with two gears in it named "workaround". Right click on it, and choose "Run as an administrator". However I do not recommend to run this batch program at every startup, because it can interfere with the BOINC manager, if a CUDA3.1 task is already running when this batch program starts. | |
ID: 26376 | Rating: 0 | rate: / Reply Quote | |
However I do not recommend to run this batch program at every startup, because it can interfere with the BOINC manager, if a CUDA3.1 task is already running when this batch program starts. Good advice. I manually kick of the batch file after a reboot, following a check to see if a CUDA3.1 is running. They have been few and far between recently, so that's good (practically negating the need for the batch file -- but just in case...). | |
ID: 26377 | Rating: 0 | rate: / Reply Quote | |
Thank you both for the follow up! Good advice on what to do and when to do it. | |
ID: 26379 | Rating: 0 | rate: / Reply Quote | |
I noticed a couple things about this workaround reconversion. First, when I convert a cuda3.1 and reboot the computer afterwards, the unit often crashes. To avoid the crash you can, suspend the unit before rebooting, and then resume it after the computer has booted up. The second is, if you convert a cuda3.1 unit, and receive another one right after it, you don't have to run the conversion for the second unit for it to run as a cuda4.2. Of course, if you receive a cuda3.1, then one or more cuda4.2 in between, and then another cuda3.1, you have to convert both cuda3.1 units in order to for them to run as cuda4.2. | |
ID: 26537 | Rating: 0 | rate: / Reply Quote | |
All I did was a detach/reattach of the project. That will clean out the project folder and it will re-download apps. If your driver is a high enough version it should only give you a cuda 42 app and DLLs. It seems to have worked for me as I don't appear to be getting any cuda 31 tasks. | |
ID: 26538 | Rating: 0 | rate: / Reply Quote | |
It seems to have worked for me as I don't appear to be getting any cuda 31 tasks. Same here.. Don't think I've seen a cuda3.1 in several weeks. | |
ID: 26540 | Rating: 0 | rate: / Reply Quote | |
MarkJ - It looks like you are runing GTX 670 (nice cards), the CUDA 3.1 app was never released for Kepler so yes, you always get the CUDA 4.2 app. | |
ID: 26543 | Rating: 0 | rate: / Reply Quote | |
All I did was a detach/reattach of the project. That will clean out the project folder and it will re-download apps. If your driver is a high enough version it should only give you a cuda 42 app and DLLs. It seems to have worked for me as I don't appear to be getting any cuda 31 tasks. That's what I've done twice but I still get the occasional 3.1 task. Is the 301.42 driver not high enough to avoid this? ____________ | |
ID: 26546 | Rating: 0 | rate: / Reply Quote | |
All I did was a detach/reattach of the project. That will clean out the project folder and it will re-download apps. If your driver is a high enough version it should only give you a cuda 42 app and DLLs. It seems to have worked for me as I don't appear to be getting any cuda 31 tasks. It should be fine. My understanding was they were going to look at the compute capability of the card and driver version. If high enough (which your GTX570 and 301.42 are) then only supply the cuda42 apps. Only if the driver version was too low or the compute capability was 1.3 would they supply the cuda31 app. Maybe GDF could confirm that's how it's been setup. There was talk of making cuda40 the minimum version but when I asked what was happening with that they replied that there are too many people using older drivers. There was also talk of making compute capability of 1.3 the minimum but I don't think that's been done either. ____________ BOINC blog | |
ID: 26547 | Rating: 0 | rate: / Reply Quote | |
I have only long runs selected to be sent to my machine but I was sent a cuda31 wu today. My gtx 470 is compute capibility 2.0 and I have driver version 301.42. If this is supposed to happen still then disregard my post. | |
ID: 26811 | Rating: 0 | rate: / Reply Quote | |
I have only long runs selected to be sent to my machine but I was sent a cuda31 wu today. My gtx 470 is compute capibility 2.0 and I have driver version 301.42. If this is supposed to happen still then disregard my post. It not supposed to happen, even so it happens with my host all the time. That's why I've implemented (with the help of some fellow crunchers) and published my workaround for it. Just to highlight a few tasks: 5821216, 5823097, 5824470, 5818221, 5817268, 5816346, 5812236, 5806997 Thanks to my workaround, all of the above workunits were processed by the CUDA 4.2 client. | |
ID: 26814 | Rating: 0 | rate: / Reply Quote | |
Message boards : Number crunching : Workaround for the "CUDA3.1 client sent to CUDA4.2 capable hosts" problem