Advanced search

Message boards : Number crunching : RuntimeError: Unable to find a valid cuDNN algorithm to run convolution when running python

Author Message
kotenok2000
Send message
Joined: 18 Jul 13
Posts: 48
Credit: 11,353,293
RAC: 6,936
Level
Pro
Scientific publications
wat
Message 58983 - Posted: 7 Jul 2022 | 20:06:30 UTC
Last modified: 7 Jul 2022 | 20:08:01 UTC

I have nvidia gtx 1650.
Maybe it is too old?

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 48
Credit: 11,353,293
RAC: 6,936
Level
Pro
Scientific publications
wat
Message 58984 - Posted: 7 Jul 2022 | 20:55:38 UTC - in response to Message 58983.
Last modified: 7 Jul 2022 | 20:57:20 UTC

Another got RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Third workunit finished and validated in 1645 seconds.

jjch
Send message
Joined: 10 Nov 13
Posts: 91
Credit: 15,040,000,871
RAC: 1,015,809
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58987 - Posted: 8 Jul 2022 | 22:15:59 UTC - in response to Message 58984.

First off I would say that the Python apps seem to have a high error rate. I'm noting about 40% failures on my windows systems without finding a good reason why. There could be a cause for this but it might also be normal.

The error you noted below seems to be from a variation of the memory used on the GPU. I think the GTX 1650 should be adequate to run the Python apps, so it could be a problem with the Python app.

What might be happening is you are also using GPU memory for something else at the same time or prior to GPUgrid. Don't run any other GPU projects or play games etc.

I also noted some of your tasks failed where it looked like you were running out of system memory. 16GB is on the low side of what will work well with other things running.

I would suggest setting things up so you are only running one GPUgrid Python app and look at your system memory usage. I have seen it be around 10Gb but it can be more.

Also check your available free disk space and the swap space you are using while you are monitoring it. Make sure you are not pushing the limits there and running out too.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1070
Credit: 1,450,990,714
RAC: 426,047
Level
Met
Scientific publications
watwatwatwatwat
Message 58988 - Posted: 9 Jul 2022 | 5:16:36 UTC

There's a problem with how Windows allocates virtual memory for Python libraries.

Linux does not have the issue because it allocates memory differently.

See this message of mine.

https://www.gpugrid.net/forum_thread.php?id=5322&nowrap=true#58908

kotenok2000
Send message
Joined: 18 Jul 13
Posts: 48
Credit: 11,353,293
RAC: 6,936
Level
Pro
Scientific publications
wat
Message 58993 - Posted: 9 Jul 2022 | 19:34:27 UTC - in response to Message 58988.

One also crashed because of CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

jjch
Send message
Joined: 10 Nov 13
Posts: 91
Credit: 15,040,000,871
RAC: 1,015,809
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58994 - Posted: 10 Jul 2022 | 2:48:24 UTC - in response to Message 58993.

The GTX 1650 is a 4GB card so it should have plenty of memory for the Python app. There is something else going on there.

I'm not a CUDA expert but there could be a problem with your driver. It looked like the driver you have is the current version.

I would suggest running a full deinstall and cleanup with DDU and reinstall it. If that still doesn't work go back to the previous version and see if that helps.

It could just be a problem with the Python/PyTorch programs and there interaction with CUDA or an error in the programming.

Other than that, I would only guess you having a problem with your card. Make sure it isn't overheating etc. Also, if you are overclocking revert that to normal etc.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 744
Credit: 4,943,798,494
RAC: 524,854
Level
Arg
Scientific publications
wat
Message 58996 - Posted: 10 Jul 2022 | 13:04:18 UTC - in response to Message 58994.

The GTX 1650 is a 4GB card so it should have plenty of memory for the Python app. There is something else going on there.

I'm not a CUDA expert but there could be a problem with your driver. It looked like the driver you have is the current version.

I would suggest running a full deinstall and cleanup with DDU and reinstall it. If that still doesn't work go back to the previous version and see if that helps.

It could just be a problem with the Python/PyTorch programs and there interaction with CUDA or an error in the programming.

Other than that, I would only guess you having a problem with your card. Make sure it isn't overheating etc. Also, if you are overclocking revert that to normal etc.



from what I remember, the python app was using more than 4GB of VRAM. it's definitely possible that 4GB isnt enough.

____________

jjch
Send message
Joined: 10 Nov 13
Posts: 91
Credit: 15,040,000,871
RAC: 1,015,809
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 58998 - Posted: 11 Jul 2022 | 4:06:44 UTC - in response to Message 58996.

That would be an interesting development. From what I have been gathering the Python app is not putting much of a load on the GPU. Not quite sure about the actual memory usage.

I tried to find a reference on what GPU memory is needed in the Forum but I only found one that mentioned a GTX980Ti .... gpu memory usage is almost constant at 2.679MB

If you find something that indicates they need 4Gb or more I would like to see it. I don't know of a good way to check on the GPU memory usage because you have to catch it when it's actually using it.

The error mentioned below in this thread is only referencing 28.00 MiB more than what was being used at 1.36 GiB and there is 1011.70 MiB free

CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 4.00 GiB total capacity; 1.32 GiB already allocated; 1011.70 MiB free; 1.36 GiB reserved in total by PyTorch)


That actually seems more like a memory error related to CUDA or the driver etc. Not the memory capacity of the card.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1070
Credit: 1,450,990,714
RAC: 426,047
Level
Met
Scientific publications
watwatwatwatwat
Message 58999 - Posted: 11 Jul 2022 | 5:32:19 UTC - in response to Message 58998.

The memory utilization seems to be constant on my gpus when they are running a Python task. Currently using 3349MB out of the 8GB on the card.

You can see that with nvidia-smi in a Terminal.

Or if you want to watch it in real-time then I can use this:

watch -n 1 nvidia-smi --query-gpu=name,pci.bus_id,utilization.gpu,utilization.memory,clocks.current.sm,power.draw,memory.used,pcie.link.gen.current,pcie.link.width.current --format=csv

which besides showing the amount of memory being used, also shows the memory bus and gpu utilization, clocks, watts and link width and speed.

jjch
Send message
Joined: 10 Nov 13
Posts: 91
Credit: 15,040,000,871
RAC: 1,015,809
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 59000 - Posted: 12 Jul 2022 | 0:40:35 UTC - in response to Message 58999.

I found a few tasks running on my Windows servers and checked them with GPU-Z. The GPU memory used was between 2518 and 3287 MB. I think with that usage these should run OK on a 4GB card.

Post to thread

Message boards : Number crunching : RuntimeError: Unable to find a valid cuDNN algorithm to run convolution when running python

//