Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use waifu2x Docker image without Nvidia #444

Open
nodecentral opened this issue Mar 10, 2023 · 18 comments
Open

Use waifu2x Docker image without Nvidia #444

nodecentral opened this issue Mar 10, 2023 · 18 comments

Comments

@nodecentral
Copy link

Hi,

Reading the instructions it says Docker image (https://hub.docker.com/r/nagadomi/waifu2x ) Requires [nvidia-docker](https://github.com/NVIDIA/nvidia-docker)

I appreciate it will not be as quick without GPU support, but i’l like to use waifu2x on my QNAP NAS which does not have an Nvidia/graphic support..

@1374363910
Copy link

1374363910 commented Mar 10, 2023 via email

@nagadomi
Copy link
Owner

nagadomi commented Mar 10, 2023

This repo does not support CPU inference.

pytorch version supports CPU inference mode with --gpu -1 option.
https://github.com/nagadomi/nunif

But without GPU, it may be hundreds of times slower.
You can experience CPU inference on the following websites.
https://unlimited.waifu2x.net/

@nodecentral
Copy link
Author

Many thanks for responding, what are my options as I’m very keen to upscale some old photos

My home set up is an iPad and a NAS (with no GPU), where I run a number of docker containers. Speed would be nice, but it’s not crucial, is there a guide for idiots I can follow ?

other options I have are an old laptop running windows, and old desktop pc, running Linux, but they both have ATI graphic cards by the looks of it.. can they be used ?

@nagadomi
Copy link
Owner

I just added Dockerfile on nunif repo.
I don't know your use case, if you have any questions, post here.

build docker

git clone https://github.com/nagadomi/nunif.git
docker build -t nunif .

run web server with CPU mode

docker run -p 8812:8812 --rm nunif python3 -m waifu2x.web --port 8812 --bind-addr 0.0.0.0 --max-pixels 16777216 --max-body-size 100 --gpu -1

Open http://localhost:8812/

CLI command examples are described in the following link.
https://github.com/nagadomi/nunif/blob/master/waifu2x/docs/cli.md

For Docker, you need to mount the host volume where the images are stored.
If you want to run with CPU mode, add --gpu -1 option.

@nodecentral
Copy link
Author

nodecentral commented Mar 12, 2023

Thanks so much @nagadomi , I’ve not built an image from a dockerfile before so this is new ground for me, but excited to try it..

@nodecentral
Copy link
Author

Hi @nagadomi , quick update to say I’ve created the image and created the container too, (thank you !!)

FYI on the logs..

……
Successfully built d41deba96bd7
Successfully tagged nunif:latest
[/tmp/nunif] # docker run -p 8812:8812 --rm nunif python3 -m waifu2x.web --port 8812 --bind-addr 0.0.0.0 --max-pixels 16777216 --max-body-size 100 --gpu -1

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

Bottle v0.12.25 server starting up (using WaitressServer(preload_app=True, threads=32, outbuf_overflow=10485760, inbuf_overflow=209715200, max_request_body_size=104857600, connection_limit=256, channel_timeout=120))...
Listening on http://0.0.0.0:8812/
Hit Ctrl-C to quit.

2023-03-12 22:53:58,404:nunif: [    INFO] diskcache: cache=0.03MB, RAM=426.29
[W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.

The first photo I tried was 1.2mb and it gave me an error, saying it was too large, the next one was 780kb and look like it was imported , but after that for a long time, it just gave a blank page, with the tab saying loading and then didn’t seem to complete . I’m assuming this was caused by due to not defining a mapped volume ? If so, what do I map to?

@nagadomi
Copy link
Owner

NNPACK! Reason: Unsupported hardware

That CPU may not support AVX2.
I am not familiar with NAS, but if it is Linux, you can check the CPU support flags with the following command.

cat /proc/cpuinfo

@nodecentral
Copy link
Author

Hi, sure here is the CPU information

[~] # cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 122
model name      : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping        : 8
microcode       : 0x16
cpu MHz         : 2595.840
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 24
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid md_clear arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 122
model name      : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping        : 8
microcode       : 0x16
cpu MHz         : 2595.840
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 24
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid md_clear arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 122
model name      : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping        : 8
microcode       : 0x16
cpu MHz         : 2595.840
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 24
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid md_clear arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 122
model name      : Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
stepping        : 8
microcode       : 0x16
cpu MHz         : 2595.840
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 24
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm 3dnowprefetch cpuid_fault cat_l2 cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms mpx rdt_a rdseed smap clflushopt intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts umip rdpid md_clear arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs ept_mode_based_exec tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 3993.60
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

[~] # 

@nagadomi
Copy link
Owner

The problem is that PyTorch's prebuilt library uses AVX/AVX2 instructions, so it cannot run on devices that do not support AVX.
I am planning to create a Dockerfile to build pytorch without AVX for older CPUs and embedded devices.

However, it is probably even slower than normal CPU processing, so if you have a CPU that includes avx2 in its flags, it would be better to use that.

@nodecentral
Copy link
Author

Thanks so much, good to know,

Sadly I don’t have another device to use, but happy to help you test out a non-Avx build. I’d love to give life to some of my old family photos

While I have you, as it seems there’s likely to be a clear performance verse image size (Mb) correlation?
If I’m using CPU only, up to what size of image file should be used ?

@nagadomi
Copy link
Owner

Conversion time depends on the resolution (pixel size) of the input image.
With modern GPU, 1024x1024 image can be converted in less than 1 second, but with CPU it takes about 1 minutes. With an older CPU, it would take much longer.
For 2048x2048, it takes 4 times as long ( (2048*2048)/(1024*1024)==4).
Maximum size depends on RAM.


If you do not have that many photos you wish to convert, I recommend using the web service I have made available.
https://waifu2x.udp.jp/
This will not expose the converted images to the public.
https://waifu2x.udp.jp/privacy_policy.txt


Also, I am currently developing a new photo model and will release it this month.
If your purpose is to convert photos not illustrations, I think you will have better results after that.

@nodecentral
Copy link
Author

nodecentral commented Mar 14, 2023

Thanks @nagadomi

More than happy to wait for your new release - as my focus is on finding tools to restore/enhance all the old family photos I have and will be scanning in over the coming weeks.

It looks like I will eventually need to invest in a device that has a GPU , can I confirm that GPU has to be Nvidia not another brand/make e.g Intel, ATI, AMX etc..? Also what specification of machine should be used CPU type/speed, memory etc. (I assume a RaspberryPi is not viable)?

@nagadomi
Copy link
Owner

PyTorch supports AMD GPU(ROCm), and macOS(MPS), so it will work if the device is not too old.
However, I do not have those devices so I have not tested it.
Better CPU and memory are better, but not as important. GPU memory(VRAM) is good to have at least 4 GB.


If you scan a photo, it may be enough to scan it as large as possible and then downsize it.
Super-resolution is useful when the source image is lost or only small digital data exists. It is best not to use it if it can be avoided.

@nodecentral
Copy link
Author

nodecentral commented Mar 15, 2023

Hi @nagadomi

While I still want to honour the focus of this post and see how I can work with a not GPU device (like my NAS) ..

Looking at Nvidia cards on ebay to potentially build an entry level PC suitable for waifu2x, other than at least 4GB of memory is there anything else I should look for ? Do I need a certain type of Nvidia chipset or version - or can I just use anything Nvidia branded ?

@nagadomi
Copy link
Owner

I am not a hardware consultant so I do not make responsible recommendations.

cuDNN(GPU-accelerated library) is required for Compute Capability 5.0 or later.
https://en.wikipedia.org/wiki/CUDA (Compute Capability section)
Practically speaking, I think GTX1050 (Pascal architecture) is the minimum.
Turing architecture is 2x faster than Pascal architecture for 16-bit float operations.
RTX3060(Ampere architecture) will be able to use most other AI-related products without stress.

Note that GPU is large and consumes a lot of power, so there may not be enough space to fit them, or the power supply unit may not be enough.

@nagadomi
Copy link
Owner

I have created a Dockerfile that builds pytorch from source code.
This should be possible to avoid using AVX instructions, but I have not tested in NoAVX device.
doc: https://github.com/nagadomi/nunif/tree/master/Dockerfiles

build

git clone https://github.com/nagadomi/nunif.git
docker build -t nunif -f Dockerfiles/Dockerfile.cpu_noavx Dockerfiles

run web server with CPU mode

docker run -p 8812:8812 --rm nunif python3 -m waifu2x.web --port 8812 --bind-addr 0.0.0.0 --max-pixels 16777216 --max-body-size 100 --gpu -1

When I tried it, it took 1 minute to convert one image with style=photo(old model) and 10 minutes with style=Artwork(new model).

@nodecentral
Copy link
Author

nodecentral commented Mar 17, 2023

Thanks, I’m still new to using Docker from the command line, i tried the above command, but it gave me an error message.

[/tmp] # git clone https://github.com/nagadomi/nunif.git
fatal: destination path 'nunif' already exists and is not an empty directory.

If I try in another location (not tmp) it progresses a bit more, but still ends with an error (see below)

[/] # git clone https://github.com/nagadomi/nunif.git
Cloning into 'nunif'...
remote: Enumerating objects: 3099, done.
remote: Counting objects: 100% (486/486), done.
remote: Compressing objects: 100% (252/252), done.
remote: Total 3099 (delta 285), reused 391 (delta 229), pack-reused 2613
Receiving objects: 100% (3099/3099), 5.66 MiB | 2.89 MiB/s, done.
Resolving deltas: 100% (2020/2020), done.
[/] # docker build -t nunif -f Dockerfiles/Dockerfile.cpu_noavx Dockerfiles
unable to prepare context: path "Dockerfiles" not found

@nagadomi
Copy link
Owner

sorry, cd is needed.

git clone https://github.com/nagadomi/nunif.git
cd nunif
docker build -t nunif -f Dockerfiles/Dockerfile.cpu_noavx Dockerfiles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants