NSNet2 is a deep learning artificial recurrent neural network (RNN) used for background noise reduction in speech audio files. Microsoft originally released NSNet2 as an updated comparision baseline for their annual Deep Noise Suppression (DNS) challenge, but it is inconvient to use and is not suited for real-time processing. Use of the released NSNet2 model requires correct versions of Python and supporting libarires (including PyTorch and ONNXruntime) to be installed and properly linked together. These are relatively big software packages to install and configure just to run this one neural network model. Running NSNet2 without any changes also uses massive amounts of uneccessary memory (RAM) that scales with the size of the audio file.
Currently there are not many wide-range, pre-trained, and effective noise suppressors for speech that can be used easily. Projects like RNNoise have several quirks, but NSNet2 can be the next-step-up for fine-tuning captured speech. NSNet2 just needed to be converted to a version that more people looking for additional audio filters for recorded speech could utilize, which is the major focus of this project. This nameless project is a user friendly conversion of the NSNet2 released by Microsoft Research. An example where this software was applied can be found on this project's website.
- Noise Reduction of Speech Audio
- Usable, Fast, and Straightforward versions of NSNet2
- Input wave (.wav) audio file -> Output equivalent noise suppressed audio file
- Offline | Non-Live and Real-time | Live versions
- Low RAM Usage while maintaining the same processing speed as NSNet2 (Offline version)
- Utilize Single-Instruction-Multiple-Data Instructions (AVX2 and FMA) of modern x64 CPUs
- Thorough Explanation (with diagrams) of how NSNet2 performs effective Noise Reduction
- Thorough Explanation (with diagrams) of what the code is doing and why it was written that way
- Simple to Compile and Modify
- No reliance on math or general matrix calculation libraries for any repeating calculations
This project was created as a starting point into creating open audio noise reduction software. Audio noise suppression research (with and without using neural networks) produces various publications and snippets throughout the web but rarely leads to open (and pre-trained) usable software. Deep learning models get compared in the Microsoft DNS-challenge and while some of the model designs are published (usually with only minimal information) the exact implementations and trained model parameter values are kept private. However, these light publications sometimes give enough information that the network model can be mostly recreated or be useful to synthesize hybrid designs. Models can then be trained with customizeable training file sets (like the one from the DNS-challenge). The final results could then be run through a comparison process against each other and possibly against the DNS-challenge results.
Since NSNet2 was published with the exact implementation and trained values, converting the model to a more user friendly version was straightforward. The model was published in the open ONNX format, which meant testing if the ONNX runtime software could be used as the main and biggest dependency alongside compileable code. Unfortunately the current ONNX runtime software suffers from the innability to carry over the model's Gated Recurrent Unit (GRU) hidden states from a previous run which is the biggest reason it is unsuitable for real-time versions of NSNet2. The model value data was extracted (and reorganized) from the ONNX file to be used with the converted version of the model.
TO ADD (software / code principles and links to detailed explanation site with examples)
- Convert a video or audio file to a WAVE (.wav) audio file using ffmpeg and the Command Prompt (or Windows Powershell)
ffmpeg.exe -i inputFile -vn -ac 1 output.wav
where "-vn" removes any video element and "-ac 1" mixes the audio into one channel (mono) - Run (Double-click) the latest NSNet2Offline.exe executable downloaded from the releases page of this project
- Navigate to and select the "output.wav" audio file created in step 1
- The converter will create "output-Enhanced.wav" in the same directory that "output.wav" resides This process should take about 2-8 seconds for every 60 seconds of audio
- Listen and compare the resulting noise suppressed file to the original
TO ADD (use ffmpeg)
TO ADD (pre-process with Audacity Noise Reduction Effect)
- Works only with 48kHz 1-Channel (Mono) Wave Audio Files
- Offline Version Released Only
- Does not convert 2-Channel (Stereo) Audio
- Real-time version is RAM memory bound (reads ~23.5MB of data for every 10ms of converted audio)
- Too basic and minimal WAVE file error checking
- Only works on Windows OS (Tested with fresh install of Windows 10)
- Requires a newish x64 CPU with AVX2 and FMA support
- TO ADD
- Live Version
- 2-Channel Convert; Each channel seperately and mixed stereo to mono then convert
- Multithreading Capability for Offline Version
- Adjustable RAM usuage (might increase offline processing speed by a tiny amount)
- More Code documentation
- Linux and FreeBSD support
- TO ADD
All C code is currently written to be compiled with gcc for Windows using the MinGW-w64 software. Latest versions can be found here. The C code will be modified in the future to be compiled with gcc no-matter the operating system.
All x64 Assembly code (currently containing only subroutine functions) were written to be assembled by the flat assembler (FASM). The assembly code contains the functions that do the main processing and make AVX2 and FMA calls. The assembled object files get linked into the final executable by gcc / ld
The FFTW library is used for performing the Discrete Fourier Transform (DFT) and its inverse. The single-percision floating point static library version (for Windows) needs to be compiled and will get linked into the final executable by gcc / ld. In the current source of FFTW, CMAKE can be used with MinGW-w64 on Windows to create the static libray after a couple of modifications. The memory allocation file needs to be modified before utilizing the CMAKE script and the script needs to specify the following options: (TO ADD)
The Makefile can be used to create the executables found on the release page. MinGW-w64 comes with Make that can process the Makefile to compile the source code once MinGW binaries and FASM binaries are added to the path. Using the Command Prompt (or Windows Powershell) change directory into the root folder of this project and run: mingw32-make.exe
The resulting executable and necessary networkData.bin file can be found in the bin subdirectory. The Makefile directs Make to use both gcc and FASM to create the intermediate object files from the source code which then get linked together with the fftw libaray in the executable by gcc / ld
TO ADD
- How to (re)-train the neural network data from the DNS Challenge set
- Optimized version of RNNoise project with prinicples taken from this project
- New noise reduction project utilizing neural networks and ideas taken from other projects and published works
- TO ADD
Email me: [email protected]