An extremely fast directory exploration tool to find:
- Largest files
- Duplicated files
- ... to be continued
In directories of any size and structure.
- ⚡️ Analyzes large software projects in < 1 sec
- 🤝 Respects
.gitignore
files - 🏠 Works locally and doesn't send your data anywhere
- 📖 Performs only read operations and doesn't modify files
- 💾 Has been tested on directories up to 100 GB of size, 20,000 files, 5,000 subfolders
Unfolder can be useful for:
- Software maintainers to reduce repo size and eliminate duplicate files, within or across projects.
- Project managers to avoid extra data storage costs and have single location for each key artifact.
Unfolder analyzes codebases of large open-source projects in under half a second:
Project | Files | Folders | Elapsed time, ms |
---|---|---|---|
Apache Airflow | 7,558 | 1,713 | 310 |
Ruff | 7,374 | 615 | 182 |
React | 6,467 | 532 | 156 |
CPython | 5,182 | 420 | 136 |
Kedro | 527 | 122 | 176 |
Time values are measured during local runs on a MacBook Pro with Apple M1 Max chip, 32 GB RAM.
Warning
This is a personal pet project of Yury Fedotov, implemented to learn Rust by doing. It may contain bugs or incomplete features. It is shared "as is" without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose, or non-infringement.
Use at your own risk. The author is not responsible for any damage, data loss, or other issues that may arise from using this software.
Currently, only installation from source is supported:
- Make sure you have Rust toolchain set up.
- This can either be done as the Rust guide suggests.
- Or if you've using RustRover IDE, it manages it automatically.
- Clone project repo locally, and
cd
there. - Run
cargo build --release
to build the binary executable file for the tool. - Run
cargo install --path .
to install this executable and make it available underunfolder
namespace in CLI.
The tool currently has just one CLI command which is available as:
unfolder path/to/directory/
In addition to path to directory, it can take 3 optional arguments:
Argument | Short | Long | Options | Default |
---|---|---|---|---|
List of file extensions to analyze | -e | --extensions | Comma-separated: e.g. py,png | All |
Minimum file size to consider for duplicate analysis | --min_file_size | One of the following alias: blank, config, code, excel, document, image, gif, audio, video, large | code (100 Kb) | |
Number of largest files to return based on size | -n | --n_top | Any positive integer | 5 |
So for example:
unfolder path/to/directory/ -e csv,pkl,png,gif --min_file_size image
Would:
- Analyze
path/to/directory/
. - Consider only files of
csv
,pkl
,png
andgif
extensions. - While identifying duplicates, ignore files smaller than
image
alias implies (10 Mb).
You can also run unfolder -h
to get info on arguments.
- Unfolder is written in pure Rust, which gives a very performant baseline.
- It leverages parallelism to analyse files faster (as much faster as many cores you have).
- To check for duplicate files, it leverages a very fast hashing algorithm:
xxHash64
.
No. As can be validated from its open-source code, it performs only does read operations.