Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize fs::write #134730

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Optimize fs::write #134730

wants to merge 1 commit into from

Conversation

ChrisDenton
Copy link
Member

Doing a write then truncate instead of truncate then write is much faster on Windows (and potentially some filesystems on other systems too). A downside is that it may leave the file in an inconsistent state if File::set_len fails.

Fixes #127606

I'm nominating for libs-api because this may not honour the API of std::fs::write. Maybe t-libs can also think of a reason not to do this.

Write then truncate instead of truncate then write.
@ChrisDenton ChrisDenton added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Dec 24, 2024
@rustbot
Copy link
Collaborator

rustbot commented Dec 24, 2024

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 24, 2024
@ChrisDenton ChrisDenton added the A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` label Dec 24, 2024
@Urgau
Copy link
Member

Urgau commented Dec 24, 2024

Do you have numbers? How much faster are we talking?

We should have a library benchmark (if we don't already have one).

@ChrisDenton
Copy link
Member Author

ChrisDenton commented Dec 24, 2024

I need to run a proper benchmark but if the end file size is about the same then there seems to be an order of magnitude difference, which seems significant regardless:

fn write_file_truncate(file_name: &str, data: &[u8]) {
    if let Ok(mut file) = OpenOptions::new()
        .write(true)
        .create(true)
        .truncate(true)
        .open(file_name)
    {
        file.write_all(data).unwrap();
    }
}

fn write_file_set_len(file_name: &str, data: &[u8]) {
    if let Ok(mut file) = OpenOptions::new()
        .write(true)
        .create(true)
        .open(file_name)
    {
        file.write_all(data).unwrap();
        let pos = file.stream_position().unwrap();
        file.set_len(pos);
    }
}

static DATA: &str = include_str!("p&p.txt");

fn main() {
    let now = std::time::Instant::now();
    for _ in 0..1000 {
        write_file_truncate("p&p.txt", DATA.as_bytes());
        //write_file_set_len("p&p.txt", DATA.as_bytes());
    }
    println!("{} ms", now.elapsed().as_millis());
}

Where p&p.txt is a copy of Pride and Prejudice.

The difference was 200 to 500 ms for truncate vs. 60 to 80 ms for set_len.

So this allows writing Pride and Prejudice an order of magnitude faster.

@clubby789
Copy link
Contributor

Benchmark on Linux:
tmpfs:

# truncate
  Time (mean ± σ):     195.1 ms ±  12.4 ms    [User: 0.5 ms, System: 193.7 ms]
  Range (min … max):   177.7 ms … 213.0 ms    16 runs
# set_len
  Time (mean ± σ):      82.8 ms ±   8.0 ms    [User: 0.9 ms, System: 80.9 ms]
  Range (min … max):    74.6 ms … 100.6 ms    31 runs

ext4 (on SSD):

# truncate
  20 seconds for one run
# set_len
  Time (mean ± σ):      85.5 ms ±   6.1 ms    [User: 0.7 ms, System: 84.8 ms]
  Range (min … max):    82.9 ms … 107.5 ms    28 runs

@the8472
Copy link
Member

the8472 commented Dec 25, 2024

Would marking a file as sparse make any difference here? At $work I've got significant speedups from that when incrementally writing a file on NTFS, but it was a different IO pattern than this.

ext4

That's likely due to auto_da_alloc. It's trying to be "helpful" here by adding an implicit fsync for this particular pattern.
An extreme waste of performance when you don't need durability. Potential data loss avoidance if your application isn't doing persistence properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` I-libs-api-nominated Nominated for discussion during a libs-api team meeting. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File truncation is slow on Windows
6 participants