Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a check for it and now we will see error: 'is a directory' #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kunal
Copy link

@kunal kunal commented Nov 23, 2018

Added a check for it and now we will see error: 'is a directory'. Resolves daokoder#523 in main repo https://github.com/daokoder/dao

modules/stream/dao_stream.c Outdated Show resolved Hide resolved
@kunal
Copy link
Author

kunal commented Dec 7, 2018

Added the suggested error message

@kunal
Copy link
Author

kunal commented Jan 3, 2019

@dumblob Any updates? :)

@dumblob
Copy link
Owner

dumblob commented Jan 3, 2019

@kunal thanks for asking - I'm actually quitting my job, moving to a different place and changing even a country (state). I was expecting I'll manage better, but it seems I'll be busy until the end of January. Starting from February I'm though quite free on the other hand, so I'll devote significantly more time to community work than during the last 2 years.

I've started testing this patch and found out we're having the same issue as Python 3 had (or maybe still has under certain circumstances due to its modularity) - namely calling stat() two times consecutively in real scenarios instead of once. I'm afraid we can't let this flow into the upstream. Any ideas how to solve this issue differently?

@kunal
Copy link
Author

kunal commented Jan 10, 2019

@dumblob I will check and get back to you.

@kunal
Copy link
Author

kunal commented Apr 23, 2019

Not sure how to resolve this :) Checked the Python implementation as well. Check line 455: https://github.com/python/cpython/blob/bcda8f1d42a98d9022736dd52d855be8e220fe15/Modules/_io/fileio.c

There is a check for a directory.

@dumblob
Copy link
Owner

dumblob commented Apr 23, 2019

I gave it a thought and I'm pretty confident we shall:

a) open the file without checking anything (like the current upstream)
b) first then check, whether the opened file descriptor is a directory using fstat() in case open() didn't error out

This has basically all advantages (no atomicity issue like when calling stat() before open(); on Windows will open() error out on a directory like it is now; on Linux will open() error out on a directory in case the open mode flag is not O_RDONLY) and just one slight disadvantage (it takes 2.6x longer in the worst case than plain stat() - see below). I think this delay is acceptable considering we gained atomicity.

I did some measurements for the very worst case on my 10 years old machine with HDD (no SSD nor anything faster) on Linux 4.19.34-1-lts with ext4 filesystem mounted with relatime,lazytime (lazytime is way more important than relatime).

variant 1.5 GByte file, 1 million iterations empty directory, 1 million iterations
f = open( p, O_RDONLY ); fstat( f, &s ); close( f ); real 0m3.563s user 0m0.420s sys 0m3.140s real 0m3.279s user 0m0.369s sys 0m2.899s
f = open( p, O_RDONLY ); close( f ); real 0m2.955s user 0m0.430s sys 0m2.518s real 0m2.629s user 0m0.347s sys 0m2.262s
stat( p, &s ); real 0m1.370s user 0m0.161s sys 0m1.210s real 0m1.002s user 0m0.170s sys 0m0.830s

So, basically we'll do something like:

static FILE* DaoIO_OpenFile( DaoProcess *proc, DString *name, const char *mode, int silent )
{
  ...
  fin = Dao_OpenFile( fname->chars, mode );
  Dstring_Delete( fname );
  #ifndef WIN32
  struct stat s;
  fstat( fileno( fin ), &s );
  #endif
  if( fin == NULL && silent == 0 &&
  #ifndef WIN32
  S_ISDIR( s.st_mode ) != 0
  #endif
  ){
  ...

So, now we also know, that stat() is about 2.25x slower than fstat() - this "huge" difference seems to be the cost of the path lookup.

@kunal
Copy link
Author

kunal commented Aug 21, 2019

Thanks for the details :) I will try this out. Only downside I feel here is we wont be able to specifically let the user know that it was a directory that he/she was trying to open and hence the error.

Also not sure why should we let open the file at first place. Why dont we just fstat first and then once we are sure its not a directory we let file to be opened because anyway we are doing this check later once we open the file. This also allows us to raise a specific error saying opening directories is not allowed. Let me know what do you think.

@dumblob
Copy link
Owner

dumblob commented Aug 27, 2019

Only downside I feel here is we wont be able to specifically let the user know that it was a directory that he/she was trying to open and hence the error.

Well, there is actually a check for whether it's a directory or not (see S_ISDIR( s.st_mode ) != 0 in the code snippet in my last comment). Or am I missing something?

Also not sure why should we let open the file at first place. Why dont we just fstat first and then once we are sure its not a directory we let file to be opened because anyway we are doing this check later once we open the file.

Dao is a language with first class support for concurrent parallelism. Thus we must not allow any race conditions. File systems on UNIXes (including Linux) are not transactional and thus it's impossible to run fstat() before open() in a Dao thread (belonging to some process) as there might be another (e.g. non-Dao) process or thread e.g. removing the directory and creating a file with the same name - both right after the Dao thread ran fstat(), but before the Dao thread ran open() (this situation is not uncommon as context switches are unpredictable and might happen immediately after fstat() in the Dao thread returned).

This also allows us to raise a specific error saying opening directories is not allowed.

I don't see any issue with raising a specific error - see the code snippet from my last comment.

@dumblob
Copy link
Owner

dumblob commented May 9, 2020

Btw. I've run another benchmark to see if we could first just open the path as read only (disregarding which opening mode the user has chosen) and if it's a file (not a directory), then just freopen() the opened file with the user-chosen mode. Note, freopen() shall be atomic, so no race conditions there.

#include <stdio.h>  // fopen, fclose, printf, FILE

int main( int a, char *aa[] ){
  a = a; aa = aa;
  FILE *f = NULL;
  for( int i=0; i<10000000; ++i ){
    f = fopen( "file.txt", "r" );
    if( f == NULL ){ printf( "ERR fopen() failed\n" ); return 1; }
    fclose( f );
  }
  return 0;
}

(gcc -O3 -std=c11 -pedantic bench.c -o bench && time ./bench)

#include <stdio.h>  // fopen, fclose, printf, FILE

int main( int a, char *aa[] ){
  a = a; aa = aa;
  FILE *f = NULL;
  for( int i=0; i<10000000; ++i ){
    f = fopen( "file.txt", "w+" );
    if( f == NULL ){ printf( "ERR fopen() failed\n" ); return 1; }
    fclose( f );
  }
  return 0;
}

(gcc -O3 -std=c11 -pedantic bench.c -o bench && time ./bench)

#include <stdio.h>  // fopen, fclose, printf, FILE

int main( int a, char *aa[] ){
  a = a; aa = aa;
  FILE *f = NULL;
  for( int i=0; i<10000000; ++i ){
    f = fopen( "file.txt", "r" );
    if( f == NULL ){ printf( "ERR fopen() failed\n" ); return 1; }
    f = freopen( NULL, "w+", f );
    if( f == NULL ){ printf( "ERR freopen() failed\n" ); return 1; }
    fclose( f );
  }
  return 0;
}

(gcc -O3 -std=c11 -pedantic bench.c -o bench && time ./bench)

Results

The file already existed, had about 5 bytes inside and wasn't a sparse file. The filesystem was ext4 under Linux LTS 5.4.32.

# 10 000 000, fopen r, file already existed
0$ time ./my

real    0m33,393s
user    0m8,526s
sys     0m24,652s
0$ time ./my  # second run to get a glimpse of variance

real    0m33,666s
user    0m8,198s
sys     0m25,320s
# 10 000 000, fopen w+, file already existed
0$ time ./my

real    2m43,142s
user    0m12,710s
sys     2m28,899s
# 10 000 000, fopen r, freopen w+, file already existed
0$ time ./my

real    4m23,731s
user    0m25,985s
sys     3m57,064s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

io.open() may create invalid stream
2 participants