Skip to content

Performance and Optimizing

Peter Andersson edited this page Aug 10, 2016 · 11 revisions

Below are notes on optimization of spiffs.

All performance graphs and collected from spiffs running on a Cortex CPU: STM32F103VET6 @ 72MHz. The code has a home-spun preemptive OS. The SPI bus is running at 36MHz, and the SPI communication is interrupt and DMA driven. The SPI flash chip is a Micron M25P16.

Code can be found here. Spiffs is configured according to this.

##FILE SYSTEM STRUCTURE#

Spiffs can be configured in many ways. Looking at the structure, spiffs normally do not complain even if things are oddly configured. The penalty is efficiency, either in flash usage or in time.

Following statements are guidelines.

Spiffs needs two free logical blocks. Hence, having less than 6 logical blocks is suboptimal. Having a great deal of logical blocks (more than 512) is valid, but spiffs will need to process a lot of data and this takes time. It might be better to have bigger logical blocks giving less number of blocks.

Considering logical block size and logical page size, one block should have at least eight pages per block. More is better. Upper limit is untested, but more than 1024 pages per block will probably be suboptimal. 32 to 512 pages per block feels reasonable.

Remember, it is perfectly valid to have a logical block size that spans multiple physical blocks. It is not necessary to span 2^n physical blocks, any multiple will do. Though, the size of the entire spiffs file system must be an integer multiple of the logical block size.

This all depends on how large your file system is, the median size of your files, and how the files are accessed and modified. To find your global maximum: test, test, test.

Then again, as mentioned, spiffs swallows pretty much. If you're not having problems with speed you should perhaps not spend too much time testing around.

##EXECUTION FOOTPRINT#

Spiffs crunches pretty much data from time to time. Stating the obvious, use compiler optimizations.

If you have some spare tightly coupled memory, you might want to put spiffs work buffer there. If there is even more room, put the cache there also. If there is still place for it, toss in the file descriptor buffer as well.

If you only run one spiffs instance, some cpu cycles can be saved using the SPIFFS_SINGLETONconfig.

##READ AND WRITE CACHE#

Configs SPIFFS_CACHE (read cache) and SPIFFS_CACHE_WR (write cache).

Enabling read cache is highly recommended, and gives a performance boost at the cost of some memory. The more pages you can give, the better of course. But do note, spiffs only supports up to 32 cache pages. Giving more is waste of ram.

Also, there should be at least one write cache page. Otherwise, one must take care when writing files as all writes will be write through. Without write cache, only largest possible quantity of data should be written at any time. A loop like the following would wear the spi flash extremely without write cache:

// Bad example when not having write cache enabled
int write_one_million_fibonacci_numbers(void) {
  spiffs_file fd = SPIFFS_open(fs, "fib", SPIFFS_O_CREAT | SPIFFS_O_TRUNC | SPIFFS_O_RDWR, 0);
  if (fd <= 0) return SPIFFS_errno(fs);
  int i = 1000000;
  int a = 0;
  int b = 1;
  while (i--) {
    // calc fibonacci number
    int c = a + b;
    a = b;
    b = c;
    // DO NOT DO THIS WITHOUT WRITE CACHE
    // write one number at a time
    if (SPIFFS_write(fs, fd, (u8_t *)&c, sizeof(int)) < SPIFFS_OK) {
       (void)SPIFFS_close(fs, fd);
    	return SPIFFS_errno(fs);
    }
  }
  if (SPIFFS_close(fs, fd) < SPIFFS_OK) return SPIFFS_errno(fs);
  return SPIFFS_OK;
}

Instead of above, you should buffer as many fibonacci numbers as you can, and then write the buffer. But only without write cache. Otherwise, above is ok.

If you have space for read cache pages, then in 99.999% of all cases you should also enable SPIFFS_CACHE_WR. It will not cost extra memory.

##OPENING FILES#

Opening a file given a filename will have spiffs search all blocks' first pages (the lookup pages) for object index headers - the file headers. Each found file header is then visited to see if the filename matches.

If lucky, the file is found quickly. If unlucky, lots of data needs to be filtered before the file is found.

There are a bunch of ways speeding this up.

###Temporal cache#

If your application needs to frequently open and close files, and files are opened in a manner of temporal locality, enable config

SPIFFS_TEMPORAL_FD_CACHE

Temporal locality in our case simply means that if a file was opened briefly before, it is likely to be opened again.

Examples could be some log file, being opened and updated often; or a http server that uses same css for all html pages: each access to an html file will also need to open the css file.

Let's construct a model of the latter example, the http server, on an ESP8266 for instance. Spiffs is configured to having 4*2kB logical block and 256 byte pages on 1 MB of the spi flash.

Cache is enabled, and spiffs is given eight cache pages and four file descriptors.

Our example has seven html files on the server residing on spiffs, each file being a couple of kilobytes. Each html file refers to a common css file. When a user accesses an html page, the http server implementation opens that html file, reads it, and the closes it. After that the css is opened, read, and closed.

Each html page have a certain probability of being visited by users.

File Hit probability
1.html 25%
2.html 20%
3.html 16%
4.html 12%
5.html 8%
6.html 5%
7.html 4%

Running this in the testbench by randomizing access according to above probabilities 10.000 times, we get:

Sum HAL read accesses Sum read bytes from spi flash
no temporal cache 1595258 389395837
with temporal cache 276361 63361517

In this example, spiffs is more than 80% more efficient opening files when it comes to number of HAL spi flash read accesses and number of read bytes from the spi flash with temporal cache enabled.

The temporal fd cache piggybacks on the file descriptor update mechanism. The more file descriptors you have, the more file locations will be remembered. As a result of this, if you open all file descriptors at some point, all cache history is lost.

The price you pay for the temporal cache is six extra bytes per file descriptor and some cpu cycles.

It is recommended to always enable SPIFFS_TEMPORAL_FD_CACHE.

###Just one or few important files#

There is a quick way not wasting time opening files - do not close them. If you have but a few files that must be accessed quickly, and have the memory for it, you can simply open those files in the beginning and never close them.

Opened files are tracked - if the file is moved, the file descriptor is updated internally. Actually, it is valid to open the same file in multiple file descriptors. They will all be updated upon changes to the spi flash.

Just make sure to call SPIFFS_fflush after SPIFFS_write (i.e. where your SPIFFS_close should have been).

###Handle your own cache#

Say you have three file descriptors and ten files, named "0", "1" till "9". If these files are always opened and closed in sequence from "0" to "9", multiple times, the temporal cache will be sidelined. This kind of access pattern is not temporal enough as we have only three descriptors.

Or you may simply have critical files that must be opened within a certain time.

If opening files is too slow in such cases, it is possible to implement your own cache for these files.

Assuming these files are created, we can scan for them and store the page index for each and one of them in an array. We also hook in a callback function to spiffs - SPIFFS_set_file_callback_func. This provides us with notifications when file headers are modified. Thus, we can track our cached files and update our cache. With this, we simply use SPIFFS_open_by_page which will open the file on O(1) time.

// Example of rolling your own cache for files that must be opened swiftly

#define NBR_OF_CACHE_FILES    (10)

typedef struct {
  spiffs_obj_id obj_id; // the file id
  spiffs_page_ix pix; // the file header location
} spiffs_cache_entry;

static spiffs_cache_entry own_cache[NBR_OF_CACHE_FILES];

static const char *CACHED_FNAMES[NBR_OF_CACHE_FILES] = {
  "0","1","2","3","4","5","6","7","8","9"
};

static void spiffs_cb(spiffs *fs, spiffs_fileop_type op, 
                      spiffs_obj_id obj_id, spiffs_page_ix pix);

int start(void) {
  // here goes code for initializing spiffs and mount it
  ...
  
  // register our callback function to spiffs
  (void)SPIFFS_set_file_callback_func(fs, spiffs_cb);
  
  init_cache();
}

// populate cache, first time
int init_cache(void) {
  int res;
  int i;
  for (i = 0; i < NBR_OF_CACHE_FILES; i++) {
    const char *fname = CACHED_FNAMES[i];
    spiffs_cache_entry *entry = &own_cache[i];
    spiffs_stat s;
    res = SPIFFS_stat(fs, fname, &s);
    if (res != SPIFFS_OK) {
      break;
    }
    entry->obj_id = s.obj_id;using
    entry->pix = s.pix;
  }
  return res;
}

// open one of our cached file by giving index in cache map
// index 0 will thus open file "0", index 1 will open "1" etc
spiffs_file open_cached_file(const int file_index, spiffs_flags flags) {
  if (own_cache[index].pix == 0) {
    // file deleted, return bad fd
    return -1;
  }
  spiffs_file fd = SPIFFS_open_by_page(fs, own_cache[index].pix, flags, 0);
  return fd;
}

// callback function implementation - update cache if any of our files
// were touched
// no SPIFFS_nn calls within here please
static void spiffs_cb(spiffs *fs, spiffs_fileop_type op, 
                      spiffs_obj_id obj_id, spiffs_page_ix pix) {
  int i;
  for (i = 0; i < NBR_OF_CACHE_FILES; i++) {
    spiffs_cache_entry *entry = &own_cache[i];
    if (entry->obj_id != obj_id) {
      continue;
    }

    switch (op) {
      case SPIFFS_CB_UPDATED:
        entry->pix = pix;
        break;
      case SPIFFS_CB_SPIFFS_CB_DELETED:
        // our cached file was deleted
      case SPIFFS_CB_CREATED:
        // our old cached file was deleted and now spiffs is reusing
        // the obj.id for a new file - should never get here
      default:
        // huh?
        entry->pix = 0; // file no longer valid, clear cache mark
        break;
    }    
    
    break;
  }
}

##READING FILES#

###Read as much as possible# If possible, read chunks of data instead of byte per byte.

Slow example, reading byte per byte:

uint8_t byte;
while (SPIFFS_read(fs, fd, &byte, 1) >= SPIFFS_OK) {
  process_one_byte(byte);
}

Fast example, reading chunks of 256 bytes:

uint8_t data[256];
int32_t len;
while ((len = SPIFFS_read(fs, fd, &data, sizeof(data))) >= SPIFFS_OK) {
  int32_t i;
  for (i = 0; i < len; i++) {
    process_one_byte(data[i]);
  }
}

On the stm32 test platform, following graph is from reading a 512kB file with different chunk sizes. The time is measured in milliseconds. The red boxes defines the chunk size for each reading.

Read by chunks graph

This translates to following read speeds:

Chunk size Bytes per second
1024 856680
512 836185
256 788403
128 699051
64 583840
32 439470
16 294709
8 177966
4 98922
2 52403
1 27064

TODO

###Index mapping#

TODO

##WRITING FILES#

TODO tip: call gc in beforehand

write as bigger chunks if possible.

Below is a time graph of writing 100000 bytes, 256 bytes a time, on a pretty full system. The high spikes is the garbage collector kicking in, making room for the data. It can be seen that the spikes grow larger as the system gets more crammed.

Write crammed graph

##STACK USAGE# These stack figures are collected from the stm32 platform. The project is compiled using gcc 5.3.0, crosscompiled for cortex-M3 with softfloat. Optimizer flag is optimizing for size (-Os).

Including in this stack usage are formatted prints, the spi driver, and some OS stuff. Rather than having a totally clean measurement of the stack usage without all noise, I personally prefer a more real-world measurement. The latter statement is heavily depending on that I otherwise would have to rewrite everything, and I'm far too lazy for that.

Function Stack usage in bytes
SPIFFS_mount 376
SPIFFS_format 312
SPIFFS_open (non-existing file) 432
SPIFFS_open (read) 432
SPIFFS_read 552
SPIFFS_write (no gc required) 576
SPIFFS_write (gc required) 960
SPIFFS_remove 520
SPIFFS_rename 416
SPIFFS_gc 760
SPIFFS_gc_quick 344
SPIFFS_check
Clone this wiki locally