Skip to content

Latest commit

 

History

History
625 lines (532 loc) · 35.3 KB

README.adoc

File metadata and controls

625 lines (532 loc) · 35.3 KB

TABFS-28

The specification

Bitfields (flags) do not utilize the endianess used in all other numbers; their storage is always big-endian, meaning the further left an byte of these fields stands, the lower it’s address. For example: 00000000 00000000, here the first group of zeros/bits (until the space) is one byte. Its address is one lower than the other group of zeros/bits. Inside an byte itself, the further an bit is to the left, the higher it is. Meaning the value of 100000000 is 0x80.

TABFS-header

The first sector of an partition must contain an valid TABFS header:

+=================+==============+======+============================================+
| name            | offset       | size | description                                |
+=================+==============+======+============================================+
| TABFS_magic     | 448 / 0x1C0  | 16   | "TABFS-28" (zero-terminated)               |
|                 |              |      | or if you prefer numeric:                  |
|                 |              |      | 0x1C0: 0x38322D5346424154                  |
|                 |              |      | 0x1C8: 0x0000000000000000                  |
|                 |              |      |                                            |
|                 |              |      | This field is 16 bytes, so custom versions |
|                 |              |      | have enough space to encode their own magic|
+-----------------+--------------+------+--------------------------------------------+
| private_data    | 464 / 0x1D0  | 32   | Private data; can be used for custom use   |
+-----------------+--------------+------+--------------------------------------------+
| flags           | 496 / 0x1F0  | 2    | Flags for the filesystem:                  |
|                 |              |      |   00000000 000000EA                        |
|                 |              |      | A: If set, the lba's of the filesystem     |
|                 |              |      |    are absolute; if cleared, they are      |
|                 |              |      |    partition based                         |
|                 |              |      | E: If set, the endianess of the filesystem |
|                 |              |      |    is big-endian; if cleared, it uses      |
|                 |              |      |    little-endian (not for bitfields)       |
+-----------------+--------------+------+--------------------------------------------+
|                 | 498 / 0x1F2  | 4    | Four unused bytes                          |
+-----------------+--------------+------+--------------------------------------------+
| info_LBA        | 500 / 0x1F6  | 8    | The LBA of the volume information block.   |
|                 |              |      | Dependent on the version of TABFS, only    |
|                 |              |      | the first 28 or 48 bits are used           |
+-----------------+--------------+------+--------------------------------------------+
| boot-signature  | 510 / 0x1FE  | 2    | 0xAA55, normal x86 boot signature          |
+-----------------+--------------+------+--------------------------------------------+
bytefield tabfs header

Volume Information Block

+=================+==============+======+============================================+
| name            | offset       | size | description                                |
+=================+==============+======+============================================+
| TABFS_magic     | 0x0000       | 16   | "TABFS-28" (zero-terminated)               |
|                 |              |      | or if you prefer numeric:                  |
|                 |              |      | 0x000: 0x38322D5346424154                  |
|                 |              |      | 0x008: 0x0000000000000000                  |
|                 |              |      |                                            |
|                 |              |      | This field is 16 bytes, so custom versions |
|                 |              |      | have enough space to encode their own magic|
|                 |              |      |                                            |
|                 |              |      | This is actually a copy of the magic found |
|                 |              |      | in the header inside the bootsector        |
+-----------------+--------------+------+--------------------------------------------+
| bat_LBA         | 0x0010       | 4    | The LBA 28 of the first block of the BAT   |
+-----------------+--------------+------+--------------------------------------------+
| min_LBA         | 0x0014       | 4    | The min LBA of the filesystem, including   |
|                 |              |      | any preload (this info section, bootloader |
|                 |              |      | or else)                                   |
+-----------------+--------------+------+--------------------------------------------+
| bat_start_LBA   | 0x0018       | 4    | The first LBA the BAT is allowed to use    |
+-----------------+--------------+------+--------------------------------------------+
| max_LBA         | 0x001C       | 4    | The max LBA (inclusive) that is allowed to |
|                 |              |      | be used by tabfs. It's also the max LBA    |
|                 |              |      | that an BAT can hold informations about    |
+-----------------+--------------+------+--------------------------------------------+
| blockSize       | 0x0020       | 4    | The absolute bytesize of an block.         |
|                 |              |      | For example: for ATA this would be 512     |
|                 |              |      | This size must not be the physical size,   |
|                 |              |      | but it should be an multiple of it.        |
+-----------------+--------------+------+--------------------------------------------+
| BS              | 0x0024       | 1    | Count of multiplies of an physical block   |
|                 |              |      | to make an block. Used to calculate LBA's  |
+-----------------+--------------+------+--------------------------------------------+
|                 | 0x0025       | 1    | Unused byte                                |
+-----------------+--------------+------+--------------------------------------------+
| flags           | 0x0026       | 2    | Flags for the filesystem:                  |
|                 |              |      |   00000000 000000EA                        |
|                 |              |      | A: If set, the lba's of the filesystem     |
|                 |              |      |    are absolute; if cleared, they are      |
|                 |              |      |    partition based                         |
|                 |              |      | E: If set, the endianess of the filesystem |
|                 |              |      |    is big-endian; if cleared, it uses      |
|                 |              |      |    little-endian (not for bitfields)       |
|                 |              |      |                                            |
|                 |              |      | This is actually a copy of the flags found |
|                 |              |      | in the header inside the bootsector        |
+-----------------+--------------+------+--------------------------------------------+
| root_LBA        | 0x0028       | 4    | The LBA of the first block of the first    |
|                 |              |      | section of the root entrytable.            |
+-----------------+--------------+------+--------------------------------------------+
| root_size       | 0x002C       | 4    | tablesize; for limitations and usage see   |
|                 |              |      | directorys; belongs to root_LBA            |
+-----------------+--------------+------+--------------------------------------------+
|                 | 0x0030       | 32   | Reserved for future usage                  |
+-----------------+--------------+------+--------------------------------------------+
| volume_label    | 0x0050       | 176  | Zero-terminated string of the label for    |
|                 |              |      | this volume.                               |
+-----------------+--------------+------+--------------------------------------------+
| private data    | 0x0100       | 256  | Private data; can be used for custom use   |
+-----------------+--------------+------+--------------------------------------------+
bytefield tabfs volumeInfo
The fields BS and blockSize are linked together. This means they must not contain data that contradict each other. This means, if BS is 1 then blockSize should be 1 * <physical block size>. If BS is 2, 2 * <physical block size> and so on. For example, if we have an physical block size of 512 (like ATA drives / HDD’s mostly have), this means by an BS of 2, we have an blockSize of 1024.
When refering to an LBA inside this document, its refers to an address with a block size as specfied in blockSize. You can obtain the "real" LBA, by multipling it with BS. The result can then be used to access the data from the physical device.

Block Allocation Table (BAT)

To simply know where free blocks are, tabfs maintains an Block Allocation table or BAT. The BAT is an simple bitmap (+ header) in wich an unset bit means the block is free and an set bit means the block is used.

The position of an bit for an block, is determined through its LBA:

abs_bitpos = lba;

// relative to an byte
bytepos = lba / 8;
rel_bitpos = lba % 8;  // counted from the left side of an byte; zero means the left most bit in a byte

When implementing this, remember that the further left a bit stands, the higher its numerical value:

0b10000000 = 0x80 = 128

An BAT is build as following:

+=================+========+=================+====================================+
| name            | offset | size            | description                        |
+=================+========+=================+====================================+
| next_bat        | 0      | 4               | first block of the next section    |
+-----------------+--------+----------------.+------------------------------------+
| block_count (N) | 4      | 2               | how many blocks this section spans |
+-----------------+--------+-----------------+------------------------------------+
| BAT_data        | 6      | blockSize*N - 6 | bitmap data                        |
+-----------------+--------+-----------------+------------------------------------+

The BAT uses at total 32MB of space (with an blockSize of 512), wich enables devices/partitions up to 128GB.

ℹ️
To find out when to stop allocating BAT regions (i.e. your drive/partition is going to be full), you can use the max_LBA field from the header.
ℹ️
When implementing this, one might find it easier to allocate the BAT completly when creating the filesystem, since you then dosnt get into the position of needing an algorithm that ansures that you always have an BAT to find an free block.

Entrytables

Each directory as an entrytable which consits of entrys with the following format:

+=================+========+=======+============================================+
| name            | offset | size  | description                                |
+=================+========+=======+============================================+
| flags           | 0      | 2     | flags for this entry:                      |
|                 |        |       | TTTTUGSA AAAAAAAA                          |
|                 |        |       | - A = acl                                  |
|                 |        |       |   - RWX:U RWX:G RWX:O (hi to low)          |
|                 |        |       | - S = sticky bit                           |
|                 |        |       | - U/G = set-user-id/set-group-id bit       |
|                 |        |       | - T = type                                 |
|                 |        |       |   - 0: entry is not valid (free)           |
|                 |        |       |   - 1: directory                           |
|                 |        |       |   - 2: regular FAT file                    |
|                 |        |       |        uses a FAT to track blocks          |
|                 |        |       |   - 3: regular segmented file              |
|                 |        |       |        uses segment headers                |
|                 |        |       |   - 4: char device                         |
|                 |        |       |   - 5: blk device                          |
|                 |        |       |   - 6: fifo                                |
|                 |        |       |   - 7: link                                |
|                 |        |       |   - 8: socket                              |
|                 |        |       |   - 9: continuous file                     |
|                 |        |       |        file is just one a group of blocks  |
|                 |        |       |   - A: long name                           |
|                 |        |       |   - B-D: not used; free for extensions!    |
|                 |        |       |   - E: tableinfo entry                     |
|                 |        |       |   - F: kernel; is a continuous file        |
+-----------------+--------+-------+--------------------------------------------+
| ctime           | 2      | 8     | creating time as a u64 timestamp           |
+-----------------+--------+-------+--------------------------------------------+
| mtime           | 10     | 8     | modify time as a u64 timestamp             |
+-----------------+--------+-------+--------------------------------------------+
| atime           | 18     | 8     | access time as a u64 timestamp             |
+-----------------+--------+-------+--------------------------------------------+
| uid             | 26     | 4     | u32 userid                                 |
+-----------------+--------+-------+--------------------------------------------+
| gid             | 30     | 4     | u32 groupid                                |
+-----------------+--------+-------+--------------------------------------------+
| data            | 34     | 8     | usage depents on the type:                 |
|                 |        |       | - dir/file/kernel:                         |
|                 |        |       |   - LBA 28 of first block                  |
|                 |        |       |   - 32bit file-/tablesize in bytes         |
|                 |        |       | - char/blk-dev:                            |
|                 |        |       |   - 32bit device id                        |
|                 |        |       |   - 32bit device flags                     |
|                 |        |       | - sock: ipv4 address / socket serial num   |
|                 |        |       | - symlink:                                 |
|                 |        |       |   - 32 bit offset                          |
|                 |        |       | - fifo:                                    |
|                 |        |       |   - 32bit buffer size in bytes             |
+-----------------+--------+-------+--------------------------------------------+
| name            | 42     | 22    | zero-terminated name of the entry          |
+-----------------+--------+-------+--------------------------------------------+
bytefield tabfs genericEnt
ℹ️
the above layout is the generic layout; for special entrys (longname, tableinfo), this dosnt applies, and they have their destinct layout. For more information about them and their layout, see below.

An entrytable itself has no limitation on how many entries it can hold, since there is no limitation about how many sections an entrytable can have.

The only limitation are offsets for symlinks, since they only store an offset for the path of theit target. This entry can only be 2 ^ 32 entries (across sections) after the symlink itself! However this would mean 256GB of entries inbetween them!

Directories

The type of entry that allows us to build a tree. Its data field is a LBA 28 of the first block of the first section of it’s entrytable and the bytesize of this first section; the size must be divideable through blockSize.

FAT files

These files use an File Allocation Table or FAT to store the addresses to their blocks. One FAT-file has one FAT. FAT-files allow a file history.

An FAT has the following format:

+=================+========+============+=======================================+
| name            | offset | size       | description                           |
+=================+========+============+=======================================+
| next_section    | 0      | 4          | LBA 28 of the first block of the next |
|                 |        |            | section                               |
+-----------------+--------+------------+---------------------------------------+
| size (N)        | 4      | 2          | count of blocks that belongs to the   |
|                 |        |            | current FAT                           |
+-----------------+--------+------------+---------------------------------------+
| unused          | 6      | 10         | unused                                |
+-----------------+--------+------------+---------------------------------------+
| entries         | 6      | 496        | the entries                           |
|                 |        | +(N*512)   |                                       |
+-----------------+--------+------------+---------------------------------------+
bytefield tabfs fat

An entry in a FAT looks like this:

+=================+========+============+=======================================+
| name            | offset | size       | description                           |
+=================+========+============+=======================================+
| index           | 0      | 4          | index (in blocks) at which the block  |
|                 |        |            | provide data for the file             |
+-----------------+--------+------------+---------------------------------------+
| lba             | 4      | 4          | lba28 of the block                    |
+-----------------+--------+------------+---------------------------------------+
| modifiy_date    | 9      | 8          | date at wich a modification has       |
|                 |        |            | created this entry                    |
+-----------------+--------+------------+---------------------------------------+
bytefield tabfs fatEnt

The file history functions are based of the principle that we have versioned blocks. First we need to find out which version we want. This is done by picking an timestamp at wich you want to view the content of an file.

ℹ️
The file history functionality only works for reading. Writing is always done on the latest version; but you can revert the file to an point in the past.

Then we traverse the FAT until we find an entry with index 0 and an matched version. If no exact match is found, the next lower version is used. This block lba can now be used for the first 512 bytes of the file. We repeat the proccess for every block we want to read from the file.

The revert function is done by picking an timestamp and deleting all entrys in the FAT that have an modify date grater than that what we picked.

ℹ️
The file history is not a tradditional COW, for example you cannot reference the same block in difference files. The limitation here is not the FAT: its the BAT, because we have no way to indicate that an given block has more than one user.

Segmented files

These files uses an segment header at the start of each segment to link the segments of the file together. Because this header consumes space inside the block, the first block of an segment contains always less then 512 bytes of space for file data.

The data field is nearly the same as in regular FAT files:

  • an LBA 28 to the first block of the first segment

  • u32 filesize in bytes

+=================+========+============+=======================================+
| name            | offset | size       | description                           |
+=================+========+============+=======================================+
| lba_next        | 0      | 4          | LBA 28 of the next block;             |
|                 |        |            | 0 if no more segments                 |
+-----------------+--------+------------+---------------------------------------+
| count           | 4      | 1          | count of blocks (including this one)  |
|                 |        |            | that belongs to this segment          |
+-----------------+--------+------------+---------------------------------------+
| data            | 5      | 507        | file content data                     |
|                 |        | +((N-1) *  |                                       |
|                 |        |  (512*BS)) |                                       |
+-----------------+--------+------------+---------------------------------------+

Continuous files

These are regular files but with a twist: their data is one continous range of blocks. No link traservel or other trickery! These files are good for huge data or generally for data that needs to be read from on multiple locations inside the file.

They are also fairly easily to implement.

Continuous files are good for images for vm’s and such or for database files. But there is one big disadvantage of continous files: they cannot grow.

ℹ️
You can try and re-balance the entire fs to make space, or move the blocks into areas on the harddrive where a wider range of blocks is available, but the standard specs dosnt require not deny this.

Their data field consists of:

  • an LBA 28 for the first block

  • a u32 filesize in bytes

The blockcount for this file is calculated through the filesize:

blockCount = ((fileSize / 512) + ( (fileSize % 512) > 0 ? 1 : 0 )) / BS;

(char- and block-) device files

They are special files that reference a device in the system, wich are in linux typically found inside of '/dev'! Their data field is split into two u32: one for the device id, and one for additional flags.

fifo, socket

Just what their name says:

  • fifo : first-in-first-out; data field contains the buffer size of the fifo

  • socket: either an unix-socket (or named pipe) or an tcp4 socket file. The later is not required.

An symlink links to another entry inside the filesystem.

Their data consists of an 32 bit offset relative to the current section of the entry table to an long-name entry, wich holds the path to the link’s target.

kernel

This is just an alias for an continous files; but its id can be used by an custom bootstrap/bootloader code to quickly find the kernel file.

Thats also why the kernel file needs to be in the first block of the root entrytable!

tableinfo entry

An special entry in each section of an entrytable to specify some informations about an entrytable! In acts like an header for the entrytable.

+=================+========+=======+============================================+
| name            | offset | size  | description                                |
+=================+========+=======+============================================+
| flags           | 0      | 1     | Flags for this entry:                      |
|                 |        |       | TTTT---                                    |
+-----------------+--------+-------+--------------------------------------------+
|                 | 1      | 39    | unused                                     |
+-----------------+--------+-------+--------------------------------------------+
| parent_lba      | 40     | 4     | LBA 28 of the first section of the parent  |
+-----------------+--------+-------+--------------------------------------------+
| parent_size     | 44     | 4     | bytesize of the first section of the parent|
+-----------------+--------+-------+--------------------------------------------+
| prev_lba        | 48     | 4     | LBA 28 of the previous section             |
+-----------------+--------+-------+--------------------------------------------+
| prev_size       | 52     | 4     | bytesize of the previous section           |
+-----------------+--------+-------+--------------------------------------------+
| next_lba        | 56     | 4     | LBA 28 of the next section                 |
+-----------------+--------+-------+--------------------------------------------+
| next_size       | 60     | 4     | bytesize of the next section               |
+-----------------+--------+-------+--------------------------------------------+
bytefield tabfs tableinfo entry

Should be the first entry in each entrytable, so we dont need to search if we want more sections.

long name

Another special entry: this time its used to allow for longer names! To use this extension, instead of writing the entry name into the name field of an entry, we create an long-name entry. Its built like this:

+=================+========+=======+============================================+
| name            | offset | size  | description                                |
+=================+========+=======+============================================+
| flags           | 0      | 1     | Flags for this entry:                      |
|                 |        |       | TTTT---                                    |
+-----------------+--------+-------+--------------------------------------------+
| name            | 1      | 63    | Zero-terminated name                       |
+-----------------+--------+-------+--------------------------------------------+

To use it, we change the name field in the normal entry from an simple 22 byte long zero-terminated string into:

| ...                                                                           |
+-----------------+--------+-------+--------------------------------------------+
| name_old        | 42     | 9     | Not used by the extension; free to use     |
+-----------------+--------+-------+--------------------------------------------+
| longname-lba    | 51     | 4     | LBA 28 of the entrytable section that      |
|                 |        |       | holds the long-name-ext                    |
+-----------------+--------+-------+--------------------------------------------+
| longname-lba-sz | 55     | 4     | bytesize of the entrytable refered to by   |
|                 |        |       | the longname-lba field                     |
+-----------------+--------+-------+--------------------------------------------+
| longname-offset | 59     | 4     | Offset into the entrytable section         |
|                 |        |       | to the long-name-ext                       |
+-----------------+--------+-------+--------------------------------------------+
| longname-byte   | 63     | 1     | Non-zero; indicates that a long-name-ext   |
|                 |        |       | is used!                                   |
+-----------------+--------+-------+--------------------------------------------+
bytefield tabfs extNameEnt

FAQ-ish

(not really frequently asked but you know, just to be save)

Q: Why are there so many (3) regular file types? which should i really use as regular file?

Tabfs is designed for osdev-ing. all 3 types work differently, and people have different difficulty to understand each of them. Also: tabfs is NOT designed that you implement all features (especially the types). Just pick what you think its best. And: try out! odev-ing is (at least for me) an project where you can try everything in your own peace.

Q: Why is there an extra type of entry for the kernel?

A: Tabfs is also designed in mind that osdever’s might want to use or write their own bootstrap code. I personally even recommend to do so, so you can understand the x86 boot sequence better! And because bootstrap/bootloader code tends to be assembler heavy and you might dont want to implement device drivers and an stub fs-driver for your bootloader, tabfs has a kernel type. Its easier just to search the first block of the root entrytabe after 0xF?.

Q: Why LBA 28? why not LBA 48?

With LBA 28 and an blockSize of 512 we can address 128GB of space. I doubt that any osdever ever generate an imgage this big. And LBA 28 (or better: 32 bit / 4 byte) fitted more nicely into the structures when I first designed this filesystem. Note however that the TABFS header already supports LBA 48, so an future TABFS-48 is not from the table :3 And you can always increase your BS/blockSize field at the creation so you can have access to more blocks. Note however that this does not change per-file-limitations!

Q: Why 64bit timestamps? are 32bit not enough for osdev purposes?

Nowhere; hardlinks can be very confusing, in usage as well as in implementation since you have multiple "users" of an dataregion (entrytable, fat-table or even only an range of blocks). Implementing them ment that we would need an additional layer inbetween; ext2 calls them inodes. So I designed tabfs to only contain one type: symlinks!

Q: Where comes the name "tabfs" from?

Its comes form its first implementation: an simple 1-sector big table with 32 byte entrys. However, given its simple structure it’s only supported continious files. No special files, not even directorys. So I reworked it into this.

Q: Can i use tabfs in my (commercial) project(s)?

See the LICENSE for detailed information!

TL;DR: the specifications for tabfs are licensed under CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/). This means that you can share and adapt this specifications how you like, but you must give appropriate credit if you do so.

⚠️
Disclaimer: this FAQ does NOT replace the LICENSE file. Please read the LICENSE file and all linked ressources in it for the full license.

Q: Under what license are the other files?

All files have the same license as mentioned above: CC BY-SA 4.0 (see LICENSE)

Q: How stable is it?

I don’t know to be honest, but I appreciate any help to proof if it is or not. (But you might head over to one of the implementations)

Q: Are there utils or reference implementations?

Not yet, but I’am working on it; when they’re ready I will link them here!

It’s finally ready! I’ve found the time and created libtabfs and tabfs-fuse! (They’re also listed in the implementations.adoc file.)

Q: For what purpose is the 'offsprings' folder?

If you want to share your own version of tabfs with other but have reason to not make a own repo, you can make a pr with an new file with your own version inside this folder. And so osdever’s that stumple uppon this system can maybe look into what others have suggested how the fs could work differently.

Q: It’s way to complicated for an bootloader written in assembler!

For exactly that purpose bootfs exists.

Q: Can I help to improve the specs?

Yes, you can always fork this repo and make your own patches. If you want to submit merge/pull request however, your patches must be contained in a file inside the 'offsprings' folder. Why? So others can follow how the foundation spec (this file) influenced it’s offsprings, and so on.

Q: I have an made an implementation!

Awesome! If you like you can add a link to the repository (and a short description) to implementations.adoc !

Q: Where are whiteouts? What even are they?

I don’t know. I only know that linux supports them. But they seem to only to exists for performance or something.

Q: How do I add an offspring?

Simple: make a new directory (when you have own diagramms) or file in the offspring folder. If you do have diagramms, please add some autobuild comments:

:imagesdir: ./assets
//$AUTOBUILDOUT assets (1)
//$AUTOBUILD tabfs-extNameEnt.bytefield (2)
//$AUTOBUILD tabfs-genericEnt.bytefield
//$AUTOBUILD tabfs-header.bytefield

// ...

ifdef::env-github[]
image::bytefield-tabfs-extNameEnt.svg[] (3)
endif::[]
ifndef::env-github[]
[bytefield]
....
include::tabfs-extNameEnt.bytefield[] (4)
....
endif::[]
  1. This sets the output dir for the generated diagramms; you also then want to set :imagesdir:

  2. This tells the build-util that the file tabfs-extNameEnt.bytefield should be read, and send to kroki as the type bytefield. The resulting SVG file will be saved as: assets/bytefield-tabfs-extNameEnt.svg.

  3. Reference the generated svg file! But only when we are on github.

  4. If we are non-github (local), we simply can use kroki directly via an addon (asciidoctor-kroki).

You then only need to run build_diagrams.rb to build all diagrams that are referenced with $AUTOBUILD!

Q: How do I add an implementation?

Add a link to the repository and a short description to implementations.adoc