Skip to content

Latest commit

 

History

History
541 lines (457 loc) · 21.7 KB

bootstrappable.org

File metadata and controls

541 lines (457 loc) · 21.7 KB

- org -- bootstrappable.org project #bootstrappable on libera.chat

What is this all about?

A full source bootstrapping for all free software programs (starting with GuixSD)

Full source bootstrapping for the GNU System

A package in GNU Guix is uniquely identified by the hash of its source code, its dependencies, and its build recipe.

Every package can be built from source, except for the bootstrap binaries.

From the GNU Guix manual

The distribution is fully “bootstrapped” and “self-contained”: each package is built based solely on other packages in the distribution.

The root of this dependency graph is a small set of “bootstrap binaries”, provided by the ‘(gnu packages bootstrap)’ module

Guix v1.0 bootstrap binary seed (May 2, 2019)

$ du -schx $(readlink $(guix build bootstrap-tarballs)/*) 2.1M /gnu/store/9623n4bq6iq5c8cwwdq99qb7d0xj93ym-binutils-static-stripped-tarball-2.28.1/binutils-static-stripped-2.28.1-x86_64-linux.tar.xz 18M /gnu/store/437xwygmmwwpkddcyy1qvjcv4hak89pb-gcc-stripped-tarball-5.5.0/gcc-stripped-5.5.0-x86_64-linux.tar.xz 1.8M /gnu/store/55ccx18a0d1x5y6a575jf1yr0ywizvdg-glibc-stripped-tarball-2.26.105-g0890d5379c/glibc-stripped-2.26.105-g0890d5379c-x86_64-linux.tar.xz 5.7M /gnu/store/bqf0ajclbvnbm0a46819f30804y3ilx0-guile-static-stripped-tarball-2.2.3/guile-static-stripped-2.2.3-x86_64-linux.tar.xz 5.8M /gnu/store/j8yzjmh9sy4gbdfwjrhw46zca43aah6x-static-binaries-tarball-0/static-binaries-0-x86_64-linux.tar.xz 33M total $ for i in $(readlink $(guix build bootstrap-tarballs)/*);\ do sudo tar xf $i; done $ du -schx * 130M bin 13M include 54M lib 51M libexec 5.2M share 252M total

Guix Reduced Binary Seed bootstrap binary seed (October 8, 2019)

$ du -schx $(readlink $(guix build bootstrap-tarballs)/*) 5.7M /gnu/store/9f8gi8raqfx9j3l9d00qrrc0jg3r1kyj-guile-static-stripped-tarball-2.2.6/guile-static-stripped-2.2.6-x86_64-linux.tar.xz 80K /gnu/store/b6rjl52hibhmvyw4dg8678pwryhla0h2-linux-libre-headers-stripped-tarball-4.19.56/linux-libre-headers-stripped-4.19.56-x86_64-linux.tar.xz 12K /gnu/store/d7zlxsjcnqilmvqwx7scija9x9bjw8cw-mescc-tools-static-stripped-tarball-0.5.2-0.bb062b0/mescc-tools-static-stripped-0.5.2-0.bb062b0-x86_64-linux.tar.xz 428K /gnu/store/n7zc4kpi8ny6jlfaikkzxlwhc5fvr1vr-mes-minimal-stripped-tarball-0.19/mes-minimal-stripped-0.19-x86_64-linux.tar.xz 6.0M /gnu/store/nv4djwlrljfqmynqr2cqvfwz0ydx7kxb-static-binaries-tarball-0/static-binaries-0-x86_64-linux.tar.xz 13M total $ for i in $(readlink $(guix build bootstrap-tarballs)/*);\ do sudo tar xf $i; done Password: $ du -schx * 93M bin 700K include 38M lib 14M share 145M total

Guix Scheme-only bootstrap binary seed (June 15, 2020)

$ du -schx $(readlink $(~/src/guix/wip-bootstrap/pre-inst-env guix build bootstrap-tarballs)/*) 5.7M /gnu/store/1mq2pcd2h7g54xpi2jrgj6ibbi4lgi3c-guile-static-stripped-tarball-2.2.6/guile-static-stripped-2.2.6-x86_64-linux.tar.xz 80K /gnu/store/bl1r2bpk6fam8r2gjvr5mvr48i3dm2hn-linux-libre-headers-stripped-tarball-4.19.56/linux-libre-headers-stripped-4.19.56-x86_64-linux.tar.xz 12K /gnu/store/w0dlz486dhb8aiq8pxm5akllz628fqin-mescc-tools-static-stripped-tarball-0.5.2-0.bb062b0/mescc-tools-static-stripped-0.5.2-0.bb062b0-x86_64-linux.tar.xz 428K /gnu/store/15j6l18q44ymlrh1cfp4s4hc9835xic5-mes-minimal-stripped-tarball-0.19/mes-minimal-stripped-0.19-x86_64-linux.tar.xz 6.2M total $ for i in $(readlink $(~/src/guix/wip-bootstrap/pre-inst-env guix build bootstrap-tarballs)/*);\ do sudo tar xf $i; done $ du -schx * 4.9M bin 700K include 38M lib 14M share 57M total

Why?

“recipe for yoghurt: add yoghurt to milk”

Trust

Due to the activities of nation states in regards to compromising critical infrastructure and undermining critical freedoms (including software), we don’t know if we can even trust our compilers.

Reproducibility

We can’t collaborate to verify if we are running the same software if our build processes don’t produce identical results.

Responsibility

We need to stop ignoring the problem and as a community work to fullfill our responsibilities to our community and promote the trust and good will that made our communties places we wanted to become members of.

Self-empowerment

We need to stop feeling powerless to the problems in this world. Pick up a shovel and do some serious damage to the problems we want gone.

Understandability

I want code easy to reason about at the heart of this bootstrap, so that everyone will be able to sit down in the morning and be done by lunch time; understading how every piece of it works.

What are you doing about it?

Producing from the smallest possible source, create the foundation upon which we depend in a clean bootstrapping function that is auditable and stable.

Stage0

A universal core bootstrap that produces identical results across arbitrary hardware and software foundations. https://savannah.nongnu.org/projects/stage0

knight

Using a hardware specification that was implemented back in the 1970s in TTL and reduced down to the essentials. We give ourselves an alien hardware platform to verify the stage0 steps for x86 bootstraps and know if a Nexus Intruder attack occured in any of the steps.

Stage0/stage0-posix/stage0-uefi steps

  • hex0 monitor (280bytes) ensures you don’t need a text editor or any other

software period. Trivial to make by hand (toggling in bytes if you want) or using a trivial program of your own written in any language you desire.

  • hex0 assembler (260 bytes) Only supports line comments (# and ;) [Could be

smaller and if you trust your text editor, you can use this as the bootstrap instead of the hex monitor]

  • hex1 assembler (488 bytes) written in hex0 and provides single char labels and

relative displacements only (16bit for knight-vm, 32bit for i386 and amd64)

  • hex2 assembler (1036 bytes) written in hex1 and provides long labels, adds

absolute addresses and the missing set (8bit relative, 16bit absolute and 16bit relative and 32bit absolute)

  • M0 macro assembler (1792 bytes) written in hex2 and now allows arbitrary

definitions (like DEFINE ADD 05000 or DEFINE ADDI32_to_RAX 4805) and then use those definitions to write programs (thus it can support both knight, x86 and arm assembly)

  • cc_x86 (16,370 bytes) written in M0 and now allows C syntax, structs, unions,

inline assembly, gotos and other standard C goodies.

  • M2-Planet (64,011 bytes) written in the subset of C that cc_x86 can compile and

is capable of self-hosting. Weighing in at 1910 lines of C Code and slowly expanding in terms of functionality.

  • M2-Mesoplanet (128,366 bytes) written in the subset of C that M2-Planet can

compile is a more standard C preprocessor and simplifies compilation, assembly and linking.

  • mescc-tools (M1 [67,186 bytes], hex2 [67,109 bytes], blood-elf [49,601 bytes],

get_machine [34,125 bytes] and kaem [79,340 bytes]) a crossplatform assembler, linker, dwarf stub generator, a tool for detecting what architecture we are running on and a basic shell which provides enough functionality to drive any further bootstrap required.

  • mescc-tools-extra (catm [22,649 bytes], chmod [36,610 bytes], cp [43,447 bytes],

match [33,717 bytes], mkdir [34,913 bytes], replace [39,649 bytes], rm [34,062 bytes], sha256sum [50,008 bytes], unbz2 [61,587 bytes], ungz [62,265 bytes] and untar [46,574 bytes]) finishing off the stage0 steps, we provide some tools to enable a semi-comfortable development basis. The ability to combine files, mark them as executable, copy files around, support conditional paths in kaem scripts, make folders, perform sed like file alterations, delete files, checksum everything, unpack bz2, gz and tar archives (allowing much more standard package sources).

abandoned Stage0 paths

  • Stage0 FORTH (4008 bytes) written in M0 macro assembly and extends itself in

its own FORTH Primitives and has a slowly growing initial library (approaching GFORTH parity thanks to reepca). No FORTH programs of real use in bootstrapping have been created.

  • Stage0 Lisp (8400 bytes) written in M0 macro assembly. Supports all of the

LISP primitives defined in McCarthy’s 1960 paper [Turns out he missed many essential things] with some modern improvements like Lexical scope, let expressions and raw string support. Turns out you need proper LISP macros in order to produce something useful in bootstrapping. Adding LISP macros in assembly simply is a task no one wants to do.

Stage0 future

  • VHDL Knight-vm on FPGA
  • Knight on TTL with manually punched paper tape (Game over Trusting trust

attack/Nexus Intruder attack)

helping

  • Simply verify our sha256sum’d steps produce identical binaries on your weird

shit (git clone ‘https://git.savannah.nongnu.org/git/stage0.git’ && cd stage0 && make && make test

  • Or help porting your architecture to stage0-posix or stage0-uefi

(https://github.com/oriansj/stage0-posix and https://git.stikonas.eu/andrius/stage0-uefi)

  • Find/report bugs
  • Audit stage0
  • Do something cool

mescc-tools

A port of Stage0 to Linux (i386, AMD64, armv7l, AArch64, RISC-V 32 and 64bit) using ELF format binaries https://github.com/oriansj/mescc-tools

exec_enable

legacy piece no longer required.

hex2_linker

A universal cross-platform linker buildable via M2-Planet. With support for absolute addressing, long labels, multiple offset sizes, allows arbitrary base addresses and of course line comments. It is written in a subset of C It is bootstrapped by M2-Planet in stage0-posix and supports Knight, x86, AMD64, armv7l and AArch64

M1 macro assembler

A universal cross-platform macro assembler. With support for DEFINE line-macros, raw strings, hex literals, numerics, alignment operations, padding operations and arbitrary byte and bit endianness of output. It is written in a subset of C It is bootstrapped by M2-Planet in stage0-posix and supports Knight, x86, AMD64, armv7l and AArch64.

blood-elf

Since debugging is painful when gdb and objdump have no idea how to handle M1-macro files, blood-elf creates a dwarf footer segment from a M1-macro input that is in M1-macro format. Not actually needed in bootstrapping but rather helpful for those wishing to develop in M1-Macro assembly. It is written in a subset of C It is bootstrapped by M2-Planet in stage0-posix and supports all 32 and 64 elf targets.

get_machine

Since automatic tests will always fail since mescc-tools is cross-platform and hardware neutral, this program exists to allow hardware specific tests to be run on generated binaries. eg. have your i386 tests run on your i386 hardware but not on your ARM, SPARC or RISC-V board. Not actually needed in bootstrapping but rather helpful for those wishing to have proper tests for their M1-macro programs. It is written in a subset of C It is bootstrapped by M2-Planet in stage0-posix and supports all Posix hosts (if we don’t support yours let us know)

mescc-tools future

  • Add support for more architectures

helping

  • Port mescc-tools to your weird hardware/Operating system combinations.
  • Write tests for alternate hardware targets
  • Find bugs

M2-Planet

A PLAtform NEutral Transpiler that happens to look and behave enough like C that you can do development in GCC and use M2-Planet to compile the result. https://github.com/oriansj/M2-Planet

Currently supports

knight-native, knight-posix, x86, AMD64, ARMv7l, AArch64, RISC-V 32 and 64bit

Types

void void* int int* unsigned unsigned* char char* char const char const* long long* SCM (unsigned long) SCM* (unsigned long*) FUNCTION (void (FUNCTION) () FUNCTION (void* (*FUNCTION) ()) any struct you wish to define (with unions or arrays supported as well) Pointers to any struct you wish to define typedef statements

All in a trivial to understand implementation https://github.com/oriansj/M2-Planet/blob/master/cc_types.c

Standard C strings

All in a trivial to understand implementation https://github.com/oriansj/M2-Planet/blob/master/cc_strings.c

Comments to an amusing result

M2-Planet in –bootstrap-mode supports 2 types of comments: * Stuff * block comments and \# Stuff line comments

and inorder to maximize compatibility with C M2-Planet does something funny with C line comments. // code; is actually compiled by M2-Planet thus allowing M2-Planet specific code to be embedded in your C sources.

It and any other odd parsing behavior can be found in the rather trivial parser https://github.com/oriansj/M2-Planet/blob/master/cc_reader.c

C primitives

M2-Planet is written using only a subset of the features that it supports https://github.com/oriansj/M2-Planet/blob/master/cc_core.c

The only parts of the C language not supported are C macros, switch statements and features that are not useful in bootstrapping and thus are ignored (until someone comes up with a reasonable use case)

Bootstrapping extras

M2-Planet supports M1-macro assembly being inlined within functions via asm(..); Support for CONSTANT FOO 4 statements to replace #define FOO 4 and CONSTANT CELL_SIZE sizeof(struct cell) to replace #define CELL_SIZE 1 eliminate the need for a C preprocessor entirely.

M2-Planet future

  • Porting to SPARC
  • Porting to MIPS
  • Porting to PowerPC
  • Porting to z80 (maybe)
  • Porting to 6502 (maybe)
  • Port to your personal architecture

helping

  • Find bugs
  • Improve documentation
  • Send patches
  • Port to your weird hardware

Mes

A late stage bootstrap core componet that ensures that once you have achieved a certain minimal floor, that you have a solid path to producing GCC and thus everything you desire. https://gitlab.com/janneke/mes https://git.savannah.gnu.org/git/mes.git

mes.c

A scheme interpreter prototyped in C ~5000 Lines that standards at our baseline target of minimal functionality. If you can build this or provide equivalent functionality, you are good to go. This is buildable by M2-Planet or better C compiler.

mescc.scm

Provided a reasonable scheme exists and is functional, we leverage that to provide a C compiler written in Scheme (uses Nyacc C99 parser in Scheme) that is the core of this project and is the path to full GCC bootstrapping. mescc along with mescc-tools are capable of self bootstrapping.

NYACC

Not Yet Another Compiler Compiler, is set of guile modules for generating parsers and lexical analyzers. https://savannah.nongnu.org/projects/nyacc

Gash

A guile replacement for shell+binutils that can in the future run on Gnu MES https://savannah.gnu.org/projects/gash

Features

  • awk.scm
  • basename.scm
  • cat.scm
  • chmod.scm
  • cmp.scm
  • command.scm
  • compress.scm
  • cp.scm
  • cut.scm
  • diff.scm
  • dirname.scm
  • expr.scm
  • false.scm
  • find.scm
  • grep.scm
  • gzip.scm
  • head.scm
  • ln.scm
  • ls.scm
  • mkdir.scm
  • mv.scm
  • printf.scm
  • pwd.scm
  • reboot.scm
  • rmdir.scm
  • rm.scm
  • sed.scm
  • sleep.scm
  • sort.scm
  • tar.scm
  • test.scm
  • touch.scm
  • tr.scm
  • true.scm
  • uname.scm
  • uniq.scm
  • wc.scm
  • which.scm

Live-bootstrap

A lovingly crafted work of art by Fosslinux and Stikonas with a good deal of help from numerous great individuals which took stage0 and Gnu Mes and ran it all the way to a complete Linux distribution without requiring any generated files of any kind.

https://github.com/fosslinux/live-bootstrap

This solid piece of art can be the basis of your distro, bootstrap yourself today.

guile psyntax bootstrapping

A very clever route to bootstrapping Gnu Guile without requiring pregenerated files by Michael Schierl. Definitely worth a read and a solid reminder that even GNU tools might have bootstrapping problems.

https://github.com/schierlm/guile-psyntax-bootstrapping

Builder-hex0

A brilliant effort by Richard R. Masters which is a 384byte bootloader (written in hex0) builds an elegant 3.5KB POSIX kernel written in hex0 which builds and spawns hex0 and kaem-optional and runs all of the steps of stage0-posix/live-bootstrap until it builds TCC and bootstraps a bigger and more standard kernel written in the C that TCC can build. https://github.com/ironmeld/builder-hex0

How to bootstrap?

See Current\ bootstrap\ map.pdf

done

current status

  • stage0/stage0-posix/stage0-uefi are able to build mes.c
  • mescc can build TCC and the path to modern software is done
  • builder-hex0 is in progress to sorting out the zero to Linux path
  • Gash is progressing and growing nicely

help

  • programmers to port the steps to more architectures
  • report bugs, issues, concerns or recommendations
  • testing and finding issues with our documentation (we are human after all)
  • We need people willing to improve documentation (art would be nice)
  • People to tell us all the ways things are broken and we can make it better

strengths of current plan

  • Every possible port of mescc-tools is buildable by every other possible

mescc-tool port and thus forces any hardware/software trusting trust attack to compromise all past, present and future hardware platforms, including those that are made for fun out of TTL logic: http://cpuville.com/Projects/Original-CPU/Original-CPU-home.html or even those made out of individual transistors: https://monster6502.com/ or should someone wish http://web.cecs.pdx.edu/~harry/Relay/ using electromechanical relays.

  • Porting of stage0 and mescc-tools to alternate platforms becomes a

straightforward mechanical exercise.

  • M2-Planet is trivial to modify to support alternate hardware platforms and

and thus function as a cross-platform, self-hosting compiler.

  • M2-Planet’s output is 100% deterministic and easily predictable; even major

code changes result in only in differences directly related to the changed code block.

  • no operating system is required until long after we bootstrap some

weakness of current plan

  • Low level encoding details need to be figured out for various architectures
  • Poorly thought out instruction encodings make for low density binaries (AArch64, RISC-V, etc)
  • Requires large amounts of largely mechanical effort
  • Very very few developers or contributors

Contact

#bootstrappable and #guix on libera.chat via bootstrappable.org via our mailing list: [email protected] or if issues are entirely MesCC only [email protected]

FAQ

Why aren’t you doing more in FORTH

Why don’t you have this language?

  • Because you did not write it yet or make any useful bootstrapping programs in

it either.

  • Probably because it sucks at bootstrapping or requires a great deal of work that

no one wants to do.

What about backdoored hardware?

  • The good news is this is simple to port to arbitrary hardware, so the cost

needed to bootstrap hardware you designed yourself is lower than ever.

  • There is nothing we can do in terms of software that eliminates the risk of

Nexus Intruder program class hardware subversion; as the only solution to that risk is to have your own trusted lithography fabrication plant that is run using only hardware/software that you know is trusted and uncompromised.

  • libresilicon is honestly the only path forward currently

What about this hardware platform?

  • If you want us to supprt your hardware platform, you need to have a reasonable

hardware target and provide the documentation and testing required.

  • We can’t port to hardware we don’t have

Why is there an ELF header?

  • stage-posix is the operating system/hardware specific port of stage0
  • ELF is not actually required for stage0
  • BIOS level versions of stage0 is possible by simply rewriting the

syscalls into BIOS calls, removing the ELF header, adjusting the base address and adding the standard PC boot signature (0xAA55)

  • The knight implementation found in stage0 which run on bare metal lack such trivia.

BIOS level bootstrapping isn’t enough

  • We completely agree; however writing 79,000+ custom bootstraps isn’t viable yet.
  • So that is why we are bootstrapping hardware (Knight and x86 later)

Why not use already existing C compilers written in Assembly?

  • Because Jeremiah wrote cc_x86 before we found any
  • Because only BDS-c is the only other C compiler ever written in assembly

and it supports less useful features in bootstrapping than cc_* and is DOS/CPM only.

  • Even the original Unix C compiler was not written in assembly.

Why not just go back and verify compilers instead?

  • It doesn’t address the problem of the trusting trust attack
  • It would take far far longer than what we are willing waste time.

What is mescc-tools-seed

It was the old name for stage0-posix as originally it was just binary blobs created by janneke to start work on MesCC’s bootstrap in Guix. Jeremiah took it over and converted it first to generated M1 programs, then handwritten pieces and expanded it until it covered all the steps from hex0 to M2-Planet+mescc-tools; effectively becoming a full POSIX port of stage0, hence the name change.

Caveats

You are depending upon Posix (Linux, *BSD, etc)

  • For the current work (stage0-posix), you are absolutely correct.
  • But not for the stage0 work running on Knight as all those pieces run on bare

metal or stage0-uefi.

  • Builder-hex0 is 4KB of hex0, you can audit that.

You are depending upon special hardware that people don’t have

  • You are absolutely correct.
  • However writing the pieces without depending on the bios is VERY hardware

specific

  • Common hardware also has firmware messured in MB which we can’t yet replace.

Those without special hardware depend upon 4 binary seeds instead of just 1

  • Well one needs a bootloader on modern hardware (I can’t avoid this)
  • A minimal posix is needed to provide syscalls enabling easier bootstrapping

(don’t want to avoid this)

  • A minimal shell run execute the bootstrapping steps while running as init

(Can’t avoid this on a posix)

  • A hex0 assembler (Can’t avoid this in this sort of bootstrap)
  • We can try to reduce their size and enable the generation of those files

locally from source.

Hardware firmware is MB in size

  • Yep, there currently is nothing we can do about it
  • Anyone who wants to work on that is more than free to do so.