Aboriginal Linux is a shell script that builds the smallest/simplest
linux system capable of rebuilding itself
from source code. This currently requires seven packages: linux,
busybox, uClibc, binutils, gcc, make, and bash. The results are packaged into
a system image with shell scripts to boot it under
QEMU. (It works fine on real hardware too.)
The build is modular; each section can be bypassed or replaced if desired.
The build offers a number of configuration
options, but if you don't want to run the build yourself you can download
binary system images to play with, built for
each target with the default options.
Even if you plan to build your own images from source code, you should
probably start by familiarizing yourself with the (known working) binary
releases.
Aboriginal Linux is a build system for creating bootable system images,
which can be configured to run either on real hardware or under emulators
(such as QEMU). It is intended to reduce or even
eliminate the need for further cross compiling, by doing all the cross
compiling necessary to bootstrap native development on a given target.
(That said, most of what the build does is create and use cross
compilers: we cross compile so you don't have to.)
The build system is implemented as a series of bash scripts which run to
create the various binary images. The "build.sh" script invokes the other
stages in the correct order, but the stages are designed to run individually.
(Nothing build.sh itself does is actually important.)
Aboriginal Linux is designed as a series of orthogonal layers (the stages
called by build.sh), to increase flexibility and minimize undocumented
dependencies. Each layer can be either omitted or replaced with something
else. The list of layers is in the source README.
Questions about Aboriginal Linux should be addressed to the project's
mailing list, or to the maintainer (rob at landley dot net) who has a
blog that often includes
notes about ongoing Aboriginal Linux development.
In addition to implementing the above, Aboriginal Linux tries to
support a number of use cases:
Eliminate the need for cross compiling
We cross compile so you don't have to: Moore's Law has
made native compiling under emulation a reasonable approach to cross-platform
support.
If you need to scale up development, Aboriginal Linux lets you throw
hardware at the scalability problem instead of engineering time, using distcc
acceleration and distributed package build clusters to compile entire
distribution repositories on racks of cheap x86 cloud servers.
But using distcc to call outside the emulator to a cross compiler still
acts like a native build. It does not reintroduce the
complexities of cross compiling, such as keeping multiple
compiler/header/library combinations straight, or preventing configure from
confusing the system you build on with the system you deploy on.
Allow package developers and maintainers to reproduce and fix bugs
on architectures they don't have access to or experience with.
Bug reports can include a link to a system image and a
reproduction sequence (wget source, build, run this test). This provides
the maintainer both a way to demonstrate the issue, and a native
development environment in which to build and test their fix.
No special hardware is required for this, just an open source emulator
(generally QEMU) and a system image to run under it. Use wget to fetch your
source, configure and make your package as normal using standard tool names
(strip, ld, as, etc), even build and test on a laptop in an airplane
without internet access (10.0.2.2 is qemu's alias for the host's 127.0.0.1.).
Automated cross-platform regression testing and portability auditing.
Aboriginal Linux lets you build the same package across multiple
architectures, and run the result immediately inside the emulator. You can
even set up a cron job to build and test regular repository snapshots of a
package's development version automatically, and report regressions when
they're fresh, when the developers remember what they did, and when
there are few recent changes that may have introduced the bug.
Use current vanilla packages, even on obscure targets.
Nonstandard hardware often receives less testing than common desktop and
server platforms, so regressions accumulate. This can lead to a vicious cycle
where everybody sticks with private forks of old versions because making the
new ones work is too much trouble, and the new ones don't work because nobody's
testing and fixing them. The farther you fall behind, the harder it is to
catch up again, but only the most recent version accepts new patches, so
even the existing fixes don't go upstream. Worst of all, working in private
forks becomes the accepted norm, and developers stop even trying to get
their patches upstream.
Aboriginal Linux uses the same (current) package versions across all
architectures, in as similar a configuration as possible, and with as few
patches as we can get away with. We (intentionally) can't upgrade a package
for one target without upgrading it for all of them, so we can't put off
dealing with less-interesting targets.
This means any supported target stays up to date with current packages in
unmodified "vanilla" form, providing an easy upgrade path to the next
version and the ability to push your own changes upstream relatively
easily.
Provide a minimal self-hosting development environment.
Perfection is achieved, not when there is nothing more to add,
but when there is nothing left to take away." - Antoine de Saint Exupery
Most build environments provide dozens of packages, ignoring the questions
"do you actually need that?" and "what's it for?" in favor of offering
rich functionality.
Aboriginal Linux provides the smallest, simplest starting point capable
of rebuilding itself under itself, and of bootstrapping up to build arbitrarily
complex environments (such as Linux From Scratch) by building and installing
additional packages. (The one package we add which is not strictly required
for this, distcc, is installed it in its own subdirectory which is only
optionally added to the $PATH.)
This minimalist approach makes it possible to regression test for
environmental dependencies. Sometimes new releases of packages simply won't
work without perl, or zlib, or some other dependency that previous versions
didn't have, not because they meant to but because they were never tested in
a build environment that didn't have them, so the dependency leaked in.
By providing a build environment that contains only the bare essentials
(relying on you to build and install whatever else you need), Aboriginal
Linux lets you document exactly what dependencies packages actually require,
figure out what functionality the additional packages provide, and measure
the costs and benefits of the extra code.
(Note: the command logging wrapper
record-commands.sh can
actually show which commands were used out of the $PATH when building any
package.)
Cleanly separate layers.
The entire build is designed to let you use only the parts of it you want,
and skip or replace the rest. The top level "build.sh" script calls other
scripts in sequence, each of which is designed to work independently.
The only place package versions are mentioned is "download.sh", the rest
of the build is version-agnostic. All it does is populate the "packages"
directory, and if you want to provide your own you never need to run this
script.
The "host-tools.sh" script protects the build from variations in the host
system, both by building known versions of command line tools (in build/host)
and adjusting the $PATH to point only to that directory, and by unsetting
all environment variables that aren't in a whitelist. If you want to
use the host system's unfiltered environment instead, just skip running
host-tool.sh.
If you supply your own cross compilers in the $PATH (with the prefixes the
given target expects), you can skip the simple-cross-compiler.sh command.
Similarly you can provide your own simple root filesystem, your own native
compiler, or your own kernel image. You can use your own script to package
them if you like.
Document how to put together a development environment.
The build system is designed to be readable. That's why it's written in
Bash (rather than something more powerful like Python): so it can act as
documentation. Each shell script collects the series of commands you need
to run in order to configure, build, and install the appropriate packages,
in the order you need to install them in to satisfy their dependencies.
The build is organized as a series of orthogonal stages. These are called
in order from build.sh, but may be run (and understood) independently.
Dependencies between them are kept to a minimum, and stages which depend on
the output of previous stages document this at the start of the file.
The scripts are also extensively commented to explain why they
do what they do, and there's design documentation on the website.
Move from busybox, uclibc, and gcc/binutils to toybox, musl,
and llvm (then qcc).
Now that we've got a simple development environment working, we can make
it simpler by moving to better packages. Most of this project's new
development effort is going into the upstream versions of those packages
until they're ready for use here. In the meantime we're maintaining what
works, but only really upgrading the kernel version and slowly switching
from busybox to toybox one command at a time.)
uClibc: The uClibc project's chronic development problems resulted in multiple
year-long gaps between releases, and after the may 2012 release more
than three years went by without a release during which time
musl-libc went
from "git init" to a 1.0 release. At this point it doesn't matter
if uClibc did get another release out, it's over, musl is the more interesting
project. (Its limitations are lack of target support, but it's easy to
port musl to new targets and very hard to clean up the mess uClibc has
become.)
toybox: The maintainer of Aboriginal Linux used to maintain
busybox, but left that project
and went on to create toybox for
reasons explained at length elsewhere
(video, outline,
merged into Android).
The toybox 1.0 release should include a shell capable of replacing
bash, and may include a make implementation (or in qcc, below). This
would eliminate two more packages currently used by Aboriginal Linux.
llvm: When gcc and binutils went GPLv3, Aboriginal Linux froze
on the last GPLv2 releases, essentially maintaining its own fork of
those projects. Several other projects did
the same but most of those have since
switched to llvm.
Unfortunately, configuring and building llvm is
unnecessarily hard (among
other things because it's not just implemented in C++ but the 2013
C++ spec, so you need gcc 4.7 or newer to bootstrap it), and nobody seems
to have worked out how to canadian cross native compilers out of it yet.
But other alternatives like pcc
or tinycc are both less capable and less
actively developed; since the FSF fell on its sword with GPLv3,
the new emerging standard is LLVM.
qcc: In the long run, we'd like to put together a new compiler,
qcc,
but won't have development effort to spare for it before toybox's 1.0
release. Its goal is to combine tinycc and QEMU's Tiny Code Generator into
a single multicall binary toolchain (cc, ld, as, strip and so on in a single
executable replacing both the gcc and binutils packages) that supports all
the output formats QEMU can emulate. (As a single-pass compiler with
no intermediate format it wouldn't optimize well, but could bootstrap
a native compiler that would.)
Additional goals for qcc would be to absorb ccwrap.c, grow built-in
distcc equivalent functionality, and an updated rewrite of cfront to
compile C++ code (and thus natively bootstrap LLVM).
Finishing the full development slate would bring the total number of
Aboriginal Linux packages down to four: linux, toybox, musl, and qcc.
(Yes, reducing dependency on GPL software and avoiding GPLv3 entirely
is a common theme of the above package switches, there's a reason for
that: audio,
outline, see also
Android self-hosting below.)
Untangle distro build system hairballs into distinct layers.
The goal here is to separate what packages you can build from where and how
you can build them.
For years, Red Hat only built under Red Hat, Debian only built under Debian,
even Gentoo assumed it was building under Gentoo. Building their packages
required using their root filesystem, and the only way to get their root
filesystem was by installing their package binaries built under their root
filesystem. The circular nature of this process meant that porting an existing
distribution to a new architecture, or making it use a new C library,
was extremely difficult at best.
This led cross compilng build systems to add their own package builds
("the buildroot trap"), and wind up maintaining their own repository of
package build recipes, configurations, and dependencies. Their few hundred
packages never approached the tens of thousands in full distribution
repositories, but the effort of maintaining and upgrading packages would
come to dominate the project's development effort until developers left to
form new projects and start the cycle over again.
This massive and perpetual reinventing of wheels is wasteful. The
proliferation of build systems (buildroot, openembedded, yocto/meego/tizen,
and many more) each has its own set of supported boards and its own half-assed
package repository, with no ability to mix and match.
The proper way to deal with this is to separate the layers so you can mix
and match. Choice of toolchain (and C library), "board support" (kernel
configuration, device tree, module selection), and package repository (which
existing distro you want to use), all must become independent. Until these are
properly separated, your choice of cross compiler limits
what boards you can boot the result on (even if the binaries you're building
would run in a chroot on that hardware), and either of those choices limit
what packages you can install into the resulting system.
This means Aboriginal Linux needs to be able to build _just_ toolchains
and provide them to other projects (done), and to accept external toolchains
(implemented but not well tested; most other projects produce cross compilers
but not native compilers).
It also needs build control images to automatically bootstrap a Debian,
Fedora, or Gentoo chroot starting from the minimal development enviornment
Aboriginal Linux creates (possibly through an intermediate Linux From Scratch
build, followed by fixups to make debian/fedora/gentoo happy with the chroot).
It must be able to do this on an arbitrary host, using the existing toolchain
and C library in an architecture-agnostic way. (If the existing system is
a musl libc built for a microblaze processor, the new chroot should be too.)
None of these distributions make it easy: it's not documented, and it
breaks. Some distributions didn't think things through: Gentoo hardwires the
list of supported architectures into every package in the repository, for
no apparent reason. Adding a new architecture requires touching every package's
metadata. Others are outright lazy; building the an allnoconfig Red
Hat Enterprise 6.2 kernel under SLES11p2 is kind of hilariously bad: "make
clean" spits out an error because the code it added to detect compiler
version (something upstream doesn't need) gets confused by "gcc 4.3", which
has no .0 on the end so the patchlevel variable is blank. Even under Red Hat's
own filesystem, "make allnoconfig" breaks on the first C file, and requires
almost two dozen config symbols to be switched on to finish the compilation,
becuase they never tested anything but the config they ship. Making
something like that work on a Hexagon processor, or making their
root filesystem work with a vanilla kernel, is a daunting task.
Make Android self-hosting (musl, toybox, qcc).
Smartphones are replacing the PC, and if Android doesn't become self-hosting
we may be stuck with locked down iPhone derivatives in the next generation.
Mainframe -> minicomputer -> microcomputer (PC) -> smartphone
Mainframes were replaced by minicomputers, which were replaced by
microcomputers, which are being replaced by smartphones. (Nobody needed to
stand in line to pick up a printout when they could sign up for a timeslot at a
terminal down the hall. Nobody needed the terminal down the hall when they
had a computer on their desk. Now nobody needs the computer on their desk when
they have one in their pocket.)
Each time the previous generation got kicked up into the "server space",
only accessed through the newer machines. (This time around kicking the PC
up into the server space is called "the cloud".)
Smartphones have USB ports, which charge the phone and transfer data.
Using a smartphone as a development workstation involves plugging it into a
USB hub, adding a USB keyboard, USB mouse, and USB to HDMI converter to plug
it into a television. The rest is software.
The smartphone needs to "grow up and become a real computer" the
same way the PC did. The PC originally booted into "ROM Basic" just like
today's Android boots into Dalvik Java: as the platform matures it must
outgrow this to run native code written in all sorts of languages.
PC software was once cross compiled from minicomputers, but as it matured
it grew to host its own development tools, powerful enough to rebuild the
entire operating system.
To grow up, Android phones need to become usable as development
workstations, meaning the OS needs a self-hosting native development
environment. This has four parts:
- Kernel (we're good)
- C library (bionic->musl, not uclibc)
- Posix command line (toolbox->toybox, not busybox)
- Compiler (qcc, llvm, open64, pcc...)
The Android kernel is a Linux derivative that adds features without removing
any, so it's already good enough for now. Convergence to vanilla linux is
important for long-term sustainability, but not time critical. (It's not part
of "beating iPhone".)
Android's "no GPL in userspace" policy precludes it from shipping
many existing Linux packages as part of the base install: no BusyBox or
GNU tools, no glibc or uClibc, and no gcc or binutils. All those are all
excluded from the Android base install, meaning they will never
come bundled with the base operating system or preinstalled on devices,
so we must find alternatives.
Android's libc is called "bionic", and is a minimal stub sufficient
to run Dalvik, and not much more. Its command line is called "toolbox" and
is also a minimal stub providing little functionality. Part of this is
intentional: Google is shipping a billion broadband-connected unix machines,
none of which are administered by a competent sysadmin. So for security
reasons, Android is locked down with minimal functionality outside the Java
VM sandbox, providing less of an attack surface for viruses and trojans.
In theory the Linux Containers
infrastructure may eventually provide a solution for sandboxing applications,
but the base OS needs to be pretty bulletproof if a billion people are going
to run code they don't deeply understand connected to broadband internet
24/7.
Thus replacement packages for the C library and posix command line
should be clean simple code easy to audit for security concerns. But it
must also provide functionality that bionic and toolbox do not
attempt, and do not provide a good base for. The musl libc and toybox
command line package should be able to satisfy these requirements.
The toolchain is a harder problem. The leading contender (LLVM) is sponsored
by Apple for use in Mac OSX and the iPhone's iOS. The iPhone is ahead of
Android here, and although Android can use this it has other problems
(implemented in C++ so significantly more complicated from a system
dependency standpoint, making it difficult to bootstrap and impossible
to audit).
The simplest option would be to combine the TinyCC project with QEMU's
Tiny Code Generator (TCG). The licensing of the current TinyCC is incompatible
with Android's userspace but permission has been obtained from Fabrice
Bellard to BSD-license his original TinyCC code as used in Rob's TinyCC fork.
This could be used to implement a "qcc"
capable of producing code for
every platform qemu supports. The result would be simple and auditable,
and compatably licensed with android userspace. Unfortunately, such a project
is understaffed, and wouldn't get properly started until after the 1.0
release of Toybox.
Other potential compiler projects include Open64 and PCC. Neither of these
has built a bootable the Linux kernel, without which a self-bootstrapping
system is impossible. (This is a good smoketest for a mature compiler: if it
can't build the kernel, it probably can't build userspace packages of the
complexity people actually write.)
Why does this matter?
This is time critical due to network effects, which create positive
feedback loops benefiting the most successful entrant and creating natural
"standards" (which become self-defending monopolies if owned by a single
player.) Whichever platform has the most users attracts the most
development effort, because it has the most potential customers. The platform
all the software ships on first (often only) is the one everybody wants to
have. Other benefits to being biggest include the large start-up costs and
much lower incremental costs of electronics manufacturing: higher unit
volume makes devices cheaper to produce. Amortizing research and development
budgets over a larger user base means the technology may actually advance
faster (more effort, anyway)...
Technological transitions produce "S curves", where a gradual increase
gives way to exponential increase (the line can go nearly vertical on a graph)
and then eventually flattens out again producing a sort of S shape.
During the steep part of the S-curve acquiring new customers dominates.
Back in the early minicomputer days a lot more people had no computer than had
an Atari 800 or Commodore 64 or Apple II or IBM PC, so each vendor focused
on selling to the computerless than converting customers from other
vendors. Once the pool of "people who haven't got the kind of computer
we're selling today but would like one if they did" was exhausted (even if
only temporarily, waiting for computers to get more powerful and easier
to use), the largest players starved the smaller ones of new sales, until
only the PC and Macintosh were left. (And the Macintosh switched over to
PC hardware components to survive, offering different software and more
attractive packaging of the same basic components.)
The same smartphone transition is inevitable as the pool of "people with
no smartphone, but who would like one if they had it" runs out. At that point,
the largest platform will suck users away from smaller platforms. If the
winner is android we can open up the hardware and software. If the winner
is iPhone, we're stuck with decades of microsoft-like monopoly except
this time the vendor isn't hamstrung by their own technical incompetence.
The PC lasted over 30 years from its 1981 introduction until smartphones
seriously started displacing it. Smartphones themselves will probably last
about as long. Once the new standard "clicks", we're stuck with it for
a long time. Now is when we can influence this decision. Linux's
15 consecutive "year of the linux desktop" announcements (spanning the period
of Microsoft Bob, Windows Millennium, and windows Vista) show how hard
displacing an entrenched standard held in place by network effects actually
is.
Why not extend vanilla Linux to smartphones instead?
Several reasons.
It's probably too late for another entrant. Microsoft muscling in with
Lumia is like IBM muscling in with OS/2. And Ubuntu on the phone is like
Coherent Unix on the PC, unlikely to even register. We have two clear leaders
and the rest are noise ("Coke, Pepsi, and everybody else"). Possibly they
could still gain ground by being categorically better, but "Categorically
better than the newest iPhone/iPad" is a hard bar to clear.
During the minicomputer->PC switch, various big iron vendors tried to
shoehorn their products down into the minicomputer space. The results were
laughable. (Look up the "microvax" sometime.)
The successful tablets are big phones, not small PCs. Teaching a PC to be
a good phone is actually harder than teaching a phone to be a good PC, we
understand the old problem space much better. (It's not that it's less
demanding, but the ways in which it is demanding are old hat and long
solved. Being a good phone is still tricky.)
Deployment requires vendor partnerships which are difficult and slow.
Apple exclusively partnered with AT&T for years to build market share, and
had much less competition at the time. Google eventually wound up buying
Motorola to defend itself from the dysfunctional patent environment.
Microsoft hijacked Nokia by installing one of their own people as CEO, and
it's done them about as much good as a similiar CEO-installation at SGI did
to get Microsoft into the supercomputer market. (Taking out SGI did reduce
Microsoft's competition in graphics workstations, but that was a market they
already had traction in.)
- Finally, Linux has had almost 2 decades of annual "Linux on the Desktop"
pushes that universally failed, and there's a reason for this. Open source
development can't do good user interfaces for the same reason wikipedia can't
write a novel with a coherent plot. The limitations of the development model
do not allow for this. The old adage "too many cooks spoil the soup" is not
a warning about lack of nutrition, it's a warning that aesthetic issues do
not survive committees. Peer review does not produce blockbuster movies,
hit songs, or masterpiece paintings. It finds scientific facts, not beauty.
Any time "shut up and show me the code" is not the correct response to
the problem at hand, open source development melts down into one of three
distinct failure modes:
1) Endless discussion that never results in actual code, because
nobody can agree on a single course of action.
2) The project forks itself to death: everybody goes off and codes up their
preferred solution, but it's no easier to agree on a single approach after
the code exists so the forks never get merged.
3) Delegating the problem to nobody, either by A) separating engine from
interface and focusing on the engine in hopes that some glorious day somebody
will write an interface worth using, or B) making the interface so configurable
that the fact it takes hours to figure out what your options are and still has
no sane defaults is now somehow the end user's fault.
Open source development defeats Brooks' Law by leveraging empirical tests.
Integrating the results of decoupled development efforts is made possible
by the ability to unequivocally determine which approaches are best (trusted
engineers break ties, but it has to be pretty close and the arguments go back
and forth). Even changing the design and repeatedly ripping out existing
implementations is doable if everyone can at least retroactively agree that what
we have now is better that what we used to have, and we should stop fighting
to go back to the old way.
In the absence of empirical tests, this doesn't work. By their nature,
aesthetic issues do not have emprical tests for "better" or "worse".
Chinese food is not "better" than mexican food. But if you can't
decide what you're doing (if one chef insists on adding ketchup and another
bacon and a third ice cream) the end result is an incoherent mess. (At
best you get beige and the DMV. Navigable with enough effort, but not
appealing.)
The way around this is to a have a single author with a clear vision
in charge of the user interface, who can make aesthetic decisions that are
coherent rather than "correct". Unfortunately when this does happen, the
open source community pressures the developer of a successful project to
give over control of the project to a committee. So the Gecko engine was
buried in the unusable Mozilla browser, then Galleon forked off from that
and Mozilla rebased itself on the Galleon fork. Then Firefox forked off
of that and the Mozilla foundation took over Firefox...
Part of the success of Android is that its user experience is NOT
community developed. (This isn't just desktop, this is "if the whole thing
pauses for two seconds while somebody's typing in a phone number, that's
unacceptable". All the way down to the bare metal, the OS serves the task
of being a handheld interactive touch screen device running off of battery
power first, being anything else it _could_ be doing second.)