[FWL] Fighting with compiler refactoring.
Rob Landley
rob at landley.net
Fri Dec 25 18:35:14 PST 2009
Hello. I know I've been silent for a bit, here's the problem:
I've been blocked for several days on the compiler refactoring, not because
it's hard to make it _work_, but because the UI requirements are insane. I'm
hoping they're not _inherently_ insane, and it's just that I've got things
buggered up. Unfortunately, I have yet to even _explain_ the problem
concisely, despite multiple attempts.
The big divide is between the simple cross compiler, and the two (fairly
similar) types of compilers created via canadian cross.
The simple compilers have everything feasible disabled: no thread support, no
libgcc_s.so, and no c++ library, and are thus somewhat crippled, but a lot
easier to build from an arbitrary host environment. Building the simple
compiler is a four stage process: build binutils, gcc, ccwrap, and uClibc, in
that order. Each gets built once, and only depends on things built before it.
The second set of compilers aren't actually needed to build a system image,
all one of those needs is the simple cross compiler. But putting a powerful
native compiler in the system image (with thread support and uclibc++) is half
the point of the exercise, and some people insist on cross compiling lots of
stuff no matter what I say, and they want a cross compiler as full-featured as
that native one.
The second compilers are built via canadian cross, which is an evil nasty
tangled process that only comes up when you're building a compiler, not when
you're building any other type of package. They need existing (simple) cross
compilers for both host and target (yes, _two_ existing cross compilers,
unless your host and target are the same, which means you're building a native
compiler for another platform) . The first thing they do is built uClibc, and
then they build binutils, gcc (beating extra c++ support out of it), and
ccwrap, and then go on to build uClibc++. (using that extra c++ library built
earlier) They have libgcc_s.so, and thread support, and can be statically
linked against uClibc on the host so they're extremely portable.
Right now, the simple cross compiler is built by its own script (cross-
compiler.sh), and all the others are built by root-filesystem.sh. This has two
problems: 1) I have to scripts largely doing the same thing, 2) root-
filesystem.sh does a lot of _other_ stuff, unrelated to creating a cross
compiler, and it's being called multiple times with lots of environment
variables to tell it _not_ to do stuff. Its organization is a mess of large
if/else statements.
I'd like to refactor this mess to fix that, and just have one compiler building
script that gets called from everywhere that needs it.
The first refactoring I did was the build/sections stuff. The actual _package_
builds for binutils, gcc, uClibc, ccwrap, and uClibc++ have now been factored
out. The rest of the code has been reduced to wrappers that call those
stages. Now that's done, in theory the rest of the code should be capable of
being factored out, untangled, and unified.
In practice, it's still a FLAMING PAIN:
1) There remains a lot of setup/teardown boilerplate. Including
sources/includes.sh, reading the architecture information, checking to see if
we've got a base architecture, tarring the result up at the end... Possibly
some kind of wrapper would help here? Except what would it _wrap_? (A
separate script file? A shell function? It's too much repetition for me to be
comfortable with, but not _quite_ enough to justify the extra infrastructure
to make it go away and moving plumbing out of sight where you can't clearly
see what it's actually _doing_ without knowing where to look. (I want it to
be both invisible and explicit. These are contradictory goals, there _is_ no
good solution without rethinking the problem somehow.)
2) cross-compiler.sh is doing some optional stuff that root-filesystem.sh isn't.
It's building "hello world" programs with the new compiler (as a sanity test),
and creating a README. I can't unify stuff that's not behaving the same, but
if the canadian build's HOST isn't the current system's (which I can't really
test for automatically because i686 runs on x86_64 and armv4t runs on armv6
and building in a relational table for that is not a good idea). The README
migth be generic, but the smoke test isn't. Yet a simple cross compiler
hasn't got an explicit HOST, it always uses the current one and thus the hello
world build stuff should always work. (The result might not run, but hello
world should _build_.)
3) The sequence differs between simple and canadian builds. Do you build
uClibc before or after the other stuff? Do you build uClibc++ and its
prerequisite at all? There are still some largeish if[] clauses in there.
For the moment, I'm thinking maybe I should just leave cross-compiler.sh alone
and break build-compiler.sh out of root-filesystem.sh first. (Just cleaning up
root-filesystem.sh is still a largeish win.)
So let's assume for the moment that I'm _just_ going to break the canadian
stuff out of root-filesystem.sh and worry about unifying cross-compiler.sh with
that (or not) later. Even ignore the repeated boilerplate for now. There's
still a problem:
4) The UI to specify what I _want_ requires at least _three_ pieces of
information per invocation: what host, what target, and what compiler prefix.
That's a lot more complicated than the simple compiler.
(Why specify compiler prefix? Because if I build a cross compiler that
coincidentally outputs code for the current host, I want it prefixed. But if I
build a _native_ compiler, I want no prefix, I want just "cc" and "ld" and
"strip" and so on. Note that x86_64-cc is still producing binaries linked
against uClibc, while the host's gcc is probably glibc, and these binaries
will probably only run if statically linked. Besides, when I build every
supported architecture the current host shouldn't behave any differently from
the others; making users of this compiler special case the host to _not_ be
prefixed is just evil. Whether the new compiler will be used as a cross
compiler or a native compiler isn't a technical issue, it's a matter of
intent, and thus must be specified, not calculated.)
5) As I blogged a few days ago, I have CROSS_HOST and CROSS_TARGET and
FROM_ARCH and FROM_HOST and ARCH, and even I have to stop and look up which
does what. It's just too many variables doing _almost_ the same thing.
Unfortunately, the evil seems to largely be in binutils and uClibc, which want
some of this crap specified when there's no excuse for it. I'm trying to figure
out how to synthesize the info it thinks it needs from the info it actually
needs closer to the usage site, but binutils/gcc demand hair splitting to
avoid triggering its STUPID ASSUMPTIONS and complete lack of orthogonal
behavior.
All of this is "biting off more than I can chew". Every time I tackle this, it
blows up into a big tangled mess where I'm trying to fix EVERYTHING AT ONCE,
which tends not to work. But the problem is that it _is_ currently a big
tangled mess, and I'm trying to _separate_ it, and it's not giving me good
fracture lines I can cleave it along.
Possibly I can deal with #4 by always producing two directories of tool names,
one full of prefixed names and the other full of unprefixed names. (In theory
I've _got_ that with "bin" and "tools". In practice tools isn't complete and
hasn't got the wrapper in it, which supplies the header and library paths.)
I've been wrestling with this, on and off, for a couple weeks. "This is a
mess, I must clean it up" doesn't always immediately suggest _how_.
So yeah, that was my week. I expect to get it done eventually, just taking a
while...
Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds
More information about the firmware
mailing list