[FWL] Fighting with compiler refactoring.

Rob Landley rob at landley.net
Fri Dec 25 18:35:14 PST 2009


Hello.  I know I've been silent for a bit, here's the problem:

I've been blocked for several days on the compiler refactoring, not because 
it's hard to make it _work_, but because the UI requirements are insane.  I'm 
hoping they're not _inherently_ insane, and it's just that I've got things 
buggered up.  Unfortunately, I have yet to even _explain_ the problem 
concisely, despite multiple attempts.

The big divide is between the simple cross compiler, and the two (fairly 
similar) types of compilers created via canadian cross.
  
The simple compilers have everything feasible disabled: no thread support, no 
libgcc_s.so, and no c++ library, and are thus somewhat crippled, but a lot 
easier to build from an arbitrary host environment.  Building the simple 
compiler is a four stage process: build binutils, gcc, ccwrap, and uClibc, in 
that order.  Each gets built once, and only depends on things built before it. 
  
The second set of compilers aren't actually needed to build a system image, 
all one of those needs is the simple cross compiler.  But putting a powerful 
native compiler in the system image (with thread support and uclibc++) is half 
the point of the exercise, and some people insist on cross compiling lots of 
stuff no matter what I say, and they want a cross compiler as full-featured as 
that native one.

The second compilers are built via canadian cross, which is an evil nasty 
tangled process that only comes up when you're building a compiler, not when 
you're building any other type of package.  They need existing (simple) cross 
compilers for both host and target (yes, _two_ existing cross compilers, 
unless your host and target are the same, which means you're building a native 
compiler for another platform) .  The first thing they do is built uClibc, and 
then they build binutils, gcc (beating extra c++ support out of it), and 
ccwrap, and then go on to build uClibc++. (using that extra c++ library built 
earlier)  They have libgcc_s.so, and thread support, and can be statically 
linked against uClibc on the host so they're extremely portable.

Right now, the simple cross compiler is built by its own script (cross-
compiler.sh), and all the others are built by root-filesystem.sh.  This has two 
problems: 1) I have to scripts largely doing the same thing, 2) root-
filesystem.sh does a lot of _other_ stuff, unrelated to creating a cross 
compiler, and it's being called multiple times with lots of environment 
variables to tell it _not_ to do stuff.  Its organization is a mess of large 
if/else statements.

I'd like to refactor this mess to fix that, and just have one compiler building 
script that gets called from everywhere that needs it.

The first refactoring I did was the build/sections stuff.  The actual _package_ 
builds for binutils, gcc, uClibc, ccwrap, and uClibc++ have now been factored 
out.  The rest of the code has been reduced to wrappers that call those 
stages.  Now that's done, in theory the rest of the code should be capable of 
being factored out, untangled, and unified. 

In practice, it's still a FLAMING PAIN:

1) There remains a lot of setup/teardown boilerplate.  Including 
sources/includes.sh, reading the architecture information, checking to see if 
we've got a base architecture, tarring the result up at the end...  Possibly 
some kind of wrapper would help here?  Except what would it _wrap_?  (A 
separate script file?  A shell function?  It's too much repetition for me to be 
comfortable with, but not _quite_ enough to justify the extra infrastructure 
to make it go away and moving plumbing out of sight where you can't clearly 
see what it's actually _doing_ without knowing where to look.  (I want it to 
be both invisible and explicit.  These are contradictory goals, there _is_ no 
good solution without rethinking the problem somehow.)

2) cross-compiler.sh is doing some optional stuff that root-filesystem.sh isn't.  
It's building "hello world" programs with the new compiler (as a sanity test), 
and creating a README.  I can't unify stuff that's not behaving the same, but 
if the canadian build's HOST isn't the current system's (which I can't really 
test for automatically because i686 runs on x86_64 and armv4t runs on armv6 
and building in a relational table for that is not a good idea).  The README 
migth be generic, but the smoke test isn't.  Yet a simple cross compiler 
hasn't got an explicit HOST, it always uses the current one and thus the hello 
world build stuff should always work.  (The result might not run, but hello 
world should _build_.)

3) The sequence differs between simple and canadian builds.  Do you build 
uClibc before or after the other stuff?  Do you build uClibc++ and its 
prerequisite at all?  There are still some largeish if[] clauses in there.  
For the moment, I'm thinking maybe I should just leave cross-compiler.sh alone 
and break build-compiler.sh out of root-filesystem.sh first.  (Just cleaning up 
root-filesystem.sh is still a largeish win.)

So let's assume for the moment that I'm _just_ going to break the canadian 
stuff out of root-filesystem.sh and worry about unifying cross-compiler.sh with 
that (or not) later.  Even ignore the repeated boilerplate for now.  There's 
still a problem:

4) The UI to specify what I _want_ requires at least _three_ pieces of 
information per invocation: what host, what target, and what compiler prefix.  
That's a lot more complicated than the simple compiler.

(Why specify compiler prefix?  Because if I build a cross compiler that 
coincidentally outputs code for the current host, I want it prefixed.  But if I 
build a _native_ compiler, I want no prefix, I want just "cc" and "ld" and 
"strip" and so on.  Note that x86_64-cc is still producing binaries linked 
against uClibc, while the host's gcc is probably glibc, and these binaries 
will probably only run if statically linked.  Besides, when I build every 
supported architecture the current host shouldn't behave any differently from 
the others; making users of this compiler special case the host to _not_ be 
prefixed is just evil.  Whether the new compiler will be used as a cross 
compiler or a native compiler isn't a technical issue, it's a matter of 
intent, and thus must be specified, not calculated.)

5) As I blogged a few days ago, I have CROSS_HOST and CROSS_TARGET and 
FROM_ARCH and FROM_HOST and ARCH, and even I have to stop and look up which 
does what.  It's just too many variables doing _almost_ the same thing.  
Unfortunately, the evil seems to largely be in binutils and uClibc, which want 
some of this crap specified when there's no excuse for it.  I'm trying to figure 
out how to synthesize the info it thinks it needs from the info it actually 
needs closer to the usage site, but binutils/gcc demand hair splitting to 
avoid triggering its STUPID ASSUMPTIONS and complete lack of orthogonal 
behavior.

All of this is "biting off more than I can chew".  Every time I tackle this, it 
blows up into a big tangled mess where I'm trying to fix EVERYTHING AT ONCE, 
which tends not to work.  But the problem is that it _is_ currently a big 
tangled mess, and I'm trying to _separate_ it, and it's not giving me good 
fracture lines I can cleave it along.

Possibly I can deal with #4 by always producing two directories of tool names, 
one full of prefixed names and the other full of unprefixed names.  (In theory 
I've _got_ that with "bin" and "tools".  In practice tools isn't complete and 
hasn't got the wrapper in it, which supplies the header and library paths.)

I've been wrestling with this, on and off, for a couple weeks.  "This is a 
mess, I must clean it up" doesn't always immediately suggest _how_.

So yeah, that was my week.  I expect to get it done eventually, just taking a 
while...

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds


More information about the firmware mailing list