lfs/part3intro/toolchaintechnotes.xml
2025-04-08 12:50:25 -05:00

474 lines
24 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % general-entities SYSTEM "../general.ent">
%general-entities;
<!ENTITY host-triplet
"<replaceable>&lt;the host triplet&gt;</replaceable>">
]>
<sect1 id="ch-tools-toolchaintechnotes" xreflabel="Toolchain Technical Notes">
<?dbhtml filename="toolchaintechnotes.html"?>
<title>Toolchain Technical Notes</title>
<para>This section explains some of the rationale and technical details
behind the overall build method. Don't try to immediately
understand everything in this section. Most of this information will be
clearer after performing an actual build. Come back and re-read this chapter
at any time during the build process.</para>
<para>The overall goal of <xref linkend="chapter-cross-tools"/> and <xref
linkend="chapter-temporary-tools"/> is to produce a temporary area
containing a set of tools that are known to be good, and that are isolated from the host system.
By using the <command>chroot</command> command, the compilations in the remaining chapters
will be isolated within that environment, ensuring a clean, trouble-free
build of the target LFS system. The build process has been designed to
minimize the risks for new readers, and to provide the most educational value
at the same time.</para>
<para>This build process is based on
<emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
to build a compiler and its associated toolchain for a machine different from
the one that is used for the build. This is not strictly necessary for LFS,
since the machine where the new system will run is the same as the one
used for the build. But cross-compilation has one great advantage:
anything that is cross-compiled cannot depend on the host environment.</para>
<sect2 id="cross-compile" xreflabel="About Cross-Compilation">
<title>About Cross-Compilation</title>
<note>
<para>
The LFS book is not (and does not contain) a general tutorial to
build a cross- (or native) toolchain. Don't use the commands in the
book for a cross-toolchain for some purpose other
than building LFS, unless you really understand what you are doing.
</para>
<para>
It's known installing GCC pass 2 will break the cross-toolchain.
We don't consider it a bug because GCC pass 2 is the last package
to be cross-compiled in the book, and we won't <quote>fix</quote>
it until we really need to cross-compile some package after GCC
pass 2 in the future.
</para>
</note>
<para>Cross-compilation involves some concepts that deserve a section of
their own. Although this section may be omitted on a first reading,
coming back to it later will help you gain a fuller understanding of
the process.</para>
<para>Let us first define some terms used in this context.</para>
<variablelist>
<varlistentry><term>The build</term><listitem>
<para>is the machine where we build programs. Note that this machine
is also referred to as the <quote>host.</quote></para></listitem>
</varlistentry>
<varlistentry><term>The host</term><listitem>
<para>is the machine/system where the built programs will run. Note
that this use of <quote>host</quote> is not the same as in other
sections.</para></listitem>
</varlistentry>
<varlistentry><term>The target</term><listitem>
<para>is only used for compilers. It is the machine the compiler
produces code for. It may be different from both the build and
the host.</para></listitem>
</varlistentry>
</variablelist>
<para>As an example, let us imagine the following scenario (sometimes
referred to as <quote>Canadian Cross</quote>). We have a
compiler on a slow machine only, let's call it machine A, and the compiler
ccA. We also have a fast machine (B), but no compiler for (B), and we
want to produce code for a third, slow machine (C). We will build a
compiler for machine C in three stages.</para>
<informaltable align="center">
<tgroup cols="5">
<colspec colnum="1" align="center"/>
<colspec colnum="2" align="center"/>
<colspec colnum="3" align="center"/>
<colspec colnum="4" align="center"/>
<colspec colnum="5" align="left"/>
<thead>
<row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
<entry>Target</entry><entry>Action</entry></row>
</thead>
<tbody>
<row>
<entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
<entry>Build cross-compiler cc1 using ccA on machine A.</entry>
</row>
<row>
<entry>2</entry><entry>A</entry><entry>B</entry><entry>C</entry>
<entry>Build cross-compiler cc2 using cc1 on machine A.</entry>
</row>
<row>
<entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
<entry>Build compiler ccC using cc2 on machine B.</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>Then, all the programs needed by machine C can be compiled
using cc2 on the fast machine B. Note that unless B can run programs
produced for C, there is no way to test the newly built programs until machine
C itself is running. For example, to run a test suite on ccC, we may want to add a
fourth stage:</para>
<informaltable align="center">
<tgroup cols="5">
<colspec colnum="1" align="center"/>
<colspec colnum="2" align="center"/>
<colspec colnum="3" align="center"/>
<colspec colnum="4" align="center"/>
<colspec colnum="5" align="left"/>
<thead>
<row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
<entry>Target</entry><entry>Action</entry></row>
</thead>
<tbody>
<row>
<entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
<entry>Rebuild and test ccC using ccC on machine C.</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>In the example above, only cc1 and cc2 are cross-compilers, that is,
they produce code for a machine different from the one they are run on.
The other compilers ccA and ccC produce code for the machine they are run
on. Such compilers are called <emphasis>native</emphasis> compilers.</para>
</sect2>
<sect2 id="lfs-cross">
<title>Implementation of Cross-Compilation for LFS</title>
<note>
<para>All the cross-compiled packages in this book use an
autoconf-based building system. The autoconf-based building system
accepts system types in the form cpu-vendor-kernel-os,
referred to as the system triplet. Since the vendor field is often
irrelevant, autoconf lets you omit it.</para>
<para>An astute reader may wonder
why a <quote>triplet</quote> refers to a four component name. The
kernel field and the os field began as a single
<quote>system</quote> field. Such a three-field form is still valid
today for some systems, for example,
<literal>x86_64-unknown-freebsd</literal>. But
two systems can share the same kernel and still be too different to
use the same triplet to describe them. For example, Android running on a
mobile phone is completely different from Ubuntu running on an ARM64
server, even though they are both running on the same type of CPU (ARM64) and
using the same kernel (Linux).</para>
<para>Without an emulation layer, you cannot run an
executable for a server on a mobile phone or vice versa. So the
<quote>system</quote> field has been divided into kernel and os fields, to
designate these systems unambiguously. In our example, the Android
system is designated <literal>aarch64-unknown-linux-android</literal>,
and the Ubuntu system is designated
<literal>aarch64-unknown-linux-gnu</literal>.</para>
<para>The word <quote>triplet</quote> remains embedded in the lexicon. A simple way to determine your
system triplet is to run the <command>config.guess</command>
script that comes with the source for many packages. Unpack the binutils
sources, run the script <userinput>./config.guess</userinput>, and note
the output. For example, for a 32-bit Intel processor the
output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>. On most
Linux systems the even simpler <command>gcc -dumpmachine</command> command
will give you similar information.</para>
<para>You should also be aware of the name of the platform's dynamic linker, often
referred to as the dynamic loader (not to be confused with the standard
linker <command>ld</command> that is part of binutils). The dynamic linker
provided by package glibc finds and loads the shared libraries needed by a
program, prepares the program to run, and then runs it. The name of the
dynamic linker for a 32-bit Intel machine is <filename
class="libraryfile">ld-linux.so.2</filename>; it's <filename
class="libraryfile">ld-linux-x86-64.so.2</filename> on 64-bit systems. A
sure-fire way to determine the name of the dynamic linker is to inspect a
random binary from the host system by running: <userinput>readelf -l
&lt;name of binary&gt; | grep interpreter</userinput> and noting the
output. The authoritative reference covering all platforms is in
<ulink url='https://sourceware.org/glibc/wiki/ABIList'>a Glibc wiki
page</ulink>.</para>
</note>
<para>
There are two key points for a cross-compilation:
</para>
<itemizedlist>
<listitem>
<para>
When producing and processing the machine code supposed to be
executed on <quote>the host,</quote> the cross-toolchain must be
used. Note that the native toolchain from <quote>the build</quote>
may be still invoked to generate machine code supposed to be
executed on <quote>the build.</quote> For example, the build system
may compile a generator with the native toolchain, then generate
a C source file with the generator, and finally compile the C
source file with the cross-toolchain so the generated code will
be able to run on <quote>the host.</quote>
</para>
<para>
With an autoconf-based build system, this requirement is ensured
by using the <parameter>--host</parameter> switch to specify
<quote>the host</quote> triplet. With this switch the build
system will use the toolchain components prefixed
with <literal>&host-triplet;</literal>
for generating and processing the machine code for
<quote>the host</quote>; e.g. the compiler will be
<command>&host-triplet;-gcc</command> and the
<command>readelf</command> tool will be
<command>&host-triplet;-readelf</command>.
</para>
</listitem>
<listitem>
<para>
The build system should not attempt to run any generated machine
code supposed to be executed on <quote>the host.</quote> For
example, when building a utility natively, its man page can be
generated by running the utility with the
<parameter>--help</parameter> switch and processing the output,
but generally it's not possible to do so for a cross-compilation
as the utility may fail
to run on <quote>the build</quote>: it's obviously impossible to
run ARM64 machine code on a x86 CPU (without an emulator).
</para>
<para>
With an autoconf-based build system, this requirement is
satisfied in <quote>the cross-compilation mode</quote> where
the optional features requiring to run machine code for
<quote>the host</quote> during the build time are disabled. When <quote>the
host</quote> triplet is explicitly specified, <quote>the
cross-compilation mode</quote> is enabled if and only if either
the <command>configure</command> script fails to run a dummy
program compiled into <quote>the host</quote> machine code, or
<quote>the build</quote> triplet is explicitly specified via the
<parameter>--build</parameter> switch and it's different from
<quote>the host</quote> triplet.
</para>
</listitem>
</itemizedlist>
<para>In order to cross-compile a package for the LFS temporary system,
the name of the system triplet is slightly adjusted by changing the
&quot;vendor&quot; field in the <envar>LFS_TGT</envar> variable so it
says &quot;lfs&quot; and <envar>LFS_TGT</envar> is then specified as
<quote>the host</quote> triplet via <parameter>--host</parameter>, so
the cross-toolchain will be used for generating and processing the
machine code running as a part of the LFS temporary system. And, we
also need to enable <quote>the cross-compilation mode</quote>: despite
<quote>the host</quote> machine code, i.e. the machine code for the LFS
temporary system, is able to execute on the current CPU, it may refer
to a library not available on the <quote>the build</quote> (the host
distro), or some code or data non-exist or defined differently in the
library even if it happens to be available. When cross-compiling a
package for the LFS temporary system, we cannot rely on the
<command>configure</command> script to detect this issue with the
dummy program: the dummy only uses a few components in
<systemitem class='library'>libc</systemitem> that the host distro
<systemitem class='library'>libc</systemitem> likely provides (unless,
maybe the host distro uses a different
<systemitem class='library'>libc</systemitem> implementation like Musl),
so it won't fail like how the really useful programs would likely.
Thus we must explicitly specify <quote>the build</quote> triplet to
enable <quote>the cross-compilation mode.</quote> The value we use is
just the default, i.e. the original system triplet from
<command>config.guess</command> output, but <quote>the cross-compilation
mode</quote> depends on an explicit specification as we've
discussed.</para>
<para>We use the <parameter>--with-sysroot</parameter> option when
building the cross-linker and cross-compiler, to tell them where to find
the needed files for <quote>the host.</quote> This nearly ensures that
none of the other programs built in
<xref linkend="chapter-temporary-tools"/> can link to libraries on
<quote>the build.</quote> The word <quote>nearly</quote> is used because
<command>libtool</command>, a <quote>compatibility</quote> wrapper of
the compiler and the linker for autoconf-based build systems,
can try to be too clever and mistakenly pass options allowing the linker
to find libraries of <quote>the build.</quote>
To prevent this fallout, we need to delete the libtool archive
(<filename class='extension'>.la</filename>) files and fix up an
outdated libtool copy shipped with the Binutils code.</para>
<informaltable align="center">
<tgroup cols="5">
<colspec colnum="1" align="center"/>
<colspec colnum="2" align="center"/>
<colspec colnum="3" align="center"/>
<colspec colnum="4" align="center"/>
<colspec colnum="5" align="left"/>
<thead>
<row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
<entry>Target</entry><entry>Action</entry></row>
</thead>
<tbody>
<row>
<entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
<entry>Build cross-compiler cc1 using cc-pc on pc.</entry>
</row>
<row>
<entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
<entry>Build compiler cc-lfs using cc1 on pc.</entry>
</row>
<row>
<entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
<entry>Rebuild (and maybe test) cc-lfs using cc-lfs on lfs.</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>In the preceding table, <quote>on pc</quote> means the commands are run
on a machine using the already installed distribution. <quote>On
lfs</quote> means the commands are run in a chrooted environment.</para>
<para>This is not yet the end of the story. The C language is not
merely a compiler; it also defines a standard library. In this book, the
GNU C library, named glibc, is used (there is an alternative, &quot;musl&quot;). This library must
be compiled for the LFS machine; that is, using the cross-compiler cc1.
But the compiler itself uses an internal library providing complex
subroutines for functions not available in the assembler instruction set. This
internal library is named libgcc, and it must be linked to the glibc
library to be fully functional. Furthermore, the standard library for
C++ (libstdc++) must also be linked with glibc. The solution to this
chicken and egg problem is first to build a degraded cc1-based libgcc,
lacking some functionalities such as threads and exception handling, and then
to build glibc using this degraded compiler (glibc itself is not
degraded), and also to build libstdc++. This last library will lack some of the
functionality of libgcc.</para>
<para>The upshot of the preceding paragraph is that cc1 is unable to
build a fully functional libstdc++ with the degraded libgcc, but cc1
is the only compiler available for building the C/C++ libraries
during stage 2. As we've discussed, we cannot run cc-lfs on pc (the
host distro) because it may require some library, code, or data not
available on <quote>the build</quote> (the host distro).
So when we build gcc stage 2, we override the library
search path to link libstdc++ against the newly
rebuilt libgcc instead of the old, degraded build. This makes the rebuilt
libstdc++ fully functional.</para>
<para>In &ch-final; (or <quote>stage 3</quote>), all the packages needed for
the LFS system are built. Even if a package has already been installed into
the LFS system in a previous chapter, we still rebuild the package. The main reason for
rebuilding these packages is to make them stable: if we reinstall an LFS
package on a completed LFS system, the reinstalled content of the package
should be the same as the content of the same package when first installed in
&ch-final;. The temporary packages installed in &ch-tmp-cross; or
&ch-tmp-chroot; cannot satisfy this requirement, because some optional
features of them are disabled because of either the missing
dependencies or the <quote>cross-compilation mode.</quote>
Additionally, a minor reason for rebuilding the packages is to run the
test suites.</para>
</sect2>
<sect2 id="other-details">
<title>Other Procedural Details</title>
<para>The cross-compiler will be installed in a separate <filename
class="directory">$LFS/tools</filename> directory, since it will not
be part of the final system.</para>
<para>Binutils is installed first because the <command>configure</command>
runs of both gcc and glibc perform various feature tests on the assembler
and linker to determine which software features to enable or disable. This
is more important than one might realize at first. An incorrectly configured
gcc or glibc can result in a subtly broken toolchain, where the impact of
such breakage might not show up until near the end of the build of an
entire distribution. A test suite failure will usually highlight this error
before too much additional work is performed.</para>
<para>Binutils installs its assembler and linker in two locations,
<filename class="directory">$LFS/tools/bin</filename> and <filename
class="directory">$LFS/tools/$LFS_TGT/bin</filename>. The tools in one
location are hard linked to the other. An important facet of the linker is
its library search order. Detailed information can be obtained from
<command>ld</command> by passing it the <parameter>--verbose</parameter>
flag. For example, <command>$LFS_TGT-ld --verbose | grep SEARCH</command>
will illustrate the current search paths and their order. (Note that this
example can be run as shown only while logged in as user
<systemitem class="username">lfs</systemitem>. If you come back to this
page later, replace <command>$LFS_TGT-ld</command> with
<command>ld</command>).</para>
<para>The next package installed is gcc. An example of what can be
seen during its run of <command>configure</command> is:</para>
<screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
<para>This is important for the reasons mentioned above. It also
demonstrates that gcc's configure script does not search the PATH
directories to find which tools to use. However, during the actual
operation of <command>gcc</command> itself, the same search paths are not
necessarily used. To find out which standard linker <command>gcc</command>
will use, run: <command>$LFS_TGT-gcc -print-prog-name=ld</command>. (Again,
remove the <command>$LFS_TGT-</command> prefix if coming back to this
later.)</para>
<para>Detailed information can be obtained from <command>gcc</command> by
passing it the <parameter>-v</parameter> command line option while compiling
a program. For example, <command>$LFS_TGT-gcc -v
<replaceable>example.c</replaceable></command> (or without <command>
$LFS_TGT-</command> if coming back later) will show
detailed information about the preprocessor, compilation, and assembly
stages, including <command>gcc</command>'s search paths for included
headers and their order.</para>
<para>Next up: sanitized Linux API headers. These allow the
standard C library (glibc) to interface with features that the Linux
kernel will provide.</para>
<para>Next comes glibc. This is the first package that we cross-compile.
We use the <parameter>--host=$LFS_TGT</parameter> option to make
the build system to use those tools prefixed with
<literal>$LFS_TGT-</literal>, and the
<parameter>--build=$(../scripts/config.guess)</parameter> option to
enable <quote>the cross-compilation mode</quote> as we've discussed.
The <envar>DESTDIR</envar> variable is used to force installation into
the LFS file system.</para>
<para>As mentioned above, the standard C++ library is compiled next, followed in
<xref linkend="chapter-temporary-tools"/> by other programs that must
be cross-compiled to break circular dependencies at build time.
The steps for those packages are similar to the steps for glibc.</para>
<para>At the end of <xref linkend="chapter-temporary-tools"/> the native
LFS compiler is installed. First binutils-pass2 is built,
in the same <envar>DESTDIR</envar> directory as the other programs,
then the second pass of gcc is constructed, omitting some
non-critical libraries.</para>
<para>Upon entering the chroot environment in <xref
linkend="chapter-chroot-temporary-tools"/>,
the temporary installations of programs needed for the proper
operation of the toolchain are performed. From this point onwards, the
core toolchain is self-contained and self-hosted. In
<xref linkend="chapter-building-system"/>, final versions of all the
packages needed for a fully functional system are built, tested, and
installed.</para>
</sect2>
</sect1>