Polish up the prose in "Toolchain Technical Notes". Fix capitalization.

Rough edges remain. For instance, $LFS_TGT-ld is referenced, but not
clearly defined. Will need to discuss wirh other editors to resolve.
This commit is contained in:
David Bryant 2022-09-28 14:56:52 -05:00
parent dd7f9df19f
commit 562062295e

View File

@ -11,26 +11,26 @@
<title>Toolchain Technical Notes</title>
<para>This section explains some of the rationale and technical details
behind the overall build method. It is not essential to immediately
behind the overall build method. Don't try to immediately
understand everything in this section. Most of this information will be
clearer after performing an actual build. This section can be referred
to at any time during the process.</para>
clearer after performing an actual build. Come back and re-read this chapter
at any time during the build process.</para>
<para>The overall goal of <xref linkend="chapter-cross-tools"/> and <xref
linkend="chapter-temporary-tools"/> is to produce a temporary area that
contains a known-good set of tools that can be isolated from the host system.
By using <command>chroot</command>, the commands in the remaining chapters
will be contained within that environment, ensuring a clean, trouble-free
linkend="chapter-temporary-tools"/> is to produce a temporary area
containing a set of tools that are known to be good, and that are isolated from the host system.
By using the <command>chroot</command> command, the compilations in the remaining chapters
will be isolated within that environment, ensuring a clean, trouble-free
build of the target LFS system. The build process has been designed to
minimize the risks for new readers and to provide the most educational value
minimize the risks for new readers, and to provide the most educational value
at the same time.</para>
<para>The build process is based on the process of
<para>This build process is based on
<emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
for building a compiler and its toolchain for a machine different from
the one that is used for the build. This is not strictly needed for LFS,
to build a compiler and its associated toolchain for a machine different from
the one that is used for the build. This is not strictly necessary for LFS,
since the machine where the new system will run is the same as the one
used for the build. But cross-compilation has the great advantage that
used for the build. But cross-compilation has one great advantage:
anything that is cross-compiled cannot depend on the host environment.</para>
<sect2 id="cross-compile" xreflabel="About Cross-Compilation">
@ -39,47 +39,46 @@
<note>
<para>
The LFS book is not, and does not contain a general tutorial to
build a cross (or native) toolchain. Don't use the command in the
book for a cross toolchain which will be used for some purpose other
The LFS book is not (and does not contain) a general tutorial to
build a cross (or native) toolchain. Don't use the commands in the
book for a cross toolchain for some purpose other
than building LFS, unless you really understand what you are doing.
</para>
</note>
<para>Cross-compilation involves some concepts that deserve a section on
their own. Although this section may be omitted in a first reading,
coming back to it later will be beneficial to your full understanding of
<para>Cross-compilation involves some concepts that deserve a section of
their own. Although this section may be omitted on a first reading,
coming back to it later will help you gain a fuller understanding of
the process.</para>
<para>Let us first define some terms used in this context:</para>
<para>Let us first define some terms used in this context.</para>
<variablelist>
<varlistentry><term>build</term><listitem>
<varlistentry><term>The build</term><listitem>
<para>is the machine where we build programs. Note that this machine
is referred to as the <quote>host</quote> in other
sections.</para></listitem>
is also referred to as the <quote>host</quote>.</para></listitem>
</varlistentry>
<varlistentry><term>host</term><listitem>
<varlistentry><term>The host</term><listitem>
<para>is the machine/system where the built programs will run. Note
that this use of <quote>host</quote> is not the same as in other
sections.</para></listitem>
</varlistentry>
<varlistentry><term>target</term><listitem>
<varlistentry><term>The target</term><listitem>
<para>is only used for compilers. It is the machine the compiler
produces code for. It may be different from both build and
host.</para></listitem>
produces code for. It may be different from both the build and
the host.</para></listitem>
</varlistentry>
</variablelist>
<para>As an example, let us imagine the following scenario (sometimes
referred to as <quote>Canadian Cross</quote>): we may have a
referred to as <quote>Canadian Cross</quote>): we have a
compiler on a slow machine only, let's call it machine A, and the compiler
ccA. We may have also a fast machine (B), but with no compiler, and we may
want to produce code for another slow machine (C). To build a
compiler for machine C, we would have three stages:</para>
ccA. We also have a fast machine (B), but no compiler for (B), and we
want to produce code for a third, slow machine (C). We will build a
compiler for machine C in three stages.</para>
<informaltable align="center">
<tgroup cols="5">
@ -95,24 +94,24 @@
<tbody>
<row>
<entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
<entry>build cross-compiler cc1 using ccA on machine A</entry>
<entry>Build cross-compiler cc1 using ccA on machine A.</entry>
</row>
<row>
<entry>2</entry><entry>A</entry><entry>B</entry><entry>C</entry>
<entry>build cross-compiler cc2 using cc1 on machine A</entry>
<entry>Build cross-compiler cc2 using cc1 on machine A.</entry>
</row>
<row>
<entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
<entry>build compiler ccC using cc2 on machine B</entry>
<entry>Build compiler ccC using cc2 on machine B.</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>Then, all the other programs needed by machine C can be compiled
<para>Then, all the programs needed by machine C can be compiled
using cc2 on the fast machine B. Note that unless B can run programs
produced for C, there is no way to test the built programs until machine
C itself is running. For example, for testing ccC, we may want to add a
produced for C, there is no way to test the newly built programs until machine
C itself is running. For example, to run a test suite on ccC, we may want to add a
fourth stage:</para>
<informaltable align="center">
@ -129,7 +128,7 @@
<tbody>
<row>
<entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
<entry>rebuild and test ccC using itself on machine C</entry>
<entry>Rebuild and test ccC using ccC on machine C.</entry>
</row>
</tbody>
</tgroup>
@ -147,43 +146,45 @@
<note>
<para>Almost all the build systems use names of the form
cpu-vendor-kernel-os referred to as the machine triplet. An astute
cpu-vendor-kernel-os, referred to as the machine triplet. An astute
reader may wonder why a <quote>triplet</quote> refers to a four component
name. The reason is history: initially, three component names were enough
to designate a machine unambiguously, but with new machines and systems
appearing, that proved insufficient. The word <quote>triplet</quote>
name. The reason is historical: initially, three component names were enough
to designate a machine unambiguously, but as new machines and systems
proliferated, that proved insufficient. The word <quote>triplet</quote>
remained. A simple way to determine your machine triplet is to run
the <command>config.guess</command>
script that comes with the source for many packages. Unpack the binutils
sources and run the script: <userinput>./config.guess</userinput> and note
the output. For example, for a 32-bit Intel processor the
output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>.</para>
system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>. On most
Linux systems the even simpler <command>gcc -dumpmachine</command> command
will give you the same information.</para>
<para>Also be aware of the name of the platform's dynamic linker, often
<para>You should also be aware of the name of the platform's dynamic linker, often
referred to as the dynamic loader (not to be confused with the standard
linker <command>ld</command> that is part of binutils). The dynamic linker
provided by Glibc finds and loads the shared libraries needed by a
provided by package glibc finds and loads the shared libraries needed by a
program, prepares the program to run, and then runs it. The name of the
dynamic linker for a 32-bit Intel machine is <filename
class="libraryfile">ld-linux.so.2</filename> and is <filename
class="libraryfile">ld-linux-x86-64.so.2</filename> for 64-bit systems. A
class="libraryfile">ld-linux.so.2</filename>; it's <filename
class="libraryfile">ld-linux-x86-64.so.2</filename> on 64-bit systems. A
sure-fire way to determine the name of the dynamic linker is to inspect a
random binary from the host system by running: <userinput>readelf -l
&lt;name of binary&gt; | grep interpreter</userinput> and noting the
output. The authoritative reference covering all platforms is in the
<filename>shlib-versions</filename> file in the root of the Glibc source
<filename>shlib-versions</filename> file in the root of the glibc source
tree.</para>
</note>
<para>In order to fake a cross compilation in LFS, the name of the host triplet
is slightly adjusted by changing the &quot;vendor&quot; field in the
<envar>LFS_TGT</envar> variable. We also use the
<envar>LFS_TGT</envar> variable so it says &quot;lfs&quot;. We also use the
<parameter>--with-sysroot</parameter> option when building the cross linker and
cross compiler to tell them where to find the needed host files. This
ensures that none of the other programs built in <xref
linkend="chapter-temporary-tools"/> can link to libraries on the build
machine. Only two stages are mandatory, and one more for tests:</para>
machine. Only two stages are mandatory, plus one more for tests.</para>
<informaltable align="center">
<tgroup cols="5">
@ -199,47 +200,47 @@
<tbody>
<row>
<entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
<entry>build cross-compiler cc1 using cc-pc on pc</entry>
<entry>Build cross-compiler cc1 using cc-pc on pc.</entry>
</row>
<row>
<entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
<entry>build compiler cc-lfs using cc1 on pc</entry>
<entry>Build compiler cc-lfs using cc1 on pc.</entry>
</row>
<row>
<entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
<entry>rebuild and test cc-lfs using itself on lfs</entry>
<entry>Rebuild and test cc-lfs using cc-lfs on lfs.</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para>In the above table, <quote>on pc</quote> means the commands are run
<para>In the preceding table, <quote>on pc</quote> means the commands are run
on a machine using the already installed distribution. <quote>On
lfs</quote> means the commands are run in a chrooted environment.</para>
<para>Now, there is more about cross-compiling: the C language is not
just a compiler, but also defines a standard library. In this book, the
GNU C library, named glibc, is used. This library must
be compiled for the lfs machine, that is, using the cross compiler cc1.
GNU C library, named glibc, is used (there is an alternative, &quot;musl&quot;). This library must
be compiled for the LFS machine; that is, using the cross compiler cc1.
But the compiler itself uses an internal library implementing complex
instructions not available in the assembler instruction set. This
internal library is named libgcc, and must be linked to the glibc
subroutines for functions not available in the assembler instruction set. This
internal library is named libgcc, and it must be linked to the glibc
library to be fully functional! Furthermore, the standard library for
C++ (libstdc++) also needs being linked to glibc. The solution to this
chicken and egg problem is to first build a degraded cc1 based libgcc,
lacking some functionalities such as threads and exception handling, then
build glibc using this degraded compiler (glibc itself is not
degraded), then build libstdc++. But this last library will lack the
same functionalities as libgcc.</para>
C++ (libstdc++) must also be linked with glibc. The solution to this
chicken and egg problem is first to build a degraded cc1-based libgcc,
lacking some functionalities such as threads and exception handling, and then
to build glibc using this degraded compiler (glibc itself is not
degraded), and also to build libstdc++. This last library will lack some of the
functionality of libgcc.</para>
<para>This is not the end of the story: the conclusion of the preceding
<para>This is not the end of the story: the upshot of the preceding
paragraph is that cc1 is unable to build a fully functional libstdc++, but
this is the only compiler available for building the C/C++ libraries
during stage 2! Of course, the compiler built during stage 2, cc-lfs,
would be able to build those libraries, but (1) the build system of
GCC does not know that it is usable on pc, and (2) using it on pc
would be at risk of linking to the pc libraries, since cc-lfs is a native
compiler. So we have to build libstdc++ later, in chroot.</para>
gcc does not know that it is usable on pc, and (2) using it on pc
would create a risk of linking to the pc libraries, since cc-lfs is a native
compiler. So we have to re-build libstdc++ later, in the chroot environment.</para>
</sect2>
@ -252,10 +253,10 @@
be part of the final system.</para>
<para>Binutils is installed first because the <command>configure</command>
runs of both GCC and Glibc perform various feature tests on the assembler
runs of both gcc and glibc perform various feature tests on the assembler
and linker to determine which software features to enable or disable. This
is more important than one might first realize. An incorrectly configured
GCC or Glibc can result in a subtly broken toolchain, where the impact of
is more important than one might realize at first. An incorrectly configured
gcc or glibc can result in a subtly broken toolchain, where the impact of
such breakage might not show up until near the end of the build of an
entire distribution. A test suite failure will usually highlight this error
before too much additional work is performed.</para>
@ -274,14 +275,14 @@
<command>$LFS_TGT-gcc dummy.c -Wl,--verbose 2&gt;&amp;1 | grep succeeded</command>
will show all the files successfully opened during the linking.</para>
<para>The next package installed is GCC. An example of what can be
<para>The next package installed is gcc. An example of what can be
seen during its run of <command>configure</command> is:</para>
<screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
<para>This is important for the reasons mentioned above. It also
demonstrates that GCC's configure script does not search the PATH
demonstrates that gcc's configure script does not search the PATH
directories to find which tools to use. However, during the actual
operation of <command>gcc</command> itself, the same search paths are not
necessarily used. To find out which standard linker <command>gcc</command>
@ -295,12 +296,12 @@ checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</compute
order.</para>
<para>Next installed are sanitized Linux API headers. These allow the
standard C library (Glibc) to interface with features that the Linux
standard C library (glibc) to interface with features that the Linux
kernel will provide.</para>
<para>The next package installed is Glibc. The most important
considerations for building Glibc are the compiler, binary tools, and
kernel headers. The compiler is generally not an issue since Glibc will
<para>The next package installed is glibc. The most important
considerations for building glibc are the compiler, binary tools, and
kernel headers. The compiler is generally not an issue since glibc will
always use the compiler relating to the <parameter>--host</parameter>
parameter passed to its configure script; e.g. in our case, the compiler
will be <command>$LFS_TGT-gcc</command>. The binary tools and kernel
@ -313,26 +314,26 @@ checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</compute
<envar>$LFS_TGT</envar> expanded) to control which binary tools are used
and the use of the <parameter>-nostdinc</parameter> and
<parameter>-isystem</parameter> flags to control the compiler's include
search path. These items highlight an important aspect of the Glibc
search path. These items highlight an important aspect of the glibc
package&mdash;it is very self-sufficient in terms of its build machinery
and generally does not rely on toolchain defaults.</para>
<para>As said above, the standard C++ library is compiled next, followed in
<xref linkend="chapter-temporary-tools"/> by all the programs that need
themselves to be built. The install step of all those packages uses the
<envar>DESTDIR</envar> variable to have the
programs land into the LFS filesystem.</para>
<para>As mentioned above, the standard C++ library is compiled next, followed in
<xref linkend="chapter-temporary-tools"/> by all the remaining programs that need
to be cross compiled. The install step of all those packages uses the
<envar>DESTDIR</envar> variable to force installation
in the LFS filesystem.</para>
<para>At the end of <xref linkend="chapter-temporary-tools"/> the native
lfs compiler is installed. First binutils-pass2 is built,
with the same <envar>DESTDIR</envar> install as the other programs,
then the second pass of GCC is constructed, omitting libstdc++
and other non-important libraries. Due to some weird logic in GCC's
LFS compiler is installed. First binutils-pass2 is built,
in the same <envar>DESTDIR</envar> directory as the other programs,
then the second version of gcc is constructed, omitting libstdc++
and other non-critical libraries. Due to some weird logic in gcc's
configure script, <envar>CC_FOR_TARGET</envar> ends up as
<command>cc</command> when the host is the same as the target, but is
<command>cc</command> when the host is the same as the target, but
different from the build system. This is why
<parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is put explicitly into
the configure options.</para>
<parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is declared explicitly
as one of the configuration options.</para>
<para>Upon entering the chroot environment in <xref
linkend="chapter-chroot-temporary-tools"/>, the first task is to install