You are viewing nickclifton

Previous Entry | Next Entry

GNU Toolchain Update, October 2009

Deep Thought
Hi Guys,

  Well the major news this month is that a big new feature has been
  added to gcc: Link-Time Optimization.

  When this feature is enabled (via the -flto command line option) gcc
  interrupts the processing of a source file after it has converted
  it into the GIMPLE format (one of GCC's internal representations).
  Then, before carrying on with its optimizations, gcc writes the
  GIMPLE out to into special sections in the output object file.
  After that gcc carries on as normal to optimize the GIMPLE and then
  convert it into machine instructions which go into the normal
  sections in the object file.

  When object files containing these special GIMPLE sections are
  linked together they can be read in and optimized before the final
  link actually takes place.  This allows for greater optimization
  opportunities, especially with inter-procedural optimizations.

  To use the link-timer optimizer -flto needs to be specified at both
  compile time and during the final link.  For example,

    gcc -c -O2 -flto foo.c
    gcc -c -O2 -flto bar.c
    gcc -o myprog -flto -O2 foo.o bar.o

  Another (simpler) way to enable link-time optimization is,

    gcc -o myprog -flto -O2 foo.c bar.c

  Note that when a file is compiled with -flto, the generated object
  file will be larger than a regular object file because it will
  contain GIMPLE bytecodes and the usual final code.  This means that
  object files with LTO information can be linked as a normal object
  file.  So, in the previous example, if the final link is done with:

    gcc -o myprog foo.o bar.o

  The only difference will be that no inter-procedural optimizations
  will be applied to produce "myprog".  The two object files foo.o and
  bar.o will be simply sent to the regular linker.

  Additionally, the optimization flags used to compile individual
  files are not necessarily related to those used at link-time.  For
  instance:

    gcc -c -O0 -flto foo.c
    gcc -c -O0 -flto bar.c
    gcc -o myprog -flto -O3 foo.o bar.o

  This will produce individual object files with unoptimized assembler
  code, but the resulting binary "myprog" will be optimized at -O3.
  Now, if the final binary is generated without -flto, then "myprog"
  will not be optimized.

  When producing the final binary with -flto, GCC will only apply
  link-time optimizations to those files that contain bytecodes.
  Therefore, you can mix and match object files and libraries with
  GIMPLE bytecodes and final object code.  GCC will automatically
  select which files to optimize in LTO mode and which files to link
  without further processing.

  There are some code generation flags that GCC will preserve when
  generating bytecodes, as they need to be used during the final link
  stage.  Currently, the following options are saved into the GIMPLE
  bytecode files: -fPIC, -fcommon and all the -m target flags.

  At link time, these options are read-in and reapplied.  Note that
  the current implementation makes no attempt at recognizing
  conflicting values for these options.  If two or more files have a
  conflicting value (e.g., one file is compiled with -fPIC and another
  isn't), the compiler will simply use the last value read from the
  bytecode files.  It is recommended, then, that all the files
  participating in the same link be compiled with the same options.

  Another feature of LTO is that it is possible to apply
  interprocedural optimizations on files written in different
  languages.  This requires some support in the language front end.
  Currently, the C, C++ and Fortran front ends are capable of emitting
  GIMPLE bytecodes, so something like this should work

    gcc -c -flto foo.c
    g++ -c -flto bar.cc
    gfortran -c -flto baz.f90
    g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran

  Notice that the final link is done with g++ to get the C++ runtime
  libraries and -lgfortran is added to get the Fortran runtime
  libraries.  In general, when mixing languages in LTO mode, you
  should use the same link command used when mixing languages in a
  regular (non-LTO) compilation.  This means that if your build
  process was mixing languages before, all you need to add is
  -flto to all the compile and link commands.

  If object files containing GIMPLE bytecode are stored in a library
  archive, say libfoo.a, it is possible to extract and use them
  in an LTO link if you are using gold as the linker (which, in turn
  requires GCC to be configured with --enable-gold).  To enable this
  feature, use the command line option -use-linker-plugin at
  link-time.  Eg:

    gcc -o myprog -O2 -flto -use-linker-plugin a.o b.o -lfoo

  With the linker plugin enabled, gold will extract the needed GIMPLE
  files from libfoo.a and pass them on to the running GCC to make them
  part of the aggregated GIMPLE image to be optimized.

  If you are not using gold and/or do not specify -use-linker-plugin
  then the objects inside libfoo.a will be extracted and linked as
  usual, but they will not participate in the LTO optimization
  process.

  Link time optimizations do not require the presence of the whole
  program to operate.  If the program does not require any symbols to
  be exported, it is possible to combine -flto with -fwhole-program to
  allow the interprocedural optimizers to use more aggressive
  assumptions which may lead to improved optimization opportunities.

  Regarding portability: the current implementation of LTO makes no
  attempt at generating bytecode that can be ported between different
  types of hosts.  The bytecode files are versioned and there is a
  strict version check, so bytecode files generated in one version of
  GCC will not work with an older/newer version of GCC.


  One problem with link time optimization is that it can require a lot
  of computer resources (memory and processing time).  For large
  programs this can be a problem.  One solution is to use the new
  -fwhopr command line option.  This option is identical in
  functionality to -flto but it differs in how the final link stage is
  executed.  Instead of loading all the function bodies in memory, the
  callgraph is analyzed and optimization decisions are made (whole
  program analysis or WPA).  Once optimization decisions are made, the
  callgraph is partitioned and the different sections are compiled
  separately (local transformations or LTRANS).

Cheers
  Nick

Comments

( 1 comment — Leave a comment )
zoltan0
Nov. 16th, 2009 01:06 pm (UTC)
Which gcc/binutils versions will have support for this feature? Which architectures are going to be supported, or is this feature platform independent?
( 1 comment — Leave a comment )

Profile

Deep Thought
nickclifton
nickclifton

Latest Month

March 2014
S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829
3031     

Page Summary

Powered by LiveJournal.com
Designed by chasethestars