nickclifton (nickclifton) wrote,
nickclifton
nickclifton

GNU Toolchain Update: September 2008

Hi Everyone,

  This is the first in what I hope will be a continuing series of blogs describing monthly changes in the GNU Toolchain (gcc, binutils, newlib and possibly gdb as well).  One of my jobs at Red Hat is to take the changes in the public versions of the toolchain sources and copy them into our internal repository.  I do this on a monthly basis and I produce a short report each time detailing what has happened.  One of my friends here suggested that people outside of Red Hat might be interested in these monthly reports and so that is why I have started this blog.

So here is the report for September 2008, suitably sanitized to remove any internal-only details:

  Whilst there has been very little change in the binutils this month,
  there has been a lot going on in gcc-land...

  Firstly a new register allocator has been added, with the promise
  that this will bring a performance boost to all of the ports.
  (Register allocation is one of the hardest tasks for any compiler,
  and GCC has had several different allocators during its history).

  The new allocator is also being used as a test to trim away any
  unmaintained ports.  Any port that has not been switched over to the
  new register allocator by the end of the month will be marked as
  deprecated and removed after the next release of gcc.

  The new allocator, called IRA or the Integrated Register Allocator
  brings some new command line options with, to control its behaviour:

    -fira-algorithm=<algorithm>
   
    Use specified algorithm for the integrated register allocator.
    The <algorithm> argument should be one of "regional", "CB",
    or "mixed".  The first algorithm can give the best result for
    machines with a small number of registers, the second one is
    faster and generates the smallest size code, but the third
    algorithm usually give the best results in most cases and for
    most architectures.  That is why it is the default.

    -fira-coalesce
    Do optimistic register coalescing.  This option might be
    profitable for architectures with big regular register sets.

 
  Secondly a new set of loop transformation optimizations has been
  added to GCC.  Supported by yet another internal representation,
  called "Graphite" this time, these transformations allow for some
  simple, but potentially very useful optimizations.  And the new
  internal representation should make it easier to add more loop
  optimizations in the future.  The new optimizations are:
   
    -floop-interchange
    Perform loop interchange transformations on loops.
    Interchanging two nested loops switches the inner and
    outer loops.  For example, given a loop like:

    DO J = 1, M
      DO I = 1, N
        A(J, I) = A(J, I) * C
      ENDDO
    ENDDO

    loop interchange will transform the loop as if the user
    had written:

    DO I = 1, N
      DO J = 1, M
        A(J, I) = A(J, I) * C
      ENDDO
    ENDDO

    which can be beneficial when N is larger than the data
    cache(s).  This example uses Fortran, where the arrays
    are stored by columns, not rows, so the first version is
    sub-optimal since it is accessing the data on a per-row
    basis.  The optimization itself is not restricted to
    Fortran however, and be useful in any language.

   
    -floop-strip-mine
    Perform loop strip mining transformations on loops.
    Strip mining splits a loop into two nested loops.
    The outer loop has strides equal to the strip size
    and the inner loop has strides of the original loop
    within a strip.  For example, given a loop like:

    DO I = 1, N
      A(I) = A(I) + C
    ENDDO

    loop strip mining will transform the loop as if the
    user had written:

    DO II = 1, N, 4
      DO I = II, min (II + 4, N)
        A(I) = A(I) + C
      ENDDO
    ENDDO


    -floop-block
    Perform loop blocking transformations on loops.
    Blocking strip mines each loop in the loop nest
    such that the memory accesses of the element loops
    fit inside caches.  For example, given a loop like:

    DO I = 1, N
      DO J = 1, M
        A(J, I) = B(I) + C(J)
      ENDDO
    ENDDO

    loop blocking will transform the loop as if the user
    had written:

    DO II = 1, N, 64
      DO JJ = 1, M, 64
        DO I = II, min (II + 63, N)
          DO J = JJ, min (JJ + 63, M)
            A(J, I) = B(I) + C(J)
          ENDDO
        ENDDO
      ENDDO
    ENDDO

    which can be beneficial when M is larger than the
    data cache(s), because the innermost loop will iterate
    over a smaller amount of data.

   
  But wait, there's more.  More new optimizations that is.  The
  following are not based on the Graphite representation or the new
  register allocator, but have been added separately:

    -fipa-cp-clone
    Perform function cloning to make interprocedural constant
    propagation stronger.  When enabled, externally visible
    functions that take constant arguments are cloned so that
    one version exists for each known set of possible arguments.
    This then allows more opportunities for constant propagation.
    Since this optimization can create multiple copies of
    functions, it may significantly increase code size and so it
    is only enabled by default at -O3.

   
    -fselective-scheduling
    Schedule instructions using selective scheduling algorithm.
    The selective instruction scheduler is an alternative to
    GCC's default scheduler which may produce better results in
    some cases.  This option enables the selective scheduler for
    the first instruction scheduling pass.

    -fselective-scheduling2
    This option enables the selective scheduler for the second
    instruction scheduling pass.

    -fsel-sched-pipelining
    Enable software pipelining of innermost loops during
    selective scheduling.  This option has no effect until one
    of -fselective-scheduling or -fselective-scheduling2 is
    turned on.

    -fsel-sched-pipelining-outer-loops
    When pipelining loops during selective scheduling, also

    pipeline outer loops.  This option has no effect until
    -fsel-sched-pipelining is turned on.


    -fprofile-correction
    Profiles collected using an instrumented binary for
    multi-threaded programs may be inconsistent due to missed
    counter updates. When this option is specified, GCC will
    use heuristics to correct or smooth out such inconsistencies.
    By default, GCC will emit an error message when an
    inconsistent profile is detected.

   
  Finally a new port has been partially added to GCC.  The "picoChip"
  port for the cpu created by picoChip Designs Ltd. http://www.picochip.com
  exists in the GCC sources, but not yet in the binutils sources.  So
  you can compile code for it, but not assemble or link this code.
Subscribe

  • October/November GNU Toolchain Update

    Hi Guys, Sorry for the delay beqwteen these updates. My new job is keeping me very busy... Anyway here are the highlights of the changes in the GNU…

  • September 2015 GNU Toolchain Update

    Hi Guys, There are lots of things to report in this month's update... * The G++ ABI has been increased to version 10. This adds mangling of…

  • July/Augist 2015 GNU Toolchain Update

    Hi Guys, Sorry for the delay in bringing you this update; I have been very busy in the last few months. Anyway the highlights of the changes to the…

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your IP address will be recorded 

  • 11 comments

  • October/November GNU Toolchain Update

    Hi Guys, Sorry for the delay beqwteen these updates. My new job is keeping me very busy... Anyway here are the highlights of the changes in the GNU…

  • September 2015 GNU Toolchain Update

    Hi Guys, There are lots of things to report in this month's update... * The G++ ABI has been increased to version 10. This adds mangling of…

  • July/Augist 2015 GNU Toolchain Update

    Hi Guys, Sorry for the delay in bringing you this update; I have been very busy in the last few months. Anyway the highlights of the changes to the…