Hi Guys,
There is not very much to report this month:
* The binutils now have support for the Motorola S12X and Freescale XGate architectures.
* GCC and the binutils now have support for the VLE extension to the PowerPC architecture.
* The readelf tool has a command line option to enable extra checks on the correctness of DWARF debug information:
--dwarf-check
* GCC has a new option to turn off warnings about the questionable use of varargs macros:
--Wno-vargargs
* G++ has a new library - libatomic - which contains code for implementing the __sync family of functions used by the C++11 __atomic operators. By default use of this library is enabled (for those targets that support atomic operations), but the old behaviour of using in-line code can be restored via the command line option:
-fno-sync-libcalls.
Cheers
Nick
There is not very much to report this month:
* The binutils now have support for the Motorola S12X and Freescale XGate architectures.
* GCC and the binutils now have support for the VLE extension to the PowerPC architecture.
* The readelf tool has a command line option to enable extra checks on the correctness of DWARF debug information:
--dwarf-check
* GCC has a new option to turn off warnings about the questionable use of varargs macros:
--Wno-vargargs
* G++ has a new library - libatomic - which contains code for implementing the __sync family of functions used by the C++11 __atomic operators. By default use of this library is enabled (for those targets that support atomic operations), but the old behaviour of using in-line code can be restored via the command line option:
-fno-sync-libcalls.
Cheers
Nick
Hi Guys,
There has been a lot of action in the GNU Toolchain over the last month. Here are the highlights:
* The linker and assembler now have support for generating Google Native Client binaries for the ARM, x86 and x86_64 architectures: See http://code.google.com/p/nativeclient/ for more information on this binary format.
* GCC's -pedantic command line option has now been renamed to -Wpedantic in line with all the other command line options that control warnings. The old -pedantic is still supported however.
* GCC diagnostic messages that display a line of source code will now also show a caret indicating the column where the problem was detected. Eg:
fred.cc:4:19: fatal error: foo: No such file or directory
#include <foo>
^
compilation terminated.
This behaviour can be turned off by using the -fno-diagnostics-show-caret command line option.
* The -fsched-pressure command line option has been extended to allow the selection of the algorithm to use when scheduling pressure sensitive instructions:
-fsched-pressure-algorithm=<weighted|mod el>
The default algorithm for -fsched-pressure is weighted but the new model algorithm can produce better results for some architectures (eg ARM). For full details of the new algorithm see: http://gcc.gnu.org/ml/gcc-patches/2011-1 2/msg01684.html. Note - currently this option is undocumented.
* By default GCC will now produce DWARF version 4 debug information (rather than version 2) when it is producing DWARF debug output. The old behaviour can be restored by -gdwarf=2.
* G++ will now issue warnings when compiling for the 2011 ISO C++ standard when a string or character literal is followed by a user defined suffix which does not begin with an underscore. For example:
#define BAR "bar"
#define _PLUS_ONE + 1
char s[] = "foo"BAR; // Warning: "invalid suffix on literal"
char c = '3'_PLUS_ONE; // No warning
The reason that underscore prefixed suffixes are allowed is that they represent user-defined suffixes. Suffixes without an underscore are language specific suffixes (eg U or F) and these should not be found after strings or character constants. User defined suffixes that start with an underscore are actually in use already, for example in <inttypes.h>:
#include <inttypes.h>
printf ("64-bit value is: %" __PRI64_PREFIX "d\n", foo);
The warning can be disabled with -Wno-literal-suffix.
* A new version of the C++ ABI has been introduced - version 7. In this version nullptr_t is treated a builtin type. The default ABI is still version 2 however.
* The H8300 backend has a new command line option: -mexr. This causes extended registers to be pushed onto the stack in monitor functions.
* The x86 and x86_64 backends have some new built-in functions which can be used to determine the type of CPU in use:
int __builtin_cpu_is (const char * cpuname)
int __builtin_cpu_supports (const char * feature)
The names currently recognised by __builtin_cpu_is () are as follows:
intel, atom, core2, corei7, nehalem, westmere, sandybridge, amd, amdfam10h, barcelona, shanghai, istanbul, amdfam15h, bdver1, bdver2
So for example:
if (__builtin_cpu_is ("corei7"))
do_corei7 ();
else
do_generic ();
Maenwhile __builtin_cpu_supports() recognises these strings:
cmov, mmx, popcnt, sse, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2
Eg:
if (__builtin_cpu_supports ("popcnt"))
asm ("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
else
count = generic_countbits (n);
If these built-in functions are going to be used in an ifunc resolver then an init function has to be run first. For example:
static void (*resolve_memcpy (void)) (void)
{
/* ifunc resolvers fire before constructors, therefore we must explicitly call the init function. */
__builtin_cpu_init ();
if (__builtin_cpu_supports ("ssse3"))
return ssse3_memcpy;
return default_memcpy;
}
void * memcpy (void *, const void *, size_t) __attribute__ ((ifunc ("resolve_memcpy")));
Cheers
Nick
There has been a lot of action in the GNU Toolchain over the last month. Here are the highlights:
* The linker and assembler now have support for generating Google Native Client binaries for the ARM, x86 and x86_64 architectures: See http://code.google.com/p/nativeclient/ for more information on this binary format.
* GCC's -pedantic command line option has now been renamed to -Wpedantic in line with all the other command line options that control warnings. The old -pedantic is still supported however.
* GCC diagnostic messages that display a line of source code will now also show a caret indicating the column where the problem was detected. Eg:
fred.cc:4:19: fatal error: foo: No such file or directory
#include <foo>
^
compilation terminated.
This behaviour can be turned off by using the -fno-diagnostics-show-caret command line option.
* The -fsched-pressure command line option has been extended to allow the selection of the algorithm to use when scheduling pressure sensitive instructions:
-fsched-pressure-algorithm=<weighted|mod
The default algorithm for -fsched-pressure is weighted but the new model algorithm can produce better results for some architectures (eg ARM). For full details of the new algorithm see: http://gcc.gnu.org/ml/gcc-patches/2011-1
* By default GCC will now produce DWARF version 4 debug information (rather than version 2) when it is producing DWARF debug output. The old behaviour can be restored by -gdwarf=2.
* G++ will now issue warnings when compiling for the 2011 ISO C++ standard when a string or character literal is followed by a user defined suffix which does not begin with an underscore. For example:
#define BAR "bar"
#define _PLUS_ONE + 1
char s[] = "foo"BAR; // Warning: "invalid suffix on literal"
char c = '3'_PLUS_ONE; // No warning
The reason that underscore prefixed suffixes are allowed is that they represent user-defined suffixes. Suffixes without an underscore are language specific suffixes (eg U or F) and these should not be found after strings or character constants. User defined suffixes that start with an underscore are actually in use already, for example in <inttypes.h>:
#include <inttypes.h>
printf ("64-bit value is: %" __PRI64_PREFIX "d\n", foo);
The warning can be disabled with -Wno-literal-suffix.
* A new version of the C++ ABI has been introduced - version 7. In this version nullptr_t is treated a builtin type. The default ABI is still version 2 however.
* The H8300 backend has a new command line option: -mexr. This causes extended registers to be pushed onto the stack in monitor functions.
* The x86 and x86_64 backends have some new built-in functions which can be used to determine the type of CPU in use:
int __builtin_cpu_is (const char * cpuname)
int __builtin_cpu_supports (const char * feature)
The names currently recognised by __builtin_cpu_is () are as follows:
intel, atom, core2, corei7, nehalem, westmere, sandybridge, amd, amdfam10h, barcelona, shanghai, istanbul, amdfam15h, bdver1, bdver2
So for example:
if (__builtin_cpu_is ("corei7"))
do_corei7 ();
else
do_generic ();
Maenwhile __builtin_cpu_supports() recognises these strings:
cmov, mmx, popcnt, sse, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2
Eg:
if (__builtin_cpu_supports ("popcnt"))
asm ("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc");
else
count = generic_countbits (n);
If these built-in functions are going to be used in an ifunc resolver then an init function has to be run first. For example:
static void (*resolve_memcpy (void)) (void)
{
/* ifunc resolvers fire before constructors, therefore we must explicitly call the init function. */
__builtin_cpu_init ();
if (__builtin_cpu_supports ("ssse3"))
return ssse3_memcpy;
return default_memcpy;
}
void * memcpy (void *, const void *, size_t) __attribute__ ((ifunc ("resolve_memcpy")));
Cheers
Nick
Hi Guys,
Well quite a lot has happened in the last month. First off, GCC 4.7 is now officially out. (So are GCC 4.4.7 and 4.6.3 if these versions matter to you).
Next the assembler has had a new feature added to support grouping together a sequence of instructions into a bundle:
.bundle_align_mode <abs-expr>
.bundle_align_mode enables or disables aligned instruction
bundle mode. In this mode, sequences of adjacent instructions
are grouped into fixed-sized bundles. If the argument is zero,
this mode is disabled (which is the default state). If the
argument it not zero, it gives the size of an instruction bundle
as a power of two.
For some targets, it's an ABI requirement that no instruction
may span a certain aligned boundary. A bundle is thus a
sequence of instructions that starts on an aligned boundary.
When bundle_align_mode is in effect, no single instruction may
span a boundary between bundles. If an instruction would start
too close to the end of a bundle to meet this requirement then
the space at the end of that bundle is filled with no-ops and
the instruction is placed at the start of the next bundle. As
a corollary, it's an error if any single instruction's encoding
is longer than the bundle size.
.bundle_lock
.bundle_unlock
For some targets, it's an ABI requirement that certain
instructions may appear only as part of specified permissible
sequences of multiple instructions, all within the same bundle.
A pair of .bundle_lock and .bundle_unlock directives define a
locked instruction sequence. For purposes of bundle_align_mode
a sequence starting with .bundle_lock and ending with
.bundle_unlock is treated as a single instruction.
In GCC land a new warning option has been added:
-Wuseless-cast
Warn when an expression is casted to its own type.
As well as a G++ option to limit the number of messages generated when issuing diagnostics about templates:
-ftemplate-backtrace-limit=<n>
Set the maximum number of template instantiation
notes for a single warning or error to <n>. The default
value is 10.
Also GCC has now started to add (experimental) support for the forthcoming ISO 2017 C++ standard:
-std=c++1y
Conform to the ISO 201y(7?) C++ draft standard.
-std=gnu++1y
Conform to the ISO 201y(7?) C++ draft standard with GNU extensions.
Finally some older ARM targets have now been marked as obsolete. Support for these targets will be entirely removed after the next release of GCC:
arm*-*-ecos-elf
arm*-*-elf
arm*-*-freebsd*
arm*-*-linux*
arm*-*-rtems*
arm*-*-uclinux*
arm*-wince-pe*
(Note before any ARM Linux people panic, the official target for ARM Linux toolchains is now arm-*-linux-gnueabi or arm-*-uclinux-gnueabi. These particular configurations have not been deprecated). Also the --with-fpu=fpa, --with-fpu=fpe2, --with-fpu=fpe3 and --with-fpu=maverick configuration options for ARM toolchains have now been deprecated as well.
Cheers
Nick
Well quite a lot has happened in the last month. First off, GCC 4.7 is now officially out. (So are GCC 4.4.7 and 4.6.3 if these versions matter to you).
Next the assembler has had a new feature added to support grouping together a sequence of instructions into a bundle:
.bundle_align_mode <abs-expr>
.bundle_align_mode enables or disables aligned instruction
bundle mode. In this mode, sequences of adjacent instructions
are grouped into fixed-sized bundles. If the argument is zero,
this mode is disabled (which is the default state). If the
argument it not zero, it gives the size of an instruction bundle
as a power of two.
For some targets, it's an ABI requirement that no instruction
may span a certain aligned boundary. A bundle is thus a
sequence of instructions that starts on an aligned boundary.
When bundle_align_mode is in effect, no single instruction may
span a boundary between bundles. If an instruction would start
too close to the end of a bundle to meet this requirement then
the space at the end of that bundle is filled with no-ops and
the instruction is placed at the start of the next bundle. As
a corollary, it's an error if any single instruction's encoding
is longer than the bundle size.
.bundle_lock
.bundle_unlock
For some targets, it's an ABI requirement that certain
instructions may appear only as part of specified permissible
sequences of multiple instructions, all within the same bundle.
A pair of .bundle_lock and .bundle_unlock directives define a
locked instruction sequence. For purposes of bundle_align_mode
a sequence starting with .bundle_lock and ending with
.bundle_unlock is treated as a single instruction.
In GCC land a new warning option has been added:
-Wuseless-cast
Warn when an expression is casted to its own type.
As well as a G++ option to limit the number of messages generated when issuing diagnostics about templates:
-ftemplate-backtrace-limit=<n>
Set the maximum number of template instantiation
notes for a single warning or error to <n>. The default
value is 10.
Also GCC has now started to add (experimental) support for the forthcoming ISO 2017 C++ standard:
-std=c++1y
Conform to the ISO 201y(7?) C++ draft standard.
-std=gnu++1y
Conform to the ISO 201y(7?) C++ draft standard with GNU extensions.
Finally some older ARM targets have now been marked as obsolete. Support for these targets will be entirely removed after the next release of GCC:
arm*-*-ecos-elf
arm*-*-elf
arm*-*-freebsd*
arm*-*-linux*
arm*-*-rtems*
arm*-*-uclinux*
arm*-wince-pe*
(Note before any ARM Linux people panic, the official target for ARM Linux toolchains is now arm-*-linux-gnueabi or arm-*-uclinux-gnueabi. These particular configurations have not been deprecated). Also the --with-fpu=fpa, --with-fpu=fpe2, --with-fpu=fpe3 and --with-fpu=maverick configuration options for ARM toolchains have now been deprecated as well.
Cheers
Nick
Hi Guys,
The GCC sources are currently frozen pending the creation of the 4.7 branch. One of the features of the 4.7 release, when it happens, will be a new (optional) C++ ABI. Controlled via the:
-fabi-version=<N>
g++ command line option, the 4.7 branch will support a new value for <N> of 6:
6: The version of the ABI that doesn't promote scoped
enums to int and changes the mangling of template
argument packs, const/static_cast, prefix ++ and --,
and a class scope function used as a template
argument.
The default value for <N> is still 2, which selects the ABI that was first introduced with G++ 3.4.
Meanwhile in binutiils lands the archiver program (ar) has seen a few changes recently. Most notably there is now a new configure options for the binutils package which changes the default baheviour of ar once it has been built:
--enable-deterministic-archives
This makes ar (and ranlib) default to enabling the -D command line option, which stops them from recording timestamps, user ids and group ids in the archives. This means that two archives built at different times or by different users, but containing the same elements, would then compare to be the same.
For backwards compatibility the --enable-determinisitic-archives configure time option is not enabled by default. The --help run-time option will inform the user as to whether deterministic behavious has been enabled by default, and the new -U command line option can be used to restore undeterministic behaviour if necessary.
Other changes to ar include the fact that it can now correctly handle archives that are bigger than 2Gb in size, including archives that contain individual members that are bigger than 2Gb. In addition ar can now handle nested archives, ie archives which contain other archives.
Cheers
Nick
The GCC sources are currently frozen pending the creation of the 4.7 branch. One of the features of the 4.7 release, when it happens, will be a new (optional) C++ ABI. Controlled via the:
-fabi-version=<N>
g++ command line option, the 4.7 branch will support a new value for <N> of 6:
6: The version of the ABI that doesn't promote scoped
enums to int and changes the mangling of template
argument packs, const/static_cast, prefix ++ and --,
and a class scope function used as a template
argument.
The default value for <N> is still 2, which selects the ABI that was first introduced with G++ 3.4.
Meanwhile in binutiils lands the archiver program (ar) has seen a few changes recently. Most notably there is now a new configure options for the binutils package which changes the default baheviour of ar once it has been built:
--enable-deterministic-archives
This makes ar (and ranlib) default to enabling the -D command line option, which stops them from recording timestamps, user ids and group ids in the archives. This means that two archives built at different times or by different users, but containing the same elements, would then compare to be the same.
For backwards compatibility the --enable-determinisitic-archives configure time option is not enabled by default. The --help run-time option will inform the user as to whether deterministic behavious has been enabled by default, and the new -U command line option can be used to restore undeterministic behaviour if necessary.
Other changes to ar include the fact that it can now correctly handle archives that are bigger than 2Gb in size, including archives that contain individual members that are bigger than 2Gb. In addition ar can now handle nested archives, ie archives which contain other archives.
Cheers
Nick
Hi Guys,
There is not very much to report this month. The GCC mainline sources are in stage 3 (bugfixes only, no new features) , so nothing has happened there. The 4.7 branch will probably be created soon.
The Binutils and Newlib source now have support for PowerPC targeted FreeBSD ports.
The Newlib sources are also preparing for their annual release snapshot, so various tidying up and bug fixing is going on there.
The GDB sources have a new 7.4 branch in preparation for a release at the start of next year. Prerelease snapshots are available at:
ftp://sourceware.org/pub/gdb/snapshots/b ranch/gdb.tar.bz2
That's all folks.
Cheers
Nick
There is not very much to report this month. The GCC mainline sources are in stage 3 (bugfixes only, no new features) , so nothing has happened there. The 4.7 branch will probably be created soon.
The Binutils and Newlib source now have support for PowerPC targeted FreeBSD ports.
The Newlib sources are also preparing for their annual release snapshot, so various tidying up and bug fixing is going on there.
The GDB sources have a new 7.4 branch in preparation for a release at the start of next year. Prerelease snapshots are available at:
ftp://sourceware.org/pub/gdb/snapshots/b
That's all folks.
Cheers
Nick
Hi Guys,
There is a lot to report this month:
* GCC now has support for transactional memory. It is enabled via a new command line option: -fgnu-tm, and it is currently only supported on x86 architectures. (This may change in the future).
The support implements and tracks the Linux variant of Intel's Transactional Memory ABI specification document. Currently this is at revision 1.1, (May 6 2009). For more information see:
http://software.intel.com/en-us/articles/i ntel-c-stm-compiler-prototype-edition/
This is potentially a very important new feature. Transactional memory support can lead to faster and less buggy multi-threaded programs, and could especially help speed up operating system kernels.
* Two new ports have been contributed to the GCC and BINUTILS projects:
- The Adapteva Epiphany processor.
- The Renesas RL78 processor.
* The binutils 2.22 release will be happening next week. In the meantime development has continued on the mainline sources and in particular a change has been made to the ARM port of GAS:
Previously GAS would generate the (deprecated) R_ARM_PLT32 relocation for branches and function calls that use the PLT table. After 2.22 GAS will now generate either the R_ARM_CALL or R_ARM_JUMP24 relocations, if the target ARM architecture supports them.
* A couple of new options have been added to the CPP library which can be accessed from the gcc command line:
-fdebug-cpp This option is for debugging GCC and the preprocessor. It must be used with -E and it dumps debugging information about the location maps for each token in the -E output. The information dumped is as follows:
P:</file/path>;
F:</includer/path>;
L:<line_num>;
C:<col_num>;
S:<system_header?>;
M:<map_address>;
E:<macro_expansion?>,
loc:<location>
-ftrack-macro-expansion[=<level>] Tracks the locations of tokens across macro expansions. This allows the compiler to emit diagnostics about the current macro expansion stack when a compilation error occurs in a macro expansion.
Using this option makes the preprocessor and the compiler consume more memory. The <level> parameter can be used to choose the level of precision of token location tracking thus decreasing the memory consumption if necessary. A level of 0 de-activates the option. A level of 1 tracks tokens locations in a degraded mode for the sake of minimal memory overhead. In this mode all tokens resulting from the expansion of an argument of a function-like macro have the same location. A level of 2 tracks tokens locations completely. This is the default.
* A new warning option has been added to GCC:
-Wzero-as-null-pointer-constant
This issues a Warning when a literal '0' is used as null pointer constant. This can be useful to facilitate the conversion to using nullptr in C++11.
Personally I am really glad to see this option as it really bugs me when 0 is used instead of NULL (or nullptr or whatever). It does work, but it rankles. Many years ago I worked on a compiler where the NULL pointer was not equal to 0 (and in fact address 0 contained valid memory that could be used by the program). That taught me to distinguish between 0 and NULL, and to see the two misused today just ticks me off.
* Several new target specific options have been added as well. Here are a few that might be of interest:
The ARM compiler now supports -mcpu=native, -mtune=native and-march=native. These work in a similar way to the normal -mcpu=, -mtune= and -march= options, except that the compiler will try to auto-detect the CPU of the build machine. At present, this feature is only supported on Linux, and not all architectures are recognised. If the auto-detect is unsuccessful the option has no effect.
The I386 compiler now supports "btver1" and "bdver1" as arguments to the -march= command line option. "btver1" is for AMD Family 14h cores and "bdver1" is for AMD 15h cores.
* The upcoming ISO C++0x standard has been renamed to ISO C++11, and so all of the g++ options that used to refer to c++0x now use c++11. (The old names are still supported for backwards compatibility, but will not be shown in --help output, etc).
The C++11 draft is still experimental, and may change in incompatible ways in future releases.
* Part of the support for the ISO C++11 standard includes support for its memory models. This has lead to the creation of a couple of new command line options:
-Winvalid-memory-model This issues a warning when an atomic memory model parameter is known to be outside the valid range.
-finline-atomics Inline __atomic operations when a lock free instruction sequence is available.
Plus a new set of builtin functions with the __atomic prefix. These are similar to the __sync prefixed builtins which already exist in GCC, but they also take a memory model parameter.
For more information see:
http://gcc.gnu.org/wiki/Atomic/GCCMM/Ato micSync,GCC wiki on atomic synchronization
Cheers
Nick
There is a lot to report this month:
* GCC now has support for transactional memory. It is enabled via a new command line option: -fgnu-tm, and it is currently only supported on x86 architectures. (This may change in the future).
The support implements and tracks the Linux variant of Intel's Transactional Memory ABI specification document. Currently this is at revision 1.1, (May 6 2009). For more information see:
http://software.intel.com/en-us/articles/i
This is potentially a very important new feature. Transactional memory support can lead to faster and less buggy multi-threaded programs, and could especially help speed up operating system kernels.
* Two new ports have been contributed to the GCC and BINUTILS projects:
- The Adapteva Epiphany processor.
- The Renesas RL78 processor.
* The binutils 2.22 release will be happening next week. In the meantime development has continued on the mainline sources and in particular a change has been made to the ARM port of GAS:
Previously GAS would generate the (deprecated) R_ARM_PLT32 relocation for branches and function calls that use the PLT table. After 2.22 GAS will now generate either the R_ARM_CALL or R_ARM_JUMP24 relocations, if the target ARM architecture supports them.
* A couple of new options have been added to the CPP library which can be accessed from the gcc command line:
-fdebug-cpp This option is for debugging GCC and the preprocessor. It must be used with -E and it dumps debugging information about the location maps for each token in the -E output. The information dumped is as follows:
P:</file/path>;
F:</includer/path>;
L:<line_num>;
C:<col_num>;
S:<system_header?>;
M:<map_address>;
E:<macro_expansion?>,
loc:<location>
-ftrack-macro-expansion[=<level>] Tracks the locations of tokens across macro expansions. This allows the compiler to emit diagnostics about the current macro expansion stack when a compilation error occurs in a macro expansion.
Using this option makes the preprocessor and the compiler consume more memory. The <level> parameter can be used to choose the level of precision of token location tracking thus decreasing the memory consumption if necessary. A level of 0 de-activates the option. A level of 1 tracks tokens locations in a degraded mode for the sake of minimal memory overhead. In this mode all tokens resulting from the expansion of an argument of a function-like macro have the same location. A level of 2 tracks tokens locations completely. This is the default.
* A new warning option has been added to GCC:
-Wzero-as-null-pointer-constant
This issues a Warning when a literal '0' is used as null pointer constant. This can be useful to facilitate the conversion to using nullptr in C++11.
Personally I am really glad to see this option as it really bugs me when 0 is used instead of NULL (or nullptr or whatever). It does work, but it rankles. Many years ago I worked on a compiler where the NULL pointer was not equal to 0 (and in fact address 0 contained valid memory that could be used by the program). That taught me to distinguish between 0 and NULL, and to see the two misused today just ticks me off.
* Several new target specific options have been added as well. Here are a few that might be of interest:
The ARM compiler now supports -mcpu=native, -mtune=native and-march=native. These work in a similar way to the normal -mcpu=, -mtune= and -march= options, except that the compiler will try to auto-detect the CPU of the build machine. At present, this feature is only supported on Linux, and not all architectures are recognised. If the auto-detect is unsuccessful the option has no effect.
The I386 compiler now supports "btver1" and "bdver1" as arguments to the -march= command line option. "btver1" is for AMD Family 14h cores and "bdver1" is for AMD 15h cores.
* The upcoming ISO C++0x standard has been renamed to ISO C++11, and so all of the g++ options that used to refer to c++0x now use c++11. (The old names are still supported for backwards compatibility, but will not be shown in --help output, etc).
The C++11 draft is still experimental, and may change in incompatible ways in future releases.
* Part of the support for the ISO C++11 standard includes support for its memory models. This has lead to the creation of a couple of new command line options:
-Winvalid-memory-model This issues a warning when an atomic memory model parameter is known to be outside the valid range.
-finline-atomics Inline __atomic operations when a lock free instruction sequence is available.
Plus a new set of builtin functions with the __atomic prefix. These are similar to the __sync prefixed builtins which already exist in GCC, but they also take a memory model parameter.
For more information see:
http://gcc.gnu.org/wiki/Atomic/GCCMM/Ato
Cheers
Nick
Hi Guys,
Quite a lot of things have happened in the last month. Here are the highlights:
* Support has been added for the Tilera TILEPRO and TILE-Gx architectures to the binutils.
* Readelf can now decode Sparc hardware attributes.
* The binutils 2.22 branch has been created, so a new release should be out soon.
* GCC now supports vector comparison with the standard C comparison operators: ==, !=, <, <=, >, >=. Comparison operands can be vector expressions of integer-type or real-type. Comparison between integer-type vectors and real-type vectors is not supported. The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type.
Vectors are compared element-wise producing 0 when comparison is false and -1 (constant of the appropriate type where all bits are set) otherwise. Consider the following example:
typedef int v4si __attribute__ ((vector_size (16)));
v4si a = {1, 2, 3, 4};
v4si b = {3, 2, 1, 4};
v4si c;
c = a > b; /* The result would be {0, 0,-1, 0} */
c = a == b; /* The result would be {0,-1, 0,-1} */
* GCC now supports vector shuffling using two builtin functions:
__builtin_shuffle (vec, mask)
__builtin_shuffle (vec0, vec1, mask)
The functions construct an output vector built from selected elements from either one or two input vectors. The output vector is of the always of the same type as the input vector(s).
The mask is an integral vector with the same width and element count as the output vector. Each element in the mask specifies which element from the input vector(s) should be selected for the corresponding position in the output vectors. Numbering starts at 0 and is computed modulo the length of the input vector(s). For example:
typedef int v4si __attribute__ ((vector_size (16)));
v4si a = {1, 2, 3, 4};
v4si b = {5, 6, 7, 8};
v4si mask1 = {0, 1, 1, 3};
v4si mask2 = {0, 4, 2, 8};
v4si res;
res = __builtin_shuffle (a, mask1); /* res is {1,2,2,4} */
res = __builtin_shuffle (a, b, mask2); /* res is {1,5,3,1} */
* GCC has a new, somewhat useless feature for the C language:
-fallow-parameterless-variadic-functions
This allows variadic functions without named parameters. Although it is possible to define such a function, it is not very useful as it is not possible to read the arguments. This is only supported for C as this construct is allowed by C++.
* A couple of new warnings have been added as well:
-Wunused-local-typedefs
Warns when a typedef locally defined in a function is not used.
-Wvector-operation-performance
Warns if vector operation is not implemented via SIMD capabilities of the architecture. Mainly useful for the performance tuning.
* Four new optimizations have been added to GCC as well:
-fno-fat-lto-objects
Fat LTO objects are object files that contain both the intermediate language and the object code. This makes them usable for both LTO linking and normal linking, and is the default when -flto is used. -fno-fat-lto-objects improves compilation time over plain LTO by not storing the object code in the object files, but it requires that the complete toolchain to be aware of LTO. This means that the linker must have plugin support as a minimum. Additionally, nm, ar and ranlib need to support linker plugins in order to allow a full-featured build environment (capable of building static libraries etc).
-foptimize-strlen
Enables string length optimizations. It attempts to track string lengths and optimize various standard C string functions like strlen(), strchr(), strcpy(), strcat(), stpcpy() into faster alternatives. This pass is enabled by default at -O2 and above, unless optimizing for size. This optimization can for example change:
char *
append_slash (const char * a)
{
size_t l = strlen (a) + 2;
char * p = malloc (l);
if (p == NULL)
return p;
strcpy (p, a);
strcat (p, "/");
return p;
}
into:
char *
append_slash (const char * a)
{
size_t tmp = strlen (a);
char * p = malloc (tmp + 2);
if (p == NULL)
return p;
memcpy (p, a, tmp);
memcpy (p + tmp, "/", 2);
return p;
}
The next optimization will be especially useful for improving the scores in synthetic benchmarks like dhrystone or coremark:
-fshrink-wrap
This makes GCC emit function prologues only before parts of the function that need it, rather than at the top of the function. This feature is enabled by default at -O and higher. For example in a function like this:
extern int bar (int *, int);
int
foo (int arg)
{
if (arg)
return arg * 2;
else
{
int array[4] = {1,2,3,4};
return bar (array, arg);
}
}
A stack frame is only needed if arg is zero. Otherwise foo() can act just like a leaf function, and no stack space, function prologues or epilogues are needed.
Lastly there is:
-ftree-tail-merge
This looks for identical code sequences at the end of functions. When found it replaces one with a jump to the other. This optimization is enabled by default at -O2 and higher.
* Some target specific GCC features have been added as well:
-mtune=generic-<arch> [For x86 targets]
This specifies that GCC should tune the performance for a blend of processors within architecture <arch>. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs.
-mpid [For the RX target]
This enables the generation of position independent data (but not code). When enabled any access to constant data will done via an offset from a base address held in a register. This allows the location of constant data to be determined at run-time without requiring the executable to be relocated, which is a benefit to embedded applications with tight memory constraints. Data that can be modified is not affected by this option.
-munaligned-access [For ARM targets]
Enable unaligned word and halfword accesses to packed data. This is enabled by default for all ARMv6, ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors, and disabled for other ARM architectures.
Cheers
Nick
Quite a lot of things have happened in the last month. Here are the highlights:
* Support has been added for the Tilera TILEPRO and TILE-Gx architectures to the binutils.
* Readelf can now decode Sparc hardware attributes.
* The binutils 2.22 branch has been created, so a new release should be out soon.
* GCC now supports vector comparison with the standard C comparison operators: ==, !=, <, <=, >, >=. Comparison operands can be vector expressions of integer-type or real-type. Comparison between integer-type vectors and real-type vectors is not supported. The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type.
Vectors are compared element-wise producing 0 when comparison is false and -1 (constant of the appropriate type where all bits are set) otherwise. Consider the following example:
typedef int v4si __attribute__ ((vector_size (16)));
v4si a = {1, 2, 3, 4};
v4si b = {3, 2, 1, 4};
v4si c;
c = a > b; /* The result would be {0, 0,-1, 0} */
c = a == b; /* The result would be {0,-1, 0,-1} */
* GCC now supports vector shuffling using two builtin functions:
__builtin_shuffle (vec, mask)
__builtin_shuffle (vec0, vec1, mask)
The functions construct an output vector built from selected elements from either one or two input vectors. The output vector is of the always of the same type as the input vector(s).
The mask is an integral vector with the same width and element count as the output vector. Each element in the mask specifies which element from the input vector(s) should be selected for the corresponding position in the output vectors. Numbering starts at 0 and is computed modulo the length of the input vector(s). For example:
typedef int v4si __attribute__ ((vector_size (16)));
v4si a = {1, 2, 3, 4};
v4si b = {5, 6, 7, 8};
v4si mask1 = {0, 1, 1, 3};
v4si mask2 = {0, 4, 2, 8};
v4si res;
res = __builtin_shuffle (a, mask1); /* res is {1,2,2,4} */
res = __builtin_shuffle (a, b, mask2); /* res is {1,5,3,1} */
* GCC has a new, somewhat useless feature for the C language:
-fallow-parameterless-variadic-functions
This allows variadic functions without named parameters. Although it is possible to define such a function, it is not very useful as it is not possible to read the arguments. This is only supported for C as this construct is allowed by C++.
* A couple of new warnings have been added as well:
-Wunused-local-typedefs
Warns when a typedef locally defined in a function is not used.
-Wvector-operation-performance
Warns if vector operation is not implemented via SIMD capabilities of the architecture. Mainly useful for the performance tuning.
* Four new optimizations have been added to GCC as well:
-fno-fat-lto-objects
Fat LTO objects are object files that contain both the intermediate language and the object code. This makes them usable for both LTO linking and normal linking, and is the default when -flto is used. -fno-fat-lto-objects improves compilation time over plain LTO by not storing the object code in the object files, but it requires that the complete toolchain to be aware of LTO. This means that the linker must have plugin support as a minimum. Additionally, nm, ar and ranlib need to support linker plugins in order to allow a full-featured build environment (capable of building static libraries etc).
-foptimize-strlen
Enables string length optimizations. It attempts to track string lengths and optimize various standard C string functions like strlen(), strchr(), strcpy(), strcat(), stpcpy() into faster alternatives. This pass is enabled by default at -O2 and above, unless optimizing for size. This optimization can for example change:
char *
append_slash (const char * a)
{
size_t l = strlen (a) + 2;
char * p = malloc (l);
if (p == NULL)
return p;
strcpy (p, a);
strcat (p, "/");
return p;
}
into:
char *
append_slash (const char * a)
{
size_t tmp = strlen (a);
char * p = malloc (tmp + 2);
if (p == NULL)
return p;
memcpy (p, a, tmp);
memcpy (p + tmp, "/", 2);
return p;
}
The next optimization will be especially useful for improving the scores in synthetic benchmarks like dhrystone or coremark:
-fshrink-wrap
This makes GCC emit function prologues only before parts of the function that need it, rather than at the top of the function. This feature is enabled by default at -O and higher. For example in a function like this:
extern int bar (int *, int);
int
foo (int arg)
{
if (arg)
return arg * 2;
else
{
int array[4] = {1,2,3,4};
return bar (array, arg);
}
}
A stack frame is only needed if arg is zero. Otherwise foo() can act just like a leaf function, and no stack space, function prologues or epilogues are needed.
Lastly there is:
-ftree-tail-merge
This looks for identical code sequences at the end of functions. When found it replaces one with a jump to the other. This optimization is enabled by default at -O2 and higher.
* Some target specific GCC features have been added as well:
-mtune=generic-<arch> [For x86 targets]
This specifies that GCC should tune the performance for a blend of processors within architecture <arch>. The aim is to generate code that run well on the current most popular processors, balancing between optimizations that benefit some CPUs in the range, and avoiding performance pitfalls of other CPUs.
-mpid [For the RX target]
This enables the generation of position independent data (but not code). When enabled any access to constant data will done via an offset from a base address held in a register. This allows the location of constant data to be determined at run-time without requiring the executable to be relocated, which is a benefit to embedded applications with tight memory constraints. Data that can be modified is not affected by this option.
-munaligned-access [For ARM targets]
Enable unaligned word and halfword accesses to packed data. This is enabled by default for all ARMv6, ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors, and disabled for other ARM architectures.
Cheers
Nick
Hi Guys,
It is an early post this month, because I am off on vacation next week. Nothing of interest has happened with the GCC sources, but there have been a couple of things with the BINUTILS project:
* The linker now defaults to not copying DT_NEEDED entries from shared libraries mentioned on the command line. This matches the behaviour of other linkers and the latest Fedora and Ubuntu releases. The effect of this change is that if you are building a project you now have to specify all of the shared libraries that it needs on the linker command line. You cannot rely upon some shared libraries pulling in others for you. In general this is a good thing. It means that you have to be clear about your project's dependencies and there are no more hidden requirements for other shared libraries.
The old behaviour can be restored with the --copy-dt-needed-entries command line option.
* It was discovered that the Binutils and GDB releases for the last few years have been in violation of the GPL. This was because they did not supply all of the sources necessary to rebuild themselves. In particular some of the cpu input files to the cgen tool were missing, so the FR30, IP2K, MEP, OPENRISC and XSTORMY16 files in the opcodes directory could not be rebuilt.
RMS is working on a press release which will grant a special exception to the GPL to anyone who is using one of the affected tarballs. In the meantime new tarballs have been uploaded to the FSF FTP repository with the missing sources added. The new tarballs have an 'a' suffix to their name, but otherwise behave in exactly the same way as the tarballs they replace. So for example the latest 2.21 binutils release tarball is now:
binutils-2.21.1a.tar.bz2
Cheers
Nick
It is an early post this month, because I am off on vacation next week. Nothing of interest has happened with the GCC sources, but there have been a couple of things with the BINUTILS project:
* The linker now defaults to not copying DT_NEEDED entries from shared libraries mentioned on the command line. This matches the behaviour of other linkers and the latest Fedora and Ubuntu releases. The effect of this change is that if you are building a project you now have to specify all of the shared libraries that it needs on the linker command line. You cannot rely upon some shared libraries pulling in others for you. In general this is a good thing. It means that you have to be clear about your project's dependencies and there are no more hidden requirements for other shared libraries.
The old behaviour can be restored with the --copy-dt-needed-entries command line option.
* It was discovered that the Binutils and GDB releases for the last few years have been in violation of the GPL. This was because they did not supply all of the sources necessary to rebuild themselves. In particular some of the cpu input files to the cgen tool were missing, so the FR30, IP2K, MEP, OPENRISC and XSTORMY16 files in the opcodes directory could not be rebuilt.
RMS is working on a press release which will grant a special exception to the GPL to anyone who is using one of the affected tarballs. In the meantime new tarballs have been uploaded to the FSF FTP repository with the missing sources added. The new tarballs have an 'a' suffix to their name, but otherwise behave in exactly the same way as the tarballs they replace. So for example the latest 2.21 binutils release tarball is now:
binutils-2.21.1a.tar.bz2
Cheers
Nick
Hi Guys,
Here are the highlights of this month's merge:
* A couple of values for the OS/ABI field in ELF file headers have
been changed. ELFOSABI_LINUX has been renamed to ELFOSABI_GNU,
and ELFOSABI_HURD has been dropped completely.
* The linker now supports a "--print-output-format" command line
option to print the name of the default output format (perhaps
influenced by other command-line options).
* Support for the Texas Instruments C6X series of processors has
been contributed to GCC.
* A new command line option "-grecord-gcc-switches" has been added
to gcc. This option records the command line that was used to
invoke gcc to be appended to the DW_AT_producer attribute in the
DWARF debugging information. This option is similar to
"-frecord-gcc-switches" except that it records the information in
a debug section rather than a note section.
* A new C++ warning option "-Wno-narrowing" has been added that
suppresses the diagnostic required by the standard for narrowing
conversions within {}, e.g.
int i = { 2.2 }; // error: narrowing from double to int
This can be useful for compiling valid C++98 code in C++0x mode.
* A new builtin function has been added:
void * __builtin_assume_aligned (const void * exp, size_t align)
This just returns its first argument, but it allows the compiler
to assume that the returned pointer is aligned to at least
"align" bytes. This can allow optimizations based on loading and
storing data through the pointer a word at a time.
Cheers
Nick
Hi Everyone,
There are several new developments to report this month:
* Support has been added to the Binutils for the Tilera TileGx and TilePRO processors: http://www.tilera.com/products/processor s
* The linker can now generate stack unwinding information for internally generated sections, such as PLT sections.
* G++ has a new warning option: -Wdelete-non-virtual-dtork. When enabled G++ will issue a warning if delete is used to destroy an instance of a class which has virtual functions and non-virtual destructor. It is unsafe to delete an instance of a derived class through a pointer to a base class if the base class does not have a virtual destructor. This warning is enabled by default with -Wall.
* GCC has a new warning option: -Wstack-usage=<len>. Using this option makes GCC issue a warning if the stack usage of a function might be larger than <len> bytes. This includes the space allocated by invocations of alloca() as well as any variable length arrays or such like. The warning has to be specifically enabled.
* GCC now has a set of options to enable or disable individual optimization passes:
-fdisable-ipa-<pass>
-fdisable-rtl-<pass>
-fdisable-rtl-<pass>=<names>
-fdisable-tree-<pass>
-fdisable-tree-<pass>=<names>
-fenable-ipa-<pass>
-fenable-rtl-<pass>
-fenable-rtl-<pass>=<names>
-fenable-tree-<pass>
-fenable-tree-<pass>=<names>
<pass> is the name of an optimization pass. (The names of which can be obtained via the -fdump-passes option, see below). If <names> is specified then the option is enabled or disabled only for those functions whose names appear in the comma separated list provided.
Here are a couple of examples:
-fdisable-tree-ccp1
Disables the ccp1 pass for all functions.
- fenable-tree-cunroll=foo,bar
Enables the cunroll pass for the functions foo and bar.
* GCC also has a new dump option: -fdump-passes which dumps a list of optimization passes that are turned on and off by the current selection command line options.
* The ARM backend to GCC now has a new command line option:
-mtls-dialect=<dialect>
This specifies the dialect to use for accessing thread local storage. Two dialects are supported: "gnu" and "gnu2". The gnu dialect selects the original GNU scheme for supporting local and global dynamic TLS models. The gnu2 dialect selects the GNU descriptor scheme, which provides better performance for shared libraries. The GNU descriptor scheme is compatible with the original scheme, but does require new assembler, linker and library support. Initial and local exec TLS models are unaffected by this option and always use the original scheme.
That's all for this month.
Cheers
Nick
There are several new developments to report this month:
* Support has been added to the Binutils for the Tilera TileGx and TilePRO processors: http://www.tilera.com/products/processor
* The linker can now generate stack unwinding information for internally generated sections, such as PLT sections.
* G++ has a new warning option: -Wdelete-non-virtual-dtork. When enabled G++ will issue a warning if delete is used to destroy an instance of a class which has virtual functions and non-virtual destructor. It is unsafe to delete an instance of a derived class through a pointer to a base class if the base class does not have a virtual destructor. This warning is enabled by default with -Wall.
* GCC has a new warning option: -Wstack-usage=<len>. Using this option makes GCC issue a warning if the stack usage of a function might be larger than <len> bytes. This includes the space allocated by invocations of alloca() as well as any variable length arrays or such like. The warning has to be specifically enabled.
* GCC now has a set of options to enable or disable individual optimization passes:
-fdisable-ipa-<pass>
-fdisable-rtl-<pass>
-fdisable-rtl-<pass>=<names>
-fdisable-tree-<pass>
-fdisable-tree-<pass>=<names>
-fenable-ipa-<pass>
-fenable-rtl-<pass>
-fenable-rtl-<pass>=<names>
-fenable-tree-<pass>
-fenable-tree-<pass>=<names>
<pass> is the name of an optimization pass. (The names of which can be obtained via the -fdump-passes option, see below). If <names> is specified then the option is enabled or disabled only for those functions whose names appear in the comma separated list provided.
Here are a couple of examples:
-fdisable-tree-ccp1
Disables the ccp1 pass for all functions.
- fenable-tree-cunroll=foo,bar
Enables the cunroll pass for the functions foo and bar.
* GCC also has a new dump option: -fdump-passes which dumps a list of optimization passes that are turned on and off by the current selection command line options.
* The ARM backend to GCC now has a new command line option:
-mtls-dialect=<dialect>
This specifies the dialect to use for accessing thread local storage. Two dialects are supported: "gnu" and "gnu2". The gnu dialect selects the original GNU scheme for supporting local and global dynamic TLS models. The gnu2 dialect selects the GNU descriptor scheme, which provides better performance for shared libraries. The GNU descriptor scheme is compatible with the original scheme, but does require new assembler, linker and library support. Initial and local exec TLS models are unaffected by this option and always use the original scheme.
That's all for this month.
Cheers
Nick