Reducing C++ compilation time in Magnum: code optimizations
Large C++ projects often suffer with very long times for both full and incremental compilation, severely hurting productivity. The following series will overview some techniques employed in Magnum to make iteration times smaller, this article focuses on code-based optimizations.
To put things into perspective, the Magnum graphics engine has around 100k lines of templated C++ code, documentation and comments. Currently the unoptimized build with Clang, CMake and Ninja runs 2 minutes and 59 seconds with tests enabled, without tests it’s just 76 seconds. Times mentioned in the article were measured in various stages of development, thus they may not always reflect the current state.
The preprocessor issue
C preprocessor is the predecessor of all module systems and it’s showing its age. Simply put, instead of providing the compiler with only essential module information, it just concatenates all the required code into one big file and passes it to the compiler. This isn’t much of an issue with C-based projects where the headers are small, but with C++’s templates we need to put much more code into the header. When you include the right STL headers and proper OpenGL headers, the preprocessed source can have well over 100k lines, which then takes significant amount of time to parse. Usually it doesn’t matter whether the code is spread over one or one hundred files, as any sane system should already have all the headers in disk cache.
To solve this and many other preprocessor issues, Clang developers are working on a module system, but it is not usable for C++ yet.
Discovering problematic includes
If you are using CMake with Makefile generator for your project, you can use it
to generate just the preprocessed file so you can examine preprocessed line
count for each source file — just append .i
to name of source file. Then
you can try removing some #include
s to bisect the big ones.
[build/src/Magnum/GL]$ make Framebuffer.cpp.i Preprocessing CXX source to CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i [build/src/Magnum/GL]$ wc -l CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i 35695 CMakeFiles/MagnumGLObjects.dir/Framebuffer.cpp.i
Reducing includes in headers
The first and obvious trick is to remove #include
s which are not
needed anymore. This is boring and time-consuming task if done by hand, but
helps a lot without even touching the actual code. There are also some
semi-automated tools for this, the simplest and dumbest brute-force method is
removing #include
s as long as the code can still compile.
If given type is not used in the header, you can use forward declarations and
move the #include
s from header to implementation file. Having the big
#include
in one *.cpp
file as opposed to having it in *.h
which
is included in 150 other files helps a lot. If given type is used only in some
non-templated function, you can move its definition into source file. The only
problem remains when the type is used as class member or in templated function.
Forward-declaring non-templated classes and structs is trivial (and very
common, for example with Qt). It gets more complicated when namespace
s
and typedef
s are involved, with template classes you need to repeat
the whole template list and it can get quicky out of hand:
class Timeline; // easy namespace Math { template<std::size_t, class> class Matrix; } typedef float Float; typedef Math::Matrix<3, Float> Matrix3x3; // ehh...
For user convenience Magnum has forward declaration headers, which are available for each namespace, so the users can just include this tiny header and don’t need to write forward declarations on their own:
// forward-declares both Timeline and Matrix3x3 #include <Magnum/Magnum.h>
The problem is when you want to forward-declare class with default template arguments. Similarly to default arguments in functions, in C++ you can’t repeat the default argument when defining the type. As we already have forward declaration header, we can put the default arguments in that header and omit them in the actual definition. The type definition must be complete, so the forward declaration header must be included in the type definition header.
// SceneGraph.h template<UnsignedInt, class T, class TranslationType = T> class TranslationTransformation;
// TranslationTransformation.h #include "SceneGraph.h" template<UnsignedInt dimensions, class T, class TranslationType> class TranslationTransformation { // ... };
With C++11 it is also possible to forward-declare typed enums. In Magnum some enums are used on many places (MeshPrimitive, GL::BufferUsage …) and some are very large (PixelFormat, GL::TextureFormat, …) and the enum values often depend on OpenGL headers which are also big. The compiler doesn’t care about particular named values and needs to know only the type, thus you can pass the value around without having full definition of the enum around:
// forward-declares ColorFormat enum #include <Magnum/Magnum.h> // Don't need the header here PixelFormat format = image.format();
// Need it here #include <Magnum/PixelFormat.h> format = PixelFormat::RGBA8Unorm;
Note that in C++ it is not possible to forward declare class members. To reduce
header dependencies I had to extract some widely-used enums from their classes
(thus GL::Buffer::Usage
is now GL::BufferUsage etc.), but the
change resulted in improved compilation times of code where the enum
forward-declaration is enough.
STL includes
The Standard C++ library is a problem on its own. It is notorious for its heavy headers, which got even bigger with C++11. The STL types are heavily templated with default template arguments and implementation-defined template parameters, thus, as said above, it’s impossible to work around the issue and create our own forward declaration headers.
The table below lists preprocessed line count of the largest STL headers. It
was generated from GCC’s libstdc++ 4.8.2 and Clang’s libc++ 3.3 with the
following command, headers which didn’t exceed 25k lines were omitted. In
comparison, whole <cmath>
has just below 3k lines and <vector>
is
merely 11k lines in C++11 libstdc++.
echo "#include <iostream>" | g++ -std=c++11 -E -x c++ - | wc -l
Header | C++03 libstdc++ | C++11 libstdc++ | C++11 libc++ |
---|---|---|---|
<forward_list> |
25927 | 18095 | |
<queue> |
8749 | 13830 | 26309 |
<algorithm> |
9801 | 46279 | 16645 |
<complex> |
21160 | 28312 | 44507 |
<valarray> |
14671 | 49630 | 24949 |
<random> |
36180 | 51187 | |
<ios> |
15442 | 21561 | 29202 |
<*stream> |
~18000 | ~24000 | ~41000 |
<iomanip> |
11504 | 24296 | 40545 |
<streambuf> |
11839 | 17946 | 29652 |
<locale> |
17913 | 24027 | 35188 |
<codecvt> |
n/a | 28922 | |
<regex> |
70409 | 41601 | |
<thread> |
27436 | 17155 | |
<future> |
32254 | 19618 |
Note how the line count varies wildly between GCC’s libstdc++ and Clang’s
libc++. The numbers aren’t exactly absolute, as many headers share common code,
but the main offenders are the various <*stream>
headers and various
algorithm headers. Fortunately for <*stream>
there is a forward-declaration
header <iosfwd>
which was created in some older revision of C++ for exactly
this purpose, because the stream implementation was far bigger than the other
headers. The situation changed with C++11, but sadly there were no more
forward-declaration headers added. The various container classes are around
10-20k lines and thus can be used as class members without much impact on
compilation time, but the other #include
s shouldn’t appear in headers
at all.
Removing all usage of <algorithm>
from Magnum header files resulted in
significant compile time reductions (4:30 before, 4:10 affer), removing stream
usage or replacing all <*stream>
occurences with <iosfwd>
resulted in
another 20 seconds saved.
Another solution is not to use STL at all and implement everything from scratch. It’s then possible to achieve very impressive compilation times, but the resources required to implement the equivalent of C++11 STL functionality are just too large.
Other heavy includes
The Boost library is also known for its header size, but it is not used in
Magnum (and C++11 incorporates many useful things from this library so the need
for it is even smaller). The other heavy thing are OpenGL headers. Originally
Magnum used GLEW for OpenGL extension handling, but GLEW headers have about 18k
lines and contain many functions the engine will never use. Recently I switched
to glLoadGen
, which generates the header with only requested functions. The
generated header has about 3k lines (which is roughly the size of official
gl.h
) and compilation time was reduced from 5:00 to 4:45.
Reducing includes needed for class members
If you have some value type as class member, you need to #include
, so
the compiler can know its size and can generate proper constructor, assignment
operator and destructor. You can circumvent this by making it a reference or
pointer and then explicitly define the constructor and other functions in
source file. The D-Pointer approach, which is very heavily used in Qt, is
another solution for this and many other issues, however the additional heap
allocation and indirection has performance implications and thus is not used in
Magnum.
Reducing templated code in headers
If the templated code is used for limited set of types (e.g. only
float
s and double
s), you can move the definition into source
file and explicitly instantiate the template for each type. This approach is
used in Magnum’s scene graph. Additonally Magnum provides special
template implementation header for each class,
which contains the definitions of templated functions. If the users want to use
the template for e.g. int
s (which isn’t provided by default), they can
include this header in some source file and do the explicit instantiation
themselves:
// instantiation.cpp #include "SceneGraph/AbstractObject.hpp" template class SceneGraph::AbstractBasicObject2D<Int>;
Balancing size and count of compilation units
For headers it’s often good to split the header into smaller ones with less dependencies, but for source files it’s better to combine more of them into one, as the compiler then needs to preprocess the included headers only once instead of more times. Be aware that this is double-edged sword and it will hurt iteration times — recompiling the whole huge file after a small change would take much longer than rebuilding only a small one. Also the compile time reduction is not as significant as when optimizing a widely-used header file. Magnum uses this approach for template instantiation files, the merging resulted in 5 seconds shorter build time.
Reducing amount of generated code
C++11 extern template
keyword tells the compiler that the code is
already compiled in some library and thus the compiler can skip the compilation
and optimizing of given code fragment and leave it for the linker.
Reducing amount of exported symbols helps the linker (and also dynamic linker at runtime), as it doesn’t have to process huge symbol table containing stuff that isn’t used outside the library. See GCC’s documentation about visibility.
The next part will be about optimizing the build system.