Lightweight but still STL-compatible unique pointer

Mag­num got a new unique point­er im­ple­ment­a­tion that’s much more light­weight with bet­ter de­bug per­form­ance and com­pile times, but is still fully com­pat­ible with std::unique_ptr.

Mag­num is cur­rently un­der­go­ing an op­tim­iz­a­tion pass for short­er com­pil­a­tion times and smal­ler bin­ary sizes, fur­ther im­prov­ing on what was done back in 2013. Back then I man­aged to re­duce amount of tem­plate in­stan­ti­ations and re­move use of the then-most-heavy #includes such as <iostream> or <algorithm> from head­ers, while also ban­ning sev­er­al oth­ers such as <regex> or <random> from ever leak­ing there. Back then, with a 2013 hard­ware and GCC 4.8, that res­ul­ted in com­pile times be­ing down from 5:00 to 2:59, which was already a sig­ni­fic­ant im­prove­ment.

Nowadays Mag­num com­piles with all tests in 80 seconds. The code­base got con­sid­er­ably big­ger dur­ing the past five years, but Moore’s law is also still in ef­fect, so one could say that the best solu­tion for im­prov­ing com­pile times is to just wait a bit. (Sim­il­arly as the best way to cure in­som­nia is to get more sleep.) But, es­pe­cially after see­ing what’s pos­sible with plain C, I’m not com­pletely happy with cur­rent state and I think I could do bet­ter.

~ ~ ~

The prob­lem with re­mov­ing the most sig­ni­fic­ant causes of slow­down is that some­thing else steps up to be­come the most prob­lem­at­ic. Now things like <vector> or <string> are among the top of­fend­ers and, apart from re­pla­cing std::vec­tor with Mag­num’s light­weight Con­tain­ers::Ar­ray where pos­sible, the most ef­fi­cient cure is to PIMPL the class in­tern­als to re­move them from class defin­i­tions. That works for al­most everything. Ex­cept for std::unique_ptr, be­cause that one is of­ten used to wrap the PIMPL it­self since you def­in­itely do not want to im­ple­ment copy/move con­struct­ors for each PIMPL’d class in­stead.

To my great sur­prise, the <memory> head­er is quite a beast, twice as big as <vector> (which, well, has to handle all the com­plex move-aware real­loc­a­tions) and it only gets worse with new­er C++ stand­ards. It’s ac­tu­ally even slightly big­ger than <iostream> which I banned for this very reas­on!

Be­low is a graph of pre­pro­cessed line count for each head­er, gen­er­ated us­ing the fol­low­ing com­mand with GCC 8.2. Note the use of -P, which re­moves un­ne­ces­sary #line state­ments from the pre­pro­cessor out­put, mak­ing the res­ult­ing line count more cor­res­pond­ing to the amount of ac­tu­al code. The last line in the plot, for com­par­is­on, is us­ing Clang 7.0 with libc++. While pre­pro­cessed line count is not the only factor af­fect­ing com­pile times, it cor­rel­ates with it pretty well, es­pe­cially in tem­plate-heavy C++ code.

echo "#include <memory>" | gcc -std=c++11 -P -E -x c++ - | wc -l
8608.0 lines 14652.0 lines 17839.0 lines 17863.0 lines 20995.0 lines 16736.0 lines 0 2500 5000 7500 10000 12500 15000 17500 20000 lines <vector> <vector> + <string> <iostream> <memory> <memory> <memory> libstdc++, C++17 libc++, C++2a Preprocessed line count

Let’s step back a bit and try again

Im­pos­ing the bur­den of 17k lines on every user of the class would ab­so­lutely des­troy any be­ne­fits of PIMPLing away the <vector> and <string> in­cludes, as the <memory> head­er alone is big­ger than those two com­bined. The crazy part is that it’s just a move-only wrap­per over a point­er.

The new Con­tain­ers::Point­er is also just that, but in a reas­on­ably-sized pack­age. Un­like std::unique_ptr it doesn’t sup­port ar­rays (Mag­num has Con­tain­ers::Ar­ray for that) and at the mo­ment it doesn’t have cus­tom de­leters, as there was no im­me­di­ate need for this fea­ture. On the oth­er hand, it provides an equi­val­ent to std::make_unique() without for­cing you to use C++14. It’s named just Pointer, be­cause I already have an Array and I don’t ever plan on im­ple­ment­ing an al­tern­at­ive to std::shared_ptr, be­cause, in my opin­ion, the only pur­pose of that type is mak­ing cod­ing crimes easi­er to com­mit.

Let’s look at it again:

2311.0 lines 2769.0 lines 17863.0 lines 21014.0 lines 0 2500 5000 7500 10000 12500 15000 17500 20000 lines <Containers/Pointer.h> <Containers/Pointer.h> <memory> <memory> C++11 C++2a C++11 C++2a Preprocessed line count

It could be smal­ler, but I needed <type_traits> to do some con­veni­ence com­pile-time checks (one of them is for­bid­ding its use on T[]). And for in-place con­struc­tion us­ing Con­tain­ers::point­er(), I needed std::for­ward() from <utility>. I could have used static_cast in­stead and saved my­self ~700 lines of code, but the head­er is so es­sen­tial that you’ll be in­clud­ing your­self soon­er or later any­way.

Com­pile times and de­bug per­form­ance

For a “mi­crobench­mark” of com­pile times, I cre­ated the fol­low­ing two code snip­pets and com­piled each with GCC 8.2. For bet­ter sense of scale, there’s also a baseline time, which is from com­pil­ing just int main() {} with no #include at all.

#include <Corrade/Containers/Pointer.h>

using namespace Corrade;

int main() {
  Containers::Pointer<int> a{new int{}};
  return *a;
}
#include <memory>

int main() {
  std::unique_ptr<int> a{new int{}};
  return *a;
}

By de­fault, Con­tain­ers::Point­er has a con­veni­ence print­er for Util­ity::De­bug and also provides hu­man-read­able as­ser­tions us­ing the same util­ity. To make the com­par­is­on more bal­anced, I op­ted-out of de­bug print­ing and switched to stand­ard C assert() by de­fin­ing CORRADE_NO_DEBUG and COR­RADE_STAND­AR­D_ASSERT on the com­piler com­mand line. The res­ult­ing times are be­low:

g++ main.cpp -DCORRADE_NO_DEBUG -DCORRADE_STANDARD_ASSERT -std=c++11 # or c++2a
49.97 ± 0.54 ms 69.74 ± 3.04 ms 71.41 ± 0.84 ms 205.19 ± 1.05 ms 249.01 ± 4.72 ms 0 50 100 150 200 250 ms baseline <Containers/Pointer.h> <Containers/Pointer.h> <memory> <memory> int main() {} C++11 C++2a C++11 C++2a Compilation time, GCC 8.2

Re­gard­ing de­bug per­form­ance, check­ing on Com­piler Ex­plorer, std::unique_ptr res­ul­ted in roughly four times as many in­struc­tions as for Con­tain­ers::Point­er in a non-op­tim­ized ver­sion on both Clang and GCC. GCC with -O1 and high­er was able to re­duce the above snip­pet to a pair of new and delete, Clang with -O1 shortened the code to roughly half for both (but still with 3x dif­fer­ence) and Clang -O2 and up man­aged to get rid of the al­loc­a­tion al­togh­er­her in both cases, which is nice.

What if my lib­rary already uses std::unique_ptr?

Mag­num will be gradu­ally switch­ing to the new type in all APIs, but be­cause I don’t want to make your life miser­able, the type is able to im­pli­citly morph from and back in­to std::unique_ptr. A sim­il­ar trick is already used in the Mag­num Math lib­rary for ex­ample for the GLM math lib­rary in­teg­ra­tion. The con­ver­sion is provided in a sep­ar­ate Cor­rade/Con­tain­ers/Point­erStl.h head­er be­cause, well, do­ing it dir­ectly in the class it­self would re­quire me to #include <memory> — which I wanted to avoid in the first place. As a side-ef­fect of this, it also al­lows you to have an equi­val­ent of std::make_unique() in C++11 — Con­tain­ers::point­er():

#include <Corrade/Containers/PointerStl.h>

using namespace Corrade;

int main() {
    std::unique_ptr<int> a{new int{42}};
    Containers::Pointer<int> b = std::move(a);

    std::unique_ptr<int> c = Containers::pointer<int>(1337);
}

This con­ver­sion be­haves like any oth­er usu­al move — the ori­gin­al in­stance gets re­lease()d, be­com­ing nullptr, and the own­er­ship moves to the oth­er.

The case of std::ref­er­en­ce_wrap­per

I… I’m not even mad any­more. Just dis­ap­poin­ted. Main use of this stand­ard type in Mag­num APIs is to al­low stor­ing ref­er­ences (or non-nul­lable point­ers) in vari­ous con­tain­ers. The std::ref­er­en­ce_wrap­per is even sim­pler than std::unique_ptr, yet it’s shoveled in­to the <functional> head­er, which, while it was not ex­actly slim to be­gin with, it man­aged to gain an in­sane amount of weight due to (I as­sume) the in­tro­duc­tion of search­ers in C++17. Like, why not put these in <search> in­stead?! So I made my own Con­tain­ers::Ref­er­en­ce, too (and it’s also con­vert­ible to/from the STL equi­val­ent in a sim­il­ar way).

1646.0 lines 2015.0 lines 14540.0 lines 31353.0 lines 0 5000 10000 15000 20000 25000 30000 lines <Containers/Reference.h> <Containers/Reference.h> <functional> <functional> C++11 C++2a C++11 C++2a Preprocessed line count

In this case I didn’t even need <utility>, so the head­er is just 1646 pre­pro­cessed lines un­der C++11. To wrap it up, here are com­pile times of the fol­low­ing snip­pets, again with the baseline com­par­is­on for bet­ter scale:

#include <Corrade/Containers/Reference.h>

using namespace Corrade;

int main() {
    int a{};
    Containers::Reference<int> b = a;
    return b;
}
#include <functional>

int main() {
    int a{};
    std::reference_wrapper<int> b = a;
    return b;
}
49.97 ± 0.54 ms 64.66 ± 3.49 ms 66.29 ± 4.87 ms 173.6 ± 7.38 ms 308.8 ± 7.76 ms 0 50 100 150 200 250 300 ms baseline <Containers/Reference.h> <Containers/Reference.h> <functional> <functional> int main() {} C++11 C++2a C++11 C++2a Compilation time, GCC 8.2

But, but, … mod­ules?

The Mod­ules work is run­ning for half a dec­ade already and many of the head­er bloat con­cerns are be­ing hand­waved away with “mod­ules will solve that”. I looked at the pro­pos­als back in 2016, but didn’t have a chance to check back since, so I was ex­cited to see the pro­gress.

TL;DR: no, we’re not there yet.

While Mod­ules are said to be on track for C++20 (I hope that’s stil pos­sible), I was not able to find any real-world ex­ample that would work for me. After much strug­gling, I man­aged to come up with this com­mand-line:

clang++ -std=c++17 -stdlib=libc++ -fmodules-ts -fimplicit-modules \
    -fmodule-map-file=/usr/include/c++/v1/module.modulemap main.cpp

And, after in­stalling both libc++ and libc++-experimental from AUR, the fol­low­ing snip­pet com­piled cor­rectly. Vari­ous ex­amples told me that I could import std.memory;, but that only greeted me with an un­google­able er­ror.

import std;

int main() {

    std::unique_ptr<int> a{new int{}};
    return *a;
}

The meas­ured com­pile times are be­low, but note the very first run takes al­most two seconds — it’s com­pil­ing the mod­ule file, res­ult­ing in 17 mega­bytes of vari­ous bin­ar­ies in your temp dir­ect­ory. And you get a dif­fer­ent set of these for dif­fer­ent flags, en­abling -O3 gen­er­ates an­oth­er set of bin­ar­ies. That … feels pretty much like pre­com­piled head­ers. Not sure if happy. (I didn’t like those at all.)

82.93 ± 0.78 ms 108.01 ± 4.51 ms 279.79 ± 4.15 ms 90.86 ± 4.14 ms 0 50 100 150 200 250 ms baseline Containers::Pointer std::unique_ptr std::unique_ptr int main() {} <Containers/Pointer.h> <memory> import std Compilation time, Clang 7.0 -std=c++17

I was look­ing for­ward to C++ mod­ules to sim­pli­fy lib­rary link­ing to the point where you just say “this is the lib­rary I want to link to” on the com­mand line and it will feed both the linker with cor­rect ob­ject code and the com­piler with cor­rect im­por­ted defin­i­tions. Wish­ful think­ing.

This is nowhere near that and the speed gains are not that sig­ni­fic­ant com­pared to re­spons­ible head­er hy­giene. People with big­ger code­bases are re­port­ing even smal­ler gains, around 10%, which makes me won­der if this is worth both­er­ing with, in the cur­rent state of things. And us­ing mod­ules will not ma­gic­ally im­prove de­bug per­form­ance of STL con­tain­ers any­way.

What’s worse is that the im­ple­ment­a­tion is nowhere prop­erly doc­u­mented (Clang Mod­ules doc­u­ment­a­tion is not about Mod­ules TS, but their own dif­fer­ent thing) and there’s no sup­port in tools or IDEs (not to men­tion build­sys­tems), so at the mo­ment it’s very pain­ful to work with. I think I’ll check back in an­oth­er five years.

Take it, it’s just a single file!

If you are already us­ing Mag­num, simply #include these files and you’re ready to take back the con­trol over your com­pile times. If not, these two con­tain­ers, along with Con­tain­ers::Op­tion­al, are avail­able through a freshly cre­ated mag­num-singles re­pos­it­ory. Each is a self-con­tained tiny head­er file, meant to be bundled in­to your pro­ject:

Lib­rary LoC Pre­pro­cessed LoC De­scrip­tion
Cor­rade­Op­tion­al.h 328 2742 See Con­tain­ers::Op­tion­al docs
Cor­rade­Point­er.h 259 2321 See Con­tain­ers::Point­er docs
Cor­radeRefer­en­ce.h 115 1639 See Con­tain­ers::Ref­er­en­ce docs
Cor­ra­de­Scope­Guard.h 108 26 See Con­tain­ers::Scope­Guard docs

This re­pos­it­ory will be re­ceiv­ing more lib­rar­ies as Mag­num will get gradu­ally slimmed down. You can already look for­ward to a math lib­rary that fits un­der 10k pre­pro­cessed lines :)

* * *

Ques­tions? Com­plaints? Share your opin­ion on so­cial net­works: