Lightweight but still STL-compatible unique pointer

Mag­num got a new unique point­er im­ple­men­ta­tion that’s much more light­weight with bet­ter de­bug per­for­mance and com­pile times, but is still ful­ly com­pat­i­ble with std::unique_p­tr.

Mag­num is cur­rent­ly un­der­go­ing an op­ti­miza­tion pass for short­er com­pi­la­tion times and small­er bi­na­ry sizes, fur­ther im­prov­ing on what was done back in 2013. Back then I man­aged to re­duce amount of tem­plate in­stan­ti­a­tions and re­move use of the then-most-heavy #includes such as <iostream> or <algorithm> from head­ers, while al­so ban­ning sev­er­al oth­ers such as <regex> or <random> from ev­er leak­ing there. Back then, with a 2013 hard­ware and GCC 4.8, that re­sult­ed in com­pile times be­ing down from 5:00 to 2:59, which was al­ready a sig­nif­i­cant im­prove­ment.

Nowa­days Mag­num com­piles with all tests in 80 sec­onds. The code­base got con­sid­er­ably big­ger dur­ing the past five years, but Moore’s law is al­so still in ef­fect, so one could say that the best so­lu­tion for im­prov­ing com­pile times is to just wait a bit. (Sim­i­lar­ly as the best way to cure in­som­nia is to get more sleep.) But, es­pe­cial­ly af­ter see­ing what’s pos­si­ble with plain C, I’m not com­plete­ly hap­py with cur­rent state and I think I could do bet­ter.

~ ~ ~

The prob­lem with re­mov­ing the most sig­nif­i­cant caus­es of slow­down is that some­thing else steps up to be­come the most prob­lem­at­ic. Now things like <vector> or <string> are among the top of­fend­ers and, apart from re­plac­ing std::vec­tor with Mag­num’s light­weight Con­tain­ers::Ar­ray where pos­si­ble, the most ef­fi­cient cure is to PIM­PL the class in­ter­nals to re­move them from class def­i­ni­tions. That works for al­most ev­ery­thing. Ex­cept for std::unique_p­tr, be­cause that one is of­ten used to wrap the PIM­PL it­self since you def­i­nite­ly do not want to im­ple­ment copy/move con­struc­tors for each PIM­PL’d class in­stead.

To my great sur­prise, the <memory> head­er is quite a beast, twice as big as <vector> (which, well, has to han­dle all the com­plex move-aware re­al­lo­ca­tions) and it on­ly gets worse with new­er C++ stan­dards. It’s ac­tu­al­ly even slight­ly big­ger than <iostream> which I banned for this very rea­son!

Be­low is a graph of pre­pro­cessed line count for each head­er, gen­er­at­ed us­ing the fol­low­ing com­mand with GCC 8.2. Note the use of -P, which re­moves un­nec­es­sary #line state­ments from the pre­proces­sor out­put, mak­ing the re­sult­ing line count more cor­re­spond­ing to the amount of ac­tu­al code. The last line in the plot, for com­par­i­son, is us­ing Clang 7.0 with libc++. While pre­pro­cessed line count is not the on­ly fac­tor af­fect­ing com­pile times, it cor­re­lates with it pret­ty well, es­pe­cial­ly in tem­plate-heavy C++ code.

echo "#include <memory>" | gcc -std=c++11 -P -E -x c++ - | wc -l
8608.0 lines 14652.0 lines 17839.0 lines 17863.0 lines 20995.0 lines 16736.0 lines 0 2500 5000 7500 10000 12500 15000 17500 20000 lines <vector> <vector> + <string> <iostream> <memory> <memory> <memory> libstdc++, C++17 libc++, C++2a Preprocessed line count

Let’s step back a bit and try again

Im­pos­ing the bur­den of 17k lines on ev­ery us­er of the class would ab­so­lute­ly de­stroy any ben­e­fits of PIM­PLing away the <vector> and <string> in­cludes, as the <memory> head­er alone is big­ger than those two com­bined. The crazy part is that it’s just a move-on­ly wrap­per over a point­er.

The new Con­tain­ers::Point­er is al­so just that, but in a rea­son­ably-sized pack­age. Un­like std::unique_p­tr it doesn’t sup­port ar­rays (Mag­num has Con­tain­ers::Ar­ray for that) and at the mo­ment it doesn’t have cus­tom deleters, as there was no im­me­di­ate need for this fea­ture. On the oth­er hand, it pro­vides an equiv­a­lent to std::make_unique() with­out forc­ing you to use C++14. It’s named just Pointer, be­cause I al­ready have an Array and I don’t ev­er plan on im­ple­ment­ing an al­ter­na­tive to std::shared_p­tr, be­cause, in my opin­ion, the on­ly pur­pose of that type is mak­ing cod­ing crimes eas­i­er to com­mit.

Let’s look at it again:

2311.0 lines 2769.0 lines 17863.0 lines 21014.0 lines 0 2500 5000 7500 10000 12500 15000 17500 20000 lines <Containers/Pointer.h> <Containers/Pointer.h> <memory> <memory> C++11 C++2a C++11 C++2a Preprocessed line count

It could be small­er, but I need­ed <type_traits> to do some con­ve­nience com­pile-time checks (one of them is for­bid­ding its use on T[]). And for in-place con­struc­tion us­ing Con­tain­ers::point­er(), I need­ed std::for­ward() from <utility>. I could have used static_cast in­stead and saved my­self ~700 lines of code, but the head­er is so es­sen­tial that you’ll be in­clud­ing your­self soon­er or lat­er any­way.

Com­pile times and de­bug per­for­mance

For a “mi­crobench­mark” of com­pile times, I cre­at­ed the fol­low­ing two code snip­pets and com­piled each with GCC 8.2. For bet­ter sense of scale, there’s al­so a base­line time, which is from com­pil­ing just int main() {} with no #include at all.

#include <Corrade/Containers/Pointer.h>

using namespace Corrade;

int main() {
  Containers::Pointer<int> a{new int{}};
  return *a;
}
#include <memory>

int main() {
  std::unique_ptr<int> a{new int{}};
  return *a;
}

By de­fault, Con­tain­ers::Point­er has a con­ve­nience print­er for Util­i­ty::De­bug and al­so pro­vides hu­man-read­able as­ser­tions us­ing the same util­i­ty. To make the com­par­i­son more bal­anced, I opt­ed-out of de­bug print­ing and switched to stan­dard C assert() by defin­ing CORRADE_NO_DEBUG and COR­RADE_­S­TAN­DARD­_ASSERT on the com­pil­er com­mand line. The re­sult­ing times are be­low:

g++ main.cpp -DCORRADE_NO_DEBUG -DCORRADE_STANDARD_ASSERT -std=c++11 # or c++2a
49.97 ± 0.54 ms 69.74 ± 3.04 ms 71.41 ± 0.84 ms 205.19 ± 1.05 ms 249.01 ± 4.72 ms 0 50 100 150 200 250 ms baseline <Containers/Pointer.h> <Containers/Pointer.h> <memory> <memory> int main() {} C++11 C++2a C++11 C++2a Compilation time, GCC 8.2

Re­gard­ing de­bug per­for­mance, check­ing on Com­pil­er Ex­plor­er, std::unique_p­tr re­sult­ed in rough­ly four times as many in­struc­tions as for Con­tain­ers::Point­er in a non-op­ti­mized ver­sion on both Clang and GCC. GCC with -O1 and high­er was able to re­duce the above snip­pet to a pair of new and delete, Clang with -O1 short­ened the code to rough­ly half for both (but still with 3x dif­fer­ence) and Clang -O2 and up man­aged to get rid of the al­lo­ca­tion al­togher­her in both cas­es, which is nice.

What if my li­brary al­ready us­es std::unique_p­tr?

Mag­num will be grad­u­al­ly switch­ing to the new type in all APIs, but be­cause I don’t want to make your life mis­er­able, the type is able to im­plic­it­ly morph from and back in­to std::unique_p­tr. A sim­i­lar trick is al­ready used in the Mag­num Math li­brary for ex­am­ple for the GLM math li­brary in­te­gra­tion. The con­ver­sion is pro­vid­ed in a sep­a­rate Cor­rade/Con­tain­ers/Point­er­Stl.h head­er be­cause, well, do­ing it di­rect­ly in the class it­self would re­quire me to #include <memory> — which I want­ed to avoid in the first place. As a side-ef­fect of this, it al­so al­lows you to have an equiv­a­lent of std::make_unique() in C++11 — Con­tain­ers::point­er():

#include <Corrade/Containers/PointerStl.h>

using namespace Corrade;

int main() {
    std::unique_ptr<int> a{new int{42}};
    Containers::Pointer<int> b = std::move(a);

    std::unique_ptr<int> c = Containers::pointer<int>(1337);
}

This con­ver­sion be­haves like any oth­er usu­al move — the orig­i­nal in­stance gets re­lease()d, be­com­ing nullptr, and the own­er­ship moves to the oth­er.

The case of std::ref­er­ence_wrap­per

I… I’m not even mad any­more. Just dis­ap­point­ed. Main use of this stan­dard type in Mag­num APIs is to al­low stor­ing ref­er­ences (or non-nul­lable point­ers) in var­i­ous con­tain­ers. The std::ref­er­ence_wrap­per is even sim­pler than std::unique_p­tr, yet it’s shov­eled in­to the <functional> head­er, which, while it was not ex­act­ly slim to be­gin with, it man­aged to gain an in­sane amount of weight due to (I as­sume) the in­tro­duc­tion of searchers in C++17. Like, why not put these in <search> in­stead?! So I made my own Con­tain­ers::Ref­er­ence, too (and it’s al­so con­vert­ible to/from the STL equiv­a­lent in a sim­i­lar way).

1646.0 lines 2015.0 lines 14540.0 lines 31353.0 lines 0 5000 10000 15000 20000 25000 30000 lines <Containers/Reference.h> <Containers/Reference.h> <functional> <functional> C++11 C++2a C++11 C++2a Preprocessed line count

In this case I didn’t even need <utility>, so the head­er is just 1646 pre­pro­cessed lines un­der C++11. To wrap it up, here are com­pile times of the fol­low­ing snip­pets, again with the base­line com­par­i­son for bet­ter scale:

#include <Corrade/Containers/Reference.h>

using namespace Corrade;

int main() {
    int a{};
    Containers::Reference<int> b = a;
    return b;
}
#include <functional>

int main() {
    int a{};
    std::reference_wrapper<int> b = a;
    return b;
}
49.97 ± 0.54 ms 64.66 ± 3.49 ms 66.29 ± 4.87 ms 173.6 ± 7.38 ms 308.8 ± 7.76 ms 0 50 100 150 200 250 300 ms baseline <Containers/Reference.h> <Containers/Reference.h> <functional> <functional> int main() {} C++11 C++2a C++11 C++2a Compilation time, GCC 8.2

But, but, … mod­ules?

The Mod­ules work is run­ning for half a decade al­ready and many of the head­er bloat con­cerns are be­ing hand­waved away with “mod­ules will solve that”. I looked at the pro­pos­als back in 2016, but didn’t have a chance to check back since, so I was ex­cit­ed to see the progress.

TL;DR: no, we’re not there yet.

While Mod­ules are said to be on track for C++20 (I hope that’s stil pos­si­ble), I was not able to find any re­al-world ex­am­ple that would work for me. Af­ter much strug­gling, I man­aged to come up with this com­mand-line:

clang++ -std=c++17 -stdlib=libc++ -fmodules-ts -fimplicit-modules \
    -fmodule-map-file=/usr/include/c++/v1/module.modulemap main.cpp

And, af­ter in­stalling both libc++ and libc++-experimental from AUR, the fol­low­ing snip­pet com­piled cor­rect­ly. Var­i­ous ex­am­ples told me that I could import std.memory;, but that on­ly greet­ed me with an un­googleable er­ror.

import std;

int main() {

    std::unique_ptr<int> a{new int{}};
    return *a;
}

The mea­sured com­pile times are be­low, but note the very first run takes al­most two sec­onds — it’s com­pil­ing the mod­ule file, re­sult­ing in 17 megabytes of var­i­ous bi­na­ries in your temp di­rec­to­ry. And you get a dif­fer­ent set of these for dif­fer­ent flags, en­abling -O3 gen­er­ates an­oth­er set of bi­na­ries. That … feels pret­ty much like pre­com­piled head­ers. Not sure if hap­py. (I didn’t like those at all.)

82.93 ± 0.78 ms 108.01 ± 4.51 ms 279.79 ± 4.15 ms 90.86 ± 4.14 ms 0 50 100 150 200 250 ms baseline Containers::Pointer std::unique_ptr std::unique_ptr int main() {} <Containers/Pointer.h> <memory> import std Compilation time, Clang 7.0 -std=c++17

I was look­ing for­ward to C++ mod­ules to sim­pli­fy li­brary link­ing to the point where you just say “this is the li­brary I want to link to” on the com­mand line and it will feed both the link­er with cor­rect ob­ject code and the com­pil­er with cor­rect im­port­ed def­i­ni­tions. Wish­ful think­ing.

This is nowhere near that and the speed gains are not that sig­nif­i­cant com­pared to re­spon­si­ble head­er hy­giene. Peo­ple with big­ger code­bas­es are re­port­ing even small­er gains, around 10%, which makes me won­der if this is worth both­er­ing with, in the cur­rent state of things. And us­ing mod­ules will not mag­i­cal­ly im­prove de­bug per­for­mance of STL con­tain­ers any­way.

What’s worse is that the im­ple­men­ta­tion is nowhere prop­er­ly doc­u­ment­ed (Clang Mod­ules doc­u­men­ta­tion is not about Mod­ules TS, but their own dif­fer­ent thing) and there’s no sup­port in tools or IDEs (not to men­tion buildsys­tems), so at the mo­ment it’s very painful to work with. I think I’ll check back in an­oth­er five years.

Take it, it’s just a sin­gle file!

If you are al­ready us­ing Mag­num, sim­ply #include these files and you’re ready to take back the con­trol over your com­pile times. If not, these two con­tain­ers, along with Con­tain­ers::Op­tion­al, are avail­able through a fresh­ly cre­at­ed mag­num-sin­gles repos­i­to­ry. Each is a self-con­tained tiny head­er file, meant to be bun­dled in­to your project:

Li­brary LoC Pre­pro­cessed LoC De­scrip­tion
Cor­radeOp­tion­al.h 328 2742 See Con­tain­ers::Op­tion­al docs
Cor­rade­Point­er.h 259 2321 See Con­tain­ers::Point­er docs
Cor­radeRef­er­ence.h 115 1639 See Con­tain­ers::Ref­er­ence docs
Cor­rade­Scope­Guard.h 108 26 See Con­tain­ers::Scope­Guard docs

This repos­i­to­ry will be re­ceiv­ing more li­braries as Mag­num will get grad­u­al­ly slimmed down. You can al­ready look for­ward to a math li­brary that fits un­der 10k pre­pro­cessed lines :)

* * *

Ques­tions? Com­plaints? Share your opin­ion on so­cial net­works: