New geometry pipeline in Magnum

Flex­i­ble and ef­fi­cient mesh rep­re­sen­ta­tion, cus­tom at­tributes, new da­ta types and a ton of new pro­cess­ing, vi­su­al­iza­tion and an­a­lyz­ing tools. GPU-friend­ly ge­om­e­try stor­age as it should be in the 21st cen­tu­ry.

Dur­ing the past six months, Mag­num had un­der­gone a rather mas­sive re­work of its very cen­tral parts — mesh da­ta im­port and stor­age. The orig­i­nal (and now dep­re­cat­ed) Trade::MeshData2D / Trade::MeshData3D class­es stayed ba­si­cal­ly in­tact from the ear­ly 2010s when Mag­num was noth­ing more than a toy project of one bored uni­ver­si­ty stu­dent, and were over­due for a re­place­ment.

How to not do things

While the GL::Mesh and GL::At­trib­ute on the ren­der­er side pro­vid­ed all imag­in­able op­tions for da­ta lay­out and ver­tex for­mats, the flex­i­bil­i­ty bot­tle­neck was on the im­porter side. In­creas­ing­ly un­hap­py about the lim­i­ta­tions, I end­ed up sug­gest­ing peo­ple to just side­step the Trade APIs and make their own rep­re­sen­ta­tion when they need­ed to do any­thing non-triv­ial. How­ev­er, work­ing on the re­place­ment, I dis­cov­ered — the hor­ror — that Mag­num was far from the on­ly li­brary with such lim­i­ta­tions em­bed­ded in its de­sign.

explicit MeshData3D(MeshPrimitive primitive,
    std::vector<UnsignedInt> indices,
    std::vector<std::vector<Vector3>> positions,
    std::vector<std::vector<Vector3>> normals,
    std::vector<std::vector<Vector2>> textureCoords2D,
    std::vector<std::vector<Color4>> colors,
    const void* importerState = nullptr);

Source: Mag­num/Trade/Mesh­Data3D.h dep­re­cat­ed

Here is the orig­i­nal Mag­num API. While it al­lowed mul­ti­ple sets of all at­tributes (us­able in mesh mor­ph­ing, for ex­am­ple), adding a new at­tribute type meant adding an­oth­er vec­tor-of-vec­tors (and up­dat­ing calls to this con­struc­tor ev­ery­where), not to men­tion lack of sup­port any sort of cus­tom at­tributes or abil­i­ty to store dif­fer­ent da­ta types. The importerState is an ex­ten­sion point that al­lows ac­cess­ing ar­bi­trary ad­di­tion­al da­ta, but it’s plug­in-de­pen­dent and thus not us­able in a gener­ic way.

struct aiMesh
    aiVector3D* mVertices;
    aiVector3D* mNormals;
    aiVector3D* mTangents;
    aiVector3D* mBitangents;
    aiColor4D* mColors[];
    aiVector3D* mTextureCoords[];
    aiFace* mFaces;

Source: as­simp/mesh.h

Per­haps the most wide­ly used as­set im­port li­brary, As­simp, has it very sim­i­lar. All at­tributes are tight­ly packed and in a fixed type, and while it sup­ports a few more at­tribute types com­pared to the orig­i­nal Mag­num API, it has no cus­tom at­tributes or for­mats ei­ther.

Fixed in­dex and at­tribute types mean that in­put da­ta has to be de-in­ter­leaved and ex­pand­ed to 32-bit ints and floats in or­der to be stored here, on­ly to have them in­ter­leaved and packed again lat­er to have ef­fi­cient rep­re­sen­ta­tion on the GPU. Both of those rep­re­sen­ta­tions al­so own the da­ta, mean­ing you can’t use them to ref­er­ence ex­ter­nal mem­o­ry (for ex­am­ple a mem­o­ry-mapped file or a GPU buf­fer).

The ul­ti­mate win­ner of this con­test, how­ev­er, is li­bIGL, with the fol­low­ing func­tion sig­na­ture. Grant­ed, it’s tem­plat­ed to al­low you to choose a dif­fer­ent in­dex and scalar type, but you have to choose the type up­front and not based on what the file ac­tu­al­ly con­tains, which kin­da de­feats the pur­pose. What’s the most amaz­ing though is that ev­ery po­si­tion and nor­mal is a three-com­po­nent std::vec­tor, ev­ery tex­ture co­or­di­nate a two-com­po­nent vec­tor and then each face is rep­re­sent­ed by an­oth­er three vec­tor in­stances. So if you load a 5M-ver­tex mesh with 10M faces (which is not that un­com­mon if you deal with re­al da­ta), it’ll be spread across 45 mil­lions of al­lo­ca­tions. Even with keep­ing all the flex­i­bil­i­ty It could be just a hand­ful1, but why keep your feet on the ground, right? The std::string passed by val­ue is just a nice touch on top.

template <typename Scalar, typename Index>
IGL_INLINE bool readOBJ(
  const std::string obj_file_name,
  std::vector<std::vector<Scalar > > & V,
  std::vector<std::vector<Scalar > > & TC,
  std::vector<std::vector<Scalar > > & N,
  std::vector<std::vector<Index > > & F,
  std::vector<std::vector<Index > > & FTC,
  std::vector<std::vector<Index > > & FN,
  std::vector<std::tuple<std::string, Index, Index >> &FM

Source: igl/read­OBJ.h

^ To be fair, li­bIGL has an over­load that puts the re­sult in­to just six reg­u­lar­ly-shaped Eigen ma­tri­ces. How­ev­er, it’s im­ple­ment­ed on top of the above (so you still need a mil­i­tary-grade al­lo­ca­tor) and it re­quires you to know be­fore­hand that all faces in the file have the same size.

Can we do bet­ter?

The orig­i­nal pipe­line (and many im­porter li­braries as well) got de­signed with an as­sump­tion that a file has to be parsed in or­der to get the ge­om­e­try da­ta out of it. It was a sen­si­ble de­ci­sion for clas­sic tex­tu­al for­mats such as OBJ, COL­LA­DA or OpenGEX, and there was lit­tle point in pars­ing those to any­thing else than 32-bit floats and in­te­gers. For such for­mats, a rel­a­tive­ly mas­sive amount of pro­cess­ing was need­ed ei­ther way, so a bunch of more copies and da­ta pack­ing at the end didn’t re­al­ly mat­ter:

The new pipe­line turns this as­sump­tion up­side down, and in­stead builds on a sim­ple de­sign goal — be­ing able to un­der­stand any­thing that the GPU can un­der­stand as well. In­ter­leaved da­ta or not, half-floats, packed for­mats, ar­bi­trary pad­ding and align­ment, cus­tom ap­pli­ca­tion-spe­cif­ic at­tributes and so on. Then, as­sum­ing a file al­ready has the da­ta ex­act­ly as we want it, it can sim­ply copy the bi­na­ry blob over to the GPU and on­ly parse the meta­da­ta de­scrib­ing off­sets, strides and for­mats:

For the tex­tu­al for­mats (and rigid­ly-de­signed 3rd par­ty im­porter li­braries) it means the im­porter plug­in now has to do ex­tra work that in­volves pack­ing the da­ta in­to a sin­gle buf­fer. But that’s an op­ti­miza­tion done on the right side — with in­creas­ing mod­el com­plex­i­ty it will make less and less sense to store the da­ta in a tex­tu­al for­mat.

En­ter the new Mesh­Da­ta

The new Trade::Mesh­Da­ta class ac­cepts just two mem­o­ry buf­fers — a type­less in­dex buf­fer and a type­less ver­tex buf­fer. The rest is sup­plied as a meta­da­ta, with Con­tain­ers::StridedAr­rayView pow­er­ing the da­ta ac­cess (be sure to check out the orig­i­nal ar­ti­cle on strid­ed views). This, along with an abil­i­ty to sup­ply any MeshIn­dex­Type and Ver­tex­For­mat gives you al­most un­lim­it­ed2 free­dom of ex­pres­sion. As an ex­am­ple, let’s say you have your po­si­tions as half-floats, nor­mals packed in bytes and a cus­tom per-ver­tex ma­te­ri­al ID at­tribute for de­ferred ren­der­ing, com­plete with pad­ding to en­sure ver­tices are aligned to four-byte ad­dress­es:

struct Vertex {
    Vector3h position;
    Vector2b normal;
    UnsignedShort objectId;

Containers::Array<char> indexData;
Containers::Array<char> vertexData;

Trade::MeshIndexData indices{MeshIndexType::UnsignedShort, indexData};
Trade::MeshData meshData{MeshPrimitive::Triangles,
    std::move(indexData), indices,
    std::move(vertexData), {
            VertexFormat::Vector3h, offsetof(Vertex, position),
            vertexCount, sizeof(Vertex)},
            VertexFormat::Vector2bNormalized, offsetof(Vertex, normal),
            vertexCount, sizeof(Vertex)},
            VertexFormat::UnsignedShort, offsetof(Vertex, objectId),
            vertexCount, sizeof(Vertex)}

The re­sult­ing meshData vari­able is a self-con­tained in­stance con­tain­ing all ver­tex and in­dex da­ta of the mesh. You can then for ex­am­ple pass it di­rect­ly to Mesh­Tools::com­pile() — which will up­load the indexData and vertexData as-is to the GPU with­out any pro­cess­ing, and con­fig­ure it so the builtin shaders can trans­par­ent­ly in­ter­pret the half-floats and nor­mal­ized bytes as 32-bit floats:

GL::Mesh mesh = MeshTools::compile(meshData);

The da­ta isn’t hid­den from you ei­ther — us­ing in­dices() or at­trib­ute() you can di­rect­ly ac­cess the in­dices and par­tic­u­lar at­tributes in a match­ing con­crete type …

Containers::StridedArrayView1D<const UnsignedShort> objectIds =
for(UnsignedShort objectId: objectIds) {
    // …

… and be­cause there’s many pos­si­ble types and not all of them are di­rect­ly us­able (such as the half-floats), there are in­dice­sAsAr­ray(), po­si­tion­s3­DAsAr­ray(), nor­mal­sAsAr­ray() etc. con­ve­nience ac­ces­sors that give you the at­tribute un­packed to a canon­i­cal type so it can be used eas­i­ly in con­texts that as­sume 32-bit floats. For ex­am­ple, cal­cu­lat­ing an AABB of what­ev­er po­si­tion type is just an one­lin­er:

Range3D aabb = Math::minmax(meshData.positions3DAsArray());

Among the evo­lu­tion­ary things, mesh at­tribute sup­port got ex­tend­ed with tan­gents and bi­tan­gents (in both rep­re­sen­ta­tions, ei­ther a four-com­po­nent tan­gent that glTF us­es or a sep­a­rate three-com­po­nent bi­tan­gent that As­simp us­es), and @Squareys is work­ing on adding sup­port for ver­tex weights and joint IDs in mosra/mag­num#441.

^ You still need to obey the lim­i­ta­tions giv­en by the GPU, such as the in­dex buf­fer be­ing con­tigu­ous, all at­tributes hav­ing the same in­dex buf­fer or all faces be­ing tri­an­gles. Un­less you go with mesh­lets.

Tools to help you around

Of course one doesn’t al­ways have da­ta al­ready packed in an ide­al way, and do­ing so by hand is te­dious and er­ror-prone. For that, the Mesh­Tools li­brary got ex­tend­ed with var­i­ous util­i­ties op­er­at­ing di­rect­ly on Trade::Mesh­Da­ta. Here’s how you could use Mesh­Tools::in­ter­leave() to cre­ate the above packed rep­re­sen­ta­tion from a bunch of con­tigu­ous ar­rays, pos­si­bly to­geth­er with Math::pack­In­to(), Math::pack­HalfIn­to() and sim­i­lar. Where pos­si­ble, the ac­tu­al Ver­tex­For­mat is in­ferred from the passed view type:

Containers::ArrayView<const Vector3h> positions;
Containers::ArrayView<const Vector2b> normals;
Containers::ArrayView<const UnsignedShort> objectIds;
Containers::ArrayView<const UnsignedShort> indices;

Trade::MeshData meshData = MeshTools::interleave(
        {}, indices, Trade::MeshIndexData{indices}, UnsignedInt(positions.size())},
    {Trade::MeshAttributeData{Trade::MeshAttribute::Position, positions},
     Trade::MeshAttributeData{Trade::MeshAttribute::Normal, normals},
     Trade::MeshAttributeData{Trade::MeshAttribute::ObjectId, objectIds}}

Thanks to the flex­i­bil­i­ty of Trade::Mesh­Da­ta, many of his­tor­i­cal­ly quite ver­bose op­er­a­tions are now avail­able through sin­gle-ar­gu­ment APIs. Tak­ing a mesh, in­ter­leav­ing its at­tributes, re­mov­ing du­pli­cates and fi­nal­ly pack­ing the in­dex buf­fer to the small­est type that can rep­re­sent giv­en range can be done by chain­ing Mesh­Tools::in­ter­leave(), Mesh­Tools::re­move­Du­pli­cates() and Mesh­Tools::com­pressIndices():

Trade::MeshData optimized = MeshTools::compressIndices(

There’s al­so Mesh­Tools::con­cate­nate() for merg­ing mul­ti­ple mesh­es to­geth­er, Mesh­Tools::gen­er­ateIndices() for con­vert­ing strips, loops and fans to in­dexed lines and tri­an­gles, and oth­ers. Ex­cept for po­ten­tial re­stric­tions com­ing from giv­en al­go­rithm, each of those works on an ar­bi­trary in­stance, be it an in­dexed mesh or not, with any kind of at­tributes.

Apart from the high-lev­el APIs work­ing on Trade::Mesh­Da­ta in­stances, the ex­ist­ing Mesh­Tools al­go­rithms that work di­rect­ly on da­ta ar­rays were port­ed from std::vec­tor to Con­tain­ers::StridedAr­rayView, mean­ing they can be used on a much broad­er range of in­puts.

Bi­na­ry file for­mats make the com­put­er hap­py

With a mesh rep­re­sen­ta­tion match­ing GPU ca­pa­bil­i­ties 1:1, let’s look at a few ex­am­ples of bi­na­ry file for­mats that could make use of it, their flex­i­bil­i­ty and how they per­form.


glTFin­ter­leaved at­trib­utes or not, do what you want as long as in­dices stay con­tigu­ous

The “JPEG of 3D” and its very flex­i­ble bi­na­ry mesh da­ta rep­re­sen­ta­tion was ac­tu­al­ly the ini­tial trig­ger for this work — “what if we could sim­ply mem­o­ry-map the *.glb and ren­der di­rect­ly off it?”. In my opin­ion the cur­rent ver­sion is a bit too lim­it­ed in the choice of ver­tex for­mats (no half-floats, no or float 11.11.10 rep­re­sen­ta­tions for nor­mals and quater­nions), but that’s large­ly due to its goal of be­ing ful­ly com­pat­i­ble with un­ex­tend­ed We­bGL 1 and noth­ing an ex­ten­sion couldn’t fix.

To make use of a broad­er range of new ver­tex for­mats, Mag­num’s TinyGlt­fIm­porter got ex­tend­ed to sup­port the KHR_mesh_quan­ti­za­tion glTF ex­ten­sion, to­geth­er with KHR_­tex­ture_­trans­form, which it de­pends on. Com­pared to the more in­volved com­pres­sion schemes quan­ti­za­tion has the ad­van­tage of not re­quir­ing any de­com­pres­sion step, as the GPU can still un­der­stand the da­ta with­out a prob­lem. A quan­tized mesh will have its po­si­tions, nor­mals and tex­ture co­or­di­nates stored in the small­est pos­si­ble type that can still rep­re­sent the orig­i­nal da­ta with­in rea­son­able er­ror bounds. So for ex­am­ple tex­ture co­or­di­nates in a range of [0.5, 0.8] will get packed to a 8-bit range [0, 255] and off­set + scale need­ed to de­quan­tize them back to orig­i­nal range is then pro­vid­ed through the tex­ture trans­for­ma­tion ma­trix. The size gains vary from mod­el to mod­el and de­pend on the ra­tio be­tween tex­ture and ver­tex da­ta. To show some num­bers, here’s a dif­fer­ence with two mod­els from the glTF-Sam­ple-Mod­els repos­i­to­ry, con­vert­ed us­ing the gltfpack util­i­ty from meshop­ti­miz­er (more on that be­low):

5.0 kB 4.8 kB 64.0 kB 8.0 kB 417.6 kB 417.6 kB 0.0 kB 0.0 kB 106.8 kB 75.6 kB 3416.0 kB 2984.0 kB 0 500 1000 1500 2000 2500 3000 3500 kB Cesium Milk Truck Cesium Milk Truck Reciprocating Saw Reciprocating Saw original *.glb quantized original *.glb quantized Quantization using gltfpack
  • JSON da­ta size
  • im­age da­ta size
  • mesh da­ta size

While packed at­tributes are sup­port­ed by the GPU trans­par­ent­ly, the builtin Shaders::Phong and Shaders::Flat had to be ex­tend­ed to sup­port tex­ture trans­form as well.

Stan­ford PLY

PLY — in­ter­leaved per-ver­tex po­si­tion, nor­mal and col­or da­ta, fol­lowed by size and in­dices of each face

PLY is a very sim­ple, yet sur­pris­ing­ly flex­i­ble and ex­ten­si­ble for­mat. Mag­num has the Stan­fordIm­porter plug­in for years, but fol­low­ing the Trade::Mesh­Da­ta re­design it gained quite a few new fea­tures, among which is sup­port for ver­tex col­ors, nor­mals, tex­ture co­or­di­nates and ob­ject IDs. PLYs al­so sup­port 8- and 16-bit types for ver­tex da­ta, and sim­i­lar­ly to glTF’s KHR_mesh_quan­ti­za­tion sup­port are now im­port­ed as-is, with­out ex­pan­sion to floats.

Be­cause PLYs are so sim­ple and be­cause PLYs are very of­ten used for mas­sive scanned datasets (Stan­ford Bun­ny be­ing the most prom­i­nent of them), I took this as an op­por­tu­ni­ty to in­ves­ti­gate how far can Mag­num re­duce the im­port time, giv­en that it can have the whole chain un­der con­trol. Plot­ted be­low is im­port time of a 613 MB scan mod­el3 with float po­si­tions, 24-bit ver­tex col­ors and a per-face 32-bit ob­ject ID prop­er­ty that is pur­pos­ed­ly ig­nored. Mea­sured times start with the orig­i­nal state be­fore the Trade::Mesh­Da­ta re­work, com­pare As­simpIm­porter and Stan­fordIm­porter con­fig­ured for fastest im­port4 and show the ef­fect of ad­di­tion­al op­ti­miza­tions:

5 6 6 7 8 9 9
7.972 seconds 7.263 seconds 2.551 seconds 2.231 seconds 1.36 seconds 0.875 seconds 0.535 seconds 0.262 seconds 0.13 seconds 0.002 seconds 0 1 2 3 4 5 6 7 8 seconds AssimpImporter + MeshData3D AssimpImporter + MeshData StanfordImporter + MeshData3D StanfordImporter + MeshData3D StanfordImporter + MeshData StanfordImporter + MeshData StanfordImporter + MeshData StanfordImporter + MeshData cat file.ply > /dev/null Magnum's upcoming *.blob format original code new MeshData APIs ⁵ original code w/o iostreams ⁶ new MeshData APIs ⁵ w/ triangle fast path ⁷ one less copy on import ⁸ zerocopy branch ⁹ warm SSD cache meshdata-cereal-killer branch ⁹ Import time, 613 MB Little-Endian PLY, Release
^ a b For As­simpIm­porter, the on-by-de­fault JoinIdenticalVertices, Triangulate and SortByPType pro­cess­ing op­tions were turned off, as those in­crease the im­port time sig­nif­i­cant­ly for large mesh­es. To have a fair com­par­i­son, in case of Stan­fordIm­porter the perFaceToPerVertex op­tion that con­verts per-face at­tributes to per-ver­tex was turned off to match As­simp that ig­nores per-face at­tributes com­plete­ly.
^ In case of Stan­fordIm­porter, the main speedup comes from all push_back()s re­placed with a Util­i­ty::copy(), which is ba­si­cal­ly a fanci­er std::mem­cpy() that works on strid­ed ar­rays as well. As­simpIm­porter in­stead as­sign()ed the whole range at once which is faster, how­ev­er the ab­so­lute speedup was rough­ly the same for both. Un­for­tu­nate­ly not enough for As­simp to be­come sig­nif­i­cant­ly faster. Com­mit mosra/mag­num-plug­ins@79a185b and mosra/mag­num-plug­ins@e67c217.
^ a b The orig­i­nal Stan­fordIm­porter im­ple­men­ta­tion was us­ing std::get­line() to parse the tex­tu­al head­er and std::istream::read() to read the bi­na­ry con­tents. Load­ing the whole file in­to a gi­ant ar­ray first and then op­er­at­ing on that proved to be faster. Com­mit mosra/mag­num-plug­ins@7d654f1.
^ PLY al­lows faces to have ar­bi­trary N-gons, which means an im­porter has to go through each face, check its ver­tex count and tri­an­gu­late if need­ed. I re­al­ized I could de­tect all-tri­an­gle files based sole­ly by com­par­ing face count with file size and then again use Util­i­ty::copy() to copy the sparse tri­an­gle in­dices to a tight­ly packed re­sult­ing ar­ray. Com­mit mosra/mag­num-plug­ins@885ba49.
^ a b c To make plug­in im­ple­men­ta­tion eas­i­er, if a plug­in doesn’t pro­vide a ded­i­cat­ed doOpen­File(), the base im­ple­men­ta­tion reads the file in­to an ar­ray and then pass­es the ar­ray to doOpen­Da­ta(). To­geth­er with as­sump­tions about da­ta own­er­ship it caus­es an ex­tra copy that can be avoid­ed by pro­vid­ing a ded­i­cat­ed doOpen­File() im­ple­men­ta­tion. Com­mit mosra/mag­num-plug­ins@8e21c2f.
^ a b If the im­porter can make a few more as­sump­tions about da­ta own­er­ship, the re­turned mesh da­ta can be ac­tu­al­ly a view on­to the mem­o­ry giv­en on in­put, get­ting rid of an­oth­er copy. There’s still some over­head left from dein­ter­leav­ing the in­dex buf­fer, so it’s not faster than a plain cat. A cus­tom file for­mat al­lows the im­port to be done in 0.002 sec­onds, with the ac­tu­al da­ta read­ing de­ferred to the point where the GPU needs it — and then feed­ing the GPU straight from a (mem­o­ry-mapped) SSD. Nei­ther of those is in­te­grat­ed in­to master yet, see A peek in­to the fu­ture — Mag­num’s own mem­o­ry-map­pable mesh for­mat be­low.

STL (“stere­olithog­ra­phy”)

STL — for each tri­an­gle a nor­mal, three cor­ner po­si­tions and op­tion­al col­or da­ta

The STL for­mat is ex­treme­ly sim­ple — just a list of tri­an­gles, each con­tain­ing a nor­mal and po­si­tions of its cor­ners. It’s com­mon­ly used for 3D print­ing, and thus the in­ter­net is al­so full of in­ter­est­ing huge files for test­ing. Un­til re­cent­ly, Mag­num used As­simpIm­porter to im­port STLs, and to do an­oth­er com­par­i­son I im­ple­ment­ed a StlImporter from scratch. Tak­ing a 104 MB file (source, al­ter­na­tive), here’s the times — As­simpIm­porter is con­fig­ured the same as above4 and sim­i­lar op­ti­miza­tions8 as in Stan­fordIm­porter were done here as well:

8 10
0.329 seconds 0.184 seconds 0.144 seconds 0.087 seconds 0.039 seconds 0.00 0.05 0.10 0.15 0.20 0.25 0.30 seconds AssimpImporter StlImporter StlImporter StlImporter cat file.stl > /dev/null new MeshData APIs initial implementation one less copy on import ⁸ per-face normals ignored ¹⁰ warm SSD cache Import time, 104 MB STL, Release
^ Be­cause the nor­mals are per-tri­an­gle, turn­ing them in­to per-ver­tex in­creas­es the da­ta size rough­ly by a half (in­stead of 16 floats per tri­an­gle it be­comes 24). Dis­abling this (again with a perFaceToPerVertex op­tion) sig­nif­i­cant­ly im­proves im­port time. Com­mit mosra/mag­num-plug­ins@e013040.

MeshOp­ti­miz­er and plug­in in­ter­faces for mesh con­ver­sion

While the Mesh­Tools li­brary pro­vides a ver­sa­tile set of APIs for var­i­ous mesh-re­lat­ed tasks, it’ll nev­er be able to suit the needs of ev­ery­one. Now that there’s a flex­i­ble-enough mesh rep­re­sen­ta­tion, it made sense to ex­tend the builtin en­gine ca­pa­bil­i­ties with ex­ter­nal mesh con­ver­sion plug­ins.

The first mesh pro­cess­ing plug­in is MeshOp­ti­miz­er­SceneCon­vert­er, in­te­grat­ing meshop­ti­miz­er by @zeux­cg. Au­thor of this li­brary is al­so re­spon­si­ble for the KHR_mesh_quan­ti­za­tion ex­ten­sion and it’s all-round a great piece of tech­nol­o­gy. Un­leash­ing the plug­in in its de­fault con­fig on a mesh will per­form the non-de­struc­tive op­er­a­tions — ver­tex cache op­ti­miza­tion, over­draw op­ti­miza­tion and ver­tex fetch op­ti­miza­tion. All those op­er­a­tions can be done in-place on an in­dexed tri­an­gle mesh us­ing con­vert­In­Place():

Containers::Pointer<Trade::AbstractSceneConverter> meshoptimizer =


Okay, now what? This may look like one of those im­pos­si­ble Press to ren­der fast mag­ic but­tons, and since the op­er­a­tion took about a sec­ond at most and didn’t make the out­put small­er in any way, it can’t re­al­ly do won­ders, right? Well, let’s mea­sure, now with a 179 MB scan3 con­tain­ing 7.5 mil­lion tri­an­gles with po­si­tions and ver­tex col­ors, how long it takes to ren­der be­fore and af­ter meshop­ti­miz­er looked at it:

62.52 ms 20.95 ms 11.98 ms 9.91 ms 0 10 20 30 40 50 60 ms Original Optimized Original Optimized Intel 630 Intel 630 AMD Vega M AMD Vega M Rendering 7.5 M triangles, GPU time
0.82 vertex shader invocations / all submitted vertices 0.21 vertex shader invocations / all submitted vertices 0.85 vertex shader invocations / all submitted vertices 0.24 vertex shader invocations / all submitted vertices 0.0 0.2 0.4 0.6 0.8 vertex shader invocations / all submitted vertices Original Optimized Original Optimized Intel 630 Intel 630 AMD Vega M AMD Vega M Rendering 7.5 M triangles, vertex fetch ratio

To sim­u­late a re­al-world sce­nario, the ren­der was de­lib­er­ate­ly done in a de­fault cam­era lo­ca­tion, with a large part of the mod­el be­ing out of the view. Both mea­sure­ments are done us­ing the (al­so re­cent­ly added) De­bug­Tools::GL­Frame­Pro­fil­er, and while GPU time mea­sures the time GPU spent ren­der­ing one frame, ver­tex fetch ra­tio shows how many times a ver­tex shad­er was ex­e­cut­ed com­pared to how many ver­tices were sub­mit­ted in to­tal. For a non-in­dexed tri­an­gle mesh the val­ue would be ex­act­ly 1.0, with in­dexed mesh­es the low­er the val­ue is the bet­ter is ver­tex re­use from the post-trans­form ver­tex cache11. The re­sults are vast­ly dif­fer­ent for dif­fer­ent GPUs, and while meshop­ti­miz­er helped re­duce the amount of ver­tex shad­er in­vo­ca­tions for both equal­ly, it helped main­ly the In­tel GPU. One con­clu­sion could be that the In­tel GPU is bot­tle­necked in ALU pro­cess­ing, while the AMD card not so much and thus re­duc­ing ver­tex shad­er in­vo­ca­tions doesn’t mat­ter that much. That said, the shad­er used here was a sim­ple Shaders::Phong, and the im­pact could be like­ly much big­ger for the AMD card with com­plex PBR shaders.

^ Un­for­tu­nate­ly the AR­B_pipeline_s­tatis­tic­s_­query ex­ten­sion doesn’t pro­vide a way to query the count of in­dices sub­mit­ted, so it’s not pos­si­ble to know the over­fetch ra­tio — how many times the ver­tex shad­er had to be ex­e­cut­ed for a sin­gle ver­tex. This is on­ly pos­si­ble if the sub­mit­ted in­dices would be count­ed on the en­gine side.

Apart from the above, the MeshOp­ti­miz­er­SceneCon­vert­er plug­in can al­so op­tion­al­ly dec­i­mate mesh­es. As that is a de­struc­tive op­er­a­tion, it’s not en­abled by de­fault, but you can en­able and con­fig­ure it us­ing plug­in-spe­cif­ic op­tions:

meshoptimizer->configuration().setValue("simplify", true);
meshoptimizer->configuration().setValue("simplifyTargetIndexCountThreshold", 0.5f);
Containers::Optional<Trade::MeshData> simplified = meshoptimizer->convert(mesh);

To­geth­er with the mesh pro­cess­ing plug­ins, and sim­i­lar­ly to im­age con­vert­ers, there’s a new mag­num-scenecon­vert­er com­mand-line tool that makes it pos­si­ble to use these plug­ins to­geth­er with var­i­ous mesh tools di­rect­ly on scene files. Its use is quite lim­it­ed at this point as the on­ly sup­port­ed out­put for­mat is PLY (via Stan­ford­SceneCon­vert­er) but the tool will grad­u­al­ly be­come more pow­er­ful, with more out­put for­mats. As an ex­am­ple, here it first prints an in­fo about the mesh, then takes just the first at­tribute, dis­card­ing per-face nor­mals, re­moves du­pli­cate ver­tices, pro­cess­es the da­ta with meshop­ti­miz­er on de­fault set­tings and saves the out­put to a PLY:

magnum-sceneconverter dragon.stl --info
Mesh 0:
  Level 0: MeshPrimitive::Triangles, 6509526 vertices (152567.0 kB)
    Offset 0: Trade::MeshAttribute::Position @ VertexFormat::Vector3, stride 24
    Offset 12: Trade::MeshAttribute::Normal @ VertexFormat::Vector3, stride 24
magnum-sceneconverter dragon.stl dragon.ply \
    --only-attributes "0" \
    --remove-duplicates \
    --converter MeshOptimizerSceneConverter -v
Trade::AnySceneImporter::openFile(): using StlImporter
Duplicate removal: 6509526 -> 1084923 vertices
Trade::MeshOptimizerSceneConverter::convert(): processing stats:
  vertex cache:
    5096497 -> 1502463 transformed vertices
    1 -> 1 executed warps
    ACMR 2.34879 -> 0.69243
    ATVR 4.69757 -> 1.38486
  vertex fetch:
    228326592 -> 24462720 bytes fetched
    overfetch 17.5378 -> 1.87899
    107733 -> 102292 shaded pixels
    101514 -> 101514 covered pixels
    overdraw 1.06126 -> 1.00766
Trade::AnySceneConverter::convertToFile(): using StanfordSceneConverter

The -v op­tion trans­lates to Trade::SceneCon­vert­er­Flag::Ver­bose, which is an­oth­er new fea­ture that en­ables plug­ins to print ex­tend­ed in­fo about im­port or pro­cess­ing. In case of MeshOp­ti­miz­er­SceneCon­vert­er it an­a­lyzes the mesh be­fore and af­ter, cal­cu­lat­ing av­er­age cache miss ra­tio, over­draw and oth­er use­ful met­rics for mesh ren­der­ing ef­fi­cien­cy.

Go­ing fur­ther — cus­tom at­trib­utes, face and edge prop­er­ties, mesh­lets

To have the mesh da­ta rep­re­sen­ta­tion tru­ly fu­ture-proofed, it isn’t enough to lim­it its sup­port to just the “clas­si­cal” in­dexed mesh­es with at­tributes of pre­de­fined se­man­tics and a (broad, but hard­cod­ed) set of ver­tex for­mats.

Re­gard­ing ver­tex for­mats, sim­i­lar­ly as is done since 2018.04 for pix­el for­mats, a mesh can con­tain any at­tribute in an im­ple­men­ta­tion-spe­cif­ic for­mat. One ex­am­ple could be nor­mals packed in­to for ex­am­ple VK_­FOR­MAT_A2R10G10B10_S­NOR­M_­PACK­32 (which cur­rent­ly doesn’t have a gener­ic equiv­a­lent in Ver­tex­For­mat) — code that con­sumes the Trade::Mesh­Da­ta in­stance can then un­wrap the im­ple­men­ta­tion-spe­cif­ic ver­tex for­mat and pass it di­rect­ly to the cor­re­spond­ing GPU API. Note that be­cause the li­brary has no way to know any­thing about sizes of im­ple­men­ta­tion-spe­cif­ic for­mats, such in­stances have on­ly lim­it­ed use in Mesh­Tools al­go­rithms.

Trade::MeshAttributeData normals{Trade::MeshAttribute::Normal,
    vertexFormatWrap(VK_FORMAT_A2R10G10B10_UNORM_PACK32), data};

Mesh­es don’t stop with just points, lines or tri­an­gles any­more. To­geth­er with Trade::Ab­strac­tIm­porter::mesh() al­low­ing a sec­ond pa­ram­e­ter spec­i­fy­ing mesh lev­el (sim­i­lar­ly to im­age mip lev­els), this opens new pos­si­bil­i­ties — STL and PLY im­porters al­ready use it to re­tain per-face prop­er­ties, as shown be­low on one of the pbrt-v3 sam­ple scenes:

# Disabling the perFaceToPerVertex option to keep face properties as-is
magnum-sceneconverter dragon_remeshed.ply --info \
    --importer StanfordImporter -i perFaceToPerVertex=false
Mesh 0 (referenced by 0 objects):
  Level 0: MeshPrimitive::Triangles, 924422 vertices (10833.1 kB)
    5545806 indices @ MeshIndexType::UnsignedInt (21663.3 kB)
    Offset 0: Trade::MeshAttribute::Position @ VertexFormat::Vector3, stride 12
  Level 1: MeshPrimitive::Faces, 1848602 vertices (21663.3 kB)
    Offset 0: Trade::MeshAttribute::Normal @ VertexFormat::Vector3, stride 12

Among oth­er pos­si­bil­i­ties is us­ing Mesh­Prim­i­tive::Edges to store mesh­es in half-edge rep­re­sen­ta­tion (the end­less­ly-flex­i­ble PLY for­mat even has sup­port for per-edge da­ta, al­though the im­porter doesn’t sup­port that yet), Mesh­Prim­i­tive::In­stances to store in­stance da­ta (for ex­am­ple to im­ple­ment the pro­posed glTF EX­T_mesh_g­pu_in­stanc­ing ex­ten­sion) or sim­ply pro­vide ad­di­tion­al LOD lev­els (glTF has a MS­FT_lod ex­ten­sion for this).

~ ~ ~

struct meshopt_Meshlet {
    unsigned int vertices[64];
    unsigned char indices[126][3];
    unsigned char triangle_count;
    unsigned char vertex_count;

Source: meshop­ti­miz­er.h

Ul­ti­mate­ly, we’re not lim­it­ed to pre­de­fined prim­i­tive and at­tribute types ei­ther. The most prom­i­nent ex­am­ple of us­ing this new­ly gained flex­i­bil­i­ty is mesh shaders and mesh­lets. Mesh­lets are a tech­nique that is be­com­ing more and more im­por­tant for deal­ing with heavy ge­om­e­try, and meshop­ti­miz­er has an ex­per­i­men­tal sup­port for those12. For giv­en in­put it gen­er­ates a se­quence of stat­i­cal­ly-de­fined meshopt_Meshlet struc­tures that are then meant to be fed straight to the GPU.

De­scrib­ing such da­ta in a Trade::Mesh­Da­ta in­stance is a mat­ter of defin­ing a set of cus­tom at­tribute names and list­ing their off­sets, types and ar­ray sizes, as shown be­low. While a bit ver­bose at first look, an ad­van­tage of be­ing able to spec­i­fy the lay­out dy­nam­i­cal­ly is that the same at­tributes can work for rep­re­sen­ta­tions from oth­er tools as well, such as mesh­lete.

/* Pick any numbers that don't conflict with your other custom attributes */
constexpr auto Meshlet = meshPrimitiveWrap(0xabcd);
constexpr auto MeshletVertices = Trade::meshAttributeCustom(1);
constexpr auto MeshletIndices = Trade::meshAttributeCustom(2);
constexpr auto MeshletTriangleCount = Trade::meshAttributeCustom(3);
constexpr auto MeshletVertexCount = Trade::meshAttributeCustom(4);

Trade::MeshData meshlets{Meshlet, std::move(meshletData), {
    Trade::MeshAttributeData{MeshletVertices, VertexFormat::UnsignedInt,
        offsetof(meshopt_Meshlet, vertices), 0, sizeof(meshopt_Meshlet), 64},
    Trade::MeshAttributeData{MeshletIndices, VertexFormat::Vector3ub,
        offsetof(meshopt_Meshlet, indices), 0, sizeof(meshopt_Meshlet), 126},
    Trade::MeshAttributeData{MeshletTriangleCount, VertexFormat::UnsignedByte,
        offsetof(meshopt_Meshlet, triangle_count), 0, sizeof(meshopt_Meshlet)},
    Trade::MeshAttributeData{MeshletVertexCount, VertexFormat::UnsignedByte,
        offsetof(meshopt_Meshlet, vertex_count), 0, sizeof(meshopt_Meshlet)},
}, meshletCount};

One im­por­tant thing to note is the ar­ray at­trib­utes — those are ac­cessed with a spe­cial syn­tax, and give you a 2D view in­stead of a 1D one:

Containers::StridedArrayView1D<const UnsignedByte> triangleCount =
Containers::StridedArrayView2D<const Vector3ub> indices =
for(Vector3ub triangle: indices[i].prefix(triangleCount[i])) {
    // do something with each triangle of meshlet i …
^ Not men­tioned in the project README, you need to look di­rect­ly in the source. At the time of writ­ing, mesh­let gen­er­a­tion isn’t in­te­grat­ed in­to the MeshOp­ti­miz­er­SceneCon­vert­er plug­in yet — but it’ll be, once I get a hard­ware to test the whole mesh shad­er pipe­line on. If you want to play with them, there’s a re­cent in­tro­duc­tion on Geek­s3D cov­er­ing both OpenGL and Vulkan.

Vi­su­al­ize the hap­pi­ness of your da­ta

When work­ing with mesh da­ta of vary­ing qual­i­ty and com­plex­i­ty, it’s of­ten need­ed to know not on­ly how the mesh ren­ders, but al­so why it ren­ders that way. The Shaders::MeshVi­su­al­iz­er got ex­tend­ed to have both a 2D and 3D vari­ant and it can now vi­su­al­ize not just wire­frame, but al­so tan­gent space13 — use­ful when you need to know why your light­ing or a nor­mal map is off —, ob­ject ID for se­man­tic an­no­ta­tions or for ex­am­ple when you have mul­ti­ple mesh­es batched to­geth­er14, and fi­nal­ly two sim­ple but very im­por­tant prop­er­ties — prim­i­tive and ver­tex ID.

On the truck wheel15 above you can see a very “rain­bowy” prim­i­tive ID vi­su­al­iza­tion, which hints that the ver­tices are not ren­dered in an or­der that would make good use of the ver­tex cache (and which can MeshOp­ti­miz­er­SceneCon­vert­er help with). Ver­tex ID, on the oth­er hand, can point to dis­con­ti­nu­ities in the mesh — even though the blan­ket3 above would look like a smooth con­tin­u­ous mesh to a naked eye, the vi­su­al­iza­tion un­cov­ers that al­most none of the tri­an­gles share a com­mon ver­tex, which will like­ly cause is­sues for ex­am­ple when dec­i­mat­ing the mesh or us­ing it for col­li­sion de­tec­tion.

Sup­ple­men­tary to the mesh vi­su­al­iz­er is a gallery of col­or maps for bal­anced and eas­i­ly rec­og­niz­able vi­su­al­iza­tions. The above im­ages were cre­at­ed us­ing the Tur­bo col­or map and De­bug­Tools::Col­orMap pro­vides four more that you can choose from.

Last­ly, and as al­ready men­tioned above, you’re en­cour­aged to use De­bug­Tools::Frame­Pro­fil­er to mea­sure var­i­ous as­pects of mesh ren­der­er, both on the CPU and GPU side and with builtin sup­port for cus­tom mea­sure­ments and de­layed queries to avoid stalls. Hook­ing up this pro­fil­er doesn’t mean you sud­den­ly need to deal with UI and text ren­der­ing — it can sim­ply print its out­put to a ter­mi­nal as well, re­fresh­ing it­self ev­ery once in a while:

Last 50 frames:
  Frame time: 16.65 ms
  CPU duration: 14.72 ms
  GPU duration: 10.89 ms
  Vertex fetch ratio: 0.24
  Primitives clipped: 59.67 %
^ a b c The Mat­ter­port3D in­door en­vi­ron­ment scans were used as a source for var­i­ous tim­ings, bench­marks and vi­su­al­iza­tions
^ Mod­el source: Lantern from the glTF Sam­ple Mod­els repos­i­to­ry
^ Screen­shot from a se­man­tics-an­no­tat­ed scan from the Repli­ca dataset
^ Mod­el source: Ce­sium Milk Truck from the glTF Sam­ple Mod­els repos­i­to­ry

Ref­er­enc­ing ex­ter­nal da­ta, avoid­ing copies

One of the ubiq­ui­tous an­noy­ing prob­lems when deal­ing with STL con­tain­ers is mem­o­ry man­age­ment in­flex­i­bil­i­ty — you can’t re­al­ly16 con­vince a std::vec­tor to ref­er­ence ex­ter­nal mem­o­ry or, con­verse­ly, re­lease its stor­age and re­use it else­where. The new Trade::Mesh­Da­ta (and Trade::An­i­ma­tion­Da­ta + Trade::Im­age­Da­ta as well, for that mat­ter) learned from past mis­takes and can act as a non-own­ing ref­er­ence to ex­ter­nal in­dex and ver­tex buf­fers as well as at­tribute de­scrip­tions.

For ex­am­ple it’s pos­si­ble store in­dex and ver­tex buf­fer for a par­tic­u­lar mod­el in con­stant mem­o­ry and make Trade::Mesh­Da­ta just ref­er­ence it, with­out any al­lo­ca­tions or copies. In Mag­num it­self this is used by cer­tain prim­i­tives such as Prim­i­tives::cubeSol­id() — since a cube is prac­ti­cal­ly al­ways the same, it doesn’t make sense to build a copy of it in dy­nam­ic mem­o­ry ev­ery time.

An­oth­er thing the API was ex­plic­it­ly de­signed for is shar­ing a sin­gle large buf­fer among mul­ti­ple mesh­es — imag­ine a glTF file con­tain­ing sev­er­al dif­fer­ent mesh­es, but all shar­ing a sin­gle buf­fer that you up­load just once:

/* Shared for all meshes */
Containers::ArrayView<const char> indexData;
Containers::ArrayView<const char> vertexData;
GL::Buffer indices{indexData};
GL::Buffer vertices{indexData};

GL::Mesh chair = MeshTools::compile(chairData, indices, vertices);
GL::Mesh tree = MeshTools::compile(treeData, indices, vertices);
// …

Last­ly, noth­ing pre­vents Trade::Mesh­Da­ta from work­ing in an “in­verse” way — first use it to up­load a GPU buf­fer, and then use the same at­tribute lay­out to con­ve­nient­ly per­form mod­i­fi­ca­tions when the buf­fer gets mapped back to CPU mem­o­ry lat­er.

^ Stan­dard Li­brary de­sign ad­vo­cates would men­tion that you can use a cus­tom al­lo­ca­tor to achieve that. While that’s tech­ni­cal­ly true, it’s not a prac­ti­cal so­lu­tion, con­sid­er­ing the sheer amount of code you need to write for an al­lo­ca­tor (when all you re­al­ly need is a cus­tom deleter). Al­so, have fun con­vinc­ing 3rd par­ty ven­dors that you need all their APIs to ac­cept std::vec­tors with cus­tom al­lo­ca­tors.

A peek in­to the fu­ture — Mag­num’s own mem­o­ry-map­pable mesh for­mat

Ex­pand­ing fur­ther on the above-men­tioned abil­i­ty to ref­er­ence ex­ter­nal da­ta, it’s now pos­si­ble to have Trade::Mesh­Da­ta point­ing di­rect­ly to con­tents of a mem­o­ry-mapped file in a com­pat­i­ble for­mat, achiev­ing a tru­ly ze­ro-copy as­set load­ing. This is, to some ex­tent, pos­si­ble with all three — STL, PLY and glTF — file for­mats men­tioned above. A work-in-progress PR en­abling this is mosra/mag­num#240, what I still need to fig­ure out is in­ter­ac­tion be­tween mem­o­ry own­er­ship and cus­tom file load­ing call­backs; plus in case of glTF it re­quires writ­ing a new im­porter plug­in based on cgltf as TinyGlt­fIm­porter (and tiny_gltf in par­tic­u­lar) can’t re­al­ly be con­vinced to work with ex­ter­nal buf­fers due to its heavy re­liance on std::vec­tors.

At some point I re­al­ized that even with all flex­i­bil­i­ty that glTF pro­vides, it’s still not ide­al due to its re­liance on JSON, which can have a large im­pact on down­load sizes of We­bAssem­bly builds.

What would a min­i­mal­ist file for­mat tai­lored for Mag­num look like, if we re­moved ev­ery­thing that can be re­moved? To avoid com­plex pars­ing and da­ta lo­gis­tics, the file for­mat should be as close to the bi­na­ry rep­re­sen­ta­tion of Trade::Mesh­Da­ta as pos­si­ble, al­low­ing the ac­tu­al pay­load to be used di­rect­ly with­out any pro­cess­ing, and the de­se­ri­al­iza­tion process be­ing just a hand­ful of san­i­ty and range checks. With that, it’s then pos­si­ble to have im­port times small­er than what would a cp file.blob > /dev/null take (as shown above), be­cause we don’t ac­tu­al­ly need to read through all da­ta at first — on­ly when giv­en por­tion of the file is meant to be up­load­ed to the GPU or pro­cessed in some oth­er way:

/* Takes basically no time */
Containers::Array<char, Utility::Directory::MapDeleter> blob =

/* Does a bunch of checks and returns views onto `blob` */
Containers::Optional<Trade::MeshData> chair = Trade::MeshData::deserialize(blob);

An­oth­er as­pect of the for­mat is easy com­pos­abil­i­ty and ex­ten­si­bil­i­ty — in­spired by RIFF and de­sign of the PNG file head­er, it’s com­posed of sized chunks that can be ar­bi­trar­i­ly com­posed to­geth­er, al­low­ing the con­sumer to pick just a sub­set and ig­nore the rest. Pack­ing a bunch of mesh­es of di­verse for­mats to­geth­er in­to a sin­gle file could then look like this:

magnum-sceneconverter file.blend --mesh "chair" chair.blob
magnum-sceneconverter scene.glb --mesh "tree" tree.blob
cat chair.blob tree.blob car.blob > blobs.blob # because why not

Ini­tial work­ing im­ple­men­ta­tion of all the above to­geth­er with de­tailed for­mat spec­i­fi­ca­tion is in mosra/mag­num#427, and the end goal is to be able to de­scribe not just mesh­es but whole scenes. It’s cur­rent­ly liv­ing in a branch be­cause the last thing a file for­mat needs is com­pat­i­bil­i­ty is­sues — it still needs a few more it­er­a­tions be­fore its de­sign set­tles down. This then goes hand-in-hand with ex­tend­ing Trade::Ab­stractSceneCon­vert­er to sup­port more than just mesh­es alone, thus al­so mak­ing it pos­si­ble to out­put glTF files with mag­num-scenecon­vert­er, among oth­er things.

* * *

And that’s it for now. Thanks for read­ing and stay tuned for fur­ther ad­vances in op­ti­miz­ing the as­set pipe­line.

Magnum 2019.10 released

The new re­lease brings Python bind­ings, Ba­sis Uni­ver­sal tex­ture com­pres­sion, im­proved STL in­ter­op­er­abil­i­ty, bet­ter Uni­code ex­pe­ri­ence for Win­dows users, a more ef­fi­cient Em­scripten ap­pli­ca­tion im­ple­men­ta­tion, sin­gle-head­er li­braries, new OpenGL driv­er work­arounds and much more.

Introducing Magnum Python Bindings

Dur­ing the past four months, Mag­num be­gan its ad­ven­ture in­to the Python world. Not just with some au­to­gen­er­at­ed bind­ings and not just with some au­to­gen­er­at­ed Sphinx docs — that sim­ply wouldn’t be Mag­num enough. Brace your­selves, this ar­ti­cle will show you ev­ery­thing.

page 1 | older articles »