Flex­ible and ef­fi­cient mesh rep­res­ent­a­tion, cus­tom at­trib­utes, new data types and a ton of new pro­cessing, visu­al­iz­a­tion and ana­lyz­ing tools. GPU-friendly geo­metry stor­age as it should be in the 21st cen­tury.

Dur­ing the past six months, Mag­num had un­der­gone a rather massive re­work of its very cent­ral parts — mesh data im­port and stor­age. The ori­gin­al (and now de­prec­ated) Trade::MeshData2D / Trade::MeshData3D classes stayed ba­sic­ally in­tact from the early 2010s when Mag­num was noth­ing more than a toy pro­ject of one bored uni­ver­sity stu­dent, and were over­due for a re­place­ment.

How to not do things

While the GL::Mesh and GL::At­trib­ute on the ren­der­er side provided all ima­gin­able op­tions for data lay­out and ver­tex formats, the flex­ib­il­ity bot­tle­neck was on the im­port­er side. In­creas­ingly un­happy about the lim­it­a­tions, I ended up sug­gest­ing people to just sidestep the Trade APIs and make their own rep­res­ent­a­tion when they needed to do any­thing non-trivi­al. How­ever, work­ing on the re­place­ment, I dis­covered — the hor­ror — that Mag­num was far from the only lib­rary with such lim­it­a­tions em­bed­ded in its design.

explicit MeshData3D(MeshPrimitive primitive,
    std::vector<UnsignedInt> indices,
    std::vector<std::vector<Vector3>> positions,
    std::vector<std::vector<Vector3>> normals,
    std::vector<std::vector<Vector2>> textureCoords2D,
    std::vector<std::vector<Color4>> colors,
    const void* importerState = nullptr);

Source: Mag­num/Trade/Mesh­Data3D.h de­prec­ated

Here is the ori­gin­al Mag­num API. While it al­lowed mul­tiple sets of all at­trib­utes (us­able in mesh morph­ing, for ex­ample), adding a new at­trib­ute type meant adding an­oth­er vec­tor-of-vec­tors (and up­dat­ing calls to this con­struct­or every­where), not to men­tion lack of sup­port any sort of cus­tom at­trib­utes or abil­ity to store dif­fer­ent data types. The importerState is an ex­ten­sion point that al­lows ac­cess­ing ar­bit­rary ad­di­tion­al data, but it’s plu­gin-de­pend­ent and thus not us­able in a gen­er­ic way.

struct aiMesh
{
    aiVector3D* mVertices;
    aiVector3D* mNormals;
    aiVector3D* mTangents;
    aiVector3D* mBitangents;
    aiColor4D* mColors[];
    aiVector3D* mTextureCoords[];
    aiFace* mFaces;
    
};

Source: as­simp/mesh.h

Per­haps the most widely used as­set im­port lib­rary, As­simp, has it very sim­il­ar. All at­trib­utes are tightly packed and in a fixed type, and while it sup­ports a few more at­trib­ute types com­pared to the ori­gin­al Mag­num API, it has no cus­tom at­trib­utes or formats either.

Fixed in­dex and at­trib­ute types mean that in­put data has to be de-in­ter­leaved and ex­pan­ded to 32-bit ints and floats in or­der to be stored here, only to have them in­ter­leaved and packed again later to have ef­fi­cient rep­res­ent­a­tion on the GPU. Both of those rep­res­ent­a­tions also own the data, mean­ing you can’t use them to ref­er­ence ex­tern­al memory (for ex­ample a memory-mapped file or a GPU buf­fer).

The ul­ti­mate win­ner of this con­test, how­ever, is libIGL, with the fol­low­ing func­tion sig­na­ture. Gran­ted, it’s tem­plated to al­low you to choose a dif­fer­ent in­dex and scal­ar type, but you have to choose the type up­front and not based on what the file ac­tu­ally con­tains, which kinda de­feats the pur­pose. What’s the most amaz­ing though is that every po­s­i­tion and nor­mal is a three-com­pon­ent std::vec­tor, every tex­ture co­ordin­ate a two-com­pon­ent vec­tor and then each face is rep­res­en­ted by an­oth­er three vec­tor in­stances. So if you load a 5M-ver­tex mesh with 10M faces (which is not that un­com­mon if you deal with real data), it’ll be spread across 45 mil­lions of al­loc­a­tions. Even with keep­ing all the flex­ib­il­ity It could be just a hand­ful1, but why keep your feet on the ground, right? The std::string passed by value is just a nice touch on top.

template <typename Scalar, typename Index>
IGL_INLINE bool readOBJ(
  const std::string obj_file_name,
  std::vector<std::vector<Scalar > > & V,
  std::vector<std::vector<Scalar > > & TC,
  std::vector<std::vector<Scalar > > & N,
  std::vector<std::vector<Index > > & F,
  std::vector<std::vector<Index > > & FTC,
  std::vector<std::vector<Index > > & FN,
  std::vector<std::tuple<std::string, Index, Index >> &FM
  );

Source: igl/readOBJ.h

1.
^ To be fair, libIGL has an over­load that puts the res­ult in­to just six reg­u­larly-shaped Ei­gen matrices. How­ever, it’s im­ple­men­ted on top of the above (so you still need a mil­it­ary-grade al­loc­at­or) and it re­quires you to know be­fore­hand that all faces in the file have the same size.

Can we do bet­ter?

The ori­gin­al pipeline (and many im­port­er lib­rar­ies as well) got de­signed with an as­sump­tion that a file has to be parsed in or­der to get the geo­metry data out of it. It was a sens­ible de­cision for clas­sic tex­tu­al formats such as OBJ, COL­LADA or OpenGEX, and there was little point in pars­ing those to any­thing else than 32-bit floats and in­tegers. For such formats, a re­l­at­ively massive amount of pro­cessing was needed either way, so a bunch of more cop­ies and data pack­ing at the end didn’t really mat­ter:

The new pipeline turns this as­sump­tion up­side down, and in­stead builds on a simple design goal — be­ing able to un­der­stand any­thing that the GPU can un­der­stand as well. In­ter­leaved data or not, half-floats, packed formats, ar­bit­rary pad­ding and align­ment, cus­tom ap­plic­a­tion-spe­cif­ic at­trib­utes and so on. Then, as­sum­ing a file already has the data ex­actly as we want it, it can simply copy the bin­ary blob over to the GPU and only parse the metadata de­scrib­ing off­sets, strides and formats:

For the tex­tu­al formats (and ri­gidly-de­signed 3rd party im­port­er lib­rar­ies) it means the im­port­er plu­gin now has to do ex­tra work that in­volves pack­ing the data in­to a single buf­fer. But that’s an op­tim­iz­a­tion done on the right side — with in­creas­ing mod­el com­plex­ity it will make less and less sense to store the data in a tex­tu­al format.

Enter the new Mesh­Data

The new Trade::Mesh­Data class ac­cepts just two memory buf­fers — a type­less in­dex buf­fer and a type­less ver­tex buf­fer. The rest is sup­plied as a metadata, with Con­tain­ers::StridedAr­rayView power­ing the data ac­cess (be sure to check out the ori­gin­al art­icle on strided views). This, along with an abil­ity to sup­ply any MeshIndex­Type and Ver­tex­Format gives you al­most un­lim­ited2 free­dom of ex­pres­sion. As an ex­ample, let’s say you have your po­s­i­tions as half-floats, nor­mals packed in bytes and a cus­tom per-ver­tex ma­ter­i­al ID at­trib­ute for de­ferred ren­der­ing, com­plete with pad­ding to en­sure ver­tices are aligned to four-byte ad­dresses:

struct Vertex {
    Vector3h position;
    Vector2b normal;
    UnsignedShort:16;
    UnsignedShort objectId;
};

Containers::Array<char> indexData;
Containers::Array<char> vertexData;

Trade::MeshIndexData indices{MeshIndexType::UnsignedShort, indexData};
Trade::MeshData meshData{MeshPrimitive::Triangles,
    std::move(indexData), indices,
    std::move(vertexData), {
        Trade::MeshAttributeData{Trade::MeshAttribute::Position,
            VertexFormat::Vector3h, offsetof(Vertex, position),
            vertexCount, sizeof(Vertex)},
        Trade::MeshAttributeData{Trade::MeshAttribute::Normal,
            VertexFormat::Vector2bNormalized, offsetof(Vertex, normal),
            vertexCount, sizeof(Vertex)},
        Trade::MeshAttributeData{Trade::MeshAttribute::ObjectId,
            VertexFormat::UnsignedShort, offsetof(Vertex, objectId),
            vertexCount, sizeof(Vertex)}
    }
};

The res­ult­ing meshData vari­able is a self-con­tained in­stance con­tain­ing all ver­tex and in­dex data of the mesh. You can then for ex­ample pass it dir­ectly to MeshTools::com­pile() — which will up­load the indexData and vertexData as-is to the GPU without any pro­cessing, and con­fig­ure it so the built­in shaders can trans­par­ently in­ter­pret the half-floats and nor­mal­ized bytes as 32-bit floats:

GL::Mesh mesh = MeshTools::compile(meshData);
Shaders::Phong{}.draw(mesh);

The data isn’t hid­den from you either — us­ing in­dices() or at­trib­ute() you can dir­ectly ac­cess the in­dices and par­tic­u­lar at­trib­utes in a match­ing con­crete type …

Containers::StridedArrayView1D<const UnsignedShort> objectIds =
    meshData.attribute<UnsignedShort>(Trade::MeshAttribute::ObjectId);
for(UnsignedShort objectId: objectIds) {
    // …
}

… and be­cause there’s many pos­sible types and not all of them are dir­ectly us­able (such as the half-floats), there are in­dicesAs­Ar­ray(), po­s­i­tion­s3­DAsAr­ray(), nor­malsAs­Ar­ray() etc. con­veni­ence ac­cessors that give you the at­trib­ute un­packed to a ca­non­ic­al type so it can be used eas­ily in con­texts that as­sume 32-bit floats. For ex­ample, cal­cu­lat­ing an AABB of whatever po­s­i­tion type is just an oneliner:

Range3D aabb = Math::minmax(meshData.positions3DAsArray());

Among the evol­u­tion­ary things, mesh at­trib­ute sup­port got ex­ten­ded with tan­gents and bit­an­gents (in both rep­res­ent­a­tions, either a four-com­pon­ent tan­gent that glTF uses or a sep­ar­ate three-com­pon­ent bit­an­gent that As­simp uses), and @Squareys is work­ing on adding sup­port for ver­tex weights and joint IDs in mosra/mag­num#441.

2.
^ You still need to obey the lim­it­a­tions giv­en by the GPU, such as the in­dex buf­fer be­ing con­tigu­ous, all at­trib­utes hav­ing the same in­dex buf­fer or all faces be­ing tri­angles. Un­less you go with mesh­lets.

Tools to help you around

Of course one doesn’t al­ways have data already packed in an ideal way, and do­ing so by hand is te­di­ous and er­ror-prone. For that, the MeshTools lib­rary got ex­ten­ded with vari­ous util­it­ies op­er­at­ing dir­ectly on Trade::Mesh­Data. Here’s how you could use MeshTools::in­ter­leave() to cre­ate the above packed rep­res­ent­a­tion from a bunch of con­tigu­ous ar­rays, pos­sibly to­geth­er with Math::pack­In­to(), Math::pack­HalfInto() and sim­il­ar. Where pos­sible, the ac­tu­al Ver­tex­Format is in­ferred from the passed view type:

Containers::ArrayView<const Vector3h> positions;
Containers::ArrayView<const Vector2b> normals;
Containers::ArrayView<const UnsignedShort> objectIds;
Containers::ArrayView<const UnsignedShort> indices;

Trade::MeshData meshData = MeshTools::interleave(
    Trade::MeshData{MeshPrimitive::Triangles,
        {}, indices, Trade::MeshIndexData{indices}, UnsignedInt(positions.size())},
    {Trade::MeshAttributeData{Trade::MeshAttribute::Position, positions},
     Trade::MeshAttributeData{Trade::MeshAttribute::Normal, normals},
     Trade::MeshAttributeData{Trade::MeshAttribute::ObjectId, objectIds}}
);

Thanks to the flex­ib­il­ity of Trade::Mesh­Data, many of his­tor­ic­ally quite verb­ose op­er­a­tions are now avail­able through single-ar­gu­ment APIs. Tak­ing a mesh, in­ter­leav­ing its at­trib­utes, re­mov­ing du­plic­ates and fi­nally pack­ing the in­dex buf­fer to the smal­lest type that can rep­res­ent giv­en range can be done by chain­ing MeshTools::in­ter­leave(), MeshTools::re­move­Du­plic­ates() and MeshTools::com­pressIn­dices():

Trade::MeshData optimized = MeshTools::compressIndices(
                                MeshTools::removeDuplicates(
                                    MeshTools::interleave(mesh)));

There’s also MeshTools::con­cat­en­ate() for mer­ging mul­tiple meshes to­geth­er, MeshTools::gen­er­ateIn­dices() for con­vert­ing strips, loops and fans to in­dexed lines and tri­angles, and oth­ers. Ex­cept for po­ten­tial re­stric­tions com­ing from giv­en al­gorithm, each of those works on an ar­bit­rary in­stance, be it an in­dexed mesh or not, with any kind of at­trib­utes.

Apart from the high-level APIs work­ing on Trade::Mesh­Data in­stances, the ex­ist­ing MeshTools al­gorithms that work dir­ectly on data ar­rays were por­ted from std::vec­tor to Con­tain­ers::StridedAr­rayView, mean­ing they can be used on a much broad­er range of in­puts.

Bin­ary file formats make the com­puter happy

With a mesh rep­res­ent­a­tion match­ing GPU cap­ab­il­it­ies 1:1, let’s look at a few ex­amples of bin­ary file formats that could make use of it, their flex­ib­il­ity and how they per­form.

glTF

glTFin­ter­leaved at­trib­utes or not, do what you want as long as in­dices stay con­tigu­ous

The “JPEG of 3D” and its very flex­ible bin­ary mesh data rep­res­ent­a­tion was ac­tu­ally the ini­tial trig­ger for this work — “what if we could simply memory-map the *.glb and render dir­ectly off it?”. In my opin­ion the cur­rent ver­sion is a bit too lim­ited in the choice of ver­tex formats (no half-floats, no 10.10.10.2 or float 11.11.10 rep­res­ent­a­tions for nor­mals and qua­ternions), but that’s largely due to its goal of be­ing fully com­pat­ible with un­ex­ten­ded WebGL 1 and noth­ing an ex­ten­sion couldn’t fix.

To make use of a broad­er range of new ver­tex formats, Mag­num’s Tiny­Glt­fIm­port­er got ex­ten­ded to sup­port the KHR_mesh_quant­iz­a­tion glTF ex­ten­sion, to­geth­er with KHR_­tex­ture_trans­form, which it de­pends on. Com­pared to the more in­volved com­pres­sion schemes quant­iz­a­tion has the ad­vant­age of not re­quir­ing any de­com­pres­sion step, as the GPU can still un­der­stand the data without a prob­lem. A quant­ized mesh will have its po­s­i­tions, nor­mals and tex­ture co­ordin­ates stored in the smal­lest pos­sible type that can still rep­res­ent the ori­gin­al data with­in reas­on­able er­ror bounds. So for ex­ample tex­ture co­ordin­ates in a range of [0.5, 0.8] will get packed to a 8-bit range [0, 255] and off­set + scale needed to dequant­ize them back to ori­gin­al range is then provided through the tex­ture trans­form­a­tion mat­rix. The size gains vary from mod­el to mod­el and de­pend on the ra­tio between tex­ture and ver­tex data. To show some num­bers, here’s a dif­fer­ence with two mod­els from the glTF-Sample-Mod­els re­pos­it­ory, con­ver­ted us­ing the gltfpack util­ity from mesh­op­tim­izer (more on that be­low):

5.0 kB 4.8 kB 64.0 kB 8.0 kB 417.6 kB 417.6 kB 0.0 kB 0.0 kB 106.8 kB 75.6 kB 3416.0 kB 2984.0 kB 0 500 1000 1500 2000 2500 3000 3500 kB Cesium Milk Truck Cesium Milk Truck Reciprocating Saw Reciprocating Saw original *.glb quantized original *.glb quantized Quantization using gltfpack
  • JSON data size
  • im­age data size
  • mesh data size

While packed at­trib­utes are sup­por­ted by the GPU trans­par­ently, the built­in Shaders::Phong and Shaders::Flat had to be ex­ten­ded to sup­port tex­ture trans­form as well.

Stan­ford PLY

PLY — in­ter­leaved per-ver­tex po­s­i­tion, nor­mal and col­or data, fol­lowed by size and in­dices of each face

PLY is a very simple, yet sur­pris­ingly flex­ible and ex­tens­ible format. Mag­num has the Stan­fordIm­port­er plu­gin for years, but fol­low­ing the Trade::Mesh­Data re­design it gained quite a few new fea­tures, among which is sup­port for ver­tex col­ors, nor­mals, tex­ture co­ordin­ates and ob­ject IDs. PLYs also sup­port 8- and 16-bit types for ver­tex data, and sim­il­arly to glTF’s KHR_mesh_quant­iz­a­tion sup­port are now im­por­ted as-is, without ex­pan­sion to floats.

Be­cause PLYs are so simple and be­cause PLYs are very of­ten used for massive scanned data­sets (Stan­ford Bunny be­ing the most prom­in­ent of them), I took this as an op­por­tun­ity to in­vest­ig­ate how far can Mag­num re­duce the im­port time, giv­en that it can have the whole chain un­der con­trol. Plot­ted be­low is im­port time of a 613 MB scan mod­el3 with float po­s­i­tions, 24-bit ver­tex col­ors and a per-face 32-bit ob­ject ID prop­erty that is pur­posedly ig­nored. Meas­ured times start with the ori­gin­al state be­fore the Trade::Mesh­Data re­work, com­pare As­simpIm­port­er and Stan­fordIm­port­er con­figured for fast­est im­port4 and show the ef­fect of ad­di­tion­al op­tim­iz­a­tions:

5 6 6 7 8 9 9
7.972 seconds 7.263 seconds 2.551 seconds 2.231 seconds 1.36 seconds 0.875 seconds 0.535 seconds 0.262 seconds 0.13 seconds 0.002 seconds 0 1 2 3 4 5 6 7 8 seconds AssimpImporter + MeshData3D AssimpImporter + MeshData StanfordImporter + MeshData3D StanfordImporter + MeshData3D StanfordImporter + MeshData StanfordImporter + MeshData StanfordImporter + MeshData StanfordImporter + MeshData cat file.ply > /dev/null Magnum's upcoming *.blob format original code new MeshData APIs ⁵ original code w/o iostreams ⁶ new MeshData APIs ⁵ w/ triangle fast path ⁷ one less copy on import ⁸ zerocopy branch ⁹ warm SSD cache meshdata-cereal-killer branch ⁹ Import time, 613 MB Little-Endian PLY, Release
4.
^ a b For As­simpIm­port­er, the on-by-de­fault JoinIdenticalVertices, Triangulate and SortByPType pro­cessing op­tions were turned off, as those in­crease the im­port time sig­ni­fic­antly for large meshes. To have a fair com­par­is­on, in case of Stan­fordIm­port­er the perFaceToPerVertex op­tion that con­verts per-face at­trib­utes to per-ver­tex was turned off to match As­simp that ig­nores per-face at­trib­utes com­pletely.
5.
^ In case of Stan­fordIm­port­er, the main spee­dup comes from all push_­back()s re­placed with a Util­ity::copy(), which is ba­sic­ally a fan­ci­er std::mem­cpy() that works on strided ar­rays as well. As­simpIm­port­er in­stead as­sign()ed the whole range at once which is faster, how­ever the ab­so­lute spee­dup was roughly the same for both. Un­for­tu­nately not enough for As­simp to be­come sig­ni­fic­antly faster. Com­mit mosra/mag­num-plu­gins@79a185b and mosra/mag­num-plu­gins@e67c217.
6.
^ a b The ori­gin­al Stan­fordIm­port­er im­ple­ment­a­tion was us­ing std::get­line() to parse the tex­tu­al head­er and std::is­tream::read() to read the bin­ary con­tents. Load­ing the whole file in­to a gi­ant ar­ray first and then op­er­at­ing on that proved to be faster. Com­mit mosra/mag­num-plu­gins@7d654f1.
7.
^ PLY al­lows faces to have ar­bit­rary N-gons, which means an im­port­er has to go through each face, check its ver­tex count and tri­an­gu­late if needed. I real­ized I could de­tect all-tri­angle files based solely by com­par­ing face count with file size and then again use Util­ity::copy() to copy the sparse tri­angle in­dices to a tightly packed res­ult­ing ar­ray. Com­mit mosra/mag­num-plu­gins@885ba49.
8.
^ a b c To make plu­gin im­ple­ment­a­tion easi­er, if a plu­gin doesn’t provide a ded­ic­ated doOpen­File(), the base im­ple­ment­a­tion reads the file in­to an ar­ray and then passes the ar­ray to doOpenData(). To­geth­er with as­sump­tions about data own­er­ship it causes an ex­tra copy that can be avoided by provid­ing a ded­ic­ated doOpen­File() im­ple­ment­a­tion. Com­mit mosra/mag­num-plu­gins@8e21c2f.
9.
^ a b If the im­port­er can make a few more as­sump­tions about data own­er­ship, the re­turned mesh data can be ac­tu­ally a view onto the memory giv­en on in­put, get­ting rid of an­oth­er copy. There’s still some over­head left from dein­ter­leav­ing the in­dex buf­fer, so it’s not faster than a plain cat. A cus­tom file format al­lows the im­port to be done in 0.002 seconds, with the ac­tu­al data read­ing de­ferred to the point where the GPU needs it — and then feed­ing the GPU straight from a (memory-mapped) SSD. Neither of those is in­teg­rated in­to master yet, see A peek in­to the fu­ture — Mag­num’s own memory-map­pable mesh format be­low.

STL (“ste­re­o­litho­graphy”)

STL — for each tri­angle a nor­mal, three corner po­s­i­tions and op­tion­al col­or data

The STL format is ex­tremely simple — just a list of tri­angles, each con­tain­ing a nor­mal and po­s­i­tions of its corners. It’s com­monly used for 3D print­ing, and thus the in­ter­net is also full of in­ter­est­ing huge files for test­ing. Un­til re­cently, Mag­num used As­simpIm­port­er to im­port STLs, and to do an­oth­er com­par­is­on I im­ple­men­ted a StlImport­er from scratch. Tak­ing a 104 MB file (source, al­tern­at­ive), here’s the times — As­simpIm­port­er is con­figured the same as above4 and sim­il­ar op­tim­iz­a­tions8 as in Stan­fordIm­port­er were done here as well:

8 10
0.329 seconds 0.184 seconds 0.144 seconds 0.087 seconds 0.039 seconds 0.00 0.05 0.10 0.15 0.20 0.25 0.30 seconds AssimpImporter StlImporter StlImporter StlImporter cat file.stl > /dev/null new MeshData APIs initial implementation one less copy on import ⁸ per-face normals ignored ¹⁰ warm SSD cache Import time, 104 MB STL, Release
10.
^ Be­cause the nor­mals are per-tri­angle, turn­ing them in­to per-ver­tex in­creases the data size roughly by a half (in­stead of 16 floats per tri­angle it be­comes 24). Dis­abling this (again with a perFaceToPerVertex op­tion) sig­ni­fic­antly im­proves im­port time. Com­mit mosra/mag­num-plu­gins@e013040.

Mesh­Op­tim­izer and plu­gin in­ter­faces for mesh con­ver­sion

While the MeshTools lib­rary provides a ver­sat­ile set of APIs for vari­ous mesh-re­lated tasks, it’ll nev­er be able to suit the needs of every­one. Now that there’s a flex­ible-enough mesh rep­res­ent­a­tion, it made sense to ex­tend the built­in en­gine cap­ab­il­it­ies with ex­tern­al mesh con­ver­sion plu­gins.

The first mesh pro­cessing plu­gin is Mesh­Op­tim­izer­SceneCon­vert­er, in­teg­rat­ing mesh­op­tim­izer by @zeux­cg. Au­thor of this lib­rary is also re­spons­ible for the KHR_mesh_quant­iz­a­tion ex­ten­sion and it’s all-round a great piece of tech­no­logy. Un­leash­ing the plu­gin in its de­fault con­fig on a mesh will per­form the non-de­struct­ive op­er­a­tions — ver­tex cache op­tim­iz­a­tion, over­draw op­tim­iz­a­tion and ver­tex fetch op­tim­iz­a­tion. All those op­er­a­tions can be done in-place on an in­dexed tri­angle mesh us­ing con­vertIn­Place():

Containers::Pointer<Trade::AbstractSceneConverter> meshoptimizer =
    manager.loadAndInstantiate("MeshOptimizerSceneConverter");

meshoptimizer->convertInPlace(mesh);

Okay, now what? This may look like one of those im­possible Press to render fast ma­gic but­tons, and since the op­er­a­tion took about a second at most and didn’t make the out­put smal­ler in any way, it can’t really do won­ders, right? Well, let’s meas­ure, now with a 179 MB scan3 con­tain­ing 7.5 mil­lion tri­angles with po­s­i­tions and ver­tex col­ors, how long it takes to render be­fore and after mesh­op­tim­izer looked at it:

62.52 ms 20.95 ms 11.98 ms 9.91 ms 0 10 20 30 40 50 60 ms Original Optimized Original Optimized Intel 630 Intel 630 AMD Vega M AMD Vega M Rendering 7.5 M triangles, GPU time
0.82 vertex shader invocations / all submitted vertices 0.21 vertex shader invocations / all submitted vertices 0.85 vertex shader invocations / all submitted vertices 0.24 vertex shader invocations / all submitted vertices 0.0 0.2 0.4 0.6 0.8 vertex shader invocations / all submitted vertices Original Optimized Original Optimized Intel 630 Intel 630 AMD Vega M AMD Vega M Rendering 7.5 M triangles, vertex fetch ratio

To sim­u­late a real-world scen­ario, the render was de­lib­er­ately done in a de­fault cam­era loc­a­tion, with a large part of the mod­el be­ing out of the view. Both meas­ure­ments are done us­ing the (also re­cently ad­ded) De­bug­Tools::GLFramePro­filer, and while GPU time meas­ures the time GPU spent ren­der­ing one frame, ver­tex fetch ra­tio shows how many times a ver­tex shader was ex­ecuted com­pared to how many ver­tices were sub­mit­ted in total. For a non-in­dexed tri­angle mesh the value would be ex­actly 1.0, with in­dexed meshes the lower the value is the bet­ter is ver­tex re­use from the post-trans­form ver­tex cache11. The res­ults are vastly dif­fer­ent for dif­fer­ent GPUs, and while mesh­op­tim­izer helped re­duce the amount of ver­tex shader in­voc­a­tions for both equally, it helped mainly the In­tel GPU. One con­clu­sion could be that the In­tel GPU is bot­tle­necked in ALU pro­cessing, while the AMD card not so much and thus re­du­cing ver­tex shader in­voc­a­tions doesn’t mat­ter that much. That said, the shader used here was a simple Shaders::Phong, and the im­pact could be likely much big­ger for the AMD card with com­plex PBR shaders.

11.
^ Un­for­tu­nately the AR­B_pipeline_s­tat­ist­ic­s_query ex­ten­sion doesn’t provide a way to query the count of in­dices sub­mit­ted, so it’s not pos­sible to know the over­fetch ra­tio — how many times the ver­tex shader had to be ex­ecuted for a single ver­tex. This is only pos­sible if the sub­mit­ted in­dices would be coun­ted on the en­gine side.

Apart from the above, the Mesh­Op­tim­izer­SceneCon­vert­er plu­gin can also op­tion­ally decim­ate meshes. As that is a de­struct­ive op­er­a­tion, it’s not en­abled by de­fault, but you can en­able and con­fig­ure it us­ing plu­gin-spe­cif­ic op­tions:

meshoptimizer->configuration().setValue("simplify", true);
meshoptimizer->configuration().setValue("simplifyTargetIndexCountThreshold", 0.5f);
Containers::Optional<Trade::MeshData> simplified = meshoptimizer->convert(mesh);

To­geth­er with the mesh pro­cessing plu­gins, and sim­il­arly to im­age con­vert­ers, there’s a new mag­num-scenecon­vert­er com­mand-line tool that makes it pos­sible to use these plu­gins to­geth­er with vari­ous mesh tools dir­ectly on scene files. Its use is quite lim­ited at this point as the only sup­por­ted out­put format is PLY (via Stan­ford­SceneCon­vert­er) but the tool will gradu­ally be­come more power­ful, with more out­put formats. As an ex­ample, here it first prints an info about the mesh, then takes just the first at­trib­ute, dis­card­ing per-face nor­mals, re­moves du­plic­ate ver­tices, pro­cesses the data with mesh­op­tim­izer on de­fault set­tings and saves the out­put to a PLY:

magnum-sceneconverter dragon.stl --info
Mesh 0:
  Level 0: MeshPrimitive::Triangles, 6509526 vertices (152567.0 kB)
    Offset 0: Trade::MeshAttribute::Position @ VertexFormat::Vector3, stride 24
    Offset 12: Trade::MeshAttribute::Normal @ VertexFormat::Vector3, stride 24
magnum-sceneconverter dragon.stl dragon.ply \
    --only-attributes "0" \
    --remove-duplicates \
    --converter MeshOptimizerSceneConverter -v
Trade::AnySceneImporter::openFile(): using StlImporter
Duplicate removal: 6509526 -> 1084923 vertices
Trade::MeshOptimizerSceneConverter::convert(): processing stats:
  vertex cache:
    5096497 -> 1502463 transformed vertices
    1 -> 1 executed warps
    ACMR 2.34879 -> 0.69243
    ATVR 4.69757 -> 1.38486
  vertex fetch:
    228326592 -> 24462720 bytes fetched
    overfetch 17.5378 -> 1.87899
  overdraw:
    107733 -> 102292 shaded pixels
    101514 -> 101514 covered pixels
    overdraw 1.06126 -> 1.00766
Trade::AnySceneConverter::convertToFile(): using StanfordSceneConverter

The -v op­tion trans­lates to Trade::SceneCon­vert­er­Flag::Verb­ose, which is an­oth­er new fea­ture that en­ables plu­gins to print ex­ten­ded info about im­port or pro­cessing. In case of Mesh­Op­tim­izer­SceneCon­vert­er it ana­lyzes the mesh be­fore and after, cal­cu­lat­ing av­er­age cache miss ra­tio, over­draw and oth­er use­ful met­rics for mesh ren­der­ing ef­fi­ciency.

Go­ing fur­ther — cus­tom at­trib­utes, face and edge prop­er­ties, mesh­lets

To have the mesh data rep­res­ent­a­tion truly fu­ture-proofed, it isn’t enough to lim­it its sup­port to just the “clas­sic­al” in­dexed meshes with at­trib­utes of pre­defined se­mantics and a (broad, but hard­coded) set of ver­tex formats.

Re­gard­ing ver­tex formats, sim­il­arly as is done since 2018.04 for pixel formats, a mesh can con­tain any at­trib­ute in an im­ple­ment­a­tion-spe­cif­ic format. One ex­ample could be nor­mals packed in­to for ex­ample VK_­FORM­AT_A2R10G10B10_S­NORM_PACK­32 (which cur­rently doesn’t have a gen­er­ic equi­val­ent in Ver­tex­Format) — code that con­sumes the Trade::Mesh­Data in­stance can then un­wrap the im­ple­ment­a­tion-spe­cif­ic ver­tex format and pass it dir­ectly to the cor­res­pond­ing GPU API. Note that be­cause the lib­rary has no way to know any­thing about sizes of im­ple­ment­a­tion-spe­cif­ic formats, such in­stances have only lim­ited use in MeshTools al­gorithms.

Trade::MeshAttributeData normals{Trade::MeshAttribute::Normal,
    vertexFormatWrap(VK_FORMAT_A2R10G10B10_UNORM_PACK32), data};

Meshes don’t stop with just points, lines or tri­angles any­more. To­geth­er with Trade::Ab­strac­tIm­port­er::mesh() al­low­ing a second para­met­er spe­cify­ing mesh level (sim­il­arly to im­age mip levels), this opens new pos­sib­il­it­ies — STL and PLY im­port­ers already use it to re­tain per-face prop­er­ties, as shown be­low on one of the pbrt-v3 sample scenes:

# Disabling the perFaceToPerVertex option to keep face properties as-is
magnum-sceneconverter dragon_remeshed.ply --info \
    --importer StanfordImporter -i perFaceToPerVertex=false
Mesh 0 (referenced by 0 objects):
  Level 0: MeshPrimitive::Triangles, 924422 vertices (10833.1 kB)
    5545806 indices @ MeshIndexType::UnsignedInt (21663.3 kB)
    Offset 0: Trade::MeshAttribute::Position @ VertexFormat::Vector3, stride 12
  Level 1: MeshPrimitive::Faces, 1848602 vertices (21663.3 kB)
    Offset 0: Trade::MeshAttribute::Normal @ VertexFormat::Vector3, stride 12

Among oth­er pos­sib­il­it­ies is us­ing MeshPrim­it­ive::Edges to store meshes in half-edge rep­res­ent­a­tion (the end­lessly-flex­ible PLY format even has sup­port for per-edge data, al­though the im­port­er doesn’t sup­port that yet), MeshPrim­it­ive::In­stances to store in­stance data (for ex­ample to im­ple­ment the pro­posed glTF EX­T_mesh_g­pu_in­stan­cing ex­ten­sion) or simply provide ad­di­tion­al LOD levels (glTF has a MSFT_lod ex­ten­sion for this).

~ ~ ~

struct meshopt_Meshlet {
    unsigned int vertices[64];
    unsigned char indices[126][3];
    unsigned char triangle_count;
    unsigned char vertex_count;
};

Source: mesh­op­tim­izer.h

Ul­ti­mately, we’re not lim­ited to pre­defined prim­it­ive and at­trib­ute types either. The most prom­in­ent ex­ample of us­ing this newly gained flex­ib­il­ity is mesh shaders and mesh­lets. Mesh­lets are a tech­nique that is be­com­ing more and more im­port­ant for deal­ing with heavy geo­metry, and mesh­op­tim­izer has an ex­per­i­ment­al sup­port for those12. For giv­en in­put it gen­er­ates a se­quence of stat­ic­ally-defined meshopt_Meshlet struc­tures that are then meant to be fed straight to the GPU.

De­scrib­ing such data in a Trade::Mesh­Data in­stance is a mat­ter of de­fin­ing a set of cus­tom at­trib­ute names and list­ing their off­sets, types and ar­ray sizes, as shown be­low. While a bit verb­ose at first look, an ad­vant­age of be­ing able to spe­cify the lay­out dy­nam­ic­ally is that the same at­trib­utes can work for rep­res­ent­a­tions from oth­er tools as well, such as mesh­lete.

/* Pick any numbers that don't conflict with your other custom attributes */
constexpr auto Meshlet = meshPrimitiveWrap(0xabcd);
constexpr auto MeshletVertices = Trade::meshAttributeCustom(1);
constexpr auto MeshletIndices = Trade::meshAttributeCustom(2);
constexpr auto MeshletTriangleCount = Trade::meshAttributeCustom(3);
constexpr auto MeshletVertexCount = Trade::meshAttributeCustom(4);

Trade::MeshData meshlets{Meshlet, std::move(meshletData), {
    Trade::MeshAttributeData{MeshletVertices, VertexFormat::UnsignedInt,
        offsetof(meshopt_Meshlet, vertices), 0, sizeof(meshopt_Meshlet), 64},
    Trade::MeshAttributeData{MeshletIndices, VertexFormat::Vector3ub,
        offsetof(meshopt_Meshlet, indices), 0, sizeof(meshopt_Meshlet), 126},
    Trade::MeshAttributeData{MeshletTriangleCount, VertexFormat::UnsignedByte,
        offsetof(meshopt_Meshlet, triangle_count), 0, sizeof(meshopt_Meshlet)},
    Trade::MeshAttributeData{MeshletVertexCount, VertexFormat::UnsignedByte,
        offsetof(meshopt_Meshlet, vertex_count), 0, sizeof(meshopt_Meshlet)},
}, meshletCount};

One im­port­ant thing to note is the ar­ray at­trib­utes — those are ac­cessed with a spe­cial syn­tax, and give you a 2D view in­stead of a 1D one:

Containers::StridedArrayView1D<const UnsignedByte> triangleCount =
    meshlets.attribute<UnsignedByte>(MeshletTriangleCount);
Containers::StridedArrayView2D<const Vector3ub> indices =
    meshlets.attribute<Vector3ub[]>(MeshletIndices);
for(Vector3ub triangle: indices[i].prefix(triangleCount[i])) {
    // do something with each triangle of meshlet i …
}
12.
^ Not men­tioned in the pro­ject README, you need to look dir­ectly in the source. At the time of writ­ing, mesh­let gen­er­a­tion isn’t in­teg­rated in­to the Mesh­Op­tim­izer­SceneCon­vert­er plu­gin yet — but it’ll be, once I get a hard­ware to test the whole mesh shader pipeline on. If you want to play with them, there’s a re­cent in­tro­duc­tion on Geek­s3D cov­er­ing both OpenGL and Vulkan.

Visu­al­ize the hap­pi­ness of your data

When work­ing with mesh data of vary­ing qual­ity and com­plex­ity, it’s of­ten needed to know not only how the mesh renders, but also why it renders that way. The Shaders::MeshVisu­al­izer got ex­ten­ded to have both a 2D and 3D vari­ant and it can now visu­al­ize not just wire­frame, but also tan­gent space13 — use­ful when you need to know why your light­ing or a nor­mal map is off —, ob­ject ID for se­mant­ic an­nota­tions or for ex­ample when you have mul­tiple meshes batched to­geth­er14, and fi­nally two simple but very im­port­ant prop­er­ties — prim­it­ive and ver­tex ID.

On the truck wheel15 above you can see a very “rain­bowy” prim­it­ive ID visu­al­iz­a­tion, which hints that the ver­tices are not rendered in an or­der that would make good use of the ver­tex cache (and which can Mesh­Op­tim­izer­SceneCon­vert­er help with). Ver­tex ID, on the oth­er hand, can point to dis­con­tinu­it­ies in the mesh — even though the blanket3 above would look like a smooth con­tinu­ous mesh to a na­ked eye, the visu­al­iz­a­tion un­cov­ers that al­most none of the tri­angles share a com­mon ver­tex, which will likely cause is­sues for ex­ample when decim­at­ing the mesh or us­ing it for col­li­sion de­tec­tion.

Sup­ple­ment­ary to the mesh visu­al­izer is a gal­lery of col­or maps for bal­anced and eas­ily re­cog­niz­able visu­al­iz­a­tions. The above im­ages were cre­ated us­ing the Turbo col­or map and De­bug­Tools::ColorMap provides four more that you can choose from.

Lastly, and as already men­tioned above, you’re en­cour­aged to use De­bug­Tools::FramePro­filer to meas­ure vari­ous as­pects of mesh ren­der­er, both on the CPU and GPU side and with built­in sup­port for cus­tom meas­ure­ments and delayed quer­ies to avoid stalls. Hook­ing up this pro­filer doesn’t mean you sud­denly need to deal with UI and text ren­der­ing — it can simply print its out­put to a ter­min­al as well, re­fresh­ing it­self every once in a while:

Last 50 frames:
  Frame time: 16.65 ms
  CPU duration: 14.72 ms
  GPU duration: 10.89 ms
  Vertex fetch ratio: 0.24
  Primitives clipped: 59.67 %
3.
^ a b c The Mat­ter­port3D in­door en­vir­on­ment scans were used as a source for vari­ous tim­ings, bench­marks and visu­al­iz­a­tions
13.
^ Mod­el source: Lan­tern from the glTF Sample Mod­els re­pos­it­ory
14.
^ Screen­shot from a se­mantics-an­not­ated scan from the Rep­lica data­set
15.
^ Mod­el source: Cesi­um Milk Truck from the glTF Sample Mod­els re­pos­it­ory

Ref­er­en­cing ex­tern­al data, avoid­ing cop­ies

One of the ubi­quit­ous an­noy­ing prob­lems when deal­ing with STL con­tain­ers is memory man­age­ment in­flex­ib­il­ity — you can’t really16 con­vince a std::vec­tor to ref­er­ence ex­tern­al memory or, con­versely, re­lease its stor­age and re­use it else­where. The new Trade::Mesh­Data (and Trade::An­im­a­tionData + Trade::Im­ageData as well, for that mat­ter) learned from past mis­takes and can act as a non-own­ing ref­er­ence to ex­tern­al in­dex and ver­tex buf­fers as well as at­trib­ute de­scrip­tions.

For ex­ample it’s pos­sible store in­dex and ver­tex buf­fer for a par­tic­u­lar mod­el in con­stant memory and make Trade::Mesh­Data just ref­er­ence it, without any al­loc­a­tions or cop­ies. In Mag­num it­self this is used by cer­tain prim­it­ives such as Prim­it­ives::cube­Sol­id() — since a cube is prac­tic­ally al­ways the same, it doesn’t make sense to build a copy of it in dy­nam­ic memory every time.

An­oth­er thing the API was ex­pli­citly de­signed for is shar­ing a single large buf­fer among mul­tiple meshes — ima­gine a glTF file con­tain­ing sev­er­al dif­fer­ent meshes, but all shar­ing a single buf­fer that you up­load just once:

/* Shared for all meshes */
Containers::ArrayView<const char> indexData;
Containers::ArrayView<const char> vertexData;
GL::Buffer indices{indexData};
GL::Buffer vertices{indexData};

GL::Mesh chair = MeshTools::compile(chairData, indices, vertices);
GL::Mesh tree = MeshTools::compile(treeData, indices, vertices);
// …

Lastly, noth­ing pre­vents Trade::Mesh­Data from work­ing in an “in­verse” way — first use it to up­load a GPU buf­fer, and then use the same at­trib­ute lay­out to con­veni­ently per­form modi­fic­a­tions when the buf­fer gets mapped back to CPU memory later.

16.
^ Stand­ard Lib­rary design ad­voc­ates would men­tion that you can use a cus­tom al­loc­at­or to achieve that. While that’s tech­nic­ally true, it’s not a prac­tic­al solu­tion, con­sid­er­ing the sheer amount of code you need to write for an al­loc­at­or (when all you really need is a cus­tom de­leter). Also, have fun con­vin­cing 3rd party vendors that you need all their APIs to ac­cept std::vec­tors with cus­tom al­loc­at­ors.

A peek in­to the fu­ture — Mag­num’s own memory-map­pable mesh format

Ex­pand­ing fur­ther on the above-men­tioned abil­ity to ref­er­ence ex­tern­al data, it’s now pos­sible to have Trade::Mesh­Data point­ing dir­ectly to con­tents of a memory-mapped file in a com­pat­ible format, achiev­ing a truly zero-copy as­set load­ing. This is, to some ex­tent, pos­sible with all three — STL, PLY and glTF — file formats men­tioned above. A work-in-pro­gress PR en­abling this is mosra/mag­num#240, what I still need to fig­ure out is in­ter­ac­tion between memory own­er­ship and cus­tom file load­ing call­backs; plus in case of glTF it re­quires writ­ing a new im­port­er plu­gin based on cgltf as Tiny­Glt­fIm­port­er (and tiny_gltf in par­tic­u­lar) can’t really be con­vinced to work with ex­tern­al buf­fers due to its heavy re­li­ance on std::vec­tors.

At some point I real­ized that even with all flex­ib­il­ity that glTF provides, it’s still not ideal due to its re­li­ance on JSON, which can have a large im­pact on down­load sizes of WebAssembly builds.

What would a min­im­al­ist file format tailored for Mag­num look like, if we re­moved everything that can be re­moved? To avoid com­plex pars­ing and data lo­gist­ics, the file format should be as close to the bin­ary rep­res­ent­a­tion of Trade::Mesh­Data as pos­sible, al­low­ing the ac­tu­al pay­load to be used dir­ectly without any pro­cessing, and the deseri­al­iz­a­tion pro­cess be­ing just a hand­ful of san­ity and range checks. With that, it’s then pos­sible to have im­port times smal­ler than what would a cp file.blob > /dev/null take (as shown above), be­cause we don’t ac­tu­ally need to read through all data at first — only when giv­en por­tion of the file is meant to be up­loaded to the GPU or pro­cessed in some oth­er way:

/* Takes basically no time */
Containers::Array<char, Utility::Directory::MapDeleter> blob =
    Utility::Directory::mapRead("file.blob");

/* Does a bunch of checks and returns views onto `blob` */
Containers::Optional<Trade::MeshData> chair = Trade::MeshData::deserialize(blob);

An­oth­er as­pect of the format is easy com­pos­ab­il­ity and ex­tens­ib­il­ity — in­spired by RIFF and design of the PNG file head­er, it’s com­posed of sized chunks that can be ar­bit­rar­ily com­posed to­geth­er, al­low­ing the con­sumer to pick just a sub­set and ig­nore the rest. Pack­ing a bunch of meshes of di­verse formats to­geth­er in­to a single file could then look like this:

magnum-sceneconverter file.blend --mesh "chair" chair.blob
magnum-sceneconverter scene.glb --mesh "tree" tree.blob
cat chair.blob tree.blob car.blob > blobs.blob # because why not

Ini­tial work­ing im­ple­ment­a­tion of all the above to­geth­er with de­tailed format spe­cific­a­tion is in mosra/mag­num#427, and the end goal is to be able to de­scribe not just meshes but whole scenes. It’s cur­rently liv­ing in a branch be­cause the last thing a file format needs is com­pat­ib­il­ity is­sues — it still needs a few more it­er­a­tions be­fore its design settles down. This then goes hand-in-hand with ex­tend­ing Trade::Ab­stractS­ceneCon­vert­er to sup­port more than just meshes alone, thus also mak­ing it pos­sible to out­put glTF files with mag­num-scenecon­vert­er, among oth­er things.

* * *

And that’s it for now. Thanks for read­ing and stay tuned for fur­ther ad­vances in op­tim­iz­ing the as­set pipeline.