Zero-waste single-pass packing of power-of-two textures

Rect­angle pack­ing doesn’t ac­tu­ally have to be a NP-hard prob­lem if we don’t need to solve the most gen­er­al case. In this post I present a simple yet op­tim­al al­gorithm for pack­ing power-of-two tex­tures in­to a tex­ture ar­ray.

I don’t claim that I “in­ven­ted a nov­el al­gorithm” here. It’s so simple that the com­put­ing pi­on­eers would def­in­itely have it in­ven­ted back in the 1970s already, but I just couldn’t find any pri­or art. On the oth­er hand, I feel like I need to risk stat­ing the ob­vi­ous, be­cause oth­er­wise the gen­er­al con­sensus seems to be that rect­angle pack­ing is hard no mat­ter what the in­put is. Well, not al­ways.

Back­ground, or why even both­er

I have a few hun­dreds of OBJ, COL­LADA, PLY, etc. mod­el files and the goal is to pack them all in­to a single glTF file op­tim­ized for multi-draw1. Thus, con­tain­ing a single ver­tex and in­dex buf­fer, with meshes be­ing sub-ranges of this buf­fer, and a single 2D ar­ray tex­ture us­ing the pro­posed KHR_­tex­ture_k­tx ex­ten­sion2, with ma­ter­i­als ref­er­en­cing (trans­formed) lay­ers of the ar­ray.

For meshes, con­cat­en­a­tion and bak­ing re­l­at­ive trans­forms in­to them was easy enough with tools like MeshTools::con­cat­en­ate() or SceneTools::flat­t­en­Mesh­H­i­er­archy3D(). For tex­ture pack­ing, how­ever, I had only Tex­tureTools::at­las(), and that’s the most stu­pid and in­ef­fi­cient im­ple­ment­a­tion ima­gin­able3. The ob­vi­ous next step was thus to look in­to re­pla­cing it with some­thing bet­ter — but even the simplest im­ple­ment­a­tions I know of45 are still quite com­plex with a lot of toggles, heur­ist­ics and tradeoffs.

Fur­ther­more, con­sid­er­ing I have hun­dreds of in­put mod­els of which many have 2048×2048 tex­tures, I’d quickly run of 2D tex­ture size lim­its of even the beefi­est GPUs. Tex­ture ar­rays, how­ever, can have as many lay­ers as I need. Which how­ever means al­most all ex­ist­ing rect­angle pack­ing al­gorithms are out of ques­tion, as they op­er­ate only on a single 2D tex­ture.

Look­ing at the data

In­stead of dir­ectly jump­ing onto ad­apt­ing some ex­ist­ing gen­er­al al­gorithm to work with tex­ture ar­rays, I pro­cras­tin­ated away. Stared at the data. And real­ized that ba­sic­ally all mod­els in the set I have fol­low the best prac­tices, and use square power-of-two tex­tures — 64×64, 256×256 and such. Out­liers that didn’t fol­low this were less than one out of 100, thus not that im­port­ant. Those will be handled sep­ar­ately6.

The sim­pler fla­vors of pack­ing al­gorithms first make all im­ages either por­trait or land­cape, sort them by size on one ax­is, and then, start­ing with the largest, pack them in­to the out­put, some­how.

In my case, giv­en that I can as­sume power-of-two squares and can just over­flow to an­oth­er ar­ray lay­er when the pre­vi­ous gets full, the al­gorithm be­comes pretty clear. Im­port­antly, power-of-two squares also make it pos­sible to do the pack­ing in a way that makes all lay­ers ex­cept the last one com­pletely full, with zero wasted space. Such con­straint also helps avoid­ing tex­ture-leak­ing ar­ti­facts when us­ing block com­pres­sion — ex­cept for the smal­lest sizes, a 4×4 block (or 8×8 for cer­tain ASTC formats) will nev­er span two in­de­pend­ent tex­tures.

The al­gorithm

The core of this al­gorithm is main­tain­ing know­ledge of the free space left. As the space is re­curs­ively sub­diving, it looks like a prob­lem quadtrees could solve, but for­tu­nately as we go lin­early from largest to smal­lest, we don’t even need that. All info that we need to main­tain is how many im­ages of cur­rent size can still fit. Best ex­plained with a visu­al ex­ample:

0 1 20 21 22 23 30 310 311 312 313 3210 3211 3212 320
  • Let’s say the out­put size is 2048×2048, and the largest im­age we have is 1024×1024. We can fit four of them in­to the out­put be­fore the lay­er is full and we need to go to the next.
  • Put­ting aside any place­ment rules for now, we place two 1024×1024 im­ages in­to the out­put (labeled 0 and 1), which means it’s just two slots left.
  • Next im­age in the list is 512×512, which has a four times smal­ler area than the pre­vi­ous one. Thus, the leftover space quad­ruples, mean­ing we now have eight slots to fit 512×512 im­ages.
  • There are five 512×512 im­ages (labeled 20 to 23 and 30), after which there’s three slots left.
  • The next im­age size is 256×256, which has a 4× smal­ler area, and thus the three 512×512 slots be­come 12 256×256 slots.

Once an ar­ray lay­er is full, we start a new one, re­cal­cu­lat­ing the num­ber of free slots based on the ra­tio of its area and area of the next im­age to place. So, for ex­ample, we can fit 256 128×128 im­ages to the next lay­er of a 2048×2048 out­put.

Place­ment rules

Cal­cu­lat­ing re­main­ing free space is hope­fully clear, now the ac­tu­al place­ment. Here’s where I fi­nally make use of quadtree ad­dress­ing, but again without build­ing any com­plex in-memory struc­ture. As can be seen above, a par­tic­u­lar square in the quadtree can be thus ad­dressed with a base-4 co­ordin­ate, where first di­git rep­res­ents loc­a­tion in the top level, second in­side giv­en quad­rant of the top level and re­curs­ing fur­ther.

The im­port­ant as­pect of this ad­dress­ing scheme is that the base-4 co­ordin­ate of every im­age is ac­tu­ally the in­dex of the free slot when adding the im­age, and the way it works guar­an­tees no over­lap. I.e., when we ad­ded the first 256×256 im­age above, labeled 310, there were 12 slots left, and (\frac{2048}{256})^2 - 12 = 52 = 310_4 . After adding the im­age, there were just 11 slots left ( 311_4 ), and thus there’s no pos­sib­il­ity for the al­gorithm to re­turn to earli­er slots like 3100_4 to 3103_4 and cause an over­lap.

To con­vert these base-4 num­bers to ac­tu­al tex­ture co­ordin­ates, I don’t have any­thing tan­gible to refer to7, but when rep­res­en­ted as bin­ary, each odd bit is an Y co­ordin­ate (lower or up­per) and each even bit is an X co­ordin­ate (left or right). Then, every two bits the co­ordin­ate range shrinks by half in both di­men­sions, and the res­ult is a sum of these.

Full code

For brev­ity, the fol­low­ing snip­pet as­sumes an already-sor­ted in­put with largest im­ages first, every size be­ing a non-zero power-of-two square, and lay­er size be­ing at least as large as the largest im­age in the set. Out­puts 3D off­sets in the ar­ray tex­ture.

Containers::Array<Vector3i> atlasArrayPowerOfTwo(
    const Vector2i& layerSize, Containers::ArrayView<const Vector2i> sortedSizes
) {
    Containers::Array<Vector3i> output{NoInit, sortedSizes.size()};

    /* Start with the whole first layer free */
    Int layer = 0;
    UnsignedInt free = 1;
    Vector2i previousSize = layerSize;
    for(std::size_t i = 0; i != sortedSizes.size(); ++i) {
        /* No free slots left, go to the next layer. Then, what's free, is one
           whole layer. */
        if(!free) {
            ++layer;
            free = 1;
            previousSize = layerSize;
        }

        /* Multiply number of free slots based on area difference from previous
           size. If the size is the same, nothing changes. */
        free *= (previousSize/sortedSizes[i]).product();

        /* Calculate slot index and coordinates from the slot index */
        const UnsignedInt sideSlotCount = layerSize.x()/sortedSizes[i].x();
        const UnsignedInt layerDepth = Math::log2(sideSlotCount);
        const UnsignedInt slotIndex = sideSlotCount*sideSlotCount - free;
        Vector2i coordinates;
        for(UnsignedInt i = 0; i < layerDepth; ++i) {
            if(slotIndex & (1 << 2*(layerDepth - i - 1)))
                coordinates.x() += layerSize.x() >> (i + 1);
            if(slotIndex & (1 << (2*(layerDepth - i - 1) + 1)))
                coordinates.y() += layerSize.y() >> (i + 1);
        }

        /* Write the final coordinates to the output */
        output[i] = {coordinates, layer};
        previousSize = sortedSizes[i];
        --free;
    }

    return output;
}

En­vir­on­ment­al re­spons­ib­il­ity and waste man­age­ment

The only vari­able af­fect­ing out­put of this al­gorithm is the ar­ray lay­er size. For the least waste in the last lay­er it’s best to set it to size of the largest im­age and not more. Gen­er­ally the more lay­ers you have the less re­l­at­ive waste there is. With 100 lay­ers, if the last lay­er would be con­tain­ing just one 1×1 im­age (which is a rather un­likely worst-case scen­ario), the re­l­at­ive waste is still less than 1%. Prac­tic­ally speak­ing how­ever, if you’d have thou­sands of 64×64 or smal­ler im­ages — say, an icon pack — then it’s bet­ter to waste a bit and choose a big­ger lay­er size to avoid hit­ting hard­ware tex­ture size lim­its.

With a lot of in­put data there are also oth­er factors to con­sider. While in my case there were al­most no non-power-of-two tex­tures, there was quite a few uni­formly colored 2048×2048 im­ages that could be scaled down to 1×1 without any loss in qual­ity.

Con­clu­sion

I don’t claim any break­through in­ven­tion. And since the al­gorithm is neither gen­er­ic8 nor ML-en­abled910, I don’t think it’s cool enough to gain much trac­tion.

But here it is, im­ple­men­ted as Tex­tureTools::at­las­Ar­ray­Po­wer­OfT­wo() in Mag­num. Cheers!

1.
^ In an ideal­ist­ic case be­ing able to sub­mit a single draw call for (a culled sub­set of) the whole scene in­stead of draw­ing every mesh sep­ar­ately, sig­ni­fic­antly re­du­cing over­head.
2.
^ There’s ex­per­i­ment­al KHR_­tex­ture_k­tx sup­port in Glt­fIm­port­er if you are in­ter­ested, avail­able un­der an off-by-de­fault op­tion. A glTF ex­port­er sup­port­ing this ex­ten­sion is cur­rently in a WIP branch and will even­tu­ally be merged too.
3.
^ The cur­rent im­ple­ment­a­tion, dat­ing back to 2012, di­vides the out­put in­to a reg­u­lar grid of cells that are all as large as the largest in­put. Yes, it’s that bad.
4.
^ Team­Hy­per­som­nia/rect­pack­2D, or its dis­tilled ver­sion at https://observablehq.com/@mourner/simple-rectangle-packing
5.
^ juj/Rect­angleBin­Pack
6.
^ Sent back to be re­worked, ideally; if not pos­sible then pad­ded or res­ampled.
7.
^ Back in 2010, when high-speed mo­bile in­ter­net was still some­thing un­fathom­able, I was re­verse-en­gin­eer­ing satel­lite im­agery in Google Maps to down­load it for off­line view­ing on my coun­tryside bike trips. Im­age URLs in one of the lay­ers looked like /elevation/tqrsqrqqtstt.jpg and, hav­ing no pri­or know­ledge about any­thing graph­ics-re­lated, it took me quite a while to fig­ure out what the let­ters meant. Since then I have the quadtree ad­dress­ing scheme forever in­grained in my head.
8.
^ Yes, I still need to im­ple­ment a real rect­angle pack­ing al­gorithm for the gen­er­ic use case. Someday.
9.
^ Ranked Re­ward: En­abling Self-Play Re­in­force­ment Learn­ing for Com­bin­at­or­i­al Op­tim­iz­a­tion, 2018, https://arxiv.org/abs/1807.01672
10.
^ At­tend2Pack: Bin Pack­ing through Deep Re­in­force­ment Learn­ing with At­ten­tion, 2021, https://arxiv.org/abs/2107.04333