# Zero-waste single-pass packing of power-of-two textures

Rectangle packing doesn’t actually have to be a NP-hard problem if we don’t need to solve the most general case. In this post I present a simple yet optimal algorithm for packing power-of-two textures into a texture array.

I don’t claim that I “invented a novel algorithm” here. It’s so simple that the
computing pioneers would definitely have it invented back in the 1970s already,
but I just couldn’t find any prior art. On the other hand, I feel like I need
to risk stating the obvious, because otherwise the general consensus seems to
be that rectangle packing is *hard* no matter what the input is. Well, not
always.

## Background, or why even bother

I have a few hundreds of OBJ, COLLADA, PLY, etc. model files and the goal is to pack them all into a single glTF file optimized for multi-draw1. Thus, containing a single vertex and index buffer, with meshes being sub-ranges of this buffer, and a single 2D array texture using the proposed KHR_texture_ktx extension2, with materials referencing (transformed) layers of the array.

For meshes, concatenation and baking relative transforms into them was easy enough with tools like MeshTools::concatenate() or SceneTools::flattenMeshHierarchy3D(). For texture packing, however, I had only TextureTools::atlas(), and that’s the most stupid and inefficient implementation imaginable3. The obvious next step was thus to look into replacing it with something better — but even the simplest implementations I know of45 are still quite complex with a lot of toggles, heuristics and tradeoffs.

Furthermore, considering I have *hundreds* of input models of which many have
2048×2048 textures, I’d quickly run of 2D texture size limits of even the
beefiest GPUs. Texture arrays, however, can have as many layers as I need.
Which however means almost all existing rectangle packing algorithms are out of
question, as they operate only on a single 2D texture.

## Looking at the data

Instead of directly jumping onto adapting some existing general algorithm to work with texture arrays, I procrastinated away. Stared at the data. And realized that basically all models in the set I have follow the best practices, and use square power-of-two textures — 64×64, 256×256 and such. Outliers that didn’t follow this were less than one out of 100, thus not that important. Those will be handled separately6.

The simpler flavors of packing algorithms first make all images either portrait
or landcape, sort them by size on one axis, and then, starting with the
largest, pack them into the output, *somehow*.

In my case, given that I can assume power-of-two squares and can just overflow to another array layer when the previous gets full, the algorithm becomes pretty clear. Importantly, power-of-two squares also make it possible to do the packing in a way that makes all layers except the last one completely full, with zero wasted space. Such constraint also helps avoiding texture-leaking artifacts when using block compression — except for the smallest sizes, a 4×4 block (or 8×8 for certain ASTC formats) will never span two independent textures.

## The algorithm

The core of this algorithm is maintaining knowledge of the free space left. As the space is recursively subdiving, it looks like a problem quadtrees could solve, but fortunately as we go linearly from largest to smallest, we don’t even need that. All info that we need to maintain is how many images of current size can still fit. Best explained with a visual example:

- Let’s say the output size is 2048×2048, and the largest image we have is 1024×1024. We can fit four of them into the output before the layer is full and we need to go to the next.
- Putting aside any placement rules for now, we place two 1024×1024
images into the output (labeled
**0**and**1**), which means it’s just two slots left. - Next image in the list is 512×512, which has a four times smaller area than the previous one. Thus, the leftover space quadruples, meaning we now have eight slots to fit 512×512 images.
- There are five 512×512 images (labeled
**20**to**23**and**30**), after which there’s three slots left. - The next image size is 256×256, which has a 4× smaller area, and thus the three 512×512 slots become 12 256×256 slots.
- …

Once an array layer is full, we start a new one, recalculating the number of free slots based on the ratio of its area and area of the next image to place. So, for example, we can fit 256 128×128 images to the next layer of a 2048×2048 output.

### Placement rules

Calculating remaining free space is hopefully clear, now the actual placement. Here’s where I finally make use of quadtree addressing, but again without building any complex in-memory structure. As can be seen above, a particular square in the quadtree can be thus addressed with a base-4 coordinate, where first digit represents location in the top level, second inside given quadrant of the top level and recursing further.

The important aspect of this addressing scheme is that the base-4 coordinate of
every image is actually the index of the free slot when adding the image, and
the way it works guarantees no overlap. I.e., when we added the first
256×256 image above, labeled **310**, there were 12 slots left, and
. After adding the image, there
were just 11 slots left (), and thus there’s no possibility for
the algorithm to return to earlier slots like to
and cause an overlap.

To convert these base-4 numbers to actual texture coordinates, I don’t have anything tangible to refer to7, but when represented as binary, each odd bit is an Y coordinate (lower or upper) and each even bit is an X coordinate (left or right). Then, every two bits the coordinate range shrinks by half in both dimensions, and the result is a sum of these.

### Full code

For brevity, the following snippet assumes an already-sorted input with largest images first, every size being a non-zero power-of-two square, and layer size being at least as large as the largest image in the set. Outputs 3D offsets in the array texture.

Containers::Array<Vector3i> atlasArrayPowerOfTwo( const Vector2i& layerSize, Containers::ArrayView<const Vector2i> sortedSizes ) { Containers::Array<Vector3i> output{NoInit, sortedSizes.size()}; /* Start with the whole first layer free */ Int layer = 0; UnsignedInt free = 1; Vector2i previousSize = layerSize; for(std::size_t i = 0; i != sortedSizes.size(); ++i) { /* No free slots left, go to the next layer. Then, what's free, is one whole layer. */ if(!free) { ++layer; free = 1; previousSize = layerSize; } /* Multiply number of free slots based on area difference from previous size. If the size is the same, nothing changes. */ free *= (previousSize/sortedSizes[i]).product(); /* Calculate slot index and coordinates from the slot index */ const UnsignedInt sideSlotCount = layerSize.x()/sortedSizes[i].x(); const UnsignedInt layerDepth = Math::log2(sideSlotCount); const UnsignedInt slotIndex = sideSlotCount*sideSlotCount - free; Vector2i coordinates; for(UnsignedInt i = 0; i < layerDepth; ++i) { if(slotIndex & (1 << 2*(layerDepth - i - 1))) coordinates.x() += layerSize.x() >> (i + 1); if(slotIndex & (1 << (2*(layerDepth - i - 1) + 1))) coordinates.y() += layerSize.y() >> (i + 1); } /* Write the final coordinates to the output */ output[i] = {coordinates, layer}; previousSize = sortedSizes[i]; --free; } return output; }

## Environmental responsibility and waste management

The only variable affecting output of this algorithm is the array layer size. For the least waste in the last layer it’s best to set it to size of the largest image and not more. Generally the more layers you have the less relative waste there is. With 100 layers, if the last layer would be containing just one 1×1 image (which is a rather unlikely worst-case scenario), the relative waste is still less than 1%. Practically speaking however, if you’d have thousands of 64×64 or smaller images — say, an icon pack — then it’s better to waste a bit and choose a bigger layer size to avoid hitting hardware texture size limits.

With a lot of input data there are also other factors to consider. While in my
case there were almost no non-power-of-two textures, there was *quite a few*
uniformly colored 2048×2048 images that could be scaled down to 1×1
without any loss in quality.

## Conclusion

I don’t claim any breakthrough invention. And since the algorithm is neither generic8 nor ML-enabled910, I don’t think it’s cool enough to gain much traction.

But here it is, implemented as TextureTools::atlasArrayPowerOfTwo() in Magnum. Cheers!

- 1.
- ^ In an idealistic case being able to submit a single draw call for (a culled subset of) the whole scene instead of drawing every mesh separately, significantly reducing overhead.
- 2.
- ^ There’s experimental KHR_texture_ktx support in GltfImporter if you are interested, available under an off-by-default option. A glTF exporter supporting this extension is currently in a WIP branch and will eventually be merged too.
- 3.
- ^ The current implementation, dating back to 2012, divides the output into a regular grid of cells that are all as large as the largest input. Yes, it’s that bad.
- 4.
- ^ TeamHypersomnia/rectpack2D, or its distilled version at https://observablehq.com/@mourner/simple-rectangle-packing
- 5.
- ^ juj/RectangleBinPack
- 6.
- ^ Sent back to be reworked, ideally; if not possible then padded or resampled.
- 7.
- ^ Back in 2010, when high-speed mobile internet was still something
unfathomable, I was reverse-engineering satellite imagery in Google Maps to
download it for offline viewing on my countryside bike trips. Image URLs in
one of the layers looked like
`/elevation/tqrsqrqqtstt.jpg`

and, having no prior knowledge about anything graphics-related, it took me quite a while to figure out what the letters meant. Since then I have the quadtree addressing scheme forever ingrained in my head. - 8.
- ^ Yes, I still need to implement a real rectangle packing algorithm for the generic use case. Someday.
- 9.
- ^
*Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization*, 2018, https://arxiv.org/abs/1807.01672 - 10.
- ^
*Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention*, 2021, https://arxiv.org/abs/2107.04333