2025-11-07

Balanced Alphabets: Partitioning A–Z into Optimal Contiguous Buckets for Even Word-Start Distribution

I find it particularly frustrating dividing index cards using alphabetical order, it seems I am not alone. This creates super-large divisions for some letters and almost empty ones for others. Here we take a list of words, count how many words start with each letter A–Z, then find groupings of adjacent letters into N buckets (e.g., 20, 15, 10, 8, 6) and choose a partition into that many contiguous groups that minimizes imbalance for the English language.

The idea is straightforward: we want to split the alphabet into contiguous blocks (like "A–C", "D", "E–G") so each block contains about the same number of words that start with those letters. Doing that by eye is hard because some letters (think S or T) have lots of words, while others (Q, X) have very few, so naive splits leave some boxes overflowing and others nearly empty.

To make the splits fair, we score a candidate grouping by squaring each block’s total and adding those squares together. Squaring does two useful things for us: it makes big blocks much more costly than small ones, and it therefore pushes the algorithm to avoid very large imbalances.

That squared-score has a practical advantage: it’s additive across blocks, so we can use dynamic programming to find the best way to partition the alphabet. Dynamic programming here means we build up the best solutions for the first few letters into best solutions for more letters, remembering the best place at each step. Because there are only 26 letters, this process is fast and exact. The algorithm tries all possible places to end the last block, uses the stored best scores for the earlier letters, and picks the split that gives the lowest total squared score.

When it’s done, you get contiguous letter ranges that minimize this squared-score. Those ranges tend to avoid one oversized group and several tiny ones, producing a more even distribution of starting-word counts.

Our index‑card notebook, shown below, uses 8 subdivisions, or nodes, which provides a balanced allocation for alphabetical filing: with this configuration contiguous letter ranges are arranged so each node holds a comparable number of cards, minimizing oversized and underfilled sections and yielding the most uniform, practical subdivision.

Index Card Notebook

Using a python script, we evaluated multiple ways of partitioning the alphabet into contiguous sections, and measured imbalance using the standard deviation of card counts per node. Normalizing the scores so the best grouping is the reference (normalized SD = 1.00) made comparisons straightforward.

The analysis shows that an 8‑node configuration provides the most balanced allocation for our word distribution.

    Best balance achieved with 8 groups (lowest SD=6432.931).
    A-B             : 43830
    C-D             : 50841
    E-F-G           : 37043
    H-I-J-K-L       : 43736
    M-N-O           : 45945
    P-Q-R           : 53436
    S               : 38764
    T-U-V-W-X-Y-Z   : 56510
    Spread: 19467, SD: 6432.931

For practical use, the letter ranges for each node are already printed in the header of every card, so you can apply this optimal subdivision immediately and maintain an even, easy‑to‑navigate index.