What worked, ultimately, was finding operations that have instruction costs O(n**2) in the number of commits and squashing them. At this point a shout-out goes to Julien “FrnchFrgg” Rivaud, a very capable hacker trying to use reposurgeon for some work on the Blender repository. He got interested in the speed problem (the Blender repo is also quite large) and was substantially helpful with both patches and advice. Working together, we memoized some expensive operations and eliminated others, often by incrementally computing reverse-lookup pointers when linking objects together in order to avoid having to traverse the entire repository later on.
Even just finding all the O(n**2) operations isn’t necessarily easy in a language as terse and high-level as Python; they can hide in very innocuous-looking code and method calls. The biggest bad boy in this case turned out to be child-node computation. Fast import streams express “is a child of” directly; for obvious reasons, a repository analysis often has to look at all the children of a given parent. This operation blows up quite badly on very large repositories even if you memoize it; the only way to make it fast is to precompute all the reverse lookups and update them when you update the forward ones.
Another time sink (the last one to get solved) was identifing all tags and resets attached to a particular commit. The brute-force method (look through all tags for any with a from member matching the commit’s mark) is expensive mainly because to look through all tags you have to look through all the events in the stream – and that’s expensive when there are 56K of them. Again, the solution was to give each commit a list of back-pointers to the tags that reference it and make sure all the mutation operations update it properly.
It all came good in the end. In the last benchmarking run before I shipped 2.29 it processed 56424 commits in 303 seconds. That’s 186 commits per second, 11160 per minute. That’s good enough that I plan to lay off serious speed-tuning efforts; the gain probably wouldn’t be worth the increased code complexity.