Jeff Hage Jeff Hage

How I delivered a 100x rewrite for $8.50

At my current job, we have a YAML defined layout system that composes react components into “layouts” that are persisted into storage and can be configured/overwritten per tenant. In other words, we have a YAML UI.

The system is currently parsed by an internal ruby gem and then persisted into mysql. On my local machine, configured with just 2 tenants, this parse takes about 3 minutes every single time I need to make a change. Multiplied by the large number of changes required to build a real layout, this results in many hours wasted for a single sprint ticket.

Even worse, running this in production against our ~100 tenants, we end up with over 2 hours of time wasted per deploy, often exceeding the max timeout of our CI provider and requiring a manual exec into ECS to complete.

After a particularly frustrating day with this system, I decided to attempt a rewrite.

Getting started

Although I’ve been in this role for about a year, this system was still largely a black box to me. I needed a quick way to build a base level understanding of the current bottlenecks.

This repository parses yaml files and persists them into our database. It currently takes almost 3 minutes for just 2 tenants. Deeply analyze the source, the tests, and the high level architecture mistakes made. Write your findings to learnings.md

Using this prompt, I learned that there were 3 large issues:

  1. Ruby 2.x
  2. To support the “overrides” feature, which allows us to overwrite or modify shared sections by tenant, the codebase was recalculating every single layout, per tenant. This was a huge problem because we only have 3 tenants that utilize overrides.
  3. Every single lookup/update/insert was done in a individual transactions. Our current component tree has over 2,000 nodes and edges (layouts and components within), resulting in almost 4,500 database round trips per tenant.

Learning how to lead AI agents

With our anti patterns in mind, the next step was to setup a proper feedback loop to keep my AI agents in line.

Blessed with the opportunity to use bleeding edge tech, I landed on:

  • Bun test
  • oxlint
  • oxfmt

Giving agents the ability to frequently and rapidly test themselves and their assumptions proved to be invaluable.

We are rewriting a ruby gem in typescript using the Bun runtime. Read the plan from learnings.md, then, let’s write a script that compares your output to the desired output. I have provided you a folder of input and output mysql dumps. The script should be fault tolerant, idempotent, and fast. Spin up a docker container with compose, insert the expected output sql files into repo_expected, and then our results into repo_actual. Write a script that compares results row by row. Ensure both databases are dropped when done.

To my surprise, the output was impressively good. On the first try, the script was able to output that all 50,000 expected lines were missing from the actual.

Good, now follow learnings.md and write a robust suite of Bun tests. Use —timeout=10 to ensure each test takes no longer than 10ms, —retry-each=3 just in case, and —randomize to prevent any weird setup assumptions early.

If you’re coming from the world of jasmine / jest / vitest 10ms may seem insane, but Bun’s test harness runs in the Zig layer. 10ms per test is actually a bit generous.

Even cooler, Bun has a native cpu profiler that outputs as markdown for AI agents

In the past, I’ve tried and failed many times at getting an AI agent to find something meaningful in a v8 trace dump. This new concise markdown output containing a 50 word “this function takes 90% of the time” is so insanely valuable and magical you won’t believe me until you try it yourself.

Implementation

Now that my agent(s) were equipped with the ability to test code quality, accuracy, and e2e quickly and often, it was time to start implementing.

Use the ruby gem’s readme, learnings.md, and our suite of Bun and e2e tests to build out a multi-step project plan. Use no dependencies. Bun has a simple file api, native yaml parser, and most importantly a native database driver. Continously think about writing the most performant code possible. Use bun run —cpu-prof-md to get a summary of the slowest functions/code paths.

After answering a few questsions, reviewing the reference ruby gem, and modifying the plan, we landed on a pretty good approach.

Phases:

  1. CLI / argv parsing / config validation
  2. Yaml reading / parsing
  3. Calculate root / common tree
  4. Apply owner overwrites
  5. Batch insert results

To my surprise, and DeepSeek V4 Flash’s credit, each phase came out way better than I had expected. I largely left the AI alone to continually test itself. I allowed it to continue on it’s own until a phase was completed, gave it some comments + feedback, then did a git commit to mark our milestone.

This was easily the most success I’ve ever had with AI agents, and has solidified my faith in the strict test/lint/e2e guardrails approach.

Snags

Almost entirely by itself, the agent was able to get us down from 4000 -> 56 remaining differences! However, this is where our progress stagnated and landed ourselves in the sterotypical agent confusion loop.

Since I did not provide the agent the ruby gem source code, it tried many different approaches but could not determine why we kept coming up 56 rows short.

After manually testing myself with both the ruby and Bun code, I realized there was a bug deep in the ruby gem we had never encountered. If an override was present for a tenant, the components were still persisted for the original, even though it’s node was “overwritten”. This was our missing 56 mismatches.

To maintain 1:1 compat, I had the agent create a config property rubyGemCompat with a code path that could reproduce the mistake, enabled by default.

Performance squeezing

It took ~2 days of prompting to get us to 1:1 compat. I had instructed the AI to keep performance in mind at all times, but hadn’t checked it much myself because it was already years ahead of the ruby gem. To it’s credit, we were already 50-60x faster, enough that nobody would ever complain. But… If we made it this far, why not go the extra mile?

With a bit of handholding and improved logging, we determined our actual algorithm was pretty damn fast, 100ms for 100 tenants. The real bottleneck was persisting to mysql, taking almost 14 seconds.

Although this is layout data was best represented as json, we needed to match the current design of squeezing it into a relational schema. This was essentially inserting ~4500 rows * N tenants, which was no small feat.

The cpu-prof-md output pointed right at the argument/string parsing done by drizzle-orm. Even though this was their latest release-candiate version that has JIT compliation, you simply can’t beat native, especially at this size.

To skip the js layer, we decided to just pass our raw objects to Bun’s Sql Driver. This was actually a pretty free win, cutting down from ~14s to ~2s!

From here, I just let the agent run wild and repeat the test + profile + fix loop to no avail. We were shaving code quality for < 10ms improvements. I decided to revert and stop here.

Any further improvements are going to require some different indexing to allow parallel inserts / transactions to the tables, which I knew wouldn’t get approved for a drop-in replacement.

Results

TenantsBeforeAfter
2~3 minutes~2s
100~2 hours~9s