Back again with another update on Halcyon Reach, my agentically engineered EVE Online-style MMO I've been building. As I've said in my previous posts, one of the most interesting parts of this project for me is seeing how far and how fast I can go using agents to do most of the heavy code work.

Cartograph: a custom static analysis tool

In an effort to improve this flow, something I built this past stretch is a custom static analysis tool, which I'm calling Cartograph. The idea behind it started with noticing what my most frequent comments to the AI were actually about.

One of humans' most valuable assets is in pattern recognition. Our brains are capable of recognizing an uncountable number of patterns. A lot of my comments when working with agents end up being about identifying these patterns and smells like:

  • Can we be DRY here?
  • Can we achieve type safety here?
  • How can we build better contracts to avoid mistakes here?
  • How can we refactor this into an adapter pattern?

It's really just a lot of predicting future failure points. So I wanted a tool that could surface those areas for me before I even had to ask. I wanted something that could guide my eye and make sure I don't miss things hidden in the diff.

Cartograph is a static analysis tool. It parses the Rust server and the GDScript client source separately, and builds a single unified graph of how code on one side gets invoked by code on the other. As I've written in my previous two posts, one of the bigger ongoing challenges in this project is the continued maintenance of the Rust-to-Godot language boundary. Being able to see, at a glance, how a change in a Rust system propagates over to the Godot client has been really useful.

The Cartograph viewer: a force-directed graph showing nodes for files, signals, commands, and wire messages, colored by layer.
The Cartograph viewer in action. Nodes are commands, signals, wire messages, and the files/systems they live in. The sidebar shows orphan notes, the kind of dead-code findings that would otherwise sit in the repo for weeks.

On top of that, Cartograph surfaces missed connections. If there's a wire message defined in Rust that isn't consumed on the Godot side, or a CommandBus dispatch on the client that has no matching server handler, Cartograph flags it as an orphan. That's the kind of thing that would otherwise sit in the codebase silently for weeks, and it's exactly the kind of bug agents tend to introduce when they delete the last consumer of something without realizing it was the last one.

I want to keep extending this and use it to move faster with agents. A representative slice of Cartograph's output lives alongside this post if you want to see the raw graph it builds.

Wire report: making bandwidth visible at review time

Something similar I worked on is called the wire report. It was inspired by a tweet from Mitchell Hashimoto about the danger of trusting impressive-looking agent output without real systems understanding to back it up:

Mitchell Hashimoto @mitchellh

I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem.

As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)!

I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work.

It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results?

88ms => 1.5ms
150K allocs => ~500 allocs

Incredible right? Nope.

My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path.

This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput.

The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity.

Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.

Something specific about an MMO, especially an EVE-Online-style MMO where hundreds or potentially thousands of players engage in the same battle in the same zone, is the need to stay very aware of server load and the data flowing over the wire. The Hashimoto quote really resonated with me. I understand the intent behind his post, but I also took it in some ways as a kind of challenge. Not that I think anything I build will give an agent Mitchell-level intuition. Rather, his post pushed me to make sure performance is a concern on every single change that I ship.

So I built the wire report. It runs an analysis of all of the data sent over the wire on each replication channel, and projects what that data would look like across realistic gameplay scenarios. For every PR, the wire report serializes a representative payload on every channel at five scale points (solo idle, roaming, brawl, hub station, big fight), then posts the per-channel sizes and the resulting per-client throughput as a sticky comment.

The wire_report sticky comment on a pull request, showing per-channel byte sizes across five scenarios and the resulting per-client and server-total throughput.
The full wire report on a PR. Per-channel atomic byte sizes across five scenarios, then the throughput math at each scale point. A big fight pushes 200 KB/s out of the server.

The intent is not to gate anything. It's to put the bandwidth question in front of me at review time, so questions like "is this feature worth those bytes?" or "does this scenario fall over at scale?" are ones that myself and my agents have to consider before merging.

CI as the substrate

Both Cartograph and the wire report are wired into the CI for Halcyon Reach. Every PR gets a sticky comment with the architecture diff from Cartograph and the per-channel byte-size diff from the wire report. The result is a historical record, attached to each PR, of how the codebase has shifted at both the structural and the bandwidth levels. Future me will be able to look back at any PR and see what its actual systemic cost was, not just its line count.

The Cartograph sticky comment on a pull request, showing nodes and edges added or removed and orphan notes newly introduced or resolved.
The Cartograph architecture diff on a PR. Nodes added, links added, orphan notes opened and resolved. The reviewer (me) gets a structural delta alongside the line-level diff.

Reactor: bringing reactive UI to Godot

On an unrelated note, another thing I've been thinking a lot about is the UI. I'm historically a React developer. I started using React back in 2016, and before that I worked a lot with Vue.js. I've always really enjoyed building user interfaces, and the comfort of having two-way binding is something I know scales really well into complex UIs.

Godot doesn't have anything similar out of the box. The default Godot UI pattern is closer to jQuery: you wire up mouse_entered, mouse_exited, pressed, a 10 Hz update_state callback, and so on, and each handler imperatively writes to the widget's visual properties. That's fine for a single button. It gets noisy fast at the scale of an MMO HUD where every row reads from a half-dozen pieces of state and has to react to hover, selection, focus, fade, and live network data all at once, with imperative handlers stepping on each other's writes everywhere.

I took inspiration from Spark, a Godot addon that implements Svelte-runes-style reactive primitives in pure GDScript. With a system like this, you declare reactive state, derive computed values from it, and bind that state directly to UI properties. When the state changes, the UI updates. No manual refresh calls, no per-handler choreography. I built my own Spark-inspired version, tailored to Halcyon Reach's idioms, and I'm calling it Reactor in homage to React.

A widget written this way reads close to a React component. Reactive state at the top, a single pure _render() function that derives a visual snapshot from it, and one _apply() that writes the result to Godot's widget properties:

var _is_hovered := R.bool(false)
var _data_name_col := R.color(Color.WHITE)

@onready var ui := R.attach(self)

func _ready():
    ui.effect(func(): _apply(_render()))

func _on_mouse_entered():
    _is_hovered.value = true

func _render() -> Dictionary:
    return {
        "name_color": OverviewRail.COL_TEXT_ACTIVE if _is_hovered.value else _data_name_col.value,
    }

func _apply(s: Dictionary) -> void:
    _name_lbl.add_theme_color_override("font_color", s.name_color)

As a side benefit, the class of bug where two handlers race each other over a theme color override (which Godot defaults you into) stops being expressible: there is no syntactically valid path to a visual mutation outside _render(). The hovered-row-strobe bug I used to fix by hand simply can't be written in this style.

Since so much of Halcyon Reach is user interface, I'm hoping this new pattern makes those interfaces easier to build, harder to break, and easier to scale up as the game grows.

Closing thoughts

What I like about all three of these tools is that they're really responses to the same underlying problem. The language and the test suite don't enforce the things I actually care about. An agent will happily produce a diff that compiles, passes tests, and breaks an invariant I never wrote down. Cartograph and the wire report make those invariants visible at review time. Reactor takes a different tack: it changes the paradigm, so a whole class of UI issues disappear completely.

As for next steps, I'm interested in more visualizations to cartographer. I want to be able to improve upon the diff as a visualization of code changes. I want to be able to identify areas for refactor. I want to be able to spot smells from my rat's nest.

But even more than that, I want to spend some time working on combat and game feel. We've built out scaffolds for the core systems. I want to start seeing how to craft compelling moment-to-moment gameplay out of them.