Back again with another update on the progress of Halcyon Reach, my ("somewhat intelligently" vibe-coded) 2d EVE Online-style MMORPG.

The Halcyon Reach client mid-combat: a Bulwark Battlecruiser selected, with the overview rail, sensor scope, and ability bar.
The Halcyon Reach client as it stands today, mid-combat with a Bulwark Battlecruiser locked.

In the past two weeks, I've pushed out 52 PRs spanning both content and core infrastructure, filed 18 new issues and closed 14 of them. I've sketched out and built prototypes of 7 new ship hulls, built the core backend logic behind zone-to-zone travel, built out a placeholder for NPC-owned stations with docking, revamped UIs, combat aggression timers, and different NPC Ship AIs. I even set up the server in the cloud and did a quick demo with my friend -- connecting two clients together and seeing our ships flying around in space was a big dopamine hit.

While I could tell you details about everything that's going on in Halcyon Reach itself, I completely expect so much of these systems to change. I'm really just laying basic foundations for systems that need to be 10-20x larger and more involved than they currently are. For example, some of the current NPC AIs can best be described as either "move towards target and shoot" or "move away from target and shoot." I have much bigger plans for the NPC AI than just these. While I am proud of myself for making this progress and the journey has been fun, I don't think it's super worthwhile to go into the nitty gritty details of how the features themselves actually work since they're all so early-stage.

What may be more worthwhile is again discussing the agentic development methodologies that have led me to this point.

UI Mockup Comparator agent

Even though fine-tuning designs at this stage is a complete waste, as a front-end developer, I can't help but get nerd-sniped by the aesthetics of the game world. If I'm gonna be staring at this screen for hours upon hours of my free time, it'd better be something that I can stand looking at. Plus a key part of the game's identity is keeping the combat "feeling" of EVE Online where the math, positioning, traversal, signature radius, etc. is the interesting part of the combat, and keeping this visible and clear to the user is really important. So, I've been trying to sketch out a broad design language. If anything, it's also just useful for my own mind's eye as I try to imagine what exactly this game is and how it plays.

As much as I truly dislike OpenAI, I do have to admit that their latest image models are pretty cracked. Their ability to sketch out large complex concepts, to lay out basic visual hierarchies, and to keep text clear and legible is simply unmatched by any other models. I was sending it screenshots of the game in the current state and asking it to piecemeal improve certain parts of it, and while the output is sometimes closer to EVE than I'd like, it definitely lays out compelling visions.

An OpenAI-generated mockup of the Halcyon Reach UI: directional scanner, overview, drone bay, and fitting HUD.
An OpenAI-generated mockup of the UI: a compelling, if EVE-adjacent, vision of where the design could go.

I have been taking inspiration from a flow I saw when working in Claude Design, wherein the coding agent will design some UX and then prior to handing off to the user, it will run the design past an impartial 3rd-party validator agent who will look at the generated image and either give it a thumbs up or thumbs down with some feedback. I really like this because I find that agents can sometimes get in their own heads and start to feel a little defensive about their designs based on their preconceived notions of what it thinks the user wants from its short-term memory context. The validator agent having minimal context keeps the coding agent a little bit more grounded in first principles design thinking and I really like that. I found this necessary for my project because while these coding agents can do endless impressive React/Tailwind/Shadcn apps, I find that with Godot UIs, they're mostly still in the 2024-2025 coding era where the designs are just very bland and unimpressive.

So this multi-agent comparison concept seemed really interesting to me, the agent that I built aims to replicate that exact flow. This involved setting up a bespoke Godot Storybook-like test-bed system with Godot. We run the current scene and leverage the GDAI-mcp to take a screenshot of the components in isolation. We then put this up against the mockup (OpenAI or otherwise), and the impartial agent reviews the coding agent's output in relation to the mock, points out discrepancies and room for improvement.

Left: the live in-game overview. Right: the mockup of the same region, the exact pair the comparator agent diffs.

This flow worked great for the very first example above. It's not a complete 1:1 match, but it's close enough to give the game a feeling of polish without much effort.

Continued server authoritative work

One of the bigger refactors I did this stretch was ripping all of the actual content out of the code. For a while, adding a single hull meant editing code in a half-dozen places. There was a HullKind enum, a few exhaustive match / switch statements that each had to know about every hull, some hardcoded arrays of file paths, and a couple of dictionaries on the Godot side mapping hull ids to their display info.

Now a hull is just a JSON file sitting in a single folder on the server, data/canonical/hulls/. Same for weapons, modules, items, loot tables, the anomalies, the NPC spawns; they're all just rows of data under data/canonical/. To add one I drop in a file and it gets picked up with no code changes to make in disparate systems.

There is still custom logic needed for individual special effects or unique abilities. For example, I implemented a rudimentary "hold-and-release"-style effect, where the Mend Frigate's special ability allows you to hold the ability button, and as you hold it, the healing circle around your character gets larger and more powerful, and then you can release it for a large burst of AoE healing. This is similar in nature to how the Evoker's charge spells from World of Warcraft work. This still needs custom script code to function, but as far as the hulls, the abilities, the modules, they can be easily created and managed via JSON, which is a huge boon.

These same JSON files extend over to the Godot side, because the data/canonical/ folder sits at the root of the project, the Rust server bakes those records into its binary at build time while the Godot client reads the exact same files at boot, globbing the directories and parsing each one into a dictionary keyed by id.

Sharing the data doesn't make the client an authority on anything, though. It reads the catalog so it can draw the right ship, show the right tooltip, and let me pick a hull at the fitting screen, but when it actually matters the server runs the simulation off its own copy and its answer is the only one that counts.

Continued "Single Source of Truth"-ing through the language boundary

As I mentioned in the previous post, a key struggle with our game will be our two-language system. While I've been having a great time with Bevy and Lightyear in our Rust server and we've successfully demoed simple networking between two clients, I still worry a lot about the continued maintenance of our Rust-to-Godot seam. I now have it so that the two halves only ever talk through a single, tiny doorway. Initially, the Godot side just mirrored all of that by hand. Every message kind was a string literal re-typed in GDScript, and every payload was a dictionary I hand-built on the way out and .get()'d my way through on the way in.

The Halcyon Reach connect screen, with name, host, and port fields and a Connect button.
The connect screen: point the client at a host and port, and the whole two-language conversation begins.

In order to make the system more resilient, I made the Rust side the single source of truth and had it generate the GDScript half. There's now one list of wire messages in Rust, and a little gen_gd binary reflects over it so the types, the field names, the serde renames, and what's optional and what isn't are all equivalent in the GDScript mirror.

In practice it turns those hand-built dictionaries and stringly-typed .get()s into generated Godot function calls with typesafety:

## before: a renamed Rust field silently reads as the default
var phase = payload.get("phase", 0)
## after: generated accessor, warns loudly the moment "phase" disappears
var phase = WirePayloads.AnomalyState.phase(payload)

And because GDScript has no compiler to catch any of this for me, I created a couple of guardrails to make it trustworthy: a drift check in my testing flow that regenerates the files and fails the build if something is stale, and a per-message test that pins the Rust schema against its keys so a new message is covered the second it joins the list. None of this makes the two-language problem go away, but it moves the failure from "crash during playtest" to "giant red X during CI," which is exactly the pattern needed to make it possible to build with agents at this scale.

Recurring chores

Additionally, I spent a lot of time this past week doing some recurring code cleanliness chores. One of the stances I have on agentic AI is that an overreliance on CLAUDE.md, specs, or rules is probably misguided. Anecdotally, I feel like the agents learn more from observing existing patterns within the codebase and parroting what's there already. Having good inline documentation through comments is incredibly important and carries with it a lot more about coding style and preferences than any rules file could ever hope for.

On this front, though, I began noticing that my agents would do things in the comments that would bother me:

  1. Write historical comments detailing the lineage of the code more than the functionality ("this used to be here, but it's not here any longer")
  2. Write overly long comments explaining every nuance of a system in exhaustive prose, as if they were narrating a YouTube tutorial about the codebase
  3. Over-reference GitHub issues, plan step IDs, or other external artifacts.

For example, here's a comment that manages all three at once:

## This Command class used to send its own wire message and mutate
## Ship.position locally, but as of the big refactor in PR #128 (see also
## issue #117 and Step 13 of the migration plan) that no longer happens.
## To be totally clear: none of that lives here anymore. The activation
## now rides on PlayerInputs, and the position mutation is handled entirely
## by the shared /sim system, which runs the very same deterministic step
## on both the client and the server so that prediction and reconciliation
## line up frame-for-frame the way Lightyear expects them to.
## TODO: revisit all of this once #142 finally lands.

is much better written like this:

## This Command class is a thin shim: the wire activation rides
## `PlayerInputs` and the `Pos` mutation runs in `/sim` on both sides.

A chore that I knew I needed to do was to take a full pass across the entire codebase to tidy up all of these comments. That PR ended up being +1669 / −3047 LOC, and I'm hoping that it continues to pay off as the codebase gets larger. The precedent is now set in stone about what a good comment looks like. I'm just hoping that future agents don't revert back to old ways!

Conclusion

It's been a really fun experience working on this project so far. But it's exactly that -- a lot of work... I'm still sitting at my PC babysitting the agents and making sure they're on track. I still have to catch smells early. I still have to make my architecture designs and game preferences known.

But that said, it's addictive being able to sketch out super high-level requirements and see them appear in real time.

A basic understanding of systems design, a sense of what good architecture looks like, and a sense of what good gameplay looks like can take you really far. I'm still driven to see just how far that is.