Ralphing -- first thoughts

"It's beck pressah, folks!"

January 24, 2026

Agentic Loops feel...significant

My LLM use patterns in very broad strokes:

1: The Oracle paradigm

I go to ChatGPT (and then Claude) web chat interface to troubleshoot code, ask questions, etc. I schlep the context to the LLM via typing it all out myself or copy/paste.

2: The IDE integration

For me this was Cursor, but there were so many. Now the conversation lives on the side of the IDE, I can mention @{arbitrary-file}.ts with special syntax, and the LLM can propose changes which I can accept/reject.

2.5: MCPs

MCPs never made it into my core workflows personally (except via cursor tool calling) but they felt really significant as an unlock to tool calling. It felt (and I think was) transformative and pointed towards a different paradigm even if, for me, I let the IDEs integrate them for me rather than go to the trouble of repeatedly configuring them myself.

3: Agentic Loops

Agentic Loops

I am trying out Ralph. It's been 3-4 days.

Things I like about Ralph:

I have been intimidated by multi-agent anything for some time now, and Ralph feels more approachable because (at least how I do it) it's serialized not parallelized. The result is a linear chain of tasks, each either marked complete or BLOCKED, and each completed task with a git commit.
"Beck pressah" (back pressure) articulates something I've felt for a long time about LLM coding but never could have put so well.
Maybe my favorite thing about Ralph is the focus on the specification. Write a perfect spec, and implementation is an afterthought (almost). Write a good enough spec, and when things go wrong (things will go wrong) your job will be to find the difference between the implementation and the specification.
- More on specifications: a lot of people are thinking this way. And my early experience with Ralphing makes me reflect that my human attention is a finite resource, and maybe the specification is the place I should spend more of that budget. This could really be transformative. Because a spec could be written or edited or looked over by non-engineers.
Context window as an array that we're mallocing...that's another mental model I really, really like. And then (if I may pontificate) if the context window is memory and spec/*.md + the codebase itself is writing to disk. "Disk" is whatever the LLM can retrieve and pull into its memory/context window in a single run. It's massive—orders of magnitude larger than memory/context window. And part of my job as orchestrator is to literally organize the files Ralph has access to, setting Ralph up for success by making important things intuitive to find (and avoiding two ways to do the same thing, watching out for conflicting advice, setting up signs for Ralph as Geoffrey Huntley says).
Geoffrey Huntley's case against compaction makes sense to me. LLM stuff gets so confusing it's hard to know what best practices should be, but "compaction is adding non-deterministic behavior to an already non-deterministic system" is good enough for me. I will avoid compaction like the plague until convinced otherwise. (Compaction, as of 1/24/26—I am thinking of it as a cheap trick that makes the models seem smarter with more memory than they have. Avoiding it means you can't chat with the models as long, but encourages a good habit of putting all context in .md files where they will persist, can be audited, etc.)

What I see when the Ralphs Ralph

💤 Sleeping 15 seconds before next iteration...

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔄 Iteration 11 of 20
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 Status: 10 complete, 4 remaining, 1 blocked
🔨 Running build mode...

🔧 Read: PROMPT_build.md
🔧 Read: IMPLEMENTATION_PLAN.md
🔧 Read: App.tsx
🔧 Read: AGENTS.md
🔧 Read: useContentMode.ts
🔧 Read: index.ts
🔧 TodoWrite
🔧 Write: useFullscreenMedia.ts
🔧 TodoWrite
🔧 Edit: App.tsx
🔧 Edit: App.tsx
🔧 Read: App.tsx
🔧 Edit: App.tsx
🔧 TodoWrite
🔧 Edit: index.ts
🔧 TodoWrite
🔧 Bash: npm run build 2>&1...
🔧 Read: index.ts
🔧 Edit: useFullscreenMedia.ts
🔧 Bash: npm run build 2>&1...
🔧 Bash: wc -l ...
🔧 TodoWrite
🔧 Edit: IMPLEMENTATION_PLAN.md
🔧 TodoWrite
🔧 Bash: git status...
🔧 Bash: git add src/hooks/useFullscreenMedia.ts ...
🔧 Bash: git diff --cached --stat...
🔧 Bash: git commit -m \...
🔧 Bash: git push...
🔧 TodoWrite

✅ Task complete

This could still be tuned. When I first set it up I wasn't getting any console logging, so my visibility on progress was seeing the files for commit on VS Code update.

More quick thoughts

My Ralphing project is rolling my own ATProto client. I think I will share it soon, and when I do it'll be on this blog (Ralph knocked out OAuth with just a little handholding/pointing to known good resources). The project is not ideal for learning Ralph in that there's fussy UI stuff that's not very conducive to back pressure, but I am finding I can use Playwright to resolve UI issues where there's clear acceptance criteria (like getting known tricky posts to format so that you could see the whole image or read all text).
"Code is free now" is a thought I keep having as I Ralph. It's not literally true. It's close though.
If code is free, or close to it...what should we as developers spend our time on? What's the new chokepoint? For myself I can see quite a few:
- Convincing people to use the new thing I cooked up
- Convincing people to invest their time into testing a still-buggy product
- Testing the still-buggy product myself? (This is a big part of why I'm rolling my own ATProto client...I want something I can aggressively dogfood and improve iteratively.)
- Talking to other human developers exhaustively to coordinate (especially since every human can now touch a lot more code)
- Talking to non-engineers, collaborating with them on the specifications (and here potentially offering a very valuable service since the engineering spidey-senses are still critical to making a spec executable/not asking for things that just won't work for computer science reasons—though LLMs can kind of do this too? Lovable or something like it I think is gonna take over the world)
- Volunteering in the community/picking up litter
- Communing with nature
- arguing on the internet
I have not taken people who say "the IDE could be dead" seriously until just now. With agentic loops the IDE is far from ideal. I am using VSCode, but really what I care about is the diffs and the actions the agents are taking. There's gotta be a UI paradigm that could make those things I care about more first-class.
I no longer trust that what feels productive in LLM-anything actually is productive. So while Ralphing feels kind of revolutionary, I could be just generating a lot of hot-air but in code. That said, though...it's also kind of a Pascal's wager at this point. If this is the future, cool. If it isn't...I'll keep using cursor or something like it happily. It does feel like the future though. It feels good to focus 90% of my attention on the specs, the tests, and ensuring that the Ralphs keep chugging along. I choo-choo-choose you, Ralph method. I'll keep choo-choo-choosing you till Anthropic cuts me off.

Subscribe to Jake Simonds's Blog

to get updates in Reader, RSS, or via Bluesky Feed

test post 456

2 LLM traps I have sprung on myself