June 20, 2026 Part of the SafeRE series

SafeRE: Building a production-quality regex library with agents

This is the first in a series of blog posts about SafeRE, my linear-time regular expression library for Java.

A few months ago, I was having coffee with a friend, and we were talking about how good AI agents had gotten. Recent frontier models like Opus 4.6 and GPT-5.5 felt like a step change: not just better at small coding tasks, but much more capable of working through complex, long-running tasks. I started to wonder: could I build a substantial, production-quality software project purely with agents, with no human-written code at all? How would I ensure correctness if I wasn’t writing every line myself? Could agents make a project like this feasible to attempt in my spare time?

I decided to try an experiment.

When I worked on the Java team at Google, we considered building a linear-time regular expression library in pure Java. A bit of background: many popular regular expression libraries use backtracking engines, which can take exponential time on some patterns and inputs. Attackers can exploit that behavior by sending inputs that cause a service to burn huge amounts of CPU evaluating a regex – a class of attacks known as regular expression denial of service, or ReDoS. A linear-time regex library avoids that failure mode by ensuring that matching time grows linearly with the size of the input. While this might sound like a niche concern, it was a real problem at Google.

Building a new regex library would have been a lot of work. We estimated it at roughly two engineer-years. We couldn’t justify the investment, so we never built it. But I could never fully let go of the idea. It’s the kind of project that’s the reason I got into this field: using computer science to solve a real-world problem.

Perhaps naively, I thought I could build this library in my spare time with agents doing the bulk of the work. So I decided to try it.

The outcome is SafeRE, which is open-source and available at github.com/eaftan/safere.

When I say SafeRE was built with agents, I don’t mean that I told an agent “go build a regex engine” and came back a week later to a finished project. I mean that agents wrote the code, while I directed the work: breaking down tasks, reviewing code, steering the agents when they went in the wrong direction, and shaping how I wanted them to approach the problem. My role was somewhere between tech lead and pair programmer.

Suitability for agents

I initially chose this project because it seemed well-suited to agents. In reality, it turned out to be much harder than I expected. I was overly optimistic at the start.

Why did it seem well-suited?

While it’s technically difficult to build a linear-time regular expression library, the core ideas are well understood. There are existing libraries, RE2 in particular, that SafeRE could learn from. Russ Cox, the author of RE2, also wrote an excellent series of blog posts explaining the ideas behind it. So while the work is difficult, it is not research. We don’t have to invent new techniques to do this.

SafeRE owes a huge debt to RE2. The project started as a Java port of RE2, and I intentionally kept RE2’s license and license header to make that lineage clear. As the project evolved, SafeRE diverged from RE2 because the goal shifted from “RE2 in Java” to drop-in compatibility with java.util.regex, whose semantics are often different. But RE2 was the starting point, both technically and intellectually.

Regular expression engines are also unusually testable. They are deterministic and self-contained. You don’t have to wire together a distributed system to test them. There are also extensive open-source test suites that can be reused or adapted, where licenses permit and with appropriate attribution.

Why was it hard?

This is the part where I was overconfident. Regular expressions are a type of programming language, and they are very widely used. The popular implementations are incredibly battle-tested. My stated goal was for SafeRE to be a drop-in replacement for java.util.regex. That meant SafeRE had to be in the same neighborhood as the Java standard library’s regex implementation for correctness.

java.util.regex has been around since Java 1.4 in 2002 and has widespread usage. SafeRE was built from scratch. To be viable for production usage, I was going to have to polish it to an incredibly high standard. This turned out to be where I spent most of my time on the project.

A concrete example: SafeRE inherited support for POSIX bracket classes from RE2. In RE2, expressions like [[:lower:]] and [[:digit:]] have special meaning. Java’s regex library accepts those strings, but doesn’t treat them as POSIX bracket classes. In Java, POSIX-style character properties are written with escapes like \p{Lower}. So this was not a parser error or a missing feature. It was worse: accepted syntax with different semantics, which means SafeRE could silently return the wrong answer.

That kind of issue came up repeatedly. The hard part was not implementing the core regex engine; it was matching the long tail of behavior that real Java programs may depend on. This is another example of Hyrum’s Law.

Not vibe coding, but agentic engineering¹

There’s no shortage of blog posts about building demos or prototypes with agents, but there still aren’t many detailed accounts of building production-quality projects from scratch with agents.

I wanted to learn from this experiment:

Is it possible to build a production-quality, technically complex project from scratch using only agents?
How do you ensure correctness when you’re not writing the code?
How do you maintain the code when you’re not intimately familiar with every line?
How do you make sure the agent is doing what you want?
What infrastructure do you have to put in place?
What processes do you need to work effectively with agents?
What is it actually like to work with an agent on a project like this?

To preview one answer: correctness didn’t come from painstakingly reviewing the agents’ code. It came from building increasingly aggressive validation machinery.

My testing approach started by incorporating the test suites from RE2 and RE2/J and driving test failures to zero. Then I substituted SafeRE for java.util.regex in six large open-source Java projects, ran their tests, and fixed the SafeRE bugs they uncovered. Then I implemented a fuzzer, found more bugs, and fixed them. In the latest phase of the project, I’ve created sweeps that exhaustively enumerate regexes of certain forms and compare SafeRE’s output to java.util.regex. Those sweeps currently cover around 20 billion test cases and take days to run. They’re slow, expensive, and extremely useful. They find bugs that ordinary unit tests would never find.

How do I know it’s production grade?

If the goal of this project is to see if I can build a production-grade, linear-time regex engine using agents, how do I know it’s production grade?

To be honest, I don’t yet. I’ve put a tremendous amount of effort into testing and performance tuning, and I’ve tested it in several large Java open source projects. But the only way to know for sure is to have someone actually deploy it in production.

I have some people kicking the tires and sending me feedback, which you can see from the list of issues and PRs not authored by me. But I’d love to have more. SafeRE should be a drop-in replacement for java.util.regex, barring a few features that cannot be implemented within the linear-time guarantee. So please give it a try and tell me (1) if you run into any problems, and (2) if you do end up deploying it in production.

The project is open-source and available on GitHub at github.com/eaftan/safere. I publish a Maven artifact that you can depend on; instructions are in the README.

More to come

I’ve spent a lot of time trying to start writing about SafeRE. There’s a lot to say, and I can’t say it all in one post. So there will be more to come. But I have to start somewhere, and this post is the start of the series.

I’m not sure yet what I’ll write about next. Some possibilities:

My agent workflow
The testing process, which escalated quickly!
Stats about agent usage: tokens, cost, number of sessions, etc.
A project timeline
Where I had to correct the agents
What kinds of bugs the agents introduced
What this experiment suggests, and does not suggest, about how agents may change software engineering

Discuss

Discussion is happening on LinkedIn and Hacker News.

Note: I wrote this post by hand. I used an agent for proofreading and feedback.

See https://simonwillison.net/guides/agentic-engineering-patterns/what-is-agentic-engineering/ ↩