Guardrails in Practice

Keeping AI-assisted Java code consistent, safe, and shippable

Define standards once. Let machines enforce them. Debug with confidence.

Agenda

🚨 The problem — AI agents are powerful but unpredictable
🧠 What “guardrails” actually means (two layers)
📄 Layer 1 — Context: AGENTS.md as the source of truth
🔍 Layer 2 — Enforcement: Checkstyle · SpotBugs · ArchUnit
⚙️ How ArchUnit works under the hood
🔁 The agent feedback loop — generate → verify → fix → re-verify
🪝 Local guardrails — Makefile + Git hooks
☸️ Runtime guardrails & debugging on Kubernetes
🔁 Defense in depth — putting it together
✅ Takeaways

🚨 The Problem: Inconsistency

AI coding agents are powerful, but unpredictable.

🔄 Different agents (and versions) write code differently
📝 Each tool has its own idea of “good code”
⚠️ A small prompt change → a different output
🧹 No accountability for style, bugs, or architecture

Result: production code that is hard to review, maintain, and trust — whether it was written by a human or an agent.

🧠 What is a “guardrail”?

A guardrail makes the right thing easy and the wrong thing fail loudly — automatically, every time.

Two complementary layers:

Layer	Goal	Mechanism
Context	Tell the agent the rules up front	`AGENTS.md`, conventions, examples
Enforcement	Catch what slips through	Static analysis, hooks, CI, runtime

Context reduces mistakes. Enforcement guarantees they never merge.

📄 Layer 1 — Context Engineering

“Curate what the AI sees so it has to guess less.”

Brief the agent like a new teammate on day one
Give it rules + constraints + concrete examples
Better context → fewer surprises, less rework

Key insight: AGENTS.md is the agent-agnostic place to write this down once — readable by humans and by any AI tool.

🔗 https://agents.md/

What the AI actually sees

The context window, top to bottom:

System instructions    → vendor/tool base behavior
Custom instructions    → your AGENTS.md, team conventions
Conversation history   → prompts, replies, corrections
Implicit context       → open files, selection, git diff
Explicit references    → #file, pasted snippets
Tool outputs           → build / test / lint feedback

You control the middle layers and the tool outputs. You don’t control model reasoning or perfectly repeatable output — so make the controllable parts strong.

`AGENTS.md` — structure

A plain Markdown contract that lives in the repo:

# Project Overview
## Build and Test Commands       # make build / make test / mvn verify
## Code Quality and Style        # JavaDoc on public methods, no NPEs
## Architecture                  # constructor injection, immutable DTOs
## Security                      # no PII in logs, validate all input
## Plugins                       # Checkstyle / SpotBugs / ArchUnit configs

Keep it at the workspace root — the single source of truth that CLAUDE.md and other assistant docs inherit from.

`AGENTS.md` — discovery: nearest wins

project-root/
|-- AGENTS.md          ← 1. read first  (global rules)
+-- src/main/
    |-- AGENTS.md      ← 2. read next   (module rules)
    +-- java/com/app/service/
        |-- AGENTS.md  ← 3. read last   (most specific — overrides parent)
        +-- UserService.java   ← file being generated

Rules merge top-down; the nearest AGENTS.md wins on conflict. Put broad rules at the root, exceptions close to the code.

🔍 Layer 2 — Why static analysis?

AGENTS.md defines the standards. Static tools enforce them.

The gap if you stop at documentation:

📋 Standards live only in prose — easy to ignore
🤖 Agents (and humans) miss subtle requirements
🐌 Manual review is slow and subjective
🚨 Violations quietly slip into commits

The fix: automated, measurable, blocking checks — so review can focus on functional logic, not brace placement.

The Java guardrail trio

Tool	Catches	Config
Checkstyle	style, naming, formatting, line length	`checkstyle.xml`
SpotBugs	NPEs, resource leaks, bad casts, overflow, concurrency	`spotbugs-exclude.xml`
ArchUnit	layering, cycles, forbidden APIs, conventions	`*ArchitectureTest.java`

All three run inside one command:

mvn verify     # or: make coverage

If any gate fails, the build stops. No green build → no merge.

🎨 Checkstyle — style is not a debate

Without it: inconsistent naming, random indentation, unreadable diffs.

With it:

✅ Naming conventions validated mechanically
✅ Indentation, whitespace, line length enforced
✅ Pairs with a formatter (Spotless / google-java-format)
✅ make format auto-fixes most violations

Style stops being a code-review opinion and becomes a build result.

🐛 SpotBugs — bug patterns before runtime

Bytecode-level analysis that catches what compiles but breaks:

✅ Potential null pointer dereferences
✅ Resource leaks — unclosed streams / connections
✅ Type-cast & equals/hashCode mistakes
✅ Integer overflow / underflow
✅ Common concurrency hazards

“It compiles” ≠ “it’s correct.” SpotBugs closes part of that gap for free.

🏛️ ArchUnit — architecture as a test

Without it: circular deps, layering violations, business logic in the wrong layer, the same anti-pattern copy-pasted everywhere.

With it — architecture rules become JUnit tests:

✅ Layer separation (controller → service → repository)
✅ No circular dependencies between packages
✅ Controlled access between modules
✅ Ban specific anti-patterns and APIs
✅ The architecture is documented in executable code

⚙️ How ArchUnit works (1/2)

A bytecode analysis engine + a fluent assertion DSL. Pure static analysis on compiled .class files — no Spring context, no running app.

.class files
   ↓  1. Import   — ClassFileImporter builds a graph of
                    JavaClass / JavaMethod / JavaField
   ↓  2. Evaluate — fluent DSL walks that object graph
   ↓  3. Condition — each match checked by an ArchCondition
   ↓  4. Report   — failures listed with FQN + line + reason

Because it reads bytecode directly, it can see annotations, inheritance, field types, and even method-call relationships without executing code.

⚙️ How ArchUnit works (2/2)

The three-part DSL — what().that(predicate).should(condition):

@AnalyzeClasses(packages = "com.example",
                importOptions = ImportOption.DoNotIncludeTests.class)
class ArchitectureTest {

  @ArchTest
  static final ArchRule noFieldInjection =
      noFields().should().beAnnotatedWith(Autowired.class)
                .because("use constructor injection");
}

✅ Can check	❌ Cannot check
annotations, inheritance, interfaces	runtime behavior / return values
method-call & package dependencies	dynamic-proxy behavior
field types, method signatures	`application.yml` config
custom bytecode patterns	reflection targets

ArchUnit — conventions as rules

Common “ArchUnit red flags” — CI fails if present:

noFields().should().beAnnotatedWith(Autowired.class);    // no field injection
noClasses().should().callConstructor(ObjectMapper.class);// reuse shared bean
noClasses().should().callConstructor(RestTemplate.class);// use RestClient
noClasses().should().accessClassesThat()
           .haveFullyQualifiedName("java.lang.System");   // no System.out
classes().that().areAnnotatedWith(RestController.class)
         .should().haveSimpleNameEndingWith("Controller");

The team agreement and the build check are the same artifact. Custom ArchCondition<T> / DescribedPredicate<T> handle anything bespoke.

🔁 The agent feedback loop

+-----------------------------+
|   Coding Agent (any tool)   |
|   reads AGENTS.md, writes   |
+--------------+--------------+
               | generates code
               v
+-----------------------------+
|  mvn verify                 |
|  Checkstyle . SpotBugs .    |
|  ArchUnit . tests           |
+------+---------------+------+
       |               |
violations              all pass
       |               |
       v               v
feedback to agent      ready for human
(report + AGENTS.md)   functional review
       |
       +--> agent fixes -> re-runs verify -> loops until green

The tool output is the prompt for the next iteration.

🪝 Shift left — local guardrails

CI is the last line of defense, not the first. Catch violations before they leave the laptop, using Git hooks wired through a Makefile.

Self-installing — the first make sets it up:

_HOOKS_PATH := $(shell git config --get core.hooksPath 2>/dev/null)
ifneq ($(_HOOKS_PATH),.githooks)
_ := $(shell test -d .git && git config core.hooksPath .githooks \
        && chmod +x .githooks/* 2>/dev/null)
endif

Hooks live in .githooks/ (version-controlled), not the un-tracked .git/hooks/. Everyone gets the same gates with zero setup.

Three hooks, three checkpoints

# pre-commit  — fast feedback
mvn -q test                       # compile + unit tests
mvn -q spotless:check             # is it formatted?  (make format to fix)
# + if a DB changelog is staged → check it applies cleanly on a fresh DB

# commit-msg  — enforce Conventional Commits
^(feat|fix|docs|refactor|perf|test|build|ci|chore|revert)(\(scope\))?!?: ...

# pre-push    — the heavier gate before sharing
mvn -q spotbugs:check
mvn -q verify                     # integration tests + coverage + ArchUnit

Escape hatch for emergencies: git push --no-verify.

The Makefile as the developer interface

One vocabulary for humans and agents — AGENTS.md points here:

make build            # mvn clean package -DskipTests
make test             # unit tests
make coverage         # mvn verify + jacoco report
make format           # spotless:apply  (auto-fix style)
make spotbugs         # spotbugs:check
make liquibase/check  # fresh container → update → validate → teardown

Discoverable, repeatable commands beat tribal knowledge — and an agent can read the Makefile to learn how to build and test the repo itself.

☸️ Guardrails at runtime — Kubernetes

Static analysis stops bad code. The cluster stops bad behavior:

Liveness / readiness / startup probes — don’t route traffic to a pod that isn’t ready; restart one that’s stuck
Resource requests & limits — cap blast radius; avoid noisy-neighbor & OOM cascades
Manifest validation — kubeconform / schema checks in CI
Admission policies — OPA/Gatekeeper or Kyverno reject non-compliant workloads at the door

Same philosophy as ArchUnit: encode the rule once, fail loudly when it’s broken.

☸️ Debugging on Kubernetes — the toolbox

When a guardrail trips, investigate fast:

kubectl get pods -o wide                 # status, restarts, node
kubectl describe pod <pod>               # events: OOMKilled, ImagePullBackOff…
kubectl logs <pod> -c <container> -f     # stream logs
kubectl logs <pod> --previous            # logs from the crashed container
kubectl exec -it <pod> -- sh             # shell inside a running pod
kubectl port-forward <pod> 8080:8080     # hit the service locally
kubectl debug <pod> --image=busybox \    # ephemeral container for
        --target=<container>             #   distroless images

Read the events first — CrashLoopBackOff, OOMKilled, and ImagePullBackOff each point at a different fix.

☸️ Tightening the inner loop

Don’t rebuild-push-redeploy by hand on every change:

Telepresence / mirrord — run the service locally while it’s wired into the live cluster: real dependencies, instant reload, your debugger attached — no image build per change.

Faster feedback at every layer — IDE, build, hook, CI, cluster — is the whole point of guardrails.

🔁 Defense in depth

IDE / Agent      AGENTS.md              curate context (Layer 1)
    |
git commit       pre-commit + commit-msg    format, compile, unit test, message
    |
git push         pre-push: spotbugs + verify (IT) + spec-check
    |
CI               Checkstyle . SpotBugs . ArchUnit . test . JaCoCo  (blocking)
    |
Kubernetes       probes . limits . admission policies       runtime
    |
Incident         kubectl events / logs / debug              observe

Each layer is cheap, fails fast, and catches what the previous one missed.

✅ Takeaways

Write the rules down once — AGENTS.md is agent-agnostic and human-readable
Make machines enforce them — Checkstyle + SpotBugs + ArchUnit turn standards into a build result
ArchUnit = architecture as a test — bytecode analysis, no runtime needed
Shift left — Makefile + Git hooks catch violations before CI
Guardrails extend to runtime — k8s probes, limits, and policies; know your debug toolbox
The agent gets faster, the standard stays — prompts are fragile, standards are forever

Thank you 🙏

Try it on one repo this week:

Drop an AGENTS.md at the root
Add Checkstyle + SpotBugs + one ArchUnit rule to mvn verify
Wire .githooks/ through your Makefile

Questions & discussion welcome.

📎 References: agents.md · ArchUnit docs · the project’s Makefile & .githooks/