Guardrails in Practice

Keeping AI-assisted Java code consistent, safe, and shippable

Define standards once. Let machines enforce them. Debug with confidence.

Agenda

  1. 🚨 The problem β€” AI agents are powerful but unpredictable
  2. 🧠 What “guardrails” actually means (two layers)
  3. πŸ“„ Layer 1 β€” Context: AGENTS.md as the source of truth
  4. πŸ” Layer 2 β€” Enforcement: Checkstyle Β· SpotBugs Β· ArchUnit
  5. βš™οΈ How ArchUnit works under the hood
  6. πŸ” The agent feedback loop β€” generate β†’ verify β†’ fix β†’ re-verify
  7. πŸͺ Local guardrails β€” Makefile + Git hooks
  8. ☸️ Runtime guardrails & debugging on Kubernetes
  9. πŸ” Defense in depth β€” putting it together
  10. βœ… Takeaways

🚨 The Problem: Inconsistency

AI coding agents are powerful, but unpredictable.

  • πŸ”„ Different agents (and versions) write code differently
  • πŸ“ Each tool has its own idea of “good code”
  • ⚠️ A small prompt change β†’ a different output
  • 🧹 No accountability for style, bugs, or architecture

Result: production code that is hard to review, maintain, and trust β€” whether it was written by a human or an agent.

🧠 What is a “guardrail”?

A guardrail makes the right thing easy and the wrong thing fail loudly β€” automatically, every time.

Two complementary layers:

LayerGoalMechanism
ContextTell the agent the rules up frontAGENTS.md, conventions, examples
EnforcementCatch what slips throughStatic analysis, hooks, CI, runtime

Context reduces mistakes. Enforcement guarantees they never merge.

πŸ“„ Layer 1 β€” Context Engineering

“Curate what the AI sees so it has to guess less.”

  • Brief the agent like a new teammate on day one
  • Give it rules + constraints + concrete examples
  • Better context β†’ fewer surprises, less rework

Key insight: AGENTS.md is the agent-agnostic place to write this down once β€” readable by humans and by any AI tool.

πŸ”— https://agents.md/

What the AI actually sees

The context window, top to bottom:

System instructions    β†’ vendor/tool base behavior
Custom instructions    β†’ your AGENTS.md, team conventions
Conversation history   β†’ prompts, replies, corrections
Implicit context       β†’ open files, selection, git diff
Explicit references    β†’ #file, pasted snippets
Tool outputs           β†’ build / test / lint feedback

You control the middle layers and the tool outputs. You don’t control model reasoning or perfectly repeatable output β€” so make the controllable parts strong.

AGENTS.md β€” structure

A plain Markdown contract that lives in the repo:

# Project Overview
## Build and Test Commands       # make build / make test / mvn verify
## Code Quality and Style        # JavaDoc on public methods, no NPEs
## Architecture                  # constructor injection, immutable DTOs
## Security                      # no PII in logs, validate all input
## Plugins                       # Checkstyle / SpotBugs / ArchUnit configs

Keep it at the workspace root β€” the single source of truth that CLAUDE.md and other assistant docs inherit from.

AGENTS.md β€” discovery: nearest wins

project-root/
|-- AGENTS.md          ← 1. read first  (global rules)
+-- src/main/
    |-- AGENTS.md      ← 2. read next   (module rules)
    +-- java/com/app/service/
        |-- AGENTS.md  ← 3. read last   (most specific β€” overrides parent)
        +-- UserService.java   ← file being generated

Rules merge top-down; the nearest AGENTS.md wins on conflict. Put broad rules at the root, exceptions close to the code.

πŸ” Layer 2 β€” Why static analysis?

AGENTS.md defines the standards. Static tools enforce them.

The gap if you stop at documentation:

  • πŸ“‹ Standards live only in prose β€” easy to ignore
  • πŸ€– Agents (and humans) miss subtle requirements
  • 🐌 Manual review is slow and subjective
  • 🚨 Violations quietly slip into commits

The fix: automated, measurable, blocking checks β€” so review can focus on functional logic, not brace placement.

The Java guardrail trio

ToolCatchesConfig
Checkstylestyle, naming, formatting, line lengthcheckstyle.xml
SpotBugsNPEs, resource leaks, bad casts, overflow, concurrencyspotbugs-exclude.xml
ArchUnitlayering, cycles, forbidden APIs, conventions*ArchitectureTest.java

All three run inside one command:

mvn verify     # or: make coverage

If any gate fails, the build stops. No green build β†’ no merge.

🎨 Checkstyle β€” style is not a debate

Without it: inconsistent naming, random indentation, unreadable diffs.

With it:

  • βœ… Naming conventions validated mechanically
  • βœ… Indentation, whitespace, line length enforced
  • βœ… Pairs with a formatter (Spotless / google-java-format)
  • βœ… make format auto-fixes most violations

Style stops being a code-review opinion and becomes a build result.

πŸ› SpotBugs β€” bug patterns before runtime

Bytecode-level analysis that catches what compiles but breaks:

  • βœ… Potential null pointer dereferences
  • βœ… Resource leaks β€” unclosed streams / connections
  • βœ… Type-cast & equals/hashCode mistakes
  • βœ… Integer overflow / underflow
  • βœ… Common concurrency hazards

“It compiles” β‰  “it’s correct.” SpotBugs closes part of that gap for free.

πŸ›οΈ ArchUnit β€” architecture as a test

Without it: circular deps, layering violations, business logic in the wrong layer, the same anti-pattern copy-pasted everywhere.

With it β€” architecture rules become JUnit tests:

  • βœ… Layer separation (controller β†’ service β†’ repository)
  • βœ… No circular dependencies between packages
  • βœ… Controlled access between modules
  • βœ… Ban specific anti-patterns and APIs
  • βœ… The architecture is documented in executable code

βš™οΈ How ArchUnit works (1/2)

A bytecode analysis engine + a fluent assertion DSL. Pure static analysis on compiled .class files β€” no Spring context, no running app.

.class files
   ↓  1. Import   β€” ClassFileImporter builds a graph of
                    JavaClass / JavaMethod / JavaField
   ↓  2. Evaluate β€” fluent DSL walks that object graph
   ↓  3. Condition β€” each match checked by an ArchCondition
   ↓  4. Report   β€” failures listed with FQN + line + reason

Because it reads bytecode directly, it can see annotations, inheritance, field types, and even method-call relationships without executing code.

βš™οΈ How ArchUnit works (2/2)

The three-part DSL β€” what().that(predicate).should(condition):

@AnalyzeClasses(packages = "com.example",
                importOptions = ImportOption.DoNotIncludeTests.class)
class ArchitectureTest {

  @ArchTest
  static final ArchRule noFieldInjection =
      noFields().should().beAnnotatedWith(Autowired.class)
                .because("use constructor injection");
}
βœ… Can check❌ Cannot check
annotations, inheritance, interfacesruntime behavior / return values
method-call & package dependenciesdynamic-proxy behavior
field types, method signaturesapplication.yml config
custom bytecode patternsreflection targets

ArchUnit β€” conventions as rules

Common “ArchUnit red flags” β€” CI fails if present:

noFields().should().beAnnotatedWith(Autowired.class);    // no field injection
noClasses().should().callConstructor(ObjectMapper.class);// reuse shared bean
noClasses().should().callConstructor(RestTemplate.class);// use RestClient
noClasses().should().accessClassesThat()
           .haveFullyQualifiedName("java.lang.System");   // no System.out
classes().that().areAnnotatedWith(RestController.class)
         .should().haveSimpleNameEndingWith("Controller");

The team agreement and the build check are the same artifact. Custom ArchCondition<T> / DescribedPredicate<T> handle anything bespoke.

πŸ” The agent feedback loop

+-----------------------------+
|   Coding Agent (any tool)   |
|   reads AGENTS.md, writes   |
+--------------+--------------+
               | generates code
               v
+-----------------------------+
|  mvn verify                 |
|  Checkstyle . SpotBugs .    |
|  ArchUnit . tests           |
+------+---------------+------+
       |               |
violations              all pass
       |               |
       v               v
feedback to agent      ready for human
(report + AGENTS.md)   functional review
       |
       +--> agent fixes -> re-runs verify -> loops until green

The tool output is the prompt for the next iteration.

πŸͺ Shift left β€” local guardrails

CI is the last line of defense, not the first. Catch violations before they leave the laptop, using Git hooks wired through a Makefile.

Self-installing β€” the first make sets it up:

_HOOKS_PATH := $(shell git config --get core.hooksPath 2>/dev/null)
ifneq ($(_HOOKS_PATH),.githooks)
_ := $(shell test -d .git && git config core.hooksPath .githooks \
        && chmod +x .githooks/* 2>/dev/null)
endif

Hooks live in .githooks/ (version-controlled), not the un-tracked .git/hooks/. Everyone gets the same gates with zero setup.

Three hooks, three checkpoints

# pre-commit  β€” fast feedback
mvn -q test                       # compile + unit tests
mvn -q spotless:check             # is it formatted?  (make format to fix)
# + if a DB changelog is staged β†’ check it applies cleanly on a fresh DB

# commit-msg  β€” enforce Conventional Commits
^(feat|fix|docs|refactor|perf|test|build|ci|chore|revert)(\(scope\))?!?: ...

# pre-push    β€” the heavier gate before sharing
mvn -q spotbugs:check
mvn -q verify                     # integration tests + coverage + ArchUnit

Escape hatch for emergencies: git push --no-verify.

The Makefile as the developer interface

One vocabulary for humans and agents β€” AGENTS.md points here:

make build            # mvn clean package -DskipTests
make test             # unit tests
make coverage         # mvn verify + jacoco report
make format           # spotless:apply  (auto-fix style)
make spotbugs         # spotbugs:check
make liquibase/check  # fresh container β†’ update β†’ validate β†’ teardown

Discoverable, repeatable commands beat tribal knowledge β€” and an agent can read the Makefile to learn how to build and test the repo itself.

☸️ Guardrails at runtime β€” Kubernetes

Static analysis stops bad code. The cluster stops bad behavior:

  • Liveness / readiness / startup probes β€” don’t route traffic to a pod that isn’t ready; restart one that’s stuck
  • Resource requests & limits β€” cap blast radius; avoid noisy-neighbor & OOM cascades
  • Manifest validation β€” kubeconform / schema checks in CI
  • Admission policies β€” OPA/Gatekeeper or Kyverno reject non-compliant workloads at the door

Same philosophy as ArchUnit: encode the rule once, fail loudly when it’s broken.

☸️ Debugging on Kubernetes β€” the toolbox

When a guardrail trips, investigate fast:

kubectl get pods -o wide                 # status, restarts, node
kubectl describe pod <pod>               # events: OOMKilled, ImagePullBackOff…
kubectl logs <pod> -c <container> -f     # stream logs
kubectl logs <pod> --previous            # logs from the crashed container
kubectl exec -it <pod> -- sh             # shell inside a running pod
kubectl port-forward <pod> 8080:8080     # hit the service locally
kubectl debug <pod> --image=busybox \    # ephemeral container for
        --target=<container>             #   distroless images

Read the events first β€” CrashLoopBackOff, OOMKilled, and ImagePullBackOff each point at a different fix.

☸️ Tightening the inner loop

Don’t rebuild-push-redeploy by hand on every change:

  • Telepresence / mirrord β€” run the service locally while it’s wired into the live cluster: real dependencies, instant reload, your debugger attached β€” no image build per change.

Faster feedback at every layer β€” IDE, build, hook, CI, cluster β€” is the whole point of guardrails.

πŸ” Defense in depth

IDE / Agent      AGENTS.md              curate context (Layer 1)
    |
git commit       pre-commit + commit-msg    format, compile, unit test, message
    |
git push         pre-push: spotbugs + verify (IT) + spec-check
    |
CI               Checkstyle . SpotBugs . ArchUnit . test . JaCoCo  (blocking)
    |
Kubernetes       probes . limits . admission policies       runtime
    |
Incident         kubectl events / logs / debug              observe

Each layer is cheap, fails fast, and catches what the previous one missed.

βœ… Takeaways

  • Write the rules down once β€” AGENTS.md is agent-agnostic and human-readable
  • Make machines enforce them β€” Checkstyle + SpotBugs + ArchUnit turn standards into a build result
  • ArchUnit = architecture as a test β€” bytecode analysis, no runtime needed
  • Shift left β€” Makefile + Git hooks catch violations before CI
  • Guardrails extend to runtime β€” k8s probes, limits, and policies; know your debug toolbox
  • The agent gets faster, the standard stays β€” prompts are fragile, standards are forever

Thank you πŸ™

Try it on one repo this week:

  1. Drop an AGENTS.md at the root
  2. Add Checkstyle + SpotBugs + one ArchUnit rule to mvn verify
  3. Wire .githooks/ through your Makefile

Questions & discussion welcome.

πŸ“Ž References: agents.md Β· ArchUnit docs Β· the project’s Makefile & .githooks/