Guardrails

Guardrails enforce policy outside the agent prompt. They still run when a prompt is weak, a model is confused, or a user tries to override instructions.

Stages

Stage	Inspects	Result
`input`	User message before dispatch	Reject the turn
`output`	Agent response before delivery	Block the response
`pre-tool`	Tool name and arguments	Deny the call
egress judge	Selected outbound HTTP requests	Allow or block the request

Use Egress judge for outbound content policy.

Guardrails vs nearby layers

Need	Use
Tell the model how to behave	`SOUL.md` or a skill
Hide or expose tools	Tool policy
Enforce policy even if the model ignores instructions	Guardrail
Restrict network destinations	Agent network policy

Built-ins

Name	Stage	Purpose
`secret-scan`	output	Blocks credential-shaped strings
`pii-scan`	input, output, pre-tool	Detects common PII patterns
`forbidden-tools`	pre-tool	Blocks a fixed destructive-tool deny list

const assistant = defineAgent({
  id: "assistant",
  dir: "./agents/assistant",
  guardrails: ["secret-scan", "pii-scan", "forbidden-tools"],
});

See the lobu.config.ts reference.

Inline and skill guardrails

Operators can add inline LLM judges for input, output, or pre-tool. A pre-tool judge may be narrowed to named tools.

Skills may add pre-tool guardrails for the tools they introduce, but cannot:

weaken input or output guardrails
pre-approve their own destructive operations
override the operator’s disabled list

The effective set is the union of agent built-ins, operator inline judges, and enabled skill guardrails. Operator exclusions apply last.

Audit and improvement

Every trip writes a guardrail-trip event. Operators can inspect it, and behaviors can group repeated failures into eval cases or proposed policy changes.

Unresolved built-in names are logged and skipped, so check startup logs after changing names.

Failure behavior

Deterministic guardrails should return a verdict rather than throw. The common runner treats an unexpected exception as a pass so an infrastructure failure does not wedge every turn. LLM judges have separate cache and circuit-breaker behavior.

Use deployment-specific controls when strict fail-closed behavior is required.