Core Systems
Core reference

Core: Security Guard

The security guard module provides runtime moderation orchestration for Jazzmine agent turns.

1. Overview

It contains:

  • a structured result model (SecurityResult),
  • a stateless runtime guard (SecurityGuard),
  • a module-level no-op singleton (NOOP_GUARD).

SecurityGuard is designed to:

  • run exactly one input gate path (input_moderator or toxicity_detector),
  • optionally run one output gate (output_moderator),
  • expose a sanitizer utility reference for caller-managed file sanitization,
  • avoid blocking the async event loop by offloading classification calls to worker threads.

2. Public API Coverage Checklist

This document covers all public classes, methods, properties, and key constants.

2.1 Module Constants

SymbolIncludedPurpose
_UNSAFE_LABEL = "LABEL_1"YesUnsafe class label used for HF-style moderators
NOOP_GUARDYesZero-gate singleton for default non-security deployments

2.2 SecurityResult API

MemberIncludedPurpose
Dataclass fieldsYesNormalized check outcome payload
safe(latency_ms=0)YesReturn non-blocking safe result
blocked(by, score, message, latency_ms=0)YesReturn blocking decision with metadata
errored(error, latency_ms=0)YesReturn fail-open error state

2.3 SecurityGuard API

MemberIncludedPurpose
__init__(...)YesConfigure moderation components and policy
has_input_gateYesWhether any input moderation path is configured
has_output_gateYesWhether output moderator is configured
has_sanitizerYesWhether sanitizer reference is configured
is_noopYesWhether all runtime gates are disabled
check_input(content)YesExecute configured input moderation path
check_output(content)YesExecute configured output moderation path
__repr__()YesHuman-readable configured-components summary

3. SecurityResult Contract

SecurityResult is the normalized output for any check.

Field semantics:

  • is_blocked: final block decision,
  • blocked_by: one of input_moderator, toxicity_detector, output_moderator,
  • score: rounded confidence score when blocked,
  • block_message: caller-facing block message,
  • latency_ms: check wall-time,
  • error: non-fatal failure detail (fail-open path).

Factory methods:

  • safe(...) returns pass-through decision,
  • blocked(...) returns hard block decision,
  • errored(...) returns fail-open decision with diagnostics.

4. SecurityGuard Runtime Model

4.1 Construction

The constructor accepts optional components and policy values:

  • input_moderator, toxicity_detector, output_moderator, file_sanitizer,
  • confidence thresholds,
  • custom block messages,
  • timeout and fail-open policy.

Important design note:

  • The guard expects input_moderator and toxicity_detector to be mutually exclusive.
  • That exclusivity is enforced in builder validation, not inside this class constructor.

4.2 Property Behavior

  • has_input_gate: true when input moderator or toxicity detector exists.
  • has_output_gate: true when output moderator exists.
  • has_sanitizer: true when sanitizer exists.
  • is_noop: true when no input gate, no output gate, and no sanitizer are configured.

4.3 Event-Loop Safety

All moderation inference calls are executed through asyncio.to_thread(...) and wrapped by asyncio.wait_for(...).

This prevents model inference from blocking the main async loop.

5. Input Moderation Flow (check_input)

check_input(content) executes one of three branches.

5.1 No Input Gate Configured

If no input gate exists:

  • returns SecurityResult.safe() immediately.

5.2 HF-Style Input Moderator Path

If input_moderator is configured:

  1. call input_moderator.classify(content) in worker thread,
  2. enforce timeout with wait_for,
  3. block only when:
  • label equals LABEL_1, and
  • score meets/exceeds input_confidence_threshold.

Timeout/exception handling:

  • fail_open=True: return SecurityResult.errored(...),
  • fail_open=False: re-raise the exception.

5.3 Toxicity Detector Path

If toxicity_detector is configured:

  1. call toxicity_detector.predict(content) in worker thread,
  2. enforce timeout,
  3. block when detector returns is_toxic=True.

Threshold behavior:

  • detector path trusts detector's internal threshold decision,
  • guard does not re-apply input_confidence_threshold in this path.

Timeout/exception handling is identical to moderator path.

6. Output Moderation Flow (check_output)

check_output(content) runs only when output_moderator is configured.

Decision rule:

  • block when label is LABEL_1 and score meets/exceeds output_confidence_threshold.

No output gate:

  • returns SecurityResult.safe().

Timeout/exception behavior:

  • fail-open by default with SecurityResult.errored(...),
  • strict mode re-raises when fail_open=False.

7. Configuration Reference

ParameterTypeDefaultDescription
input_moderatorAny | NoneNoneInput HF-style moderator (classify(text) expected)
toxicity_detectorAny | NoneNoneInput toxicity detector (predict(text) expected)
output_moderatorAny | NoneNoneOutput HF-style moderator
file_sanitizerAny | NoneNoneExposed via sanitizer property; not auto-invoked in chat flow
input_confidence_thresholdfloat0.5Input moderator threshold for unsafe label
output_confidence_thresholdfloat0.5Output moderator threshold for unsafe label
input_block_messagestrbuilt-in defaultMessage returned when input is blocked
output_block_messagestrbuilt-in defaultMessage returned when output is blocked
moderation_timeoutfloat10.0Per-check timeout in seconds
fail_openboolTrueGraceful allow-on-error mode

8. Integration Semantics

8.1 Build-Time Integration

Agent builder wires this guard into runtime at build time.

If security config is omitted, runtime uses NOOP_GUARD.

8.2 Turn Lifecycle Positioning

  • check_input runs early in turn flow before enhancement/recall/LLM generation.
  • check_output runs after generation and before final response return/storage decisions.

8.3 Sanitizer Scope

file_sanitizer is not called automatically by check_input or check_output.

It is exposed as utility (guard.sanitizer) for caller-managed file ingestion workflows.

9. Error Handling and Safety Modes

9.1 Fail-Open Mode (Default)

On timeout or exception:

  • return SecurityResult.errored(...),
  • allow flow to continue (non-blocking).

9.2 Fail-Closed/Strict Mode

When fail_open=False:

  • moderation timeout/exception is re-raised,
  • caller must handle raised exceptions.

9.3 Observability

Guard logs:

  • info when blocked,
  • debug for safe decisions,
  • warning on timeout/error fail-open paths.

10. Practical Examples

10.1 Default Fail-Open Guard

python
from jazzmine.core.security_guard import SecurityGuard

guard = SecurityGuard(
    input_moderator=my_input_moderator,
    output_moderator=my_output_moderator,
)

10.2 Toxicity Detector Input Path

python
from jazzmine.core.security_guard import SecurityGuard

guard = SecurityGuard(
    toxicity_detector=my_toxicity_detector,
    output_moderator=my_output_moderator,
)

10.3 Strict Failure Mode

python
from jazzmine.core.security_guard import SecurityGuard

guard = SecurityGuard(
    input_moderator=my_input_moderator,
    fail_open=False,
    moderation_timeout=5.0,
)

10.4 Noop Runtime

python
from jazzmine.core.security_guard import NOOP_GUARD

guard = NOOP_GUARD
assert guard.is_noop

11. Operational Guidance

  • Prefer fail-open for availability-sensitive chat products and fail-closed for high-security workflows.
  • Keep timeout tuned to model latency profile to avoid noisy false errors.
  • Configure exactly one input path (input_moderator or toxicity_detector) in production deployments.
  • Use guard-level block messages to provide consistent UX across all blocked outcomes.