1. Overview
It contains:
- a structured result model (SecurityResult),
- a stateless runtime guard (SecurityGuard),
- a module-level no-op singleton (NOOP_GUARD).
SecurityGuard is designed to:
- run exactly one input gate path (input_moderator or toxicity_detector),
- optionally run one output gate (output_moderator),
- expose a sanitizer utility reference for caller-managed file sanitization,
- avoid blocking the async event loop by offloading classification calls to worker threads.
2. Public API Coverage Checklist
This document covers all public classes, methods, properties, and key constants.
2.1 Module Constants
| Symbol | Included | Purpose |
|---|---|---|
| _UNSAFE_LABEL = "LABEL_1" | Yes | Unsafe class label used for HF-style moderators |
| NOOP_GUARD | Yes | Zero-gate singleton for default non-security deployments |
2.2 SecurityResult API
| Member | Included | Purpose |
|---|---|---|
| Dataclass fields | Yes | Normalized check outcome payload |
| safe(latency_ms=0) | Yes | Return non-blocking safe result |
| blocked(by, score, message, latency_ms=0) | Yes | Return blocking decision with metadata |
| errored(error, latency_ms=0) | Yes | Return fail-open error state |
2.3 SecurityGuard API
| Member | Included | Purpose |
|---|---|---|
| __init__(...) | Yes | Configure moderation components and policy |
| has_input_gate | Yes | Whether any input moderation path is configured |
| has_output_gate | Yes | Whether output moderator is configured |
| has_sanitizer | Yes | Whether sanitizer reference is configured |
| is_noop | Yes | Whether all runtime gates are disabled |
| check_input(content) | Yes | Execute configured input moderation path |
| check_output(content) | Yes | Execute configured output moderation path |
| __repr__() | Yes | Human-readable configured-components summary |
3. SecurityResult Contract
SecurityResult is the normalized output for any check.
Field semantics:
- is_blocked: final block decision,
- blocked_by: one of input_moderator, toxicity_detector, output_moderator,
- score: rounded confidence score when blocked,
- block_message: caller-facing block message,
- latency_ms: check wall-time,
- error: non-fatal failure detail (fail-open path).
Factory methods:
- safe(...) returns pass-through decision,
- blocked(...) returns hard block decision,
- errored(...) returns fail-open decision with diagnostics.
4. SecurityGuard Runtime Model
4.1 Construction
The constructor accepts optional components and policy values:
- input_moderator, toxicity_detector, output_moderator, file_sanitizer,
- confidence thresholds,
- custom block messages,
- timeout and fail-open policy.
Important design note:
- The guard expects input_moderator and toxicity_detector to be mutually exclusive.
- That exclusivity is enforced in builder validation, not inside this class constructor.
4.2 Property Behavior
- has_input_gate: true when input moderator or toxicity detector exists.
- has_output_gate: true when output moderator exists.
- has_sanitizer: true when sanitizer exists.
- is_noop: true when no input gate, no output gate, and no sanitizer are configured.
4.3 Event-Loop Safety
All moderation inference calls are executed through asyncio.to_thread(...) and wrapped by asyncio.wait_for(...).
This prevents model inference from blocking the main async loop.
5. Input Moderation Flow (check_input)
check_input(content) executes one of three branches.
5.1 No Input Gate Configured
If no input gate exists:
- returns SecurityResult.safe() immediately.
5.2 HF-Style Input Moderator Path
If input_moderator is configured:
- call input_moderator.classify(content) in worker thread,
- enforce timeout with wait_for,
- block only when:
- label equals LABEL_1, and
- score meets/exceeds input_confidence_threshold.
Timeout/exception handling:
- fail_open=True: return SecurityResult.errored(...),
- fail_open=False: re-raise the exception.
5.3 Toxicity Detector Path
If toxicity_detector is configured:
- call toxicity_detector.predict(content) in worker thread,
- enforce timeout,
- block when detector returns is_toxic=True.
Threshold behavior:
- detector path trusts detector's internal threshold decision,
- guard does not re-apply input_confidence_threshold in this path.
Timeout/exception handling is identical to moderator path.
6. Output Moderation Flow (check_output)
check_output(content) runs only when output_moderator is configured.
Decision rule:
- block when label is LABEL_1 and score meets/exceeds output_confidence_threshold.
No output gate:
- returns SecurityResult.safe().
Timeout/exception behavior:
- fail-open by default with SecurityResult.errored(...),
- strict mode re-raises when fail_open=False.
7. Configuration Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
| input_moderator | Any | None | None | Input HF-style moderator (classify(text) expected) |
| toxicity_detector | Any | None | None | Input toxicity detector (predict(text) expected) |
| output_moderator | Any | None | None | Output HF-style moderator |
| file_sanitizer | Any | None | None | Exposed via sanitizer property; not auto-invoked in chat flow |
| input_confidence_threshold | float | 0.5 | Input moderator threshold for unsafe label |
| output_confidence_threshold | float | 0.5 | Output moderator threshold for unsafe label |
| input_block_message | str | built-in default | Message returned when input is blocked |
| output_block_message | str | built-in default | Message returned when output is blocked |
| moderation_timeout | float | 10.0 | Per-check timeout in seconds |
| fail_open | bool | True | Graceful allow-on-error mode |
8. Integration Semantics
8.1 Build-Time Integration
Agent builder wires this guard into runtime at build time.
If security config is omitted, runtime uses NOOP_GUARD.
8.2 Turn Lifecycle Positioning
- check_input runs early in turn flow before enhancement/recall/LLM generation.
- check_output runs after generation and before final response return/storage decisions.
8.3 Sanitizer Scope
file_sanitizer is not called automatically by check_input or check_output.
It is exposed as utility (guard.sanitizer) for caller-managed file ingestion workflows.
9. Error Handling and Safety Modes
9.1 Fail-Open Mode (Default)
On timeout or exception:
- return SecurityResult.errored(...),
- allow flow to continue (non-blocking).
9.2 Fail-Closed/Strict Mode
When fail_open=False:
- moderation timeout/exception is re-raised,
- caller must handle raised exceptions.
9.3 Observability
Guard logs:
- info when blocked,
- debug for safe decisions,
- warning on timeout/error fail-open paths.
10. Practical Examples
10.1 Default Fail-Open Guard
from jazzmine.core.security_guard import SecurityGuard
guard = SecurityGuard(
input_moderator=my_input_moderator,
output_moderator=my_output_moderator,
)10.2 Toxicity Detector Input Path
from jazzmine.core.security_guard import SecurityGuard
guard = SecurityGuard(
toxicity_detector=my_toxicity_detector,
output_moderator=my_output_moderator,
)10.3 Strict Failure Mode
from jazzmine.core.security_guard import SecurityGuard
guard = SecurityGuard(
input_moderator=my_input_moderator,
fail_open=False,
moderation_timeout=5.0,
)10.4 Noop Runtime
from jazzmine.core.security_guard import NOOP_GUARD
guard = NOOP_GUARD
assert guard.is_noop11. Operational Guidance
- Prefer fail-open for availability-sensitive chat products and fail-closed for high-security workflows.
- Keep timeout tuned to model latency profile to avoid noisy false errors.
- Configure exactly one input path (input_moderator or toxicity_detector) in production deployments.
- Use guard-level block messages to provide consistent UX across all blocked outcomes.