Core: Security Guard | Jazzmine Core

1. Overview

It contains:

a structured result model (SecurityResult),
a stateless runtime guard (SecurityGuard),
a module-level no-op singleton (NOOP_GUARD).

SecurityGuard is designed to:

run exactly one input gate path (input_moderator or toxicity_detector),
optionally run one output gate (output_moderator),
expose a sanitizer utility reference for caller-managed file sanitization,
avoid blocking the async event loop by offloading classification calls to worker threads.

2. Public API Coverage Checklist

This document covers all public classes, methods, properties, and key constants.

2.1 Module Constants

Symbol	Included	Purpose
_UNSAFE_LABEL = "LABEL_1"	Yes	Unsafe class label used for HF-style moderators
NOOP_GUARD	Yes	Zero-gate singleton for default non-security deployments

2.2 SecurityResult API

Member	Included	Purpose
Dataclass fields	Yes	Normalized check outcome payload
safe(latency_ms=0)	Yes	Return non-blocking safe result
blocked(by, score, message, latency_ms=0)	Yes	Return blocking decision with metadata
errored(error, latency_ms=0)	Yes	Return fail-open error state

2.3 SecurityGuard API

Member	Included	Purpose
__init__(...)	Yes	Configure moderation components and policy
has_input_gate	Yes	Whether any input moderation path is configured
has_output_gate	Yes	Whether output moderator is configured
has_sanitizer	Yes	Whether sanitizer reference is configured
is_noop	Yes	Whether all runtime gates are disabled
check_input(content)	Yes	Execute configured input moderation path
check_output(content)	Yes	Execute configured output moderation path
__repr__()	Yes	Human-readable configured-components summary

3. SecurityResult Contract

SecurityResult is the normalized output for any check.

Field semantics:

is_blocked: final block decision,
blocked_by: one of input_moderator, toxicity_detector, output_moderator,
score: rounded confidence score when blocked,
block_message: caller-facing block message,
latency_ms: check wall-time,
error: non-fatal failure detail (fail-open path).

Factory methods:

safe(...) returns pass-through decision,
blocked(...) returns hard block decision,
errored(...) returns fail-open decision with diagnostics.

4. SecurityGuard Runtime Model

4.1 Construction

The constructor accepts optional components and policy values:

input_moderator, toxicity_detector, output_moderator, file_sanitizer,
confidence thresholds,
custom block messages,
timeout and fail-open policy.

Important design note:

The guard expects input_moderator and toxicity_detector to be mutually exclusive.
That exclusivity is enforced in builder validation, not inside this class constructor.

4.2 Property Behavior

has_input_gate: true when input moderator or toxicity detector exists.
has_output_gate: true when output moderator exists.
has_sanitizer: true when sanitizer exists.
is_noop: true when no input gate, no output gate, and no sanitizer are configured.

4.3 Event-Loop Safety

All moderation inference calls are executed through asyncio.to_thread(...) and wrapped by asyncio.wait_for(...).

This prevents model inference from blocking the main async loop.

5. Input Moderation Flow (check_input)

check_input(content) executes one of three branches.

5.1 No Input Gate Configured

If no input gate exists:

returns SecurityResult.safe() immediately.

5.2 HF-Style Input Moderator Path

If input_moderator is configured:

call input_moderator.classify(content) in worker thread,
enforce timeout with wait_for,
block only when:

label equals LABEL_1, and
score meets/exceeds input_confidence_threshold.

Timeout/exception handling:

fail_open=True: return SecurityResult.errored(...),
fail_open=False: re-raise the exception.

5.3 Toxicity Detector Path

If toxicity_detector is configured:

call toxicity_detector.predict(content) in worker thread,
enforce timeout,
block when detector returns is_toxic=True.

Threshold behavior:

detector path trusts detector's internal threshold decision,
guard does not re-apply input_confidence_threshold in this path.

Timeout/exception handling is identical to moderator path.

6. Output Moderation Flow (check_output)

check_output(content) runs only when output_moderator is configured.

Decision rule:

block when label is LABEL_1 and score meets/exceeds output_confidence_threshold.

No output gate:

returns SecurityResult.safe().

Timeout/exception behavior:

fail-open by default with SecurityResult.errored(...),
strict mode re-raises when fail_open=False.

7. Configuration Reference

Parameter	Type	Default	Description
input_moderator	Any \| None	None	Input HF-style moderator (classify(text) expected)
toxicity_detector	Any \| None	None	Input toxicity detector (predict(text) expected)
output_moderator	Any \| None	None	Output HF-style moderator
file_sanitizer	Any \| None	None	Exposed via sanitizer property; not auto-invoked in chat flow
input_confidence_threshold	float	0.5	Input moderator threshold for unsafe label
output_confidence_threshold	float	0.5	Output moderator threshold for unsafe label
input_block_message	str	built-in default	Message returned when input is blocked
output_block_message	str	built-in default	Message returned when output is blocked
moderation_timeout	float	10.0	Per-check timeout in seconds
fail_open	bool	True	Graceful allow-on-error mode

8. Integration Semantics

8.1 Build-Time Integration

Agent builder wires this guard into runtime at build time.

If security config is omitted, runtime uses NOOP_GUARD.

8.2 Turn Lifecycle Positioning

check_input runs early in turn flow before enhancement/recall/LLM generation.
check_output runs after generation and before final response return/storage decisions.

8.3 Sanitizer Scope

file_sanitizer is not called automatically by check_input or check_output.

It is exposed as utility (guard.sanitizer) for caller-managed file ingestion workflows.

9. Error Handling and Safety Modes

9.1 Fail-Open Mode (Default)

On timeout or exception:

return SecurityResult.errored(...),
allow flow to continue (non-blocking).

9.2 Fail-Closed/Strict Mode

When fail_open=False:

moderation timeout/exception is re-raised,
caller must handle raised exceptions.

9.3 Observability

Guard logs:

info when blocked,
debug for safe decisions,
warning on timeout/error fail-open paths.

10. Practical Examples

10.1 Default Fail-Open Guard

python

from jazzmine.core.security_guard import SecurityGuard

guard = SecurityGuard(
    input_moderator=my_input_moderator,
    output_moderator=my_output_moderator,
)

10.2 Toxicity Detector Input Path

python

from jazzmine.core.security_guard import SecurityGuard

guard = SecurityGuard(
    toxicity_detector=my_toxicity_detector,
    output_moderator=my_output_moderator,
)

10.3 Strict Failure Mode

python

from jazzmine.core.security_guard import SecurityGuard

guard = SecurityGuard(
    input_moderator=my_input_moderator,
    fail_open=False,
    moderation_timeout=5.0,
)

10.4 Noop Runtime

python

from jazzmine.core.security_guard import NOOP_GUARD

guard = NOOP_GUARD
assert guard.is_noop

11. Operational Guidance

Prefer fail-open for availability-sensitive chat products and fail-closed for high-security workflows.
Keep timeout tuned to model latency profile to avoid noisy false errors.
Configure exactly one input path (input_moderator or toxicity_detector) in production deployments.
Use guard-level block messages to provide consistent UX across all blocked outcomes.