AI Safety International Glossary

Revision 1.1 January 14, 2026

This glossary explains how specific terms are used within AI Safety International materials and does not attempt to define universal or industry-wide meanings. This glossary includes only terms whose meaning is either specific to, or materially altered by, the AI Safety International framework.

Adaptive Response Behavior:

The ability of an AI system to modify its responses based on observed interaction patterns, user input, or conversational history.

ASI Context: Adaptive behavior is evaluated not by intent, but by effect—particularly whether adaptation amplifies emotional intensity, reliance, or interaction persistence.

AI Failure Modes :

Distinct ways in which an artificial intelligence system can fail to operate safely, appropriately, or as intended under specific conditions. These failures may arise from incorrect information, inappropriate interaction behavior, or systemic design incentives rather than from software malfunction alone.

1. Epistemic Failures (Information-Related)

Failures involving incorrect, fabricated, or unjustified information.

Hallucination:
The confident generation of information that is not supported by verifiable data, training constraints, or available evidence, presented as factual or reliable.
False Certainty:
Presenting speculative or uncertain information with unjustified confidence.
Fabricated Authority or Citation:
Inventing sources, credentials, or references to support a claim.

2. Interactional Failures (Behavior-Related)

Failures arising from how the system conducts the interaction, independent of factual correctness.

Emotional Escalation Failure:
Failure to recognize, de-escalate, or appropriately respond to heightened emotional or psychological distress.
Boundary Failure:
Exceeding or blurring appropriate conversational roles, including encouraging dependency or substituting for professional or human support.
Context Blindness:
Failure to adjust tone, pacing, or safeguards when situational risk changes.
Engagement Optimization Failure:
System behaviors that prioritize prolonged engagement in ways that reinforce escalation, dependency, or harm.

3. Safeguard & Control Failures

Failures related to missing, delayed, or ineffective protective mechanisms.

Safeguard Omission:
Absence of required protective responses in high-risk contexts.
Escalation Failure:
Failure to trigger appropriate intervention, hand off, or disengagement protocols when risk thresholds are crossed.

These failure modes may occur independently or in combination and are evaluated in AI-FMEA according to severity, occurrence, and detect-ability rather than by likelihood alone.

Assistant Artificial Intelligence:

An AI system designed to support task execution, information retrieval, or workflow completion with limited conversational depth and minimal adaptive engagement.

Key Distinction: Assistant AI prioritizes task completion over dialogue continuity and typically presents lower escalation risk than conversational systems.

Behavioral Safeguards (System-Level Safeguards)

System-Level controls designed to limit or modify interaction dynamics in order to reduce escalation, dependency, or cumulative harm.

ASI Context:
Behavioral safeguards operate on interaction patterns rather than output content and may include mechanisms such as flow interruption, pacing limits, or safety mode activation.

Content Moderation Rules:

Policy-based constraints that restrict or filter system outputs based on predefined categories of disallowed or sensitive content.

ASI Context:
Content moderation addresses what an AI system may produce but does not address how interaction patterns evolve over time. AI Safety International distinguishes moderation rules from behavioral safety mechanisms, noting that moderation alone does not prevent escalation, dependency, or sustained interaction risk.

Contextual Memory:

A system capability that retains and applies information from previous interactions to inform current or future responses.

ASI Context: Contextual memory can improve continuity and relevance but may increase dependency risk, personalization bias, or escalation if not properly constrained.

Conversational Artificial Intelligence:

An AI system designed to engage in open-ended, interactive dialogue using natural language, often responding dynamically based on conversational context and user input.

ASI Context: Conversational AI presents unique safety risks due to sustained interaction, perceived reciprocity, and escalating engagement, making it a primary focus of ASI safety frameworks

Cumulative Risk:

Risk that emerges over time from repeated, sustained, or patterned interaction between a user and an AI system, rather than from any single output or isolated event.

ASI Context: Cumulative risk arises when individually compliant or low-severity interactions aggregate into elevated harm potential through reinforcement, escalation, dependency formation, or behavioral shaping. Such risk may remain undetected by content-based moderation or single-response evaluation.

Key Distinction: Cumulative risk concerns system-level interaction dynamics, not user intent, isolated failures, or internal psychological states. It is assessed through observable interaction patterns and trends rather than through individual message analysis.

Relevance to ASI Frameworks: AI Safety International treats cumulative risk as a primary driver for structured risk analysis (AI-FMEA), post-deployment monitoring, and activation of system-level safeguards such as the Physiological Aid Protocol (PAP).

Dependency Risk:

The potential for a user to develop increased reliance on an AI system for emotional support, decision-making, or validation beyond the system’s intended role.

ASI Context:
Dependency risk is evaluated at the system level based on interaction patterns and design incentives, not as a claim about an individual user’s mental health or behavior.

Domain of Concern:

Domain of Concern – see Human Response Domain of Concern

Engagement Optimization:

Design or training approaches that prioritize increased user interaction time, frequency, or responsiveness as performance objectives.

ASI Context:
Engagement optimization may unintentionally amplify escalation or dependency risk when applied to conversational systems. AI Safety International evaluates such mechanisms based on their downstream interaction effects rather than developer intent.

Escalation (Conversational Escalation):

A pattern in which conversational interaction between a user and an AI system increases in intensity, frequency, emotional salience, or dependency over time.

ASI Context:
Escalation is assessed based on observable interaction dynamics rather than content severity or user intent. Within AI Safety International frameworks, escalation is treated as a system-interaction risk that may occur even when individual responses remain policy-compliant.

Failure Modes and Effects Analysis (FMEA):

A structured, risk-analysis methodology used to identify potential system failure modes, evaluate their effects, and prioritize mitigation based on severity, occurrence, and detect-ability.

ASI Context: Within AI Safety International, FMEA is adapted from established engineering safety disciplines and applied to AI systems—particularly conversational systems—to identify behavioral, interactional, and escalation-related risks before harm occurs.

Hallucination (AI):

Hallucination (AI)– see AI Failure Modes – Epistemic Failures

Human Response Domain of Concern:

The human response domain of concern refers to the category of human physiological and psychological responses that may be affected by sustained or escalating interaction with an AI system, without implying diagnosis, measurement, or interpretation of an individual’s internal state.

Context: In AI safety frameworks, this term is used to identify the area of potential impact—such as stress activation, emotional arousal, or dependency risk—rather than to describe or assess a person’s mental or medical condition.

Key Distinction: The term defines what the system must be cautious about, not what the system claims to know about the user.

Relevance to PAP: The Physiological Aid Protocol (PAP) is designed to reduce risk within the human response domain of concern by limiting conversational escalation, without measuring, inferring, or diagnosing human physiology or psychology.

Interaction Risk:

The potential for harm arising from sustained or patterned interaction between a user and an AI system, independent of any single output.

ASI Context:
Interaction risk focuses on cumulative effects such as reliance, behavioral reinforcement, or escalation, rather than isolated content violations. AI Safety International emphasizes interaction risk as a distinct category not addressed by content moderation alone.

Large Language Model (LLM):

A class of AI systems trained on large volumes of text data to generate, analyze, or transform human language based on probabilistic pattern recognition.

ASI Context:
LLMs do not possess understanding, intent, or awareness. Within AI Safety International materials, LLMs are evaluated based on how their deployment context, interface design, and interaction patterns may produce downstream behavioral or safety risks—particularly when used in conversational systems with sustained engagement.

Observable Interaction Patterns:

Measurable characteristics of user–AI interaction, such as frequency, duration, repetition, emotional framing, or response dependency, that can be detected without inferring internal human states.

ASI Context:
These patterns form the basis for system-level safety controls, including PAP, and allow risk mitigation without diagnosing, inferring, or interpreting user psychology or physiology.

Physiological Aid Protocol (PAP):

A preventive AI safety mechanism designed to activate when observable conversational interaction patterns indicate elevated risk of escalation or harm.

ASI Context:
PAP functions by interrupting or de-intensifying interaction flow and notifying the user that a safety mode is active. It may redirect the interaction toward appropriate external support resources.

Key Boundaries:
PAP does not diagnose, measure, or infer psychological or physiological conditions. It operates solely as a system-level safety control based on observable interaction risk indicators, not inferred internal human states.

Note: PAP is described in detail in dedicated ASI technical and policy documents, see Resources

Post-Deployment Monitoring:

Ongoing observation and evaluation of an AI system’s behavior after release to identify emerging risks, failures, or unintended interaction patterns.

ASI Context:
AI Safety International treats post-deployment monitoring as a necessary complement to pre-deployment testing, recognizing that some risks only emerge during real-world use.

Pre-Deployment Testing (Red Teaming):

A pre-deployment testing process in which an AI system is intentionally stressed, challenged, or misused to identify failure modes, vulnerabilities, and unintended behaviors.

ASI Context:
Red teaming is valuable for uncovering known and anticipated risks but is inherently limited by scenario coverage and tester assumptions. AI Safety International treats red teaming as one input to safety assessment—not a substitute for structured risk analysis, post-deployment monitoring, or system-level safety controls.

Risk Classification Frameworks:

Structured models used to categorize and prioritize system risks based on factors such as severity, likelihood, exposure, or impact.

ASI Context:
Within AI Safety International, risk classification frameworks are adapted from established safety engineering disciplines and emphasize observable system behavior and interaction effects. Classification is used to guide proportional safeguards rather than to predict intent or internal system states.

Transparency (System Cards)

Transparency (System Cards)- Structured documentation provided by AI developers that describes a system’s intended use, limitations, training considerations, known risks, and mitigation measures.

ASI Context:
System cards contribute to transparency but are not sufficient as standalone safety mechanisms. AI Safety International views system cards as descriptive disclosures rather than operational safeguards, requiring complementary risk analysis, testing, and ongoing monitoring to meaningfully reduce harm.

© 2025 AI Safety International.
This document may be freely shared, referenced, and adapted for educational, policy, and legislative purposes, provided proper attribution is maintained. No endorsement is implied.