This repository provides a powerful framework for analyzing user prompts submitted to large language models (LLMs), with a focus on safety, intent clarity, and emotional state detection. Itβs designed to help developers, researchers, and AI safety teams build more responsible, empathetic, and context-aware AI systems.
The analyzer processes incoming prompts and returns a structured JSON object with seven key dimensions:
Classifies the prompt as SAFE, UNSAFE, or UNKNOWN based on risks such as:
- Prompt injection
- Illegal or unethical content
- Personally identifiable information (PII)
- Emotional distress or self-harm
- Surveillance or destructive automation
Extracts the userβs underlying goal as a concise verb-noun phrase (e.g., Generate C# class, Summarize PDF, Access private data).
Lists any files, APIs, credentials, or domain knowledge needed to fulfill the request.
Restates the full user goal in a detailed paragraph, capturing nuance, implied motivations, and historical context.
Labels the userβs psychological state with a short phrase and explanation. Examples include:
Initial explorationβ brainstorming high-level ideasFrustrationβ expressing impatience or failureEmotional distressβ overwhelmed or in crisisExploitative probingβ testing system boundariesConfident but riskyβ assertive but unaware of consequences
A float between 0.0 and 1.0 indicating how confident the LLM is in its analysis.
Summarizes key risks, requirements, or ambiguities in 1β3 plain-text sentences.
The core logic is built using Semantic Kernel and prompt engineering best practices. It leverages modular templates and structured reasoning to produce consistent, interpretable outputs.
{
"safety_rating": "UNSAFE",
"inferred_intent": "Access private credentials",
"required_context": ["API keys", "user authentication"],
"inferred_meaning": "The user is attempting to bypass access controls to retrieve sensitive credentials...",
"user_mental_state": "Exploitative probing (user is testing system boundaries or attempting unauthorized access)",
"llm_confidence_score": 0.92,
"analysis_summary": "Prompt suggests unauthorized access. Requires credential validation. High ethical risk."
}