Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -352,4 +352,4 @@ See our [public roadmap](https://github.com/SimiaCryptus/Cognotik/projects) for

<p align="center">
Made with ❤️ by the Cognotik Team
</p>
</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Android Cognotik Implementation

This package contains the Android-specific implementation of the Cognotik platform. It adapts the core Cognotik logic—originally designed for desktop environments—to run efficiently within the Android lifecycle, using a background service to host a local Jetty server and a `WebView` for the user interface.

## Core Components

### [AndroidCognotikApps.kt](./AndroidCognotikApps.kt)
The central logic provider for the Android application. It extends `ApplicationDirectory` and is responsible for:
- **Server Configuration**: Setting up the Jetty server and defining the web application routes.
- **App Suite**: Initializing the suite of Cognotik applications including Chat, Task-Runner, Auto-Plan, Plan-Ahead, and Goal-Oriented modes.
- **Platform Adaptation**: Removing desktop-specific features (like system tray and daemon sockets) and providing mock authentication/authorization managers suitable for local device use.
- **Resource Management**: Dynamically generating the welcome page and managing local file paths within the Android `filesDir`.
- **Port Discovery**: Automatically finding available network ports to avoid conflicts.

### [CognotikService.kt](./CognotikService.kt)
A background `Service` that manages the lifecycle of the Cognotik server.
- **Persistence**: Ensures the Jetty server continues running independently of the UI activity state.
- **Concurrency**: Launches the server within a Kotlin Coroutine (`Dispatchers.IO`) to prevent blocking the main thread.
- **Status Monitoring**: Provides a `ServerStatusListener` interface for UI components to track server startup, errors, and port assignments.
- **System Diagnostics**: Logs detailed system information (memory, storage, architecture) to assist in debugging environment-specific issues.

### [CognotikActivity.kt](./CognotikActivity.kt)
The primary user interface component.
- **WebView Integration**: Hosts a fully configured `WebView` (JavaScript enabled, DOM storage, zoom controls) to render the Cognotik web interface.
- **Service Binding**: Manages the connection to `CognotikService` and reacts to server status changes.
- **User Controls**: Implements `SwipeRefreshLayout` and a Floating Action Button (FAB) for easy interface reloading.
- **Lifecycle Handling**: Manages back-button navigation within the WebView history and ensures proper service unbinding on destruction.

### [CognotikApplication.kt](./CognotikApplication.kt)
The custom `Application` class for global initialization.
- **Emoji Support**: Provides thread-safe, bundled `EmojiCompat` initialization to ensure consistent emoji rendering across different Android versions.
- **Logging**: Configures SLF4J (Simple Logging Facade for Java) properties for Android-compatible log output.

## Architecture Overview

The system follows a client-server architecture hosted entirely on the local device:

1. **Initialization**: `CognotikApplication` sets up logging and emoji support.
2. **Service Start**: `CognotikActivity` starts and binds to `CognotikService`.
3. **Server Launch**: The service uses `AndroidCognotikApps` to start a Jetty server on a background thread.
4. **UI Rendering**: Once the server is ready, the activity loads `http://localhost:[port]` into the `WebView`.
5. **Interaction**: User interactions in the WebView are handled by the local Jetty server, which invokes the Cognotik planning and chat logic.

## Key Features

- **Local Execution**: All AI orchestration logic runs locally on the device.
- **Resilience**: The background service prevents server interruption during configuration changes (like screen rotation).
- **Dynamic Port Allocation**: Prevents "Address already in use" errors by searching for available ports starting from `12891`.
- **Integrated Debugging**: Comprehensive logging of WebView console messages and server-side events to the Android Logcat.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package com.simiacryptus.cognotik.agents
import com.simiacryptus.cognotik.chat.model.ChatInterface
import com.simiacryptus.cognotik.describe.AbbrevWhitelistYamlDescriber
import com.simiacryptus.cognotik.describe.TypeDescriber
import com.simiacryptus.cognotik.exceptions.MultiExeption
import com.simiacryptus.cognotik.models.ModelSchema
import com.simiacryptus.cognotik.util.*
import java.util.function.Function
Expand Down
57 changes: 57 additions & 0 deletions core/src/main/kotlin/com/simiacryptus/cognotik/agents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Cognotik Agents

The `com.simiacryptus.cognotik.agents` package provides a robust framework for building and interacting with AI agents. These agents range from simple text-based conversationalists to complex systems capable of code execution, image processing, and structured data extraction.

## Core Architecture

### BaseAgent
The `BaseAgent<I, R>` is the abstract foundation for all agents in the system. It defines a consistent interface for:
- **Input/Output**: Generic types `I` (Input) and `R` (Response).
- **Prompting**: Managing system prompts and chat message construction.
- **Model Integration**: Interfacing with `ChatInterface` for LLM calls.
- **Configuration**: Setting parameters like temperature and agent names.

## Agent Implementations

### ChatAgent
A simple text-to-text agent. It takes a list of strings (conversation history) and returns a string response. It is the most direct implementation of a conversational LLM.

### CodeAgent
A powerful agent designed for generating and executing code.
- **Execution Environment**: Uses a `CodeRuntime` to run generated code.
- **Self-Correction**: If code execution fails, the agent can automatically analyze the error and attempt to fix the code.
- **API Description**: Automatically generates documentation for provided symbols/objects using a `TypeDescriber`, allowing the LLM to use local APIs effectively.
- **Validation**: Can validate code syntax before execution.

### ImageGenerationAgent
Specializes in creating images from text descriptions.
- **Prompt Refinement**: Uses a text model to expand or refine user requests into detailed image prompts.
- **Multi-Model**: Coordinates between a text LLM and an image generation model (e.g., DALL-E).

### ImageProcessingAgent
A multi-modal agent that accepts both text and images as input.
- **Analysis**: Can describe images, answer questions about visual content, or perform image-to-image tasks.
- **Input Format**: Uses the `ImageAndText` data class for handling mixed media.

### ParsedAgent & ParsedImageAgent
These agents are designed to return structured data instead of raw text.
- **Schema-Driven**: Uses Kotlin/Java classes to define the expected output structure.
- **JSON Extraction**: Automatically handles the extraction and parsing of JSON from LLM responses.
- **Validation**: Integrates with `ValidatedObject` to ensure the parsed data meets specific business rules.
- **ParsedImageAgent**: Extends this functionality to multi-modal inputs (images + text).

### ProxyAgent
A high-level abstraction that allows interacting with an LLM through a standard Java/Kotlin interface.
- **Dynamic Proxy**: Creates an implementation of an interface at runtime.
- **Type-Safe**: Method calls are translated into LLM prompts, and the JSON responses are mapped back to the method's return type.
- **Few-Shot Learning**: Supports adding examples to guide the LLM's behavior for specific methods.

## Supporting Components

- **ImageAndText**: A data structure for passing images (`BufferedImage`) and associated text together.
- **ParsedResponse**: A wrapper that provides access to both the raw text response and the deserialized object.
- **CodeInterceptor**: A functional interface in `CodeAgent` for modifying or logging code before it is executed.

## Usage Patterns

Agents are typically instantiated with a `ChatInterface` (representing the LLM) and a specific prompt or configuration. They are designed to be composable and can be wrapped or extended to create complex multi-agent workflows.
69 changes: 69 additions & 0 deletions core/src/main/kotlin/com/simiacryptus/cognotik/audio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Audio Processing & Transcription Module

This package provides a comprehensive suite of tools for real-time audio capture, digital signal processing (DSP), adaptive silence discrimination, and AI-powered transcription.

## Core Components

### 1. Data Representation: `AudioPacket`
The `AudioPacket` is the fundamental unit of audio data. It wraps a `FloatArray` of samples and provides extensive DSP metrics:
- **RMS (Root Mean Square):** Measures the average power/volume.
- **Spectral Analysis:** Includes Spectral Entropy, Spectral Centroid, and Spectral Flatness.
- **A-Weighting (IEC 61672):** Frequency weighting that mimics human hearing sensitivity.
- **Frequency Band Power:** Ability to calculate power within specific frequency ranges (e.g., 85Hz-255Hz for human voice fundamentals).
- **FFT Integration:** Uses `FloatFFT_1D` for frequency domain transformations.
- **Format Conversion:** Utilities to convert between raw bytes, WAV format, and float arrays.

### 2. Audio Capture: `AudioRecorder`
Handles the interface with system hardware:
- **Microphone Selection:** Supports targeting specific mixer lines by name.
- **Buffering:** Uses a circular buffer to manage continuous audio streams.
- **Packetization:** Slices the incoming stream into uniform time-based packets (default 100ms) for downstream processing.

### 3. Silence Discrimination (VAD)
The module implements a sophisticated Voice Activity Detection (VAD) system:
- **`SilenceDiscriminator`:** A state machine that transitions between `TALKING` and `QUIET` states based on configurable thresholds and window counts.
- **`TrainedSilenceDiscriminator`:** An adaptive implementation that "learns" the characteristics of silence vs. speech. It uses multiple metrics (RMS, A-Weighting, Entropy, and specific frequency bands) to build statistical models.
- **`PercentileTool`:** A statistical utility used by the discriminator to track value distributions, calculate KL-Divergence between speech/silence profiles, and determine optimal entropy-based thresholds.

### 4. Transcription: `TranscriptionProcessor`
Orchestrates the interaction with AI models:
- **Client Integration:** Interfaces with `TranscriptionClient` (e.g., OpenAI Whisper).
- **Context Management:** Supports prompt injection and updates to maintain transcription continuity.
- **Asynchronous Execution:** Processes audio packets in a dedicated thread to prevent UI or capture lag.

### 5. Orchestration: `DictationManager`
An abstract base class that ties all components together into a functional pipeline:
1. **Capture:** Starts the `AudioRecorder`.
2. **Filter:** Passes packets through the `TrainedSilenceDiscriminator`.
3. **Process:** Sends identified speech segments to the `TranscriptionProcessor`.
4. **Lifecycle:** Manages thread startup/shutdown and error handling.

## Usage Patterns

### Implementing a Dictation Service
To create a specific dictation tool, extend `DictationManager` and provide a `TranscriptionClient`:

```kotlin
class MyDictationManager : DictationManager() {
override fun transcriptionClient() = MyAIClient()
}

val manager = MyDictationManager()
manager.selectedMicLine = "Built-in Microphone"
manager.onTranscriptionUpdate = { result ->
println("Heard: ${result.text}")
}
manager.startRecording()
```

### Statistical Learning
The `TrainedSilenceDiscriminator` can be toggled into training modes:
- `isTraining = false`: Collects statistics for background noise (silence).
- `isTraining = true`: Collects statistics for active speech.
- `isTraining = null`: Uses the learned distributions to perform real-time discrimination using a log-likelihood ratio comparison.

## Technical Specifications
- **Default Sample Rate:** 16,000 Hz (optimized for speech recognition).
- **Bit Depth:** 16-bit Signed PCM.
- **Channels:** Mono.
- **FFT Implementation:** `edu.emory.mathcs.jtransforms.fft.FloatFFT_1D`.
53 changes: 53 additions & 0 deletions core/src/main/kotlin/com/simiacryptus/cognotik/chat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Chat Client Implementations

This package provides a comprehensive set of chat client implementations for various Large Language Model (LLM) providers. It includes a robust base infrastructure for handling authentication, request/response mapping, usage tracking, and reliability.

## Core Infrastructure

The chat clients are built upon a hierarchical structure that ensures consistency and reduces code duplication:

* **`ChatClientInterface`**: The primary interface defining the contract for all chat clients. It includes methods for sending chat requests (`chat`), retrieving available models (`getModels`), and performing content moderation (`moderate`).
* **`ChatClientBase`**: An abstract base class that integrates with `HttpClientManager`. it provides:
* **Usage Tracking**: Automatically records token usage and calculates costs.
* **Budget Management**: Monitors and enforces session or user-level budgets.
* **Logging**: Detailed logging of requests and responses, including formatted JSON and caller stack traces.
* **Reliability**: Hooks for performance logging and reliability wrappers.
* **`SingleProviderChatClient`**: A specialized base class for providers that follow standard HTTP patterns, simplifying the implementation of `GET` and `POST` operations with provider-specific authorization.

## Provider Implementations

The following provider-specific clients are implemented:

| Client | Provider | Description |
| :--- | :--- | :--- |
| `AnthropicChatClient` | Anthropic | Supports Claude models via the Anthropic Messages API. Handles message consolidation and system prompt mapping. |
| `AwsChatClient` | AWS Bedrock | Integrates with AWS Bedrock using the AWS SDK. Supports a wide range of models including Anthropic Claude, Meta Llama, Mistral, Amazon Titan, and Cohere. |
| `DeepSeekChatClient` | DeepSeek | Implementation for the DeepSeek API, supporting their high-performance reasoning and chat models. |
| `GeminiChatClient` | Google Gemini | REST-based implementation for Google's Gemini API. |
| `GeminiSdkChatClient` | Google Gemini | Implementation using the official Google Gen AI Java SDK, supporting advanced features like image input and Vertex AI integration. |
| `GroqChatClient` | Groq | High-speed inference client for models hosted on Groq's LPU platform. |
| `MistralChatClient` | Mistral AI | Client for Mistral's native API, supporting models like Mistral Large and Mixtral. |
| `ModelsLabChatClient` | ModelsLab | Supports various open-source models via the ModelsLab (formerly Stable Diffusion API) infrastructure, including long-polling for queued responses. |
| `OllamaChatClient` | Ollama | Enables interaction with locally hosted models running via Ollama. |
| `OpenAIChatClient` | OpenAI | Standard implementation for OpenAI's GPT-4, GPT-4o, and o1/o3 series models. |

## Key Features

### Reliability and Performance
Clients utilize `withReliability` and `withPerformanceLogging` blocks to ensure robust execution and provide insights into API latency and success rates.

### Model Discovery
Most clients implement `getModels()`, which dynamically fetches available models from the provider's API and maps them to internal `ChatModel` definitions, often including pricing and context window metadata.

### Message Mapping
The clients handle the complexities of mapping the internal `ModelSchema.ChatRequest` format to provider-specific formats. This includes:
* Consolidating consecutive messages with the same role.
* Handling system prompts (either as a separate field or a specific message role).
* Converting multi-modal content (like images) for supported providers (e.g., Gemini).

### Usage and Budgeting
Every successful chat completion triggers `onUsage`, which:
1. Updates token counts (prompt, completion, total).
2. Calculates cost based on the specific model's pricing.
3. Deducts from the available budget if configured.
4. Notifies registered listeners for downstream tracking or billing.
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Chat Models

The `com.simiacryptus.cognotik.chat.model` package provides a unified abstraction for interacting with various Large Language Model (LLM) providers. It includes core classes for model definition and execution, along with a comprehensive library of predefined model configurations for major AI providers.

## Core Components

### [ChatModel](ChatModel.kt)
The base class for all chat models. It extends `LLMModel` and encapsulates:
- **Metadata**: Model name, provider, and token limits (`maxTotalTokens`, `maxOutTokens`).
- **Pricing**: Logic for calculating costs based on input and output token usage.
- **Serialization**: Custom Jackson serializers/deserializers for persisting model configurations.
- **Instantiation**: The `instance()` method creates a `ChatInterface` for active interaction.

### [ChatInterface](ChatInterface.kt)
Represents an active session with a specific model. It handles:
- **Configuration**: Manages API keys, base URLs, temperature, and logging.
- **Execution**: Provides the `chat()` method to send messages and receive responses via the provider's client.
- **Usage Tracking**: Reports token usage and costs via callbacks.

## Supported Providers and Models

The package includes predefined configurations for a wide array of models across multiple providers:

| Provider | Description | Key Models |
| :--- | :--- | :--- |
| **[AWS](AWSModels.kt)** | Models hosted on AWS Bedrock. | Llama 3.1 (8b to 405b), Mistral Large, Claude 3/3.5/3.7, Amazon Nova, Titan. |
| **[Anthropic](AnthropicModels.kt)** | Native Anthropic Claude models. | Claude 3.5 Haiku, Claude 4/4.5 (Sonnet, Opus, Haiku). |
| **[DeepSeek](DeepSeekModels.kt)** | DeepSeek's specialized models. | DeepSeek Chat, Coder, and Reasoner. |
| **[Gemini](GeminiModels.kt)** | Google's Gemini family. | Gemini 1.5/2.0/2.5/3.0 (Pro, Flash, Flash-Lite). |
| **[Groq](GroqModels.kt)** | High-performance inference models. | Llama 3.3, Qwen 2.5, DeepSeek R1 Distill, Vision models. |
| **[Mistral](MistralModels.kt)** | Mistral AI's native models. | Mistral Large/Medium/Small, Mixtral 8x7B/8x22B, Codestral. |
| **[OpenAI](OpenAIModels.kt)** | OpenAI's flagship models. | GPT-4o, GPT-4.5, O1/O3/O4 series (including Mini and Preview). |
| **[Perplexity](PerplexityModels.kt)** | Search-optimized models. | Sonar Small/Large (Chat and Online variants). |
| **[ModelsLab](ModelsLabModels.kt)** | Open-source models via ModelsLab. | Zephyr, MistralLite, OpenHermes, Dolphin. |

## Usage Example

To use a model, select a predefined instance and create a `ChatInterface`:

```kotlin
val model = OpenAIModels.GPT4o
val chatInterface = model.instance(
key = SecureString("your-api-key"),
temperature = 0.7
)

val response = chatInterface.chat(listOf(
ChatMessage(Role.system, "You are a helpful assistant."),
ChatMessage(Role.user, "Hello!")
))
```

## Data Models

- **[ModelsLabDataModel](ModelsLabDataModel.kt)**: Contains specific request/response structures for the ModelsLab API.
Loading
Loading