[Feature] Advanced Token Usage and Management System

## Description

Implement advanced token usage and management capabilities to enhance Nanocoder's efficiency and resource optimization. Currently, Nanocoder has a solid foundation with provider-specific tokenization and basic context monitoring, but lacks intelligent context management, advanced usage tracking, and multi-model optimization. The system needs enhanced tokenization for more providers, intelligent context compression, comprehensive usage analytics, and context-aware caching to match industry-leading CLI coding agentic tools.

The feature will implement:
- Enhanced tokenization with support for more providers (Gemini, Mistral, Qwen)
- Intelligent context compression with automatic summarization
- Sliding window context management with pinning capabilities
- Comprehensive usage tracking with session-based analytics
- Context-aware token caching with intelligent invalidation
- Multi-model tokenizer pooling for efficient resource management

## Use Case

**Current Problem:**
- Limited tokenization support for providers beyond OpenAI, Anthropic, and Llama
- No automatic context pruning or compression when approaching limits
- Basic token estimation without intelligent context management
- Limited usage tracking and analytics capabilities
- No multi-model tokenizer optimization

**Target Scenarios:**
1. **Enhanced Tokenization**: Support for Gemini, Mistral, Qwen and other providers
2. **Intelligent Context Management**: Automatic compression when approaching limits
3. **Sliding Window Context**: Fixed-size context window with important message pinning
4. **Usage Analytics**: Comprehensive session-based tracking and reporting
5. **Multi-model Optimization**: Efficient tokenizer resource management

## Proposed Solution

### Phase 1: Enhanced Tokenization System (2-3 weeks)
- Implement multi-provider tokenizer support (Gemini, Mistral, Qwen)
- Create `TokenizerPool` for efficient multi-model resource management
- Enhance fallback tokenizer with better estimation algorithms
- Update tokenizer factory with new provider detection
- Create `EnhancedFallbackTokenizer` with content-aware estimation

### Phase 2: Intelligent Context Management (3-4 weeks)
- Implement `ContextCompressor` with LLM-powered summarization
- Add `SlidingWindowContextManager` for fixed-size context windows
- Create context compression engine with automatic summarization
- Add message pinning capabilities to preserve important context
- Integrate with existing context monitoring system

### Phase 3: Advanced Usage Tracking and Optimization (4-5 weeks)
- Implement `TokenUsageTracker` for session-based analytics
- Create `ContextAwareTokenCache` with intelligent invalidation
- Add comprehensive usage reporting and analytics
- Implement context-aware cache optimization
- Add usage export and visualization capabilities

### Technical Implementation

#### Core Components
```typescript
// Enhanced tokenizer factory with more providers
export function createEnhancedTokenizer(
  providerName: string,
  modelId: string
): Tokenizer {
  const provider = detectEnhancedProvider(providerName, modelId);

  switch (provider) {
    case 'openai':
      return new OpenAITokenizer(modelId);
    case 'anthropic':
      return new AnthropicTokenizer(modelId);
    case 'llama':
      return new LlamaTokenizer(modelId);
    case 'gemini':
      return new GeminiTokenizer(modelId);
    case 'mistral':
      return new MistralTokenizer(modelId);
    case 'qwen':
      return new QwenTokenizer(modelId);
    case 'fallback':
    default:
      return new EnhancedFallbackTokenizer();
  }
}

// Enhanced fallback tokenizer with better estimation
class EnhancedFallbackTokenizer implements Tokenizer {
  countTokens(text: string): number {
    const charCount = text.length;
    const wordCount = text.split(/\s+/).length;

    // Adjust estimation based on content characteristics
    const adjustmentFactor = this.calculateAdjustmentFactor(text);
    return Math.round((charCount / CHARS_PER_TOKEN_ESTIMATE) * adjustmentFactor);
  }

  private calculateAdjustmentFactor(text: string): number {
    // Analyze text characteristics for better estimation
    const codeRatio = this.estimateCodeRatio(text);
    const punctuationRatio = this.estimatePunctuationRatio(text);

    // Code-heavy text typically has higher token density
    if (codeRatio > 0.7) return 1.2;
    if (codeRatio > 0.4) return 1.1;

    // High punctuation might indicate more tokens
    if (punctuationRatio > 0.3) return 1.15;

    return 1.0;
  }
}

// Tokenizer pool for efficient multi-model support
export class TokenizerPool {
  private pool: Map<string, Tokenizer> = new Map();
  private usageCount: Map<string, number> = new Map();

  getTokenizer(provider: string, model: string): Tokenizer {
    const key = `${provider}:${model}`;

    if (this.pool.has(key)) {
      this.usageCount.set(key, (this.usageCount.get(key) || 0) + 1);
      return this.pool.get(key)!;
    }

    const tokenizer = createEnhancedTokenizer(provider, model);
    this.pool.set(key, tokenizer);
    this.usageCount.set(key, 1);

    return tokenizer;
  }

  releaseTokenizer(provider: string, model: string): void {
    const key = `${provider}:${model}`;
    const count = this.usageCount.get(key) || 0;

    if (count <= 1) {
      const tokenizer = this.pool.get(key);
      if (tokenizer?.free) {
        tokenizer.free();
      }
      this.pool.delete(key);
      this.usageCount.delete(key);
    } else {
      this.usageCount.set(key, count - 1);
    }
  }

  cleanupUnused(): void {
    for (const [key, count] of this.usageCount) {
      if (count === 0) {
        const tokenizer = this.pool.get(key);
        if (tokenizer?.free) {
          tokenizer.free();
        }
        this.pool.delete(key);
        this.usageCount.delete(key);
      }
    }
  }
}

// Context compressor with intelligent summarization
export class ContextCompressor {
  private summarizationModel: string;
  private compressionThreshold: number;

  constructor(options: {summarizationModel?: string; threshold?: number} = {}) {
    this.summarizationModel = options.summarizationModel || 'gpt-3.5-turbo';
    this.compressionThreshold = options.threshold || 0.8;
  }

  async compressContext(
    messages: Message[],
    currentTokenCount: number,
    contextLimit: number,
    tokenizer: Tokenizer
  ): Promise<Message[]> {
    const usageRatio = currentTokenCount / contextLimit;

    if (usageRatio < this.compressionThreshold) {
      return messages; // No compression needed
    }

    const compressibleMessages = this.findCompressibleMessages(messages);

    if (compressibleMessages.length === 0) {
      return messages;
    }

    const summary = await this.summarizeMessages(compressibleMessages);
    return this.replaceWithSummary(messages, compressibleMessages, summary);
  }

  private findCompressibleMessages(messages: Message[]): Message[] {
    const compressible: Message[] = [];

    for (let i = 0; i < messages.length; i++) {
      const message = messages[i];

      // Skip system messages and very recent messages
      if (message.role === 'system' || i >= messages.length - 3) {
        continue;
      }

      // Only compress user and assistant messages
      if (message.role === 'user' || message.role === 'assistant') {
        compressible.push(message);
      }
    }

    return compressible;
  }

  private async summarizeMessages(messages: Message[]): Promise<Message> {
    const summaryPrompt = this.createSummaryPrompt(messages);

    const summary = await callSummarizationModel(
      summaryPrompt,
      this.summarizationModel
    );

    return {
      role: 'system',
      content: `[Context Summary] ${summary}`,
      contextSummary: true
    };
  }
}

// Sliding window context manager
export class SlidingWindowContextManager {
  private window: Message[] = [];
  private maxTokens: number;
  private tokenizer: Tokenizer;

  constructor(maxTokens: number, tokenizer: Tokenizer) {
    this.maxTokens = maxTokens;
    this.tokenizer = tokenizer;
  }

  addMessage(message: Message): void {
    const messageTokens = this.tokenizer.countTokens(message);

    while (this.getTotalTokens() + messageTokens > this.maxTokens && this.window.length > 0) {
      const oldestMessage = this.window[0];
      const oldestTokens = this.tokenizer.countTokens(oldestMessage);

      if (this.getTotalTokens() + messageTokens - oldestTokens <= this.maxTokens) {
        this.window.shift();
      } else {
        const messagesToRemove = Math.ceil(
          (this.getTotalTokens() + messageTokens - this.maxTokens) /
          (oldestTokens || 1)
        );

        for (let i = 0; i < messagesToRemove && this.window.length > 0; i++) {
          this.window.shift();
        }
      }
    }

    this.window.push(message);
  }

  getMessages(): Message[] {
    return [...this.window];
  }

  getTotalTokens(): number {
    return this.window.reduce(
      (sum, msg) => sum + this.tokenizer.countTokens(msg),
      0
    );
  }

  // Pin important messages that shouldn't be removed
  pinMessage(index: number): void {
    if (index >= 0 && index < this.window.length) {
      const message = this.window[index];
      message.pinned = true;
    }
  }
}

// Token usage tracker for session-based analytics
export class TokenUsageTracker {
  private sessionHistory: UsageSession[] = [];
  private currentSession: UsageSession;
  private maxSessions: number;

  constructor(maxSessions: number = 100) {
    this.maxSessions = maxSessions;
    this.currentSession = this.createNewSession();
  }

  private createNewSession(): UsageSession {
    return {
      id: generateSessionId(),
      startTime: Date.now(),
      endTime: null,
      tokenBreakdown: {
        system: 0,
        userMessages: 0,
        assistantMessages: 0,
        toolDefinitions: 0,
        toolResults: 0,
        total: 0
      },
      messageCount: 0,
      toolUsage: new Map<string, number>(),
      modelInfo: null
    };
  }

  startNewSession(modelInfo?: ModelInfo): void {
    if (this.currentSession) {
      this.currentSession.endTime = Date.now();
      this.sessionHistory.unshift(this.currentSession);

      if (this.sessionHistory.length > this.maxSessions) {
        this.sessionHistory.pop();
      }
    }

    this.currentSession = this.createNewSession();
    if (modelInfo) {
      this.currentSession.modelInfo = modelInfo;
    }
  }

  trackMessageTokens(message: Message, tokens: number): void {
    this.currentSession.messageCount++;

    switch (message.role) {
      case 'system':
        this.currentSession.tokenBreakdown.system += tokens;
        break;
      case 'user':
        this.currentSession.tokenBreakdown.userMessages += tokens;
        break;
      case 'assistant':
        this.currentSession.tokenBreakdown.assistantMessages += tokens;
        break;
      case 'tool':
        this.currentSession.tokenBreakdown.toolResults += tokens;
        break;
    }

    this.currentSession.tokenBreakdown.total += tokens;
  }

  trackToolUsage(toolName: string, tokenCost: number): void {
    const currentCount = this.currentSession.toolUsage.get(toolName) || 0;
    this.currentSession.toolUsage.set(toolName, currentCount + 1);
    this.currentSession.tokenBreakdown.toolDefinitions += tokenCost;
    this.currentSession.tokenBreakdown.total += tokenCost;
  }

  getCurrentUsage(): TokenBreakdown {
    return {...this.currentSession.tokenBreakdown};
  }

  getSessionHistory(): UsageSession[] {
    return [...this.sessionHistory];
  }

  generateReport(): UsageReport {
    const totalTokens = this.sessionHistory.reduce(
      (sum, session) => sum + session.tokenBreakdown.total,
      0
    );

    const avgPerSession = this.sessionHistory.length > 0
      ? totalTokens / this.sessionHistory.length
      : 0;

    return {
      totalSessions: this.sessionHistory.length,
      totalTokens,
      averagePerSession: avgPerSession,
      breakdownByCategory: this.aggregateBreakdown(),
      topTools: this.getTopTools()
    };
  }
}

// Context-aware token cache with intelligent invalidation
export class ContextAwareTokenCache {
  private cache: Map<string, number>;
  private contextHash: string = '';
  private maxSize: number;

  constructor(maxSize: number = 1000) {
    this.maxSize = maxSize;
    this.cache = new Map();
  }

  getCachedTokens(
    message: Message,
    tokenizer: Tokenizer,
    context: ConversationContext
  ): number {
    const currentContextHash = this.calculateContextHash(context);
    const cacheKey = this.getCacheKey(message, tokenizer);

    if (currentContextHash !== this.contextHash) {
      this.invalidateStaleEntries(currentContextHash);
      this.contextHash = currentContextHash;
    }

    if (this.cache.has(cacheKey)) {
      return this.cache.get(cacheKey)!;
    }

    const tokens = tokenizer.countTokens(message);

    if (this.cache.size >= this.maxSize) {
      const oldestKey = this.cache.keys().next().value;
      this.cache.delete(oldestKey);
    }

    this.cache.set(cacheKey, tokens);
    return tokens;
  }

  private calculateContextHash(context: ConversationContext): string {
    const factors = [
      context.messagesBeforeToolExecution.length,
      context.systemMessage.content?.length || 0,
      context.assistantMsg.content?.length || 0
    ];

    return factors.join('|');
  }

  private getCacheKey(message: Message, tokenizer: Tokenizer): string {
    const tokenizerType = this.getTokenizerType(tokenizer);
    return `${tokenizerType}:${message.content}:${message.role}`;
  }

  private invalidateStaleEntries(newContextHash: string): void {
    const hashDiff = this.calculateHashDifference(this.contextHash, newContextHash);

    if (hashDiff > CONTEXT_CHANGE_THRESHOLD) {
      this.cache.clear();
    } else {
      this.pruneOldEntries(0.5); // Keep 50% of cache
    }
  }
}
```

#### Integration Points
- **Tokenizer Factory**: Enhance `source/tokenization/tokenizer-factory.ts` with new providers
- **App State**: Integrate with `source/hooks/useAppState.tsx` for token caching
- **Context Checker**: Enhance `source/hooks/chat-handler/utils/context-checker.tsx` with compression
- **Usage Calculator**: Update `source/usage/calculator.ts` with enhanced tracking
- **Constants**: Update `source/constants.ts` with new thresholds and configurations
- **Chat Handler**: Integrate with `source/hooks/chat-handler/conversation/conversation-loop.tsx`

#### Files to Modify/Create
- `source/tokenization/enhanced-tokenizer-factory.ts` (new) - Enhanced provider support
- `source/tokenization/tokenizer-pool.ts` (new) - Multi-model resource management
- `source/tokenization/enhanced-fallback-tokenizer.ts` (new) - Better estimation
- `source/context/context-compressor.ts` (new) - Intelligent compression
- `source/context/sliding-window-context-manager.ts` (new) - Fixed-size windows
- `source/usage/token-usage-tracker.ts` (new) - Session analytics
- `source/usage/context-aware-token-cache.ts` (new) - Intelligent caching
- `source/tokenization/tokenizer-factory.ts` (modify) - Enhanced provider detection
- `source/hooks/useAppState.tsx` (enhance) - Token caching integration
- `source/hooks/chat-handler/utils/context-checker.tsx` (enhance) - Compression integration
- `source/usage/calculator.ts` (enhance) - Enhanced tracking
- `source/constants.ts` (enhance) - New thresholds and configs
- `source/components/usage/usage-display.tsx` (enhance) - Analytics visualization

## Alternatives Considered

1. **Simple Tokenization Extension**: Considered but rejected for limited context management
2. **Basic Context Pruning**: Rejected for lack of intelligent summarization
3. **Static Token Limits**: Rejected for inability to adapt to conversation complexity
4. **Monolithic Token System**: Rejected for poor maintainability and scalability

## Additional Context

- [x] I have searched existing issues to ensure this is not a duplicate
- [x] This feature aligns with the project's goals (local-first AI assistance)
- [x] The implementation considers local LLM performance constraints
- [x] Memory efficiency is prioritized for local usage

### Performance Considerations
- Efficient tokenizer pooling for multi-model scenarios
- Memory-optimized context management algorithms
- Incremental context compression to minimize memory usage
- Optimized token calculation performance

### Local LLM Adaptations
- Memory-efficient tokenizer instances
- Lightweight context compression algorithms
- Resource-aware token estimation
- Progressive enhancement for local model capabilities

### Token Management Benefits
- Enhanced tokenization support for multiple providers
- Intelligent context compression with automatic summarization
- Sliding window management with message pinning
- Comprehensive usage tracking with session analytics
- Context-aware caching with intelligent invalidation

## Implementation Notes (optional)

### Key Integration Points
- Integrate with existing tokenizer factory system
- Connect to context monitoring and checking
- Enhance usage calculation and display
- Add to chat handler for compression integration
- Connect with UI components for analytics visualization

### Testing Strategy
- Unit tests for tokenization algorithms
- Integration tests for context compression
- Performance tests for token caching
- Memory usage monitoring for tokenizer pools
- Context management efficiency testing

### Migration Path
- All new features will be optional and backward compatible
- Existing tokenization remains as fallback
- Gradual rollout with feature flags
- User preferences for token management features

### Success Metrics
- **Tokenization Accuracy**: <5% error rate for supported providers
- **Cache Hit Rate**: 90%+ for typical usage patterns
- **Context Compression**: 30-50% reduction when needed
- **Performance Impact**: <10ms overhead per message
- **Memory Usage**: Keep tokenizer pool under 10MB
- **User Satisfaction**: Reduced manual context management
- **Safety**: 95%+ fewer context limit errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Advanced Token Usage and Management System #289

Description

Use Case

Proposed Solution

Phase 1: Enhanced Tokenization System (2-3 weeks)

Phase 2: Intelligent Context Management (3-4 weeks)

Phase 3: Advanced Usage Tracking and Optimization (4-5 weeks)

Technical Implementation

Core Components

Integration Points

Files to Modify/Create

Alternatives Considered

Additional Context

Performance Considerations

Local LLM Adaptations

Token Management Benefits

Implementation Notes (optional)

Key Integration Points

Testing Strategy

Migration Path

Success Metrics

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Advanced Token Usage and Management System #289

Description

Description

Use Case

Proposed Solution

Phase 1: Enhanced Tokenization System (2-3 weeks)

Phase 2: Intelligent Context Management (3-4 weeks)

Phase 3: Advanced Usage Tracking and Optimization (4-5 weeks)

Technical Implementation

Core Components

Integration Points

Files to Modify/Create

Alternatives Considered

Additional Context

Performance Considerations

Local LLM Adaptations

Token Management Benefits

Implementation Notes (optional)

Key Integration Points

Testing Strategy

Migration Path

Success Metrics

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions