-
Notifications
You must be signed in to change notification settings - Fork 124
Open
Labels
Description
Description
Implement advanced token usage and management capabilities to enhance Nanocoder's efficiency and resource optimization. Currently, Nanocoder has a solid foundation with provider-specific tokenization and basic context monitoring, but lacks intelligent context management, advanced usage tracking, and multi-model optimization. The system needs enhanced tokenization for more providers, intelligent context compression, comprehensive usage analytics, and context-aware caching to match industry-leading CLI coding agentic tools.
The feature will implement:
- Enhanced tokenization with support for more providers (Gemini, Mistral, Qwen)
- Intelligent context compression with automatic summarization
- Sliding window context management with pinning capabilities
- Comprehensive usage tracking with session-based analytics
- Context-aware token caching with intelligent invalidation
- Multi-model tokenizer pooling for efficient resource management
Use Case
Current Problem:
- Limited tokenization support for providers beyond OpenAI, Anthropic, and Llama
- No automatic context pruning or compression when approaching limits
- Basic token estimation without intelligent context management
- Limited usage tracking and analytics capabilities
- No multi-model tokenizer optimization
Target Scenarios:
- Enhanced Tokenization: Support for Gemini, Mistral, Qwen and other providers
- Intelligent Context Management: Automatic compression when approaching limits
- Sliding Window Context: Fixed-size context window with important message pinning
- Usage Analytics: Comprehensive session-based tracking and reporting
- Multi-model Optimization: Efficient tokenizer resource management
Proposed Solution
Phase 1: Enhanced Tokenization System (2-3 weeks)
- Implement multi-provider tokenizer support (Gemini, Mistral, Qwen)
- Create
TokenizerPoolfor efficient multi-model resource management - Enhance fallback tokenizer with better estimation algorithms
- Update tokenizer factory with new provider detection
- Create
EnhancedFallbackTokenizerwith content-aware estimation
Phase 2: Intelligent Context Management (3-4 weeks)
- Implement
ContextCompressorwith LLM-powered summarization - Add
SlidingWindowContextManagerfor fixed-size context windows - Create context compression engine with automatic summarization
- Add message pinning capabilities to preserve important context
- Integrate with existing context monitoring system
Phase 3: Advanced Usage Tracking and Optimization (4-5 weeks)
- Implement
TokenUsageTrackerfor session-based analytics - Create
ContextAwareTokenCachewith intelligent invalidation - Add comprehensive usage reporting and analytics
- Implement context-aware cache optimization
- Add usage export and visualization capabilities
Technical Implementation
Core Components
// Enhanced tokenizer factory with more providers
export function createEnhancedTokenizer(
providerName: string,
modelId: string
): Tokenizer {
const provider = detectEnhancedProvider(providerName, modelId);
switch (provider) {
case 'openai':
return new OpenAITokenizer(modelId);
case 'anthropic':
return new AnthropicTokenizer(modelId);
case 'llama':
return new LlamaTokenizer(modelId);
case 'gemini':
return new GeminiTokenizer(modelId);
case 'mistral':
return new MistralTokenizer(modelId);
case 'qwen':
return new QwenTokenizer(modelId);
case 'fallback':
default:
return new EnhancedFallbackTokenizer();
}
}
// Enhanced fallback tokenizer with better estimation
class EnhancedFallbackTokenizer implements Tokenizer {
countTokens(text: string): number {
const charCount = text.length;
const wordCount = text.split(/\s+/).length;
// Adjust estimation based on content characteristics
const adjustmentFactor = this.calculateAdjustmentFactor(text);
return Math.round((charCount / CHARS_PER_TOKEN_ESTIMATE) * adjustmentFactor);
}
private calculateAdjustmentFactor(text: string): number {
// Analyze text characteristics for better estimation
const codeRatio = this.estimateCodeRatio(text);
const punctuationRatio = this.estimatePunctuationRatio(text);
// Code-heavy text typically has higher token density
if (codeRatio > 0.7) return 1.2;
if (codeRatio > 0.4) return 1.1;
// High punctuation might indicate more tokens
if (punctuationRatio > 0.3) return 1.15;
return 1.0;
}
}
// Tokenizer pool for efficient multi-model support
export class TokenizerPool {
private pool: Map<string, Tokenizer> = new Map();
private usageCount: Map<string, number> = new Map();
getTokenizer(provider: string, model: string): Tokenizer {
const key = `${provider}:${model}`;
if (this.pool.has(key)) {
this.usageCount.set(key, (this.usageCount.get(key) || 0) + 1);
return this.pool.get(key)!;
}
const tokenizer = createEnhancedTokenizer(provider, model);
this.pool.set(key, tokenizer);
this.usageCount.set(key, 1);
return tokenizer;
}
releaseTokenizer(provider: string, model: string): void {
const key = `${provider}:${model}`;
const count = this.usageCount.get(key) || 0;
if (count <= 1) {
const tokenizer = this.pool.get(key);
if (tokenizer?.free) {
tokenizer.free();
}
this.pool.delete(key);
this.usageCount.delete(key);
} else {
this.usageCount.set(key, count - 1);
}
}
cleanupUnused(): void {
for (const [key, count] of this.usageCount) {
if (count === 0) {
const tokenizer = this.pool.get(key);
if (tokenizer?.free) {
tokenizer.free();
}
this.pool.delete(key);
this.usageCount.delete(key);
}
}
}
}
// Context compressor with intelligent summarization
export class ContextCompressor {
private summarizationModel: string;
private compressionThreshold: number;
constructor(options: {summarizationModel?: string; threshold?: number} = {}) {
this.summarizationModel = options.summarizationModel || 'gpt-3.5-turbo';
this.compressionThreshold = options.threshold || 0.8;
}
async compressContext(
messages: Message[],
currentTokenCount: number,
contextLimit: number,
tokenizer: Tokenizer
): Promise<Message[]> {
const usageRatio = currentTokenCount / contextLimit;
if (usageRatio < this.compressionThreshold) {
return messages; // No compression needed
}
const compressibleMessages = this.findCompressibleMessages(messages);
if (compressibleMessages.length === 0) {
return messages;
}
const summary = await this.summarizeMessages(compressibleMessages);
return this.replaceWithSummary(messages, compressibleMessages, summary);
}
private findCompressibleMessages(messages: Message[]): Message[] {
const compressible: Message[] = [];
for (let i = 0; i < messages.length; i++) {
const message = messages[i];
// Skip system messages and very recent messages
if (message.role === 'system' || i >= messages.length - 3) {
continue;
}
// Only compress user and assistant messages
if (message.role === 'user' || message.role === 'assistant') {
compressible.push(message);
}
}
return compressible;
}
private async summarizeMessages(messages: Message[]): Promise<Message> {
const summaryPrompt = this.createSummaryPrompt(messages);
const summary = await callSummarizationModel(
summaryPrompt,
this.summarizationModel
);
return {
role: 'system',
content: `[Context Summary] ${summary}`,
contextSummary: true
};
}
}
// Sliding window context manager
export class SlidingWindowContextManager {
private window: Message[] = [];
private maxTokens: number;
private tokenizer: Tokenizer;
constructor(maxTokens: number, tokenizer: Tokenizer) {
this.maxTokens = maxTokens;
this.tokenizer = tokenizer;
}
addMessage(message: Message): void {
const messageTokens = this.tokenizer.countTokens(message);
while (this.getTotalTokens() + messageTokens > this.maxTokens && this.window.length > 0) {
const oldestMessage = this.window[0];
const oldestTokens = this.tokenizer.countTokens(oldestMessage);
if (this.getTotalTokens() + messageTokens - oldestTokens <= this.maxTokens) {
this.window.shift();
} else {
const messagesToRemove = Math.ceil(
(this.getTotalTokens() + messageTokens - this.maxTokens) /
(oldestTokens || 1)
);
for (let i = 0; i < messagesToRemove && this.window.length > 0; i++) {
this.window.shift();
}
}
}
this.window.push(message);
}
getMessages(): Message[] {
return [...this.window];
}
getTotalTokens(): number {
return this.window.reduce(
(sum, msg) => sum + this.tokenizer.countTokens(msg),
0
);
}
// Pin important messages that shouldn't be removed
pinMessage(index: number): void {
if (index >= 0 && index < this.window.length) {
const message = this.window[index];
message.pinned = true;
}
}
}
// Token usage tracker for session-based analytics
export class TokenUsageTracker {
private sessionHistory: UsageSession[] = [];
private currentSession: UsageSession;
private maxSessions: number;
constructor(maxSessions: number = 100) {
this.maxSessions = maxSessions;
this.currentSession = this.createNewSession();
}
private createNewSession(): UsageSession {
return {
id: generateSessionId(),
startTime: Date.now(),
endTime: null,
tokenBreakdown: {
system: 0,
userMessages: 0,
assistantMessages: 0,
toolDefinitions: 0,
toolResults: 0,
total: 0
},
messageCount: 0,
toolUsage: new Map<string, number>(),
modelInfo: null
};
}
startNewSession(modelInfo?: ModelInfo): void {
if (this.currentSession) {
this.currentSession.endTime = Date.now();
this.sessionHistory.unshift(this.currentSession);
if (this.sessionHistory.length > this.maxSessions) {
this.sessionHistory.pop();
}
}
this.currentSession = this.createNewSession();
if (modelInfo) {
this.currentSession.modelInfo = modelInfo;
}
}
trackMessageTokens(message: Message, tokens: number): void {
this.currentSession.messageCount++;
switch (message.role) {
case 'system':
this.currentSession.tokenBreakdown.system += tokens;
break;
case 'user':
this.currentSession.tokenBreakdown.userMessages += tokens;
break;
case 'assistant':
this.currentSession.tokenBreakdown.assistantMessages += tokens;
break;
case 'tool':
this.currentSession.tokenBreakdown.toolResults += tokens;
break;
}
this.currentSession.tokenBreakdown.total += tokens;
}
trackToolUsage(toolName: string, tokenCost: number): void {
const currentCount = this.currentSession.toolUsage.get(toolName) || 0;
this.currentSession.toolUsage.set(toolName, currentCount + 1);
this.currentSession.tokenBreakdown.toolDefinitions += tokenCost;
this.currentSession.tokenBreakdown.total += tokenCost;
}
getCurrentUsage(): TokenBreakdown {
return {...this.currentSession.tokenBreakdown};
}
getSessionHistory(): UsageSession[] {
return [...this.sessionHistory];
}
generateReport(): UsageReport {
const totalTokens = this.sessionHistory.reduce(
(sum, session) => sum + session.tokenBreakdown.total,
0
);
const avgPerSession = this.sessionHistory.length > 0
? totalTokens / this.sessionHistory.length
: 0;
return {
totalSessions: this.sessionHistory.length,
totalTokens,
averagePerSession: avgPerSession,
breakdownByCategory: this.aggregateBreakdown(),
topTools: this.getTopTools()
};
}
}
// Context-aware token cache with intelligent invalidation
export class ContextAwareTokenCache {
private cache: Map<string, number>;
private contextHash: string = '';
private maxSize: number;
constructor(maxSize: number = 1000) {
this.maxSize = maxSize;
this.cache = new Map();
}
getCachedTokens(
message: Message,
tokenizer: Tokenizer,
context: ConversationContext
): number {
const currentContextHash = this.calculateContextHash(context);
const cacheKey = this.getCacheKey(message, tokenizer);
if (currentContextHash !== this.contextHash) {
this.invalidateStaleEntries(currentContextHash);
this.contextHash = currentContextHash;
}
if (this.cache.has(cacheKey)) {
return this.cache.get(cacheKey)!;
}
const tokens = tokenizer.countTokens(message);
if (this.cache.size >= this.maxSize) {
const oldestKey = this.cache.keys().next().value;
this.cache.delete(oldestKey);
}
this.cache.set(cacheKey, tokens);
return tokens;
}
private calculateContextHash(context: ConversationContext): string {
const factors = [
context.messagesBeforeToolExecution.length,
context.systemMessage.content?.length || 0,
context.assistantMsg.content?.length || 0
];
return factors.join('|');
}
private getCacheKey(message: Message, tokenizer: Tokenizer): string {
const tokenizerType = this.getTokenizerType(tokenizer);
return `${tokenizerType}:${message.content}:${message.role}`;
}
private invalidateStaleEntries(newContextHash: string): void {
const hashDiff = this.calculateHashDifference(this.contextHash, newContextHash);
if (hashDiff > CONTEXT_CHANGE_THRESHOLD) {
this.cache.clear();
} else {
this.pruneOldEntries(0.5); // Keep 50% of cache
}
}
}Integration Points
- Tokenizer Factory: Enhance
source/tokenization/tokenizer-factory.tswith new providers - App State: Integrate with
source/hooks/useAppState.tsxfor token caching - Context Checker: Enhance
source/hooks/chat-handler/utils/context-checker.tsxwith compression - Usage Calculator: Update
source/usage/calculator.tswith enhanced tracking - Constants: Update
source/constants.tswith new thresholds and configurations - Chat Handler: Integrate with
source/hooks/chat-handler/conversation/conversation-loop.tsx
Files to Modify/Create
source/tokenization/enhanced-tokenizer-factory.ts(new) - Enhanced provider supportsource/tokenization/tokenizer-pool.ts(new) - Multi-model resource managementsource/tokenization/enhanced-fallback-tokenizer.ts(new) - Better estimationsource/context/context-compressor.ts(new) - Intelligent compressionsource/context/sliding-window-context-manager.ts(new) - Fixed-size windowssource/usage/token-usage-tracker.ts(new) - Session analyticssource/usage/context-aware-token-cache.ts(new) - Intelligent cachingsource/tokenization/tokenizer-factory.ts(modify) - Enhanced provider detectionsource/hooks/useAppState.tsx(enhance) - Token caching integrationsource/hooks/chat-handler/utils/context-checker.tsx(enhance) - Compression integrationsource/usage/calculator.ts(enhance) - Enhanced trackingsource/constants.ts(enhance) - New thresholds and configssource/components/usage/usage-display.tsx(enhance) - Analytics visualization
Alternatives Considered
- Simple Tokenization Extension: Considered but rejected for limited context management
- Basic Context Pruning: Rejected for lack of intelligent summarization
- Static Token Limits: Rejected for inability to adapt to conversation complexity
- Monolithic Token System: Rejected for poor maintainability and scalability
Additional Context
- I have searched existing issues to ensure this is not a duplicate
- This feature aligns with the project's goals (local-first AI assistance)
- The implementation considers local LLM performance constraints
- Memory efficiency is prioritized for local usage
Performance Considerations
- Efficient tokenizer pooling for multi-model scenarios
- Memory-optimized context management algorithms
- Incremental context compression to minimize memory usage
- Optimized token calculation performance
Local LLM Adaptations
- Memory-efficient tokenizer instances
- Lightweight context compression algorithms
- Resource-aware token estimation
- Progressive enhancement for local model capabilities
Token Management Benefits
- Enhanced tokenization support for multiple providers
- Intelligent context compression with automatic summarization
- Sliding window management with message pinning
- Comprehensive usage tracking with session analytics
- Context-aware caching with intelligent invalidation
Implementation Notes (optional)
Key Integration Points
- Integrate with existing tokenizer factory system
- Connect to context monitoring and checking
- Enhance usage calculation and display
- Add to chat handler for compression integration
- Connect with UI components for analytics visualization
Testing Strategy
- Unit tests for tokenization algorithms
- Integration tests for context compression
- Performance tests for token caching
- Memory usage monitoring for tokenizer pools
- Context management efficiency testing
Migration Path
- All new features will be optional and backward compatible
- Existing tokenization remains as fallback
- Gradual rollout with feature flags
- User preferences for token management features
Success Metrics
- Tokenization Accuracy: <5% error rate for supported providers
- Cache Hit Rate: 90%+ for typical usage patterns
- Context Compression: 30-50% reduction when needed
- Performance Impact: <10ms overhead per message
- Memory Usage: Keep tokenizer pool under 10MB
- User Satisfaction: Reduced manual context management
- Safety: 95%+ fewer context limit errors
Reactions are currently unavailable