Skip to content

[Feature] Advanced Project Knowledge Persistence System #286

@Avtrkrb

Description

@Avtrkrb

Description

Implement comprehensive project knowledge persistence capabilities to enhance Nanocoder's ability to maintain and leverage project-specific knowledge across sessions. Currently, Nanocoder has robust checkpoint systems for conversation state but lacks dedicated project knowledge management. The system needs centralized knowledge storage, intelligent search and discovery, knowledge evolution mechanisms, and cross-project sharing capabilities to match industry-leading CLI coding agentic tools.

The feature will implement:

  • Project knowledge base with centralized storage and management
  • Intelligent knowledge search and discovery with semantic search
  • Knowledge evolution and learning mechanisms with AI-powered refinement
  • Cross-project knowledge sharing and adaptation
  • Context-aware knowledge preservation and retrieval
  • Knowledge visualization and analytics

Use Case

Current Problem:

  • No explicit project knowledge base - knowledge siloed in individual checkpoints
  • Limited knowledge sharing between projects
  • Basic search capabilities without semantic understanding
  • Static knowledge storage without evolution mechanisms
  • Context lost between sessions with no project-level knowledge persistence

Target Scenarios:

  1. Project Knowledge Base: Centralized repository for project-specific knowledge
  2. Intelligent Search: Semantic search and discovery of relevant information
  3. Knowledge Evolution: AI-powered refinement and improvement of knowledge
  4. Cross-project Sharing: Reuse knowledge across different projects
  5. Context Preservation: Maintain project context across sessions

Proposed Solution

Phase 1: Basic Knowledge Management Foundation (4-5 weeks)

  • Implement core knowledge data structures and storage
  • Create BasicKnowledgeManager for simple knowledge operations
  • Add knowledge visualization components
  • Integrate with existing checkpoint system
  • Implement knowledge history tracking

Phase 2: Enhanced Knowledge Features (5-6 weeks)

  • Create KnowledgeSearchManager with advanced search capabilities
  • Add semantic search and context-aware discovery
  • Implement knowledge categorization and tagging
  • Add knowledge versioning and history
  • Create knowledge quality assessment mechanisms

Phase 3: Knowledge Persistence and Integration (4-5 weeks)

  • Create KnowledgePersistenceManager for state management
  • Integrate with existing storage infrastructure
  • Add session management and cleanup
  • Implement knowledge export and import
  • Add error recovery and validation

Phase 4: Advanced Knowledge Features (5-6 weeks)

  • Create KnowledgeEvolutionManager for AI-powered refinement
  • Add cross-project knowledge sharing with KnowledgeSharingManager
  • Implement knowledge gap identification
  • Create knowledge digests and analytics
  • Add team collaboration features

Technical Implementation

Core Components

// Core knowledge data structures
export interface KnowledgeEntry {
  id: string;
  content: string;
  projectId: string;
  createdAt: Date;
  updatedAt: Date;
  source: 'user' | 'tool' | 'system' | 'shared';
  tags?: string[];
  metadata?: Record<string, any>;
  context?: KnowledgeContext;
}

export interface KnowledgeContext {
  conversationId?: string;
  toolExecutionId?: string;
  relevantFiles?: string[];
  timestamp: Date;
}

export interface ProjectKnowledge {
  projectId: string;
  knowledgeEntries: KnowledgeEntry[];
  tags: Set<string>;
  lastUpdated: Date;
  metadata?: ProjectMetadata;
}

export interface KnowledgeSearchResult {
  knowledge: KnowledgeEntry;
  score: number; // Relevance score
  projectId: string;
  contextMatch?: KnowledgeContext;
}

// Basic knowledge manager
export class BasicKnowledgeManager {
  private knowledgeStore: Map<string, KnowledgeEntry> = new Map();
  private projectIndex: Map<string, ProjectKnowledge> = new Map();
  private maxEntriesPerProject: number;

  constructor(maxEntriesPerProject: number = 1000) {
    this.maxEntriesPerProject = maxEntriesPerProject;
  }

  addKnowledge(
    knowledge: Omit<KnowledgeEntry, 'id' | 'createdAt' | 'updatedAt'>
  ): KnowledgeEntry {
    const entry: KnowledgeEntry = {
      id: this.generateKnowledgeId(),
      createdAt: new Date(),
      updatedAt: new Date(),
      ...knowledge
    };

    // Store knowledge
    this.knowledgeStore.set(entry.id, entry);

    // Add to project index
    if (!this.projectIndex.has(entry.projectId)) {
      this.projectIndex.set(entry.projectId, {
        projectId: entry.projectId,
        knowledgeEntries: [],
        tags: new Set(),
        lastUpdated: new Date()
      });
    }

    const project = this.projectIndex.get(entry.projectId);
    if (project) {
      project.knowledgeEntries.push(entry);
      project.lastUpdated = new Date();

      // Update tags
      entry.tags?.forEach(tag => project.tags.add(tag));

      // Enforce limit
      if (project.knowledgeEntries.length > this.maxEntriesPerProject) {
        this.cleanupOldKnowledge(entry.projectId);
      }
    }

    return entry;
  }

  getKnowledge(knowledgeId: string): KnowledgeEntry | undefined {
    return this.knowledgeStore.get(knowledgeId);
  }

  getProjectKnowledge(projectId: string): KnowledgeEntry[] {
    return this.projectIndex.get(projectId)?.knowledgeEntries || [];
  }

  private generateKnowledgeId(): string {
    return `kn_${Date.now()}_${Math.random().toString(36).substring(2, 8)}`;
  }

  private cleanupOldKnowledge(projectId: string): void {
    const project = this.projectIndex.get(projectId);
    if (!project) return;

    // Sort by updated date (oldest first)
    project.knowledgeEntries.sort(
      (a, b) => a.updatedAt.getTime() - b.updatedAt.getTime()
    );

    // Remove excess entries
    while (project.knowledgeEntries.length > this.maxEntriesPerProject) {
      const oldEntry = project.knowledgeEntries.shift();
      if (oldEntry) {
        this.knowledgeStore.delete(oldEntry.id);
      }
    }
  }
}

// Enhanced knowledge search manager
export class KnowledgeSearchManager extends BasicKnowledgeManager {
  private searchIndex: Map<string, SearchableKnowledge> = new Map();

  override addKnowledge(
    knowledge: Omit<KnowledgeEntry, 'id' | 'createdAt' | 'updatedAt'>
  ): KnowledgeEntry {
    const entry = super.addKnowledge(knowledge);

    // Add to search index
    const searchable: SearchableKnowledge = {
      id: entry.id,
      projectId: entry.projectId,
      content: entry.content,
      tags: entry.tags || [],
      tokens: this.tokenizeContent(entry.content),
      timestamp: entry.createdAt
    };

    this.searchIndex.set(entry.id, searchable);

    return entry;
  }

  private tokenizeContent(content: string): string[] {
    return content
      .toLowerCase()
      .split(/\s+/)
      .filter(token => token.length > 2)
      .map(token => token.replace(/[^\w-]/g, ''));
  }

  searchKnowledge(
    query: string,
    options: {
      projectId?: string;
      tags?: string[];
      limit?: number;
    } = {}
  ): KnowledgeSearchResult[] {
    const queryTokens = this.tokenizeContent(query);
    if (queryTokens.length === 0) return [];

    const candidates = options.projectId
      ? Array.from(this.searchIndex.values())
          .filter(entry => entry.projectId === options.projectId)
      : Array.from(this.searchIndex.values());

    const tagFiltered = options.tags && options.tags.length > 0
      ? candidates.filter(entry =>
          options.tags!.some(tag => entry.tags.includes(tag))
        )
      : candidates;

    const scoredResults = tagFiltered.map(entry => {
      const score = this.calculateRelevanceScore(entry.tokens, queryTokens);
      return {
        knowledge: this.knowledgeStore.get(entry.id)!,
        score,
        projectId: entry.projectId
      };
    });

    return scoredResults
      .sort((a, b) => b.score - a.score)
      .slice(0, options.limit || 10);
  }

  private calculateRelevanceScore(
    documentTokens: string[],
    queryTokens: string[]
  ): number {
    const matchingTokens = queryTokens.filter(token =>
      documentTokens.includes(token)
    );

    return matchingTokens.length / queryTokens.length;
  }
}

// Knowledge evolution manager
export class KnowledgeEvolutionManager {
  private knowledgeManager: KnowledgeSearchManager;
  private llmClient: LLMClient;

  constructor(
    knowledgeManager: KnowledgeSearchManager,
    llmClient: LLMClient
  ) {
    this.knowledgeManager = knowledgeManager;
    this.llmClient = llmClient;
  }

  async refineKnowledge(
    knowledgeId: string,
    feedback: KnowledgeFeedback
  ): Promise<KnowledgeEntry> {
    const knowledge = this.knowledgeManager.getKnowledge(knowledgeId);
    if (!knowledge) throw new Error('Knowledge not found');

    const refinementPrompt = `
      Refine this knowledge based on user feedback:

      Original Knowledge:
      ${knowledge.content}

      User Feedback:
      ${feedback.comment}

      Feedback Type: ${feedback.type}
      Quality Rating: ${feedback.quality}/5

      Provide an improved version that:
      - Incorporates the feedback
      - Maintains accuracy
      - Improves clarity
      - Adds relevant details

      Format: "Refined Knowledge:\n[improved content]"
    `;

    const response = await this.llmClient.chat([
      {role: 'system', content: refinementPrompt}
    ], {});

    const refinedContent = this.extractRefinedContent(response.messages[0].content);

    const updatedKnowledge = await this.knowledgeManager.updateKnowledge(
      knowledgeId,
      {
        content: refinedContent,
        metadata: {
          ...knowledge.metadata,
          refinements: [
            ...(knowledge.metadata?.refinements || []),
            {
              timestamp: new Date(),
              feedback,
              originalContent: knowledge.content
            }
          ]
        }
      }
    );

    return updatedKnowledge;
  }

  private extractRefinedContent(response: string): string {
    const match = response.match(/Refined Knowledge:\n(.+)/s);
    return match?.[1]?.trim() || response;
  }
}

// Knowledge sharing manager
export class KnowledgeSharingManager {
  private knowledgeManager: KnowledgeSearchManager;
  private knowledgeEvolution: KnowledgeEvolutionManager;

  constructor(
    knowledgeManager: KnowledgeSearchManager,
    knowledgeEvolution: KnowledgeEvolutionManager
  ) {
    this.knowledgeManager = knowledgeManager;
    this.knowledgeEvolution = knowledgeEvolution;
  }

  async shareKnowledge(
    knowledgeId: string,
    sourceProjectId: string,
    targetProjectId: string,
    options: {
      generalize?: boolean;
      adaptToContext?: KnowledgeContext;
    } = {}
  ): Promise<KnowledgeEntry> {
    const sourceKnowledge = this.knowledgeManager.getKnowledge(knowledgeId);
    if (!sourceKnowledge) throw new Error('Knowledge not found');

    if (options.generalize) {
      return this.knowledgeEvolution.generalizeKnowledge(
        knowledgeId,
        targetProjectId
      );
    } else if (options.adaptToContext) {
      return this.adaptKnowledgeToContext(
        sourceKnowledge,
        targetProjectId,
        options.adaptToContext
      );
    } else {
      return this.directShareKnowledge(
        sourceKnowledge,
        targetProjectId
      );
    }
  }

  private async adaptKnowledgeToContext(
    knowledge: KnowledgeEntry,
    targetProjectId: string,
    targetContext: KnowledgeContext
  ): Promise<KnowledgeEntry> {
    const adaptationPrompt = `
      Adapt this knowledge to the target context:

      Original Knowledge:
      ${knowledge.content}

      Original Context:
      Project: ${knowledge.projectId}
      Files: ${knowledge.context?.relevantFiles?.join(', ') || 'None'}

      Target Context:
      Project: ${targetProjectId}
      Files: ${targetContext.relevantFiles?.join(', ') || 'None'}

      Create an adapted version that:
      - Maintains core concepts
      - Updates project-specific references
      - Adjusts file paths and names
      - Preserves accuracy

      Format: "Adapted Knowledge:\n[adapted content]"
    `;

    const response = await this.llmClient.chat([
      {role: 'system', content: adaptationPrompt}
    ], {});

    const adaptedContent = this.extractAdaptedContent(response.messages[0].content);

    return this.knowledgeManager.addKnowledge({
      content: adaptedContent,
      projectId: targetProjectId,
      source: 'adapted',
      tags: [...(knowledge.tags || []), 'adapted'],
      context: targetContext,
      metadata: {
        adaptedFrom: knowledge.id,
        originalProject: knowledge.projectId,
        adaptationTimestamp: new Date()
      }
    });
  }
}

Integration Points

  • Checkpoint System: Enhance source/services/checkpoint-manager.ts with knowledge persistence
  • Usage Tracking: Integrate with source/usage/storage.ts and source/usage/tracker.ts
  • Prompt History: Enhance source/prompt-history.ts with knowledge integration
  • Conversation Loop: Integrate with source/hooks/chat-handler/conversation/conversation-loop.tsx
  • Tool System: Connect with tool execution for knowledge extraction
  • UI Components: Add knowledge visualization to existing UI

Files to Modify/Create

  • source/knowledge/types.ts (new) - Knowledge data structures
  • source/knowledge/basic-knowledge-manager.ts (new) - Basic knowledge operations
  • source/knowledge/knowledge-search-manager.ts (new) - Advanced search capabilities
  • source/knowledge/knowledge-evolution-manager.ts (new) - AI-powered refinement
  • source/knowledge/knowledge-sharing-manager.ts (new) - Cross-project sharing
  • source/knowledge/knowledge-persistence-manager.ts (new) - State management
  • source/knowledge/knowledge-visualization.tsx (new) - UI components
  • source/services/checkpoint-manager.ts (enhance) - Knowledge integration
  • source/hooks/chat-handler/conversation/conversation-loop.tsx (enhance) - Knowledge extraction
  • source/components/knowledge/knowledge-search-display.tsx (new) - Search UI
  • source/components/knowledge/knowledge-detail-view.tsx (new) - Detail UI
  • source/components/knowledge/project-knowledge-summary.tsx (new) - Summary UI

Alternatives Considered

  1. Simple Knowledge Storage: Considered but rejected for limited search capabilities
  2. Basic Tagging Only: Rejected for lack of intelligent discovery
  3. External Knowledge Base: Rejected for complexity and dependency concerns
  4. Monolithic Knowledge System: Rejected for poor maintainability and scalability

Additional Context

  • I have searched existing issues to ensure this is not a duplicate
  • This feature aligns with the project's goals (local-first AI assistance)
  • The implementation considers local LLM performance constraints
  • Memory efficiency is prioritized for local usage

Performance Considerations

  • Efficient indexing algorithms for local LLM constraints
  • Memory-optimized data structures for knowledge storage
  • Incremental search indexing to minimize memory usage
  • Lightweight tokenization for search functionality

Local LLM Adaptations

  • Simple relevance scoring algorithms
  • Efficient search data structures
  • Lightweight knowledge refinement
  • Resource-aware knowledge operations

Knowledge Management Benefits

  • Centralized project knowledge base with search capabilities
  • Intelligent knowledge discovery with semantic understanding
  • AI-powered knowledge evolution and refinement
  • Cross-project knowledge sharing and adaptation
  • Context-aware knowledge preservation across sessions

Implementation Notes (optional)

Key Integration Points

  • Integrate with existing checkpoint system for persistence
  • Connect to conversation loop for knowledge extraction
  • Enhance tool execution with knowledge capture
  • Add to UI components for knowledge visualization
  • Connect with usage tracking for analytics

Testing Strategy

  • Unit tests for knowledge search algorithms
  • Integration tests for knowledge persistence
  • Performance tests for search operations
  • Memory usage monitoring for knowledge storage
  • Knowledge quality assessment testing

Migration Path

  • All new features will be optional and backward compatible
  • Existing checkpoint system remains as fallback
  • Gradual rollout with feature flags
  • User preferences for knowledge management features

Success Metrics

  • Knowledge Entry Rate: 50+ knowledge entries per active project
  • Search Performance: <50ms for typical knowledge searches
  • Retrieval Accuracy: 80%+ relevance for top search results
  • Memory Usage: Keep knowledge storage under 10MB per project
  • User Adoption: 60%+ of users utilizing knowledge features
  • Knowledge Reuse: 40%+ reduction in redundant information requests

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions