Skip to content

feat: implement citations for search results (Issue #633)#2009

Open
saurabh-G07 wants to merge 1 commit intotopoteretes:mainfrom
saurabh-G07:feat/citations-issue-633
Open

feat: implement citations for search results (Issue #633)#2009
saurabh-G07 wants to merge 1 commit intotopoteretes:mainfrom
saurabh-G07:feat/citations-issue-633

Conversation

@saurabh-G07
Copy link

@saurabh-G07 saurabh-G07 commented Jan 20, 2026

Description

Address Issue #633 "Citations or References When Querying".
This PR adds a citations field to the SearchResult object and implements logic in prepare_search_result to extract source node metadata (file paths, IDs) from the context graph.

Changes

  • Updated SearchResult model to include citations.
  • Modified prepare_search_result to extract citation metadata from context Edges.
  • Updated search method to propagate citations to the response.

Verification

  • Verified with reproduction script that citations are correctly extracted from node attributes.

Summary by CodeRabbit

  • New Features
    • Search results now include citations displaying source attribution information, text excerpts, and associated metadata for each retrieved result.

✏️ Tip: You can customize this high-level summary in your review settings.

@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @saurabh-G07, thank you for submitting a PR! We will respond as soon as possible.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

Walkthrough

The changes extend the search results functionality by introducing a citations field throughout the search pipeline. A new optional citations field is added to the SearchResult type, extracted from context edges within the preparation utility by deduplicating nodes and building citation objects, and propagated through the search method's output dictionaries.

Changes

Cohort / File(s) Summary
Citations field addition
cognee/modules/search/types/SearchResult.py, cognee/modules/search/methods/search.py, cognee/modules/search/utils/prepare_search_result.py
Introduces optional citations field to SearchResult model, retrieves and propagates citations through search output dictionaries, and extracts citation objects from context edges with deduplication by node id and metadata enrichment.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It lacks several required sections from the template: Acceptance Criteria, Type of Change, Pre-submission Checklist, and DCO Affirmation are all missing. Add missing required sections: Acceptance Criteria (with verification steps), Type of Change checkbox, Pre-submission Checklist items, and DCO Affirmation statement to comply with the repository's template.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: implement citations for search results (Issue #633)' clearly and specifically summarizes the main change - adding citations functionality to search results.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cognee/modules/search/methods/search.py (2)

143-164: Citations not propagated in use_combined_context branch.

The CombinedSearchResult is returned without the citations field that was extracted by prepare_search_result. This creates inconsistent behavior where citations are only available when use_combined_context=False and backend_access_control_enabled() is True.

To fix this, CombinedSearchResult model needs a citations field, and the citations should be propagated here:

🐛 Proposed fix

First, update CombinedSearchResult in SearchResult.py:

class CombinedSearchResult(BaseModel):
    result: Optional[Any]
    context: Dict[str, Any]
    graphs: Optional[Dict[str, Any]] = {}
    datasets: Optional[List[SearchResultDataset]] = None
    citations: Optional[List[Dict[str, Any]]] = None

Then, propagate citations in this branch:

         prepared_search_results = await prepare_search_result(
             search_results[0] if isinstance(search_results, list) else search_results
         )
         result = prepared_search_results["result"]
         graphs = prepared_search_results["graphs"]
         context = prepared_search_results["context"]
         datasets = prepared_search_results["datasets"]
+        citations = prepared_search_results.get("citations", [])

         return CombinedSearchResult(
             result=result,
             graphs=graphs,
             context=context,
             datasets=[
                 SearchResultDataset(
                     id=dataset.id,
                     name=dataset.name,
                 )
                 for dataset in datasets
             ],
+            citations=citations,
         )

206-220: Citations not propagated when access control is disabled.

When backend_access_control_enabled() returns False, citations are not included in the return value, creating inconsistent behavior compared to the access-control-enabled path.

🐛 Proposed fix to include citations
         else:
             return_value = []
             if only_context:
                 for search_result in search_results:
                     prepared_search_results = await prepare_search_result(search_result)
-                    return_value.append(prepared_search_results["context"])
+                    return_value.append({
+                        "context": prepared_search_results["context"],
+                        "citations": prepared_search_results.get("citations", []),
+                    })
             else:
                 for search_result in search_results:
-                    result, context, datasets = search_result
-                    return_value.append(result)
+                    prepared_search_results = await prepare_search_result(search_result)
+                    return_value.append({
+                        "result": prepared_search_results["result"],
+                        "citations": prepared_search_results.get("citations", []),
+                    })

Note: This change would alter the return type for the non-access-control path. If backward compatibility is critical, consider documenting this as expected behavior or adding a flag to opt-in to citations.

🧹 Nitpick comments (2)
cognee/modules/search/utils/prepare_search_result.py (1)

11-12: Add a docstring to document the function's purpose and return structure.

Per coding guidelines, undocumented function definitions are assumed incomplete. A brief docstring describing the input/output structure and the new citations field would improve maintainability.

async def prepare_search_result(search_result):
    """
    Prepare search result payload with graphs, context, and citations.
    
    Args:
        search_result: Tuple of (results, context, datasets) from search operations.
        
    Returns:
        Dict containing result, graphs, context, datasets, and citations extracted
        from Edge nodes in the context.
    """
cognee/modules/search/types/SearchResult.py (1)

22-22: Consider defining a dedicated Citation model for better type safety.

Using Dict[str, Any] works but loses the benefits of Pydantic validation. A dedicated model would make the API contract explicit and enable validation:

♻️ Optional: Define a Citation model
class Citation(BaseModel):
    id: str
    text: str
    metadata: Dict[str, Any] = {}


class SearchResult(BaseModel):
    search_result: Any
    dataset_id: Optional[UUID]
    dataset_name: Optional[str]
    citations: Optional[List[Citation]] = None

This can be deferred if you prefer to keep the initial implementation flexible.

@Vasilije1990
Copy link
Contributor

@saurabh-G07 thank you for the contribution. Please check our CONTRIBUTING.md and correct the PR + provide screenshots of tests passing

@saurabh-G07
Copy link
Author

saurabh-G07 commented Jan 22, 2026 via email

@Vasilije1990 Vasilije1990 requested a review from lxobr January 31, 2026 08:01
@Vasilije1990
Copy link
Contributor

@lxobr can you help me with this PR and issue so we can finally add the feature. Please check the issue and if not fitting, propose a ticket for contributors based on our work with clients

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants