feat: implement citations for search results (Issue #633)#2009
feat: implement citations for search results (Issue #633)#2009saurabh-G07 wants to merge 1 commit intotopoteretes:mainfrom
Conversation
Please make sure all the checkboxes are checked:
|
There was a problem hiding this comment.
Hello @saurabh-G07, thank you for submitting a PR! We will respond as soon as possible.
WalkthroughThe changes extend the search results functionality by introducing a citations field throughout the search pipeline. A new optional citations field is added to the SearchResult type, extracted from context edges within the preparation utility by deduplicating nodes and building citation objects, and propagated through the search method's output dictionaries. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cognee/modules/search/methods/search.py (2)
143-164: Citations not propagated inuse_combined_contextbranch.The
CombinedSearchResultis returned without thecitationsfield that was extracted byprepare_search_result. This creates inconsistent behavior where citations are only available whenuse_combined_context=Falseandbackend_access_control_enabled()isTrue.To fix this,
CombinedSearchResultmodel needs acitationsfield, and the citations should be propagated here:🐛 Proposed fix
First, update
CombinedSearchResultinSearchResult.py:class CombinedSearchResult(BaseModel): result: Optional[Any] context: Dict[str, Any] graphs: Optional[Dict[str, Any]] = {} datasets: Optional[List[SearchResultDataset]] = None citations: Optional[List[Dict[str, Any]]] = NoneThen, propagate citations in this branch:
prepared_search_results = await prepare_search_result( search_results[0] if isinstance(search_results, list) else search_results ) result = prepared_search_results["result"] graphs = prepared_search_results["graphs"] context = prepared_search_results["context"] datasets = prepared_search_results["datasets"] + citations = prepared_search_results.get("citations", []) return CombinedSearchResult( result=result, graphs=graphs, context=context, datasets=[ SearchResultDataset( id=dataset.id, name=dataset.name, ) for dataset in datasets ], + citations=citations, )
206-220: Citations not propagated when access control is disabled.When
backend_access_control_enabled()returnsFalse, citations are not included in the return value, creating inconsistent behavior compared to the access-control-enabled path.🐛 Proposed fix to include citations
else: return_value = [] if only_context: for search_result in search_results: prepared_search_results = await prepare_search_result(search_result) - return_value.append(prepared_search_results["context"]) + return_value.append({ + "context": prepared_search_results["context"], + "citations": prepared_search_results.get("citations", []), + }) else: for search_result in search_results: - result, context, datasets = search_result - return_value.append(result) + prepared_search_results = await prepare_search_result(search_result) + return_value.append({ + "result": prepared_search_results["result"], + "citations": prepared_search_results.get("citations", []), + })Note: This change would alter the return type for the non-access-control path. If backward compatibility is critical, consider documenting this as expected behavior or adding a flag to opt-in to citations.
🧹 Nitpick comments (2)
cognee/modules/search/utils/prepare_search_result.py (1)
11-12: Add a docstring to document the function's purpose and return structure.Per coding guidelines, undocumented function definitions are assumed incomplete. A brief docstring describing the input/output structure and the new
citationsfield would improve maintainability.async def prepare_search_result(search_result): """ Prepare search result payload with graphs, context, and citations. Args: search_result: Tuple of (results, context, datasets) from search operations. Returns: Dict containing result, graphs, context, datasets, and citations extracted from Edge nodes in the context. """cognee/modules/search/types/SearchResult.py (1)
22-22: Consider defining a dedicatedCitationmodel for better type safety.Using
Dict[str, Any]works but loses the benefits of Pydantic validation. A dedicated model would make the API contract explicit and enable validation:♻️ Optional: Define a Citation model
class Citation(BaseModel): id: str text: str metadata: Dict[str, Any] = {} class SearchResult(BaseModel): search_result: Any dataset_id: Optional[UUID] dataset_name: Optional[str] citations: Optional[List[Citation]] = NoneThis can be deferred if you prefer to keep the initial implementation flexible.
|
@saurabh-G07 thank you for the contribution. Please check our CONTRIBUTING.md and correct the PR + provide screenshots of tests passing |
|
Yes, will share till tomorrow end of day
Beast regards
Saurabh Ghundre
…On Wed, 21 Jan, 2026, 1:12 pm Vasilije, ***@***.***> wrote:
*Vasilije1990* left a comment (topoteretes/cognee#2009)
<#2009 (comment)>
@saurabh-G07 <https://github.com/saurabh-G07> thank you for the
contribution. Please check our CONTRIBUTING.md and correct the PR + provide
screenshots of tests passing
—
Reply to this email directly, view it on GitHub
<#2009 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BDWFVCTQWXKBTAHT6VK7EE34H4UWTAVCNFSM6AAAAACSITS5O6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTONZWGYYTKOBQGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@lxobr can you help me with this PR and issue so we can finally add the feature. Please check the issue and if not fitting, propose a ticket for contributors based on our work with clients |
Description
Address Issue #633 "Citations or References When Querying".
This PR adds a
citationsfield to theSearchResultobject and implements logic inprepare_search_resultto extract source node metadata (file paths, IDs) from the context graph.Changes
SearchResultmodel to includecitations.prepare_search_resultto extract citation metadata from context Edges.searchmethod to propagate citations to the response.Verification
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.