Skip to content

feat(website): add feature to search multiple string fields#5881

Open
chaoran-chen wants to merge 3 commits intomainfrom
multifieldsearch
Open

feat(website): add feature to search multiple string fields#5881
chaoran-chen wants to merge 3 commits intomainfrom
multifieldsearch

Conversation

@chaoran-chen
Copy link
Member

@chaoran-chen chaoran-chen commented Jan 26, 2026

resolves nothing

This PR adds a feature that allows the configuration of search fields that search multiple fields by constructing a LAPIS advanced query. In the preview, there are now two new search fields

  • identifier: searches accessionVersion, submissionId, insdcAccessionFull, bioprojectAccession, gcaAccession, biosampleAccession, insdcRawReadsAccession
  • contributor: searches authors, authorAffiliations, sequencedByOrganization, sequencedByContactName, submitter, groupName

This can be used to resolve [gap to prevent automated linking] pathoplexus/pathoplexus#766.

Screenshot

image

PR Checklist

  • [ ] All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)
    • Filter WNV (here) (also verified that refreshing the page correctly fills the form), download metadata, copy download download link, linkout to Nextclade

🚀 Preview: https://multifieldsearch.loculus.org

@chaoran-chen chaoran-chen self-assigned this Jan 26, 2026
@chaoran-chen chaoran-chen added the preview Triggers a deployment to argocd label Jan 26, 2026
@claude claude bot added website Tasks related to the web application deployment Code changes targetting the deployment infrastructure labels Jan 26, 2026
@claude
Copy link
Contributor

claude bot commented Jan 26, 2026

This PR may be related to #1985 (Search: standardised approach for multi-entry search)

@theosanderson
Copy link
Member

Very cool - but I think this actually doesn't relate to #1985 unless I misunderstand: the idea there is about the type of search where you can enter multiple accessions and you're not trying to find all of them in a single sequence entry, but each one in a different sequence entry

@chaoran-chen
Copy link
Member Author

aah, right, I misread, thanks @theosanderson!

@chaoran-chen chaoran-chen force-pushed the multifieldsearch branch 2 times, most recently from 46cd32e to 8fed763 Compare January 26, 2026 21:28
@chaoran-chen chaoran-chen marked this pull request as ready for review January 26, 2026 21:52
@claude
Copy link
Contributor

claude bot commented Jan 26, 2026

Claude finished @chaoran-chen's task in 3m 9s —— View job


Code Review Complete ✓

I've reviewed this PR and found it to be a well-implemented feature with comprehensive testing. The implementation is clean and follows existing patterns in the codebase. Below is my detailed assessment:


Summary

This PR adds multi-field search functionality that allows users to search across multiple metadata fields using a single input. Two search fields are configured:

  • identifier: searches 7 accession-related fields
  • contributor: searches 6 contributor-related fields

The implementation constructs LAPIS advanced queries using regex patterns to search across multiple fields with OR logic.


Strengths

Good test coverage: Comprehensive Playwright tests covering search, URL params, filter removal, and downloads
Schema validation: Proper Zod schema definitions and JSON schema validation
Follows existing patterns: Integrates well with existing search infrastructure
URL state management: Correctly syncs with URL query parameters
User experience: Active filters display properly, fields can be removed via UI
Accessibility: Uses DisabledUntilHydrated to prevent Playwright race conditions


Issues Found

I've left inline comments on specific areas that need attention:

Security (1 issue):

  • Potential regex injection risk in field name handling (SequenceFilters.tsx:131)

Code Quality (3 issues):

  • Code duplication in search.ts query extraction logic
  • Advanced query construction could be extracted to a separate method
  • Performance optimization opportunity for Set creation

Testing (3 issues):

  • Hard-coded timeouts in Playwright tests (multiple locations)
  • Fragile CSV assertion that could break with format changes

Documentation (2 issues):

  • Missing documentation on regex escaping logic
  • Configuration could benefit from usage examples

Accessibility (1 issue):

  • Missing aria-describedby for multi-field search context

Recommendations

Must address before merge:

  1. Review the regex injection security concern and add validation for field names
  2. Replace waitForTimeout() calls with deterministic waits in tests

Should address:
3. Extract advanced query construction to a separate method for testability
4. Add documentation to makeCaseInsensitiveLiteralSubstringRegex explaining security measures
5. Make CSV download assertion more robust

Nice to have:
6. Add aria-describedby to help screen reader users understand which fields are searched
7. Add configuration comments explaining the multi-field search feature
8. Extract duplicate query extraction logic in search.ts


Testing Notes

The PR includes thorough integration tests that verify:

  • Search functionality with both identifier and contributor fields
  • URL parameter synchronization
  • Filter removal via UI
  • Metadata download with filtered results

Manual testing was performed on the preview environment for WNV searches.


@theosanderson
Copy link
Member

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: affc28b6cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +129 to +126
if (value && typeof value === 'string' && value.trim()) {
const regex = makeCaseInsensitiveLiteralSubstringRegex(value.trim());
const fieldQueries = mfs.fields.map((f) => `${f}.regex='${regex}'`);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape quotes in advancedQuery regex literals

The new multi-field search builds advancedQuery by interpolating the user’s raw input into a single-quoted string (${f}.regex='${regex}'). makeCaseInsensitiveLiteralSubstringRegex escapes regex metacharacters but does not escape single quotes, so inputs containing apostrophes (e.g., “O'Connor”, “King’s College”) will terminate the string literal and produce a malformed advancedQuery, causing the search to fail or be parsed incorrectly. Please escape ' (or switch to a safer encoding/quoting strategy) before embedding user input in the advanced query string.

Useful? React with 👍 / 👎.

Copy link
Contributor

@anna-parker anna-parker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested this out by pasting an accession into the identifier search field for CCHf and I think it is sadly broken for multi segmented organisms! The page freezes:
image
image

and then I get an axios error

{"message":"Request failed with status code 400","name":"AxiosError","stack":"AxiosError: Request failed with status code 400\n at It (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:3:1073)\n at XMLHttpRequest.w (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:3:5793)\n at _.request (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:5:2079)\n at async Gn.request (https://multifieldsearch.loculus.org/_astro/createAuthorizationHeader.CbxMgGzV.js:11:4348)","config":{"transitional":{"silentJSONParsing":true,"forcedJSONParsing":true,"clarifyTimeoutError":false},"adapter":["xhr","http","fetch"],"transformRequest":[null],"transformResponse":[null],"timeout":0,"xsrfCookieName":"XSRF-TOKEN","xsrfHeaderName":"X-XSRF-TOKEN","maxContentLength":-1,"maxBodyLength":-1,"env":{},"headers":{"Accept":"application/json, text/plain, */*","Content-Type":"application/json"},"baseURL":"https://lapis-multifieldsearch.loculus.org/cchf","method":"post","url":"/sample/details","data":"{\"versionStatus\":\"LATEST_VERSION\",\"isRevocation\":\"false\",\"nucleotideMutations\":[],\"aminoAcidMutations\":[],\"nucleotideInsertions\":[],\"aminoAcidInsertions\":[],\"advancedQuery\":\"(accessionVersion.regex='(?i)PV920624' or submissionId.regex='(?i)PV920624' or insdcAccessionFull.regex='(?i)PV920624' or bioprojectAccession.regex='(?i)PV920624' or gcaAccession.regex='(?i)PV920624' or biosampleAccession.regex='(?i)PV920624' or insdcRawReadsAccession.regex='(?i)PV920624')\",\"fields\":[\"authorAffiliations\",\"authors\",\"geoLocAdmin1\",\"geoLocCountry\",\"hostNameScientific\",\"length_L\",\"length_M\",\"length_S\",\"lineage\",\"ncbiReleaseDate\",\"sampleCollectionDate\",\"accessionVersion\"],\"limit\":100,\"offset\":0,\"orderBy\":[{\"field\":\"sampleCollectionDate\",\"type\":\"descending\"}]}","params":{},"allowAbsoluteUrls":true},"code":"ERR_BAD_REQUEST","status":400}

Update: sorry when I first copied the error I somehow copied the whole page with all the sequence metadata

@chaoran-chen
Copy link
Member Author

Thanks for identifying the bug, @anna-parker! The perSegment=true case wasn't properly handled, this has now been addressed in 472d72a and I also added a test in 0e78b38.

@anna-parker anna-parker dismissed their stale review February 4, 2026 07:46

I am dismissing my blocker :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Code changes targetting the deployment infrastructure preview Triggers a deployment to argocd website Tasks related to the web application

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants