Skip to content

Commit d950ab8

Browse files
committed
update
1 parent 41c91bb commit d950ab8

File tree

2 files changed

+7
-4
lines changed

2 files changed

+7
-4
lines changed

preprocessing/nextclade/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ However, the `preprocessing` field can be customized to take an arbitrary number
124124
5. `concatenate`: Take multiple metadata fields (including the accessionVersion) and concatenate them in the order specified by the `arg.order` parameter, fields will first be processed based on their `arg.type` (the order of the types should correspond to the order of fields specified by the order argument).
125125
6. `process_options`: Only accept input that is in `args.options`, this check is case-insensitive. If input value is not in options raise an error, or return null if the submitter is in the "insdc_ingest" group.
126126
7. `check_regex`: Validate that the input field matches the pattern in `args.pattern`.
127-
8. `extract_regex`: Extracts a substring from input field using the provided regex `args.pattern` with a `args.capture_group`. For example the pattern `^(?P<segment>[^-]+)-(?P<subtype>[^-]+)$` with capture group `subtype` would extract `HA` from the field `seg1-HA`. Returns an error if the pattern does not match (and internal error if capture group does not exist in pattern).
127+
8. `extract_regex`: Extracts a substring from input field using the provided regex `args.pattern` with a `args.capture_group`. For example the pattern `^(?P<segment>[^-]+)-(?P<subtype>[^-]+)$` with capture group `subtype` would extract `HA` from the field `seg1-HA`. Returns an error if the pattern does not match (and internal error if capture group does not exist in pattern). If `arg.uppercase` is added the extracted string will be capitalized.
128128

129129
Using these functions in your `values.yaml` will look like:
130130

preprocessing/nextclade/src/loculus_preprocessing/processing_functions.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -862,7 +862,7 @@ def extract_regex(
862862
) -> ProcessingResult:
863863
"""
864864
Extracts a substring from the `regex_field` using the provided regex `pattern`
865-
with a `capture_group`.
865+
with a `capture_group`, if `uppercase` is set to true the extracted value is capitalized.
866866
e.g. ^(?P<segment>[^-]+)-(?P<subtype>[^-]+)$ where segment or subtype could be used
867867
as a capture_group to extract their respective value from the regex_field.
868868
"""
@@ -871,8 +871,9 @@ def extract_regex(
871871
warnings: list[ProcessingAnnotation] = []
872872
errors: list[ProcessingAnnotation] = []
873873

874-
pattern = args["pattern"]
875-
capture_group = args["capture_group"]
874+
pattern = args.get("pattern")
875+
capture_group = args.get("capture_group")
876+
uppercase = args.get("uppercase", False)
876877

877878
if not regex_field:
878879
return ProcessingResult(datum=None, warnings=warnings, errors=errors)
@@ -900,6 +901,8 @@ def extract_regex(
900901
if match:
901902
try:
902903
result = match.group(capture_group)
904+
if uppercase:
905+
result = result.upper()
903906
return ProcessingResult(datum=result, warnings=warnings, errors=errors)
904907
except IndexError:
905908
errors.append(

0 commit comments

Comments
 (0)