Skip to content

Conversation

@LOCEANlloydizard
Copy link
Collaborator

This PR introduces the public API and output schema for single target detection! Ref #1532

It focuses on:

  • defining the function signature and dispatcher integration
  • validating required inputs and parameters
  • returning a stable xr.Dataset with a target-based schema
  • issuing warnings when optional metadata are missing

This provides a foundation for upcoming algorithm development and allows early review of the API design, data model, and test coverage!

@codecov-commenter
Copy link

codecov-commenter commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 53.92405% with 182 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.29%. Comparing base (d371686) to head (a6a8282).
⚠️ Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
...ype/mask/single_target_detection/detect_matecho.py 52.61% 181 Missing ⚠️
echopype/mask/api.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1588      +/-   ##
==========================================
- Coverage   84.95%   83.29%   -1.67%     
==========================================
  Files          79       80       +1     
  Lines        6998     7391     +393     
==========================================
+ Hits         5945     6156     +211     
- Misses       1053     1235     +182     
Flag Coverage Δ
unittests 83.29% <53.92%> (-1.67%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@leewujung leewujung self-requested a review December 20, 2025 18:04
@LOCEANlloydizard LOCEANlloydizard changed the title Single target detection API and output schema Single target detection Dec 23, 2025
Comment on lines +2139 to +2162
expected_vars = [
"TS_comp",
"TS_uncomp",
"Target_range",
"Target_range_disp",
"Target_range_min",
"Target_range_max",
"idx_r",
"idx_target_lin",
"pulse_env_before_lin",
"pulse_env_after_lin",
"PulseLength_Normalized_PLDL",
"Transmitted_pulse_length",
"Angle_minor_axis",
"Angle_major_axis",
"StandDev_Angles_Minor_Axis",
"StandDev_Angles_Major_Axis",
"Heave",
"Roll",
"Pitch",
"Heading",
"Dist",
"TSfreq_matrix",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing these to using snake_case?

Copy link
Member

@leewujung leewujung Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and for consistency, how about following the pattern identifier_modifier, so:

  • "target_range_{min, max}"
  • "angles_{minor, major}_axis"
  • "angles_{minor, major}_axis_std_dev"
  • ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put my comments for the variables themselves in detect_matecho.py

REQ = {
"channel": "chan1",
"pulse_length": 1e-3, # 1 ms dummy
"timeSampleInterval_usec": 40.0, # dummy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where this would be used, but maybe also change this to using snake_case?

Comment on lines +101 to +102
Ping_number=("target", _arr("Ping_number", int)),
Time=("target", _arr("Time", np.dtype("datetime64[ns]"))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any variables above aligned with Ping_number or Time, so think these are redundant and can be removed?

Also, ping number and time are themselves the same thing, unless the timestamps are not derived from ping time (though I think for ping-by-ping methods like this it is derived from ping time?)

Copy link
Member

@leewujung leewujung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @LOCEANlloydizard : Thanks for putting together this PR for discussion! I have a few overarching suggestion:

  • use xarray data container (dataarray or dataset) instead of dict when possible -- for example at the output of _compute_core_matrices, instead of a Dict, a xr.Dataset would work well. BUT see my comment below about that function [1].
  • add more inline code comments to enhance code readability
  • using xarray operation would help with code readability too, compared to when doing vanilla numpy operations, since it's xarray is label-aware and it's easier to follow the operations.

I have some thoughts between _cond_8_build_outputs and detect_matecho that would affect the implementation:

If I read the code right, what you have in the double loop in _cond_8_build_outputs is to go through all pings (ip) that contain detected single targets, and then all ranges (pidx) of the detected targets within that ping. The out assembly and refinement does 3 things:

  1. "flatten" the ping-range structure to just a 1-D array indexed by target
  2. refine range in meters, though it's still index-based and not on a continuous scale
  3. compute some derived variables, such as positionBeamCoord

So, at the core the single target detection output has 1 dimension target (which is also a coordinate), and 2 variables: the index number along ping_time, and the index number along range_sample (or range_meter, depending on what the detection is operating on [2]). All other information can be extracted or computed from the original dataarray the detection runs on using these 2 index numbers for each target. isel or sel allow passing multiple indices at once, so seems more efficient than looping (i.e. vectorizing the code).

If the above is true, how about breaking things up into 2 steps? The first function focuses on getting this "core" single target detection output that's flattened (outputA), and the second function focuses on extracting and computing the required values in the full output (outputB)? Then users can decide if they want to store outputA and stop there, or proceed to compute outputB.

The reasons why I think it may be useful to separate outputA and outputB are:

  • it gives a good progression in what the code is trying to do in a workflow
  • derived values in outputB likely change more dramatically if calibration or something else change in the source dataset, but outputA is less variable
  • I am not exactly sure about this one: this may provide us a way for scaling the single target detection operation in the future, along with the idea that storing an intermediate data product to disk and then read it back for lazy computation is more scalable than trying to do everything in memory at once.

I am also fine with keeping outputA just as an intermediate produce within detect_matecho and not exposing it to the user, though thinking it's good to have a structure that's easy to follow.


[1] _compute_core_matrices computes quantities that largely overlap with current echopype compute_Sv and compute_TS functions, and I think it's better to use those directly (and fix any errors there if any!)

[2] I didn't read in detail to see if the current code includes the range decimation you mentioned earlier, so this comment may have no bearing: as part of compute_Sv/TS there's a function that computes sample range and store it into the range_meter variable in the Sv dataset. Could we compute the refined range from there? This is so that we don't have redundancy in code that does the same thing scattered in different parts of the codebase, which would make maintenance challenging.

- Remove preliminary Matecho algorithm logic
- Introduce a stub implementation for single-target detection
- Define and validate the public API (inputs, params, outputs)
- Return a zero-target xr.Dataset with a stable schema
- Add parameter and dataset validation checks
- Prepare ground for future algorithmic implementation
fix tests to match updated single-target detection code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants