-
Notifications
You must be signed in to change notification settings - Fork 88
Single target detection #1588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Single target detection #1588
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1588 +/- ##
==========================================
- Coverage 84.95% 83.29% -1.67%
==========================================
Files 79 80 +1
Lines 6998 7391 +393
==========================================
+ Hits 5945 6156 +211
- Misses 1053 1235 +182
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| expected_vars = [ | ||
| "TS_comp", | ||
| "TS_uncomp", | ||
| "Target_range", | ||
| "Target_range_disp", | ||
| "Target_range_min", | ||
| "Target_range_max", | ||
| "idx_r", | ||
| "idx_target_lin", | ||
| "pulse_env_before_lin", | ||
| "pulse_env_after_lin", | ||
| "PulseLength_Normalized_PLDL", | ||
| "Transmitted_pulse_length", | ||
| "Angle_minor_axis", | ||
| "Angle_major_axis", | ||
| "StandDev_Angles_Minor_Axis", | ||
| "StandDev_Angles_Major_Axis", | ||
| "Heave", | ||
| "Roll", | ||
| "Pitch", | ||
| "Heading", | ||
| "Dist", | ||
| "TSfreq_matrix", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing these to using snake_case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and for consistency, how about following the pattern identifier_modifier, so:
- "target_range_{min, max}"
- "angles_{minor, major}_axis"
- "angles_{minor, major}_axis_std_dev"
- ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put my comments for the variables themselves in detect_matecho.py
| REQ = { | ||
| "channel": "chan1", | ||
| "pulse_length": 1e-3, # 1 ms dummy | ||
| "timeSampleInterval_usec": 40.0, # dummy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where this would be used, but maybe also change this to using snake_case?
| Ping_number=("target", _arr("Ping_number", int)), | ||
| Time=("target", _arr("Time", np.dtype("datetime64[ns]"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any variables above aligned with Ping_number or Time, so think these are redundant and can be removed?
Also, ping number and time are themselves the same thing, unless the timestamps are not derived from ping time (though I think for ping-by-ping methods like this it is derived from ping time?)
leewujung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @LOCEANlloydizard : Thanks for putting together this PR for discussion! I have a few overarching suggestion:
- use xarray data container (dataarray or dataset) instead of dict when possible -- for example at the output of
_compute_core_matrices, instead of a Dict, a xr.Dataset would work well. BUT see my comment below about that function [1]. - add more inline code comments to enhance code readability
- using xarray operation would help with code readability too, compared to when doing vanilla numpy operations, since it's xarray is label-aware and it's easier to follow the operations.
I have some thoughts between _cond_8_build_outputs and detect_matecho that would affect the implementation:
If I read the code right, what you have in the double loop in _cond_8_build_outputs is to go through all pings (ip) that contain detected single targets, and then all ranges (pidx) of the detected targets within that ping. The out assembly and refinement does 3 things:
- "flatten" the ping-range structure to just a 1-D array indexed by
target - refine range in meters, though it's still index-based and not on a continuous scale
- compute some derived variables, such as
positionBeamCoord
So, at the core the single target detection output has 1 dimension target (which is also a coordinate), and 2 variables: the index number along ping_time, and the index number along range_sample (or range_meter, depending on what the detection is operating on [2]). All other information can be extracted or computed from the original dataarray the detection runs on using these 2 index numbers for each target. isel or sel allow passing multiple indices at once, so seems more efficient than looping (i.e. vectorizing the code).
If the above is true, how about breaking things up into 2 steps? The first function focuses on getting this "core" single target detection output that's flattened (outputA), and the second function focuses on extracting and computing the required values in the full output (outputB)? Then users can decide if they want to store outputA and stop there, or proceed to compute outputB.
The reasons why I think it may be useful to separate outputA and outputB are:
- it gives a good progression in what the code is trying to do in a workflow
- derived values in outputB likely change more dramatically if calibration or something else change in the source dataset, but outputA is less variable
- I am not exactly sure about this one: this may provide us a way for scaling the single target detection operation in the future, along with the idea that storing an intermediate data product to disk and then read it back for lazy computation is more scalable than trying to do everything in memory at once.
I am also fine with keeping outputA just as an intermediate produce within detect_matecho and not exposing it to the user, though thinking it's good to have a structure that's easy to follow.
[1] _compute_core_matrices computes quantities that largely overlap with current echopype compute_Sv and compute_TS functions, and I think it's better to use those directly (and fix any errors there if any!)
[2] I didn't read in detail to see if the current code includes the range decimation you mentioned earlier, so this comment may have no bearing: as part of compute_Sv/TS there's a function that computes sample range and store it into the range_meter variable in the Sv dataset. Could we compute the refined range from there? This is so that we don't have redundancy in code that does the same thing scattered in different parts of the codebase, which would make maintenance challenging.
- Remove preliminary Matecho algorithm logic - Introduce a stub implementation for single-target detection - Define and validate the public API (inputs, params, outputs) - Return a zero-target xr.Dataset with a stable schema - Add parameter and dataset validation checks - Prepare ground for future algorithmic implementation
fix tests to match updated single-target detection code
fc6e8da to
a6a8282
Compare
This PR introduces the public API and output schema for single target detection! Ref #1532
It focuses on:
This provides a foundation for upcoming algorithm development and allows early review of the API design, data model, and test coverage!