Traceroute failures don't indicate where failures occur (which hop(s))


When a traceroute check runs, it does the equivalent of ` mtr -c 5 <target>`, where c indicates the number of traces that will run.
So we're running 5 traces, collecting statistics like latency and loss that are averaged over those 5 samples.  The traces are done sequentially. 

Here is the [here is the core of the implementation](https://github.com/grafana/mtr/blob/a9806fdda16646e637ff216c23a24698dcaf2844/pkg/mtr/mtr.go#L165-L218)

If the number of consecutive hop failures ("unknown hops") reaches "max unknown hops" in any trace, the test exits (the in-progress trace is stopped and no further traces are run) and the test returns the data collected so far with the error "max unknown hops exceeded".

The problem is when the "max unknown hops exceeded" error is receive, it's often not clear which hop(s) caused the problem. 

Sometimes the `sent` value can be used to narrow it down (sent = the number of times that hop was visited), but that assumes the path of the traces is the same each time.  Some refinements to the trace logging might help with troubleshooting
- Indicating where in the trace hop failures occurred. 
- Providing a summary of each trace, in addition to the summary statistics across all traces (I'm not sure the current implementation supports this)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traceroute failures don't indicate where failures occur (which hop(s)) #1643

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Traceroute failures don't indicate where failures occur (which hop(s)) #1643

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions