Skip to content

Traceroute failures don't indicate where failures occur (which hop(s)) #1643

@peterschretlen

Description

@peterschretlen

When a traceroute check runs, it does the equivalent of mtr -c 5 <target>, where c indicates the number of traces that will run.
So we're running 5 traces, collecting statistics like latency and loss that are averaged over those 5 samples. The traces are done sequentially.

Here is the here is the core of the implementation

If the number of consecutive hop failures ("unknown hops") reaches "max unknown hops" in any trace, the test exits (the in-progress trace is stopped and no further traces are run) and the test returns the data collected so far with the error "max unknown hops exceeded".

The problem is when the "max unknown hops exceeded" error is receive, it's often not clear which hop(s) caused the problem.

Sometimes the sent value can be used to narrow it down (sent = the number of times that hop was visited), but that assumes the path of the traces is the same each time. Some refinements to the trace logging might help with troubleshooting

  • Indicating where in the trace hop failures occurred.
  • Providing a summary of each trace, in addition to the summary statistics across all traces (I'm not sure the current implementation supports this)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions