Skip to content

AS 4.1.16+ expects the end time to be always available, but SIGKILL/poweroff invalidate that #2815

@kinow

Description

@kinow

Description

From #2807 (comment)

  • Autosubmit version: 4.1.16+ (any dev version)

Reproducible Example

Run any experiment that takes a while, and just kill the remote jobs. The _STAT file is not generated.

AS/API expect to have the end time of the job, but it will only be created when the user runs the workflow again, so that the log retrieval process will call the logic to set the end date as the time the log retrieval was launched (which can be several hours/days/weeks/months after the actual end of the job).

Expected Behaviour

This is open for discussion for now. cc @f-macchia, @dbeltrankyl , @LuiggiTenorioK

We have a few options, I think, e.g.,:

  1. Get the API or one of its workers to update the end time when it's missing
  2. Get AS to talk to scheduler and find the time the remote job stopped via Slurm, etc., or look at the mod time of the last log in platforms without slurm
  3. Add a command for that
  4. Add a cron job for that
  5. Get a warning in the API so the user is aware it's missing an end date (maybe show that in the metrics page?)
  6. etc.,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdiscussionThe issue is created to keep track a discussion

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions