Retrospective research opportunities and forward-thinking testing/engagement using ClearML data

Some time ago, John developed a script to scrape build job data from ClearML (research and production runs) which I've refined and expanded. Currently, we only use this data (to my knowledge) for some high-level reporting (e.g., how many projects are using Serval, how many new projects this month, etc.), but I think there's a lot more we could glean from it. Here are some ideas:

* Using the ClearML data, we can identify which production projects are long-time, consistent users of our tools. Knowing this, we could:
  * Update/expand our list of standard NMT testing projects to include some of these projects to see how updates to the pipeline would affect our most consistent users - not only in regard to scores like BLEU but also in regard to other new features like marker placement or quotation denormalization.
  * Attempt to identify patterns in these projects: Are they from certain language families? Certain regions? Certain partner organizations? If we run a test against a random 250, do they tend to get higher scores than the non-long-time projects?  
  * Reach out to the project owners and develop some kind of inner-circle where we could explore feature ideas and hear concerns.
* Using the production data, we could also do the reverse: What projects tried it once a long time ago and haven't tried it since? We could do similar things for these projects as mentioned in the bullet points above as well as:
  * Reach out to these projects and ask whether they've encountered difficulties with their draft or encourage them to retry (if it's been long enough). 
* Since we can scrape the complete config from all *research* ClearML runs through the API, there's an opportunity to analyze the affect of different configuration options retrospectively. This could include:
  * Language or script codes (which could be mapped to families, regions, etc.)
  * Hyperparameters
  * Or just establish baselines across many runs or long-term trends (are our drafts getting better?)
 
And I'm sure there are more opportunities than these!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retrospective research opportunities and forward-thinking testing/engagement using ClearML data #915

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Retrospective research opportunities and forward-thinking testing/engagement using ClearML data #915

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions