Skip to content

Conversation

@henry-r18
Copy link
Contributor

@henry-r18 henry-r18 commented Nov 21, 2025

Implements SYS-2103

Acceptance criteria

  • Use Excel data attached to ticket to update Digital Data records
  • Consider ways the DD application might be enhanced to allow self-service functionality like this for authorized users

Description

This PR adds a management command to load sheets of data from an Excel spreadsheet and use that data to apply updates to targeted records in the Digital Data application. There's some complexity to it, as the input spreadsheet that FTVA staff provided doesn't perfectly align with the database schema.

  1. First off, the values of the fields which are either ForeignKey or ManyToMany fields internally are given as resolved string values, rather than integer references to other models. This requires some special handling in the batch_update function;
  2. Secondly, some of the values themselves in these fields don't exactly match the values of the Django objects, so I use a __istartswith query in a couple places to add some fuzziness to looking up the reference intended by the input value;
  3. Thirdly, I added a guard against any changes to id, pk, or uuid fields, as these should be immutable and could cause problems if updates were attempted;
  4. And lastly, empty strings in file_name specifically are filled with "NO FILE NAME", as this is something we did manually via the production REPL a while back.

The script has a dry_run mode, and should report if no changes exist between an input record and its corresponding record in the database.

Testing

I did not add any tests to our test suite, but I can if need be (though I'll have to figure out how). For manual testing, the script can be run in dry_run mode without effecting the database as follows:

(within dev container)
python manage.py batch_update -i Tom_Reed_Ingest_List_MAMs_20251118.xlsx --dry_run

@henry-r18 henry-r18 requested a review from akohler November 21, 2025 22:29
Copy link
Member

@akohler akohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry-r18 I like this. The defensive programming is appropriate and friendlier than forcing 100% strictness on the editor. I don't see any cases where it's too forgiving, though I suppose there could be such cases.

To defend further against this, please add logging / output at the 3 points in batch_update() where changes are actually made, showing the old & new values for each field & record, something like 'Changing record 123 field status from foo to bar'.

Other than that, there's an unhandled exception possible when a SheetImport record in the Excel sheet does not exist in the database.

Thanks --Andy

- Properly handle potential `SheetImport.DoesNotExist` exception
- Improve comments
@henry-r18
Copy link
Contributor Author

@akohler, this is ready for another look. The command now prints updates more granularly, as below.

It does look like all the fields, including status, are applying updates, at least against my local database, which is out of date. Are you seeing the same?

Example output

Record 19732 updated: file_type changed from None to MP4
Record 19732 updated: asset_type changed from None to Derivative
Record 19732 updated: media_type changed from None to video
Record 19732 updated: audio_class changed from None to Unknown
Record 19732 updated: added Curator reviewed - Ingest to status

@henry-r18 henry-r18 requested a review from akohler November 25, 2025 22:19
Copy link
Member

@akohler akohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry-r18 All the changes look good. I ran into problems reloading my local data, incompatibility between migrations and database backups, so I'm running a fresh full production JSON export now (which is very slow). Once that finishes and I get my local db refreshed, I'll re-run this to confirm all works, then (I expect) will merge.
Thanks --Andy

Copy link
Member

@akohler akohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry-r18 Bug found: status values are set immediately, even when it's a dry run. The program appears to work correctly otherwise: non-status values are not updated during a dry run, and all values are updated when it's not a dry run.
Thanks --Andy

@henry-r18
Copy link
Contributor Author

Thanks for the catch, @akohler! I added an additional dry_run check, before calling add(). I then:

  1. reset my database;
  2. confirmed record 19732 was in the initial state I expected;
  3. ran the command with the dry_run flag;
  4. confirmed record 19732 was unchanged;
  5. re-ran the command without the dry_run flag; and
  6. confirmed all the expected changes were applied from the spreadsheet, as below.

Should be ready for another (hopefully final) look!

image

@henry-r18 henry-r18 requested a review from akohler November 26, 2025 19:21
Copy link
Member

@akohler akohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed, the dry run status bug is fixed.

@henry-r18 Please add release notes and version change for deployment. We'll probably need to run the command via tunnel, so deployment is not strictly needed, but this a change to the codebase so let's make it official. Then I'll merge.
Thanks --Andy

@henry-r18
Copy link
Contributor Author

henry-r18 commented Nov 26, 2025

@akohler, just pushed up the version bump and release notes. Thanks!

Edit: Not sure I needed to re-request review, as the PR was already approved, but I did it anyway. Sorry for any spam!

@henry-r18 henry-r18 requested a review from akohler November 26, 2025 22:02
@akohler akohler merged commit 04636f8 into main Nov 26, 2025
1 check passed
@akohler akohler deleted the SYS-2103/batch_update branch November 26, 2025 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants