-
Notifications
You must be signed in to change notification settings - Fork 0
SYS-2103: Add mgmt command for applying batch updates using XLSX input #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
akohler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henry-r18 I like this. The defensive programming is appropriate and friendlier than forcing 100% strictness on the editor. I don't see any cases where it's too forgiving, though I suppose there could be such cases.
To defend further against this, please add logging / output at the 3 points in batch_update() where changes are actually made, showing the old & new values for each field & record, something like 'Changing record 123 field status from foo to bar'.
Other than that, there's an unhandled exception possible when a SheetImport record in the Excel sheet does not exist in the database.
Thanks --Andy
- Properly handle potential `SheetImport.DoesNotExist` exception - Improve comments
|
@akohler, this is ready for another look. The command now prints updates more granularly, as below. It does look like all the fields, including Example output |
akohler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henry-r18 All the changes look good. I ran into problems reloading my local data, incompatibility between migrations and database backups, so I'm running a fresh full production JSON export now (which is very slow). Once that finishes and I get my local db refreshed, I'll re-run this to confirm all works, then (I expect) will merge.
Thanks --Andy
akohler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henry-r18 Bug found: status values are set immediately, even when it's a dry run. The program appears to work correctly otherwise: non-status values are not updated during a dry run, and all values are updated when it's not a dry run.
Thanks --Andy
|
Thanks for the catch, @akohler! I added an additional
Should be ready for another (hopefully final) look!
|
akohler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed, the dry run status bug is fixed.
@henry-r18 Please add release notes and version change for deployment. We'll probably need to run the command via tunnel, so deployment is not strictly needed, but this a change to the codebase so let's make it official. Then I'll merge.
Thanks --Andy
* Add release note
|
@akohler, just pushed up the version bump and release notes. Thanks! Edit: Not sure I needed to re-request review, as the PR was already approved, but I did it anyway. Sorry for any spam! |

Implements SYS-2103
Acceptance criteria
Description
This PR adds a management command to load sheets of data from an Excel spreadsheet and use that data to apply updates to targeted records in the Digital Data application. There's some complexity to it, as the input spreadsheet that FTVA staff provided doesn't perfectly align with the database schema.
ForeignKeyorManyToManyfields internally are given as resolved string values, rather than integer references to other models. This requires some special handling in thebatch_updatefunction;__istartswithquery in a couple places to add some fuzziness to looking up the reference intended by the input value;id,pk, oruuidfields, as these should be immutable and could cause problems if updates were attempted;file_namespecifically are filled with "NO FILE NAME", as this is something we did manually via the production REPL a while back.The script has a
dry_runmode, and should report if no changes exist between an input record and its corresponding record in the database.Testing
I did not add any tests to our test suite, but I can if need be (though I'll have to figure out how). For manual testing, the script can be run in
dry_runmode without effecting the database as follows:(within dev container)
python manage.py batch_update -i Tom_Reed_Ingest_List_MAMs_20251118.xlsx --dry_run