Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions doc/release-notes/12122-archiving updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Notifications

This release includes multiple updates to the process of creating archival bags including
- performance/scaling improvements for large datasets (multiple changes)
- bug fixes for when superusers see the "Submit" button to launch archiving from the dataset page version table
- new functionality to optionally suppress an archiving workflow when using the Update Current Version functionality and mark the current archive as out of date
- new functionality to support recreating an archival bag when Update Current Version has been used, which is available for archivers that can delete existing files
-
10 changes: 10 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2263,6 +2263,9 @@ At present, archiving classes include the DuraCloudSubmitToArchiveCommand, Local

All current options support the :ref:`Archival Status API` calls and the same status is available in the dataset page version table (for contributors/those who could view the unpublished dataset, with more detail available to superusers).

Archival Bags are created per dataset version. By default, if a version is republished (via the superuser-only 'Update Current Version' publication option in the UI/API), a new archival bag is not created for the version.
If the archiver used is capable of deleting existing bags (Google, S3, and File Archivers) superusers can trigger a manual update of the archival bag, and, if the :ref:`dataverse.feature.archive-on-version-update` flag is set to true, this will be done automatically when 'Update Current Version' is used.

.. _Duracloud Configuration:

Duracloud Configuration
Expand Down Expand Up @@ -4031,6 +4034,13 @@ dataverse.feature.only-update-datacite-when-needed

Only contact DataCite to update a DOI after checking to see if DataCite has outdated information (for efficiency, lighter load on DataCite, especially when using file DOIs).

.. _dataverse.feature.archive-on-version-update:

dataverse.feature.archive-on-version-update
+++++++++++++++++++++++++++++++++++++++++++

Indicates whether archival bag creation should be triggered (if configured) when a version is updated and was already successfully archived,
i.e via the Update-Current-Version publication option. Setting the flag true only works if the archiver being used supports deleting existing archival bags.



Expand Down
102 changes: 63 additions & 39 deletions src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
import edu.harvard.iq.dataverse.engine.command.impl.UpdateDatasetVersionCommand;
import edu.harvard.iq.dataverse.export.ExportService;
import edu.harvard.iq.dataverse.util.cache.CacheFactoryBean;
import edu.harvard.iq.dataverse.util.json.JsonUtil;
import io.gdcc.spi.export.ExportException;
import io.gdcc.spi.export.Exporter;
import edu.harvard.iq.dataverse.ingest.IngestRequest;
Expand Down Expand Up @@ -105,6 +106,9 @@
import jakarta.faces.view.ViewScoped;
import jakarta.inject.Inject;
import jakarta.inject.Named;
import jakarta.json.Json;
import jakarta.json.JsonObject;
import jakarta.json.JsonObjectBuilder;
import jakarta.persistence.OptimisticLockException;

import org.apache.commons.lang3.StringUtils;
Expand Down Expand Up @@ -160,6 +164,7 @@
import edu.harvard.iq.dataverse.search.SearchFields;
import edu.harvard.iq.dataverse.search.SearchUtil;
import edu.harvard.iq.dataverse.search.SolrClientService;
import edu.harvard.iq.dataverse.settings.FeatureFlags;
import edu.harvard.iq.dataverse.settings.JvmSettings;
import edu.harvard.iq.dataverse.util.SignpostingResources;
import edu.harvard.iq.dataverse.util.FileMetadataUtil;
Expand Down Expand Up @@ -2992,27 +2997,38 @@ public String updateCurrentVersion() {
String className = settingsService.get(SettingsServiceBean.Key.ArchiverClassName.toString());
AbstractSubmitToArchiveCommand archiveCommand = ArchiverUtil.createSubmitToArchiveCommand(className, dvRequestService.getDataverseRequest(), updateVersion);
if (archiveCommand != null) {
// Delete the record of any existing copy since it is now out of date/incorrect
updateVersion.setArchivalCopyLocation(null);
/*
* Then try to generate and submit an archival copy. Note that running this
* command within the CuratePublishedDatasetVersionCommand was causing an error:
* "The attribute [id] of class
* [edu.harvard.iq.dataverse.DatasetFieldCompoundValue] is mapped to a primary
* key column in the database. Updates are not allowed." To avoid that, and to
* simplify reporting back to the GUI whether this optional step succeeded, I've
* pulled this out as a separate submit().
*/
try {
updateVersion = commandEngine.submit(archiveCommand);
if (!updateVersion.getArchivalCopyLocationStatus().equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE)) {
successMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.success");
} else {
errorMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.failure");
//There is an archiver configured, so now decide what to dO:
// If a successful copy exists, don't automatically update, just note the old copy is obsolete (and enable the superadmin button in the display to allow a ~manual update if desired)
// If pending or an obsolete copy exists, do nothing (nominally if a pending run succeeds and we're updating the current version here, it should be marked as obsolete - ignoring for now since updates within the time an archiving run is pending should be rare
// If a failure or null, rerun archiving now. If a failure is due to an exiting copy in the repo, we'll fail again
String status = updateVersion.getArchivalCopyLocationStatus();
if((status==null) || status.equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE) || (FeatureFlags.ARCHIVE_ON_VERSION_UPDATE.enabled() && archiveCommand.canDelete())){
// Delete the record of any existing copy since it is now out of date/incorrect
JsonObjectBuilder job = Json.createObjectBuilder();
job.add(DatasetVersion.ARCHIVAL_STATUS, DatasetVersion.ARCHIVAL_STATUS_PENDING);
updateVersion.setArchivalCopyLocation(JsonUtil.prettyPrint(job.build()));
//Persist to db now
datasetVersionService.persistArchivalCopyLocation(updateVersion);
/*
* Then try to generate and submit an archival copy. Note that running this
* command within the CuratePublishedDatasetVersionCommand was causing an error:
* "The attribute [id] of class
* [edu.harvard.iq.dataverse.DatasetFieldCompoundValue] is mapped to a primary
* key column in the database. Updates are not allowed." To avoid that, and to
* simplify reporting back to the GUI whether this optional step succeeded, I've
* pulled this out as a separate submit().
*/
try {
commandEngine.submitAsync(archiveCommand);
JsfHelper.addSuccessMessage(BundleUtil.getStringFromBundle("datasetversion.archive.inprogress"));
} catch (CommandException ex) {
errorMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.failure") + " - " + ex.toString();
logger.severe(ex.getMessage());
}
} catch (CommandException ex) {
errorMsg = BundleUtil.getStringFromBundle("datasetversion.update.archive.failure") + " - " + ex.toString();
logger.severe(ex.getMessage());
} else if(status.equals(DatasetVersion.ARCHIVAL_STATUS_SUCCESS)) {
//Not automatically replacing the old archival copy as creating it is expensive
updateVersion.setArchivalStatusOnly(DatasetVersion.ARCHIVAL_STATUS_OBSOLETE);
datasetVersionService.persistArchivalCopyLocation(updateVersion);
}
}
}
Expand Down Expand Up @@ -6087,33 +6103,33 @@ public void refreshPaginator() {

/**
* This method can be called from *.xhtml files to allow archiving of a dataset
* version from the user interface. It is not currently (11/18) used in the IQSS/develop
* branch, but is used by QDR and is kept here in anticipation of including a
* GUI option to archive (already published) versions after other dataset page
* changes have been completed.
* version from the user interface.
*
* @param id - the id of the datasetversion to archive.
*/
public void archiveVersion(Long id) {
public void archiveVersion(Long id, boolean force) {
if (session.getUser() instanceof AuthenticatedUser) {
DatasetVersion dv = datasetVersionService.retrieveDatasetVersionByVersionId(id).getDatasetVersion();
String className = settingsWrapper.getValueForKey(SettingsServiceBean.Key.ArchiverClassName, null);
AbstractSubmitToArchiveCommand cmd = ArchiverUtil.createSubmitToArchiveCommand(className, dvRequestService.getDataverseRequest(), dv);
if (cmd != null) {
try {
DatasetVersion version = commandEngine.submit(cmd);
if (!version.getArchivalCopyLocationStatus().equals(DatasetVersion.ARCHIVAL_STATUS_FAILURE)) {
String status = dv.getArchivalCopyLocationStatus();
if (status == null || (force && cmd.canDelete())) {

// Set initial pending status
JsonObjectBuilder job = Json.createObjectBuilder();
job.add(DatasetVersion.ARCHIVAL_STATUS, DatasetVersion.ARCHIVAL_STATUS_PENDING);
dv.setArchivalCopyLocation(JsonUtil.prettyPrint(job.build()));
//Persist now
datasetVersionService.persistArchivalCopyLocation(dv);
commandEngine.submitAsync(cmd);

logger.info(
"DatasetVersion id=" + version.getId() + " submitted to Archive, status: " + dv.getArchivalCopyLocationStatus());
} else {
logger.severe("Error submitting version " + version.getId() + " due to conflict/error at Archive");
}
if (version.getArchivalCopyLocation() != null) {
"DatasetVersion id=" + dv.getId() + " submitted to Archive, status: " + dv.getArchivalCopyLocationStatus());
setVersionTabList(resetVersionTabList());
this.setVersionTabListForPostLoad(getVersionTabList());
JsfHelper.addSuccessMessage(BundleUtil.getStringFromBundle("datasetversion.archive.success"));
} else {
JsfHelper.addErrorMessage(BundleUtil.getStringFromBundle("datasetversion.archive.failure"));
JsfHelper.addSuccessMessage(BundleUtil.getStringFromBundle("datasetversion.archive.inprogress"));
}
} catch (CommandException ex) {
logger.log(Level.SEVERE, "Unexpected Exception calling submit archive command", ex);
Expand Down Expand Up @@ -6147,31 +6163,39 @@ public boolean isArchivable() {
return archivable;
}

/** Method to decide if a 'Submit' button should be enabled for archiving a dataset version. */
public boolean isVersionArchivable() {
if (versionArchivable == null) {
// If this dataset isn't in an archivable collection return false
versionArchivable = false;
if (isArchivable()) {
boolean checkForArchivalCopy = false;

// Otherwise, we need to know if the archiver is single-version-only
// If it is, we have to check for an existing archived version to answer the
// question
String className = settingsWrapper.getValueForKey(SettingsServiceBean.Key.ArchiverClassName, null);
if (className != null) {
try {
boolean checkForArchivalCopy = false;
Class<?> clazz = Class.forName(className);
Method m = clazz.getMethod("isSingleVersion", SettingsWrapper.class);
Method m2 = clazz.getMethod("supportsDelete");

Object[] params = { settingsWrapper };
boolean supportsDelete = (Boolean) m2.invoke(null);
checkForArchivalCopy = (Boolean) m.invoke(null, params);

if (checkForArchivalCopy) {
// If we have to check (single version archiving), we can't allow archiving if
// one version is already archived (or attempted - any non-null status)
versionArchivable = !isSomeVersionArchived();
} else {
// If we allow multiple versions or didn't find one that has had archiving run
// on it, we can archive, so return true
versionArchivable = true;
// If we didn't find one that has had archiving run
// on it, or we archiving per version is supported and either
// the status is null or the archiver can delete prior runs and status isn't success,
// we can archive, so return true
String status = workingVersion.getArchivalCopyLocationStatus();
versionArchivable = (status == null) || ((!status.equals(DatasetVersion.ARCHIVAL_STATUS_SUCCESS) && (!status.equals(DatasetVersion.ARCHIVAL_STATUS_PENDING)) && supportsDelete));
}
} catch (ClassNotFoundException | IllegalAccessException | IllegalArgumentException
| InvocationTargetException | NoSuchMethodException | SecurityException e) {
Expand Down
25 changes: 18 additions & 7 deletions src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ public enum VersionState {
public static final String ARCHIVAL_STATUS_PENDING = "pending";
public static final String ARCHIVAL_STATUS_SUCCESS = "success";
public static final String ARCHIVAL_STATUS_FAILURE = "failure";
public static final String ARCHIVAL_STATUS_OBSOLETE = "obsolete";

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
Expand Down Expand Up @@ -231,8 +232,9 @@ public enum VersionState {
@Transient
private DatasetVersionDifference dvd;

//The Json version of the archivalCopyLocation string
@Transient
private JsonObject archivalStatus;
private JsonObject archivalCopyLocationJson;

public Long getId() {
return this.id;
Expand Down Expand Up @@ -383,24 +385,24 @@ public String getArchivalCopyLocation() {
public String getArchivalCopyLocationStatus() {
populateArchivalStatus(false);

if(archivalStatus!=null) {
return archivalStatus.getString(ARCHIVAL_STATUS);
if(archivalCopyLocationJson!=null) {
return archivalCopyLocationJson.getString(ARCHIVAL_STATUS);
}
return null;
}
public String getArchivalCopyLocationMessage() {
populateArchivalStatus(false);
if(archivalStatus!=null) {
return archivalStatus.getString(ARCHIVAL_STATUS_MESSAGE);
if(archivalCopyLocationJson!=null && archivalCopyLocationJson.containsKey(ARCHIVAL_STATUS_MESSAGE)) {
return archivalCopyLocationJson.getString(ARCHIVAL_STATUS_MESSAGE);
}
return null;
}

private void populateArchivalStatus(boolean force) {
if(archivalStatus ==null || force) {
if(archivalCopyLocationJson ==null || force) {
if(archivalCopyLocation!=null) {
try {
archivalStatus = JsonUtil.getJsonObject(archivalCopyLocation);
archivalCopyLocationJson = JsonUtil.getJsonObject(archivalCopyLocation);
} catch(Exception e) {
logger.warning("DatasetVersion id: " + id + "has a non-JsonObject value, parsing error: " + e.getMessage());
logger.fine(archivalCopyLocation);
Expand All @@ -414,6 +416,15 @@ public void setArchivalCopyLocation(String location) {
populateArchivalStatus(true);
}

// Convenience method to just change the status without changing the location
public void setArchivalStatusOnly(String status) {
populateArchivalStatus(false);
JsonObjectBuilder job = Json.createObjectBuilder(archivalCopyLocationJson);
job.add(DatasetVersion.ARCHIVAL_STATUS, status);
archivalCopyLocationJson = job.build();
archivalCopyLocation = JsonUtil.prettyPrint(archivalCopyLocationJson);
}

public String getDeaccessionLink() {
return deaccessionLink;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,14 @@
import jakarta.ejb.EJB;
import jakarta.ejb.EJBException;
import jakarta.ejb.Stateless;
import jakarta.ejb.TransactionAttribute;
import jakarta.ejb.TransactionAttributeType;
import jakarta.inject.Named;
import jakarta.json.Json;
import jakarta.json.JsonObjectBuilder;
import jakarta.persistence.EntityManager;
import jakarta.persistence.NoResultException;
import jakarta.persistence.OptimisticLockException;
import jakarta.persistence.PersistenceContext;
import jakarta.persistence.Query;
import jakarta.persistence.TypedQuery;
Expand Down Expand Up @@ -1333,4 +1336,24 @@ public Long getDatasetVersionCount(Long datasetId, boolean canViewUnpublishedVer

return em.createQuery(cq).getSingleResult();
}


/**
* Update the archival copy location for a specific version of a dataset.
* Archiving can be long-running and other parallel updates to the datasetversion have likely occurred
* so this method will just re-find the version rather than risking an
* OptimisticLockException and then having to retry in yet another transaction (since the OLE rolls this one back).
*
* @param dv
* The dataset version whose archival copy location we want to update. Must not be {@code null}.
*/
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
public void persistArchivalCopyLocation(DatasetVersion dv) {
DatasetVersion currentVersion = find(dv.getId());
if (currentVersion != null) {
currentVersion.setArchivalCopyLocation(dv.getArchivalCopyLocation());
} else {
logger.log(Level.SEVERE, "Could not find DatasetVersion with id={0} to retry persisting archival copy location after OptimisticLockException.", dv.getId());
}
}
}
Loading