Uploaded image for project: 'DSpace'
  1. DSpace
  2. DS-466

Add ability to export/import entire Community/Collection/Item structure (for easier backups, migrations, etc.)

    Details

    • Type: New Feature
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.7.0
    • Component/s: DSpace API
    • Labels:
      None
    • Attachments:
      0
    • Comments:
      4
    • Documentation Status:
      In Comments

      Description

      This comes out of a requirement for DSpace integration with DuraCloud (http://www.duraspace.org/duracloud.php). One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.

      Essentially, we'd like a way to be able to export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (e.g. METS or similar structured packaging format). This entire hierarchy should also be able to be re-imported into DSpace in the same format, to allow for "roundtripping" of that content (essentially a restore of that content in the same or different DSpace installation).

      Perceived benefits to DSpace community:

      • Would allow folks to more easily move entire Communities or Collections between DSpace instances.
      • Would allow for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying on synchronizing a backup of your DB (metadata/relationships) and assetstore (bitstreams).
      • Would provide a way for people to more easily get their data out of DSpace (whatever the purpose may be).
      • Would provide a relatively standard format for people to migrate entire hierarchies (Communities/Collections) into DSpace (from another system).

      Known Issues:

      • Exporting/Importing the Community/Collection/Item hierarchy technically doesn't cover all the "content" held in DSpace. There are also Groups, EPeople and permissions/rights (which would get you closer to a full export/import of all DSpace content). However, concentrating on just the hierarchy of Community/Collection/Item seems like a good first step.

      This is related to (and a partial subset of) MIT's AIP Prototype: http://jira.dspace.org/jira/browse/DS-465 However, the AIP prototype currently does not make it very easy to re-import the exported AIPs for Communities or Collections. So, this feature would extend on the AIP prototype's current packagers/crosswalks to allow for an full export and import of an entire DSpace hierarchy, or just a set of Communities, Collections or Items.

      My current plan is to build off of the subset of the AIP prototype (essentially the packagers, crosswalks and related changes) which begins to allow for this roundtripping of Communities and Collections. I'll be adding a new SVN sandbox area for this work (so that others can help out, if it interests them). If anyone has comments, suggestions or feedback on this idea, or would like to be involved in this project, definitely let me know (or add comments to this issue).

      This work is being prototyped in the SVN Sandbox at:
      http://scm.dspace.org/svn/repo/sandbox/aip-external-1_6-prototype/

      More details on this project available on the Wiki at:
      http://wiki.dspace.org/confluence/display/DSPACE/AipBackupRestorePrototype

        Attachments

          Issue Links

            Activity

            tdonohue Tim Donohue created issue -
            tdonohue Tim Donohue made changes -
            Field Original Value New Value
            Link This issue is related to DS-465 [ DS-465 ]
            Hide
            tdonohue Tim Donohue added a comment -

            Added reference to the prototype's SVN Sandbox location in description above:
            http://scm.dspace.org/svn/repo/sandbox/aip-external-1_6-prototype/

            Show
            tdonohue Tim Donohue added a comment - Added reference to the prototype's SVN Sandbox location in description above: http://scm.dspace.org/svn/repo/sandbox/aip-external-1_6-prototype/
            tdonohue Tim Donohue made changes -
            Description This comes out of a requirement for DSpace integration with DuraCloud (http://www.duraspace.org/duracloud.php). One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.

            Essentially, we'd like a way to be able to export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (e.g. METS or similar structured packaging format). This entire hierarchy should also be able to be re-imported into DSpace in the same format, to allow for "roundtripping" of that content (essentially a restore of that content in the same or different DSpace installation).

            Perceived benefits to DSpace community:
            * Would allow folks to more easily move entire Communities or Collections between DSpace instances.
            * Would allow for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying on synchronizing a backup of your DB (metadata/relationships) and assetstore (bitstreams).
            * Would provide a way for people to more easily get their data out of DSpace (whatever the purpose may be).
            * Would provide a relatively standard format for people to migrate entire hierarchies (Communities/Collections) into DSpace (from another system).

            Known Issues:
            * Exporting/Importing the Community/Collection/Item hierarchy technically doesn't cover all the "content" held in DSpace. There are also Groups, EPeople and permissions/rights (which would get you closer to a full export/import of all DSpace content). However, concentrating on just the hierarchy of Community/Collection/Item seems like a good first step.

            This is related to (and a partial subset of) MIT's AIP Prototype: http://jira.dspace.org/jira/browse/DS-465 However, the AIP prototype currently does not make it very easy to re-import the exported AIPs for Communities or Collections. So, this feature would extend on the AIP prototype's current packagers/crosswalks to allow for an full export and import of an entire DSpace hierarchy, or just a set of Communities, Collections or Items.

            My current plan is to build off of the subset of the AIP prototype (essentially the packagers, crosswalks and related changes) which begins to allow for this roundtripping of Communities and Collections. I'll be adding a new SVN sandbox area for this work (so that others can help out, if it interests them). If anyone has comments, suggestions or feedback on this idea, or would like to be involved in this project, definitely let me know (or add comments to this issue).
            This comes out of a requirement for DSpace integration with DuraCloud (http://www.duraspace.org/duracloud.php). One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.

            Essentially, we'd like a way to be able to export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (e.g. METS or similar structured packaging format). This entire hierarchy should also be able to be re-imported into DSpace in the same format, to allow for "roundtripping" of that content (essentially a restore of that content in the same or different DSpace installation).

            Perceived benefits to DSpace community:
            * Would allow folks to more easily move entire Communities or Collections between DSpace instances.
            * Would allow for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying on synchronizing a backup of your DB (metadata/relationships) and assetstore (bitstreams).
            * Would provide a way for people to more easily get their data out of DSpace (whatever the purpose may be).
            * Would provide a relatively standard format for people to migrate entire hierarchies (Communities/Collections) into DSpace (from another system).

            Known Issues:
            * Exporting/Importing the Community/Collection/Item hierarchy technically doesn't cover all the "content" held in DSpace. There are also Groups, EPeople and permissions/rights (which would get you closer to a full export/import of all DSpace content). However, concentrating on just the hierarchy of Community/Collection/Item seems like a good first step.

            This is related to (and a partial subset of) MIT's AIP Prototype: http://jira.dspace.org/jira/browse/DS-465 However, the AIP prototype currently does not make it very easy to re-import the exported AIPs for Communities or Collections. So, this feature would extend on the AIP prototype's current packagers/crosswalks to allow for an full export and import of an entire DSpace hierarchy, or just a set of Communities, Collections or Items.

            My current plan is to build off of the subset of the AIP prototype (essentially the packagers, crosswalks and related changes) which begins to allow for this roundtripping of Communities and Collections. I'll be adding a new SVN sandbox area for this work (so that others can help out, if it interests them). If anyone has comments, suggestions or feedback on this idea, or would like to be involved in this project, definitely let me know (or add comments to this issue).

            This work is being prototyped in the SVN Sandbox at:
            http://scm.dspace.org/svn/repo/sandbox/aip-external-1_6-prototype/
            Hide
            tdonohue Tim Donohue added a comment -

            Added link to wiki page describing project.

            Show
            tdonohue Tim Donohue added a comment - Added link to wiki page describing project.
            tdonohue Tim Donohue made changes -
            Description This comes out of a requirement for DSpace integration with DuraCloud (http://www.duraspace.org/duracloud.php). One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.

            Essentially, we'd like a way to be able to export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (e.g. METS or similar structured packaging format). This entire hierarchy should also be able to be re-imported into DSpace in the same format, to allow for "roundtripping" of that content (essentially a restore of that content in the same or different DSpace installation).

            Perceived benefits to DSpace community:
            * Would allow folks to more easily move entire Communities or Collections between DSpace instances.
            * Would allow for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying on synchronizing a backup of your DB (metadata/relationships) and assetstore (bitstreams).
            * Would provide a way for people to more easily get their data out of DSpace (whatever the purpose may be).
            * Would provide a relatively standard format for people to migrate entire hierarchies (Communities/Collections) into DSpace (from another system).

            Known Issues:
            * Exporting/Importing the Community/Collection/Item hierarchy technically doesn't cover all the "content" held in DSpace. There are also Groups, EPeople and permissions/rights (which would get you closer to a full export/import of all DSpace content). However, concentrating on just the hierarchy of Community/Collection/Item seems like a good first step.

            This is related to (and a partial subset of) MIT's AIP Prototype: http://jira.dspace.org/jira/browse/DS-465 However, the AIP prototype currently does not make it very easy to re-import the exported AIPs for Communities or Collections. So, this feature would extend on the AIP prototype's current packagers/crosswalks to allow for an full export and import of an entire DSpace hierarchy, or just a set of Communities, Collections or Items.

            My current plan is to build off of the subset of the AIP prototype (essentially the packagers, crosswalks and related changes) which begins to allow for this roundtripping of Communities and Collections. I'll be adding a new SVN sandbox area for this work (so that others can help out, if it interests them). If anyone has comments, suggestions or feedback on this idea, or would like to be involved in this project, definitely let me know (or add comments to this issue).

            This work is being prototyped in the SVN Sandbox at:
            http://scm.dspace.org/svn/repo/sandbox/aip-external-1_6-prototype/
            This comes out of a requirement for DSpace integration with DuraCloud (http://www.duraspace.org/duracloud.php). One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.

            Essentially, we'd like a way to be able to export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (e.g. METS or similar structured packaging format). This entire hierarchy should also be able to be re-imported into DSpace in the same format, to allow for "roundtripping" of that content (essentially a restore of that content in the same or different DSpace installation).

            Perceived benefits to DSpace community:
            * Would allow folks to more easily move entire Communities or Collections between DSpace instances.
            * Would allow for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying on synchronizing a backup of your DB (metadata/relationships) and assetstore (bitstreams).
            * Would provide a way for people to more easily get their data out of DSpace (whatever the purpose may be).
            * Would provide a relatively standard format for people to migrate entire hierarchies (Communities/Collections) into DSpace (from another system).

            Known Issues:
            * Exporting/Importing the Community/Collection/Item hierarchy technically doesn't cover all the "content" held in DSpace. There are also Groups, EPeople and permissions/rights (which would get you closer to a full export/import of all DSpace content). However, concentrating on just the hierarchy of Community/Collection/Item seems like a good first step.

            This is related to (and a partial subset of) MIT's AIP Prototype: http://jira.dspace.org/jira/browse/DS-465 However, the AIP prototype currently does not make it very easy to re-import the exported AIPs for Communities or Collections. So, this feature would extend on the AIP prototype's current packagers/crosswalks to allow for an full export and import of an entire DSpace hierarchy, or just a set of Communities, Collections or Items.

            My current plan is to build off of the subset of the AIP prototype (essentially the packagers, crosswalks and related changes) which begins to allow for this roundtripping of Communities and Collections. I'll be adding a new SVN sandbox area for this work (so that others can help out, if it interests them). If anyone has comments, suggestions or feedback on this idea, or would like to be involved in this project, definitely let me know (or add comments to this issue).

            This work is being prototyped in the SVN Sandbox at:
            http://scm.dspace.org/svn/repo/sandbox/aip-external-1_6-prototype/

            More details on this project available on the Wiki at:
            http://wiki.dspace.org/confluence/display/DSPACE/AipBackupRestorePrototype
            tdonohue Tim Donohue made changes -
            Workflow jira [ 10777 ] DSpace Workflow [ 11469 ]
            Hide
            tdonohue Tim Donohue added a comment -

            Added Trunk Patches and descriptions of all changes to a new wiki page:

            https://wiki.duraspace.org/display/DSPACE/AipCoreAPIChanges

            This wiki page details all changes necessary to support this AIP Backup/Restore functionality – it also includes all patches to existing code (the patches are attached to the wiki page). The majority of the changes include refactoring of the Crosswalks and Packagers to support AIPs (and normal backup/restore tasks). There are other minor changes that were necessary to the core API, as well as SWORD and LNI – but, hopefull none of those will be controversial.

            I'm now looking for others to begin to review these changes (or checkout the code itself from the Sandbox SVN to review it).
            I think this code is ready to begin to migrate/merge into Trunk in the very near future (assuming others agree).

            More information on running/using this new AIP Backup Restore functionality is available off the wiki page:

            https://wiki.duraspace.org/display/DSPACE/AipBackupRestorePrototype

            Show
            tdonohue Tim Donohue added a comment - Added Trunk Patches and descriptions of all changes to a new wiki page: https://wiki.duraspace.org/display/DSPACE/AipCoreAPIChanges This wiki page details all changes necessary to support this AIP Backup/Restore functionality – it also includes all patches to existing code (the patches are attached to the wiki page). The majority of the changes include refactoring of the Crosswalks and Packagers to support AIPs (and normal backup/restore tasks). There are other minor changes that were necessary to the core API, as well as SWORD and LNI – but, hopefull none of those will be controversial. I'm now looking for others to begin to review these changes (or checkout the code itself from the Sandbox SVN to review it). I think this code is ready to begin to migrate/merge into Trunk in the very near future (assuming others agree). More information on running/using this new AIP Backup Restore functionality is available off the wiki page: https://wiki.duraspace.org/display/DSPACE/AipBackupRestorePrototype
            tdonohue Tim Donohue made changes -
            Link This issue is related to DS-647 [ DS-647 ]
            Hide
            tdonohue Tim Donohue added a comment -

            Resolving this issue, as primary code is now committed to Trunk (as of rev 5265).

            Documentation on how to use & configure this new feature is available on the Wiki at:
            https://wiki.duraspace.org/display/DSPACE/AipBackupRestore

            However, I still require help on fully testing the LNI refactoring, as detailed in issue DS-647

            Show
            tdonohue Tim Donohue added a comment - Resolving this issue, as primary code is now committed to Trunk (as of rev 5265). Documentation on how to use & configure this new feature is available on the Wiki at: https://wiki.duraspace.org/display/DSPACE/AipBackupRestore However, I still require help on fully testing the LNI refactoring, as detailed in issue DS-647
            tdonohue Tim Donohue made changes -
            Status Open [ 1 ] Resolved [ 5 ]
            Resolution Fixed [ 1 ]
            Documentation Status Needed In Comments
            tdonohue Tim Donohue made changes -
            Project Import Thu Oct 14 14:40:23 UTC 2010 [ 1287067223065 ]
            tdonohue Tim Donohue made changes -
            Link This issue is related to DS-772 [ DS-772 ]
            tdonohue Tim Donohue made changes -
            Workflow DSpace Workflow [ 15379 ] DSpace JIRA Workflow [ 23405 ]
            Status Resolved [ 5 ] Closed [ 6 ]

              People

              • Assignee:
                tdonohue Tim Donohue
                Reporter:
                tdonohue Tim Donohue
              • Votes:
                1 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: