Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
320 changes: 320 additions & 0 deletions docs/openedx_content/backup_restore.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
.. _backup-restore-format:

Backup / Restore Format
=======================

The ``backup_restore`` applet lets you export a learning package (V2 content

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``backup_restore`` applet lets you export a learning package (V2 content
The ``backup_restore`` applet lets you back up a learning package (V2 content

We're intentionally trying to use "backup/restore" to distinguish it between incremental import/export functionality that we plan to add in the future.

library) to a portable ZIP archive and restore it on the same or a different
Open edX instance.

.. contents:: Contents
:local:
:depth: 2

Overview
--------

A backup ZIP is a self-contained snapshot of one learning package. It captures

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify the difference between a Learning Package and a Library. Namely, that a Library has one and only one Learning Package where it stores its content, but Learning Packages can also stand alone. The restore process creates a temporary Learning Package that can be reviewed by the user, and then later associates that Learning Package with a newly created Library.

every component, collection, container (sections / subsections / units), and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
every component, collection, container (sections / subsections / units), and
every component, collection, container (section / subsection / unit), and

static asset. For each component and container, only the current draft and
published versions are exported — the full version history is not preserved.

The archive uses `TOML <https://toml.io>`_ for all metadata files and keeps the
actual XBlock content as XML (the same ``block.xml`` format Studio has always

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
actual XBlock content as XML (the same ``block.xml`` format Studio has always
component XBlock content as XML (the same OLX format Studio has always

In modulestore, the XML files are not named block.xml. Also, the old XML format is being kept for components (e.g. problems, videos), but not for structural container types like units and subsections.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's probably worth noting that the naming is different--in courses, each component would be exported with it's block_id as the name of the file. That's usually a machine-generated ID (since that's the default in Split) but sometimes it's a meaningful identifier when authored by hand. For our export format, it the OLX is always block.xml, and it's the metadata in the parent TOML file that gives the identifier.

used). This makes backups both machine-readable and human-inspectable.

.. note::

The current archive ``format_version`` is **1**. Future incompatible changes
to the schema will increment this number so that tooling can detect them
before attempting a restore.

Exporting a Package
-------------------

Management command (recommended for operators)::

python manage.py lp_dump <package_ref> output.zip
python manage.py lp_dump <package_ref> output.zip --username admin --origin_server cms.example.com

Python API::

from openedx_content.api import create_zip_file

create_zip_file(
package_ref="lib:MyOrg:MyLibrary",
path="/tmp/my_library.zip",
user=request.user, # optional – recorded in package.toml
origin_server="cms.example.com", # optional
)

Restoring a Package
-------------------

Management command::

python manage.py lp_load output.zip <username>

Python API::

from openedx_content.api import load_learning_package

result = load_learning_package(path="/tmp/my_library.zip")
if result["status"] == "error":
print(result["log_file_error"].getvalue())

.. note::

``load_learning_package`` accepts an optional ``package_ref`` argument.
When provided it overrides the ``key`` stored in ``package.toml``, which
is useful when importing a library under a new reference.
Comment on lines +69 to +70

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use stronger language here. It's really dangerous to trust the archive for either the package_ref or the user, and callers should explicitly pass those to load_learning_package unless they really, really know what they're doing.


Archive Structure
-----------------

::

<package>.zip
├── package.toml # library metadata + archive metadata
├── collections/
│ └── <collection-key>.toml # one file per collection
└── entities/
├── <container-slug>.toml # sections, subsections, units
└── xblock.v1/
└── <block-type>/ # e.g. html, problem, video
├── <uuid>.toml # entity metadata + version list
└── <uuid>/
└── component_versions/
└── v<N>/
├── block.xml # XBlock content (XML)
└── static/ # media assets referenced by block.xml

File Format Reference
---------------------

package.toml
~~~~~~~~~~~~

Located at the root of the archive. Contains two sections:

``[meta]`` — archive metadata (not restored to the database, for inspection only):

.. list-table::
:header-rows: 1
:widths: 25 15 60

* - Field
- Required
- Description
* - ``format_version``
- yes
- Integer schema version; currently ``1``
* - ``created_by``
- no
- Username of the operator who ran the export
* - ``created_by_email``
- no
- Email address of the exporting user
* - ``created_at``
- yes
- UTC timestamp when the archive was created
* - ``origin_server``
- no
- Free-form string identifying the origin CMS instance (typically a
hostname or URL; stored as-is with no format validation)

``[learning_package]`` — library data (restored to the database, with caveats: ``key`` may be overridden by the caller and ``updated`` is not applied during restore):

.. list-table::
:header-rows: 1
:widths: 25 15 60

* - Field
- Required
- Description
* - ``title``
- yes
- Human-readable name of the library
* - ``key``
- yes
- Package reference string, e.g. ``lib:MyOrg:MyLib``
* - ``description``
- yes
- Free-text description (may be blank)
* - ``created``
- yes
- UTC timestamp when the library was originally created
* - ``updated``
- yes
- UTC timestamp of the library's last modification (written to the
archive for reference; **not** applied during restore)

Example::

[meta]
format_version = 1
created_by = "lp_user"
created_by_email = "lp_user@example.com"
created_at = 2025-10-05T18:23:45.180535Z
origin_server = "cms.test"

[learning_package]
title = "Library test"
key = "lib:WGU:LIB_C001"
description = ""
created = 2025-08-19T04:25:10.988166Z
updated = 2025-08-19T04:25:10.988166Z

Component entity TOML (``entities/xblock.v1/<type>/<uuid>.toml``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each XBlock component gets one TOML file.

``[entity]``:

.. list-table::
:header-rows: 1
:widths: 25 15 60

* - Field
- Required
- Description
* - ``can_stand_alone``
- yes
- Whether this component can be used independently (almost always ``true``)
* - ``key``
- yes
- Entity reference in the form ``xblock.v1:<type>:<uuid>``
* - ``created``
- yes
- UTC creation timestamp

``[entity.draft]`` / ``[entity.published]`` — each contains ``version_num``
pointing at the current draft or published ``[[version]]`` entry respectively.
``[entity.draft]`` is absent when the entity has no draft.
``[entity.published]`` is **always present** — when the entity has no
published version it is written as an empty table with an explanatory comment
(see the container example below).

``[[version]]`` — at most two entries: the current draft version first, then
the current published version if it differs from draft. The full version
history is not stored.

.. list-table::
:header-rows: 1
:widths: 25 15 60

* - Field
- Required
- Description
* - ``title``
- yes
- Display name of the component at this version
* - ``version_num``
- yes
- Monotonically increasing integer starting at 1

Example::

[entity]
can_stand_alone = true
key = "xblock.v1:html:e32d5479-9492-41f6-9222-550a7346bc37"
created = 2025-08-19T04:25:43.685529Z

[entity.draft]
version_num = 5

[entity.published]
version_num = 4

# ### Versions

[[version]]
title = "Text"
version_num = 5

[[version]]
title = "Text"
version_num = 4

Container entity TOML (``entities/<slug>.toml``)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain what a <slug> is: This is the last part of the entity_ref if there is no collision, but if the last parts of the entity_ref collide (e.g. a Unit and an HTMLBlock that are both "intro"), then a short hash gets appended.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sections, subsections, and units share the same base structure with an
additional ``[entity.container.<type>]`` marker (``section``, ``subsection``,
or ``unit``) and a ``[version.container]`` table that lists child keys.

Example (section)::

[entity]
can_stand_alone = true
key = "section1-8ca126"
created = 2025-09-04T22:51:40.919872Z

[entity.draft]
version_num = 2

[entity.published]
# unpublished: no published_version_num

[entity.container.section]

# ### Versions

[[version]]
title = "Section1"
version_num = 2

[version.container]
children = ["subsection1-48afa3"]

Collection TOML (``collections/<key>.toml``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 25 15 60

* - Field
- Required
- Description
* - ``title``
- yes
- Collection display name
* - ``key``
- yes
- Unique key within the library
* - ``description``
- yes
- Free-text description (may be blank)
* - ``created``
- yes
- UTC creation timestamp
* - ``entities``
- yes
- List of entity reference strings (``xblock.v1:<type>:<uuid>``)

Example::

[collection]
title = "Collection test1"
key = "collection-test"
description = ""
created = 2025-08-19T04:25:27.754968Z
entities = [
"xblock.v1:html:e32d5479-9492-41f6-9222-550a7346bc37",
"xblock.v1:problem:256739e8-c2df-4ced-bd10-8156f6cfa90b",
]

XBlock content (``component_versions/v<N>/block.xml``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Standard XBlock XML, identical to what Studio stores internally. Static assets

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference in HTMLBlock storage. Namely, we don't currently support storing a separate HTML file, so we inline the HTML with CDATA. In courses, we'd have a tiny XML file for the HTMLBlock that pointed to the HTML file.

This is a limitation of our XBlock serialization, but one I hope we can fix before Willow.

(images, PDFs, etc.) referenced with ``/static/<filename>`` in the XML are
stored alongside the XML under ``component_versions/v<N>/static/``.

Example ``block.xml``::

<html display_name="Text">
<![CDATA[<p>Hello <img src="/static/me.png" alt="Me" /></p>]]>
</html>
1 change: 1 addition & 0 deletions docs/openedx_content/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ Django app for modeling and authoring course content structures.

decisions/index
api_reference
backup_restore