From 132f2c6675050dcb25793328026b2a9bd959d552 Mon Sep 17 00:00:00 2001 From: Irfan Ahmad Date: Mon, 8 Jun 2026 20:17:49 +0500 Subject: [PATCH 1/3] docs: document the backup/restore ZIP archive format Adds a reference page describing the TOML-based ZIP format produced by `create_zip_file` / `lp_dump` and consumed by `load_learning_package` / `lp_load`. Covers the full archive layout, every TOML file schema with field-level descriptions and annotated examples drawn from the test fixtures, the XBlock XML placement convention, and quick-start usage snippets for both the management commands and the Python API. Closes https://github.com/openedx/openedx-core/issues/492 Co-Authored-By: Claude Sonnet 4.6 --- docs/openedx_content/backup_restore.rst | 312 ++++++++++++++++++++++++ docs/openedx_content/index.rst | 1 + 2 files changed, 313 insertions(+) create mode 100644 docs/openedx_content/backup_restore.rst diff --git a/docs/openedx_content/backup_restore.rst b/docs/openedx_content/backup_restore.rst new file mode 100644 index 000000000..a064c4e4d --- /dev/null +++ b/docs/openedx_content/backup_restore.rst @@ -0,0 +1,312 @@ +.. _backup-restore-format: + +Backup / Restore Format +======================= + +The ``backup_restore`` applet lets you export a learning package (V2 content +library) to a portable ZIP archive and restore it on the same or a different +Open edX instance. + +.. contents:: Contents + :local: + :depth: 2 + +Overview +-------- + +A backup ZIP is a self-contained snapshot of one learning package. It captures +every component, collection, container (sections / subsections / units), static +asset, and version history that existed at export time. + +The archive uses `TOML `_ for all metadata files and keeps the +actual XBlock content as XML (the same ``block.xml`` format Studio has always +used). This makes backups both machine-readable and human-inspectable. + +.. note:: + + The current archive ``format_version`` is **1**. Future incompatible changes + to the schema will increment this number so that tooling can detect them + before attempting a restore. + +Exporting a Package +------------------- + +Management command (recommended for operators):: + + python manage.py lp_dump output.zip + python manage.py lp_dump output.zip --username admin --origin_server cms.example.com + +Python API:: + + from openedx_content.api import create_zip_file + + create_zip_file( + package_ref="lib:MyOrg:MyLibrary", + path="/tmp/my_library.zip", + user=request.user, # optional – recorded in package.toml + origin_server="cms.example.com", # optional + ) + +Restoring a Package +------------------- + +Management command:: + + python manage.py lp_load output.zip + +Python API:: + + from openedx_content.api import load_learning_package + + result = load_learning_package(path="/tmp/my_library.zip") + if result["status"] == "error": + print(result["log_file_error"].getvalue()) + +.. note:: + + ``load_learning_package`` accepts an optional ``package_ref`` argument. + When provided it overrides the ``key`` stored in ``package.toml``, which + is useful when importing a library under a new reference. + +Archive Structure +----------------- + +:: + + .zip + ├── package.toml # library metadata + archive metadata + ├── collections/ + │ └── .toml # one file per collection + └── entities/ + ├── .toml # sections, subsections, units + └── xblock.v1/ + └── / # e.g. html, problem, video + ├── .toml # entity metadata + version list + └── / + └── component_versions/ + └── v/ + ├── block.xml # XBlock content (XML) + └── static/ # media assets referenced by block.xml + +File Format Reference +--------------------- + +package.toml +~~~~~~~~~~~~ + +Located at the root of the archive. Contains two sections: + +``[meta]`` — archive metadata (not restored to the database, for inspection only): + +.. list-table:: + :header-rows: 1 + :widths: 25 15 60 + + * - Field + - Required + - Description + * - ``format_version`` + - yes + - Integer schema version; currently ``1`` + * - ``created_by`` + - no + - Username of the operator who ran the export + * - ``created_by_email`` + - no + - Email address of the exporting user + * - ``created_at`` + - yes + - UTC timestamp when the archive was created + * - ``origin_server`` + - no + - Hostname of the CMS instance that produced the archive + +``[learning_package]`` — library data (restored to the database): + +.. list-table:: + :header-rows: 1 + :widths: 25 15 60 + + * - Field + - Required + - Description + * - ``title`` + - yes + - Human-readable name of the library + * - ``key`` + - yes + - Package reference string, e.g. ``lib:MyOrg:MyLib`` + * - ``description`` + - yes + - Free-text description (may be blank) + * - ``created`` + - yes + - UTC timestamp when the library was originally created + * - ``updated`` + - yes + - UTC timestamp of the library's last modification + +Example:: + + [meta] + format_version = 1 + created_by = "lp_user" + created_by_email = "lp_user@example.com" + created_at = 2025-10-05T18:23:45.180535Z + origin_server = "cms.test" + + [learning_package] + title = "Library test" + key = "lib:WGU:LIB_C001" + description = "" + created = 2025-08-19T04:25:10.988166Z + updated = 2025-08-19T04:25:10.988166Z + +Component entity TOML (``entities/xblock.v1//.toml``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each XBlock component gets one TOML file. + +``[entity]``: + +.. list-table:: + :header-rows: 1 + :widths: 25 15 60 + + * - Field + - Required + - Description + * - ``can_stand_alone`` + - yes + - Whether this component can be used independently (almost always ``true``) + * - ``key`` + - yes + - Entity reference in the form ``xblock.v1::`` + * - ``created`` + - yes + - UTC creation timestamp + +``[entity.draft]`` / ``[entity.published]`` — each contains ``version_num`` +pointing at the current draft or published ``[[version]]`` entry respectively. +If a section is absent the entity has no draft or published version. + +``[[version]]`` — one entry per saved version, in ascending ``version_num`` order: + +.. list-table:: + :header-rows: 1 + :widths: 25 15 60 + + * - Field + - Required + - Description + * - ``title`` + - yes + - Display name of the component at this version + * - ``version_num`` + - yes + - Monotonically increasing integer starting at 1 + +Example:: + + [entity] + can_stand_alone = true + key = "xblock.v1:html:e32d5479-9492-41f6-9222-550a7346bc37" + created = 2025-08-19T04:25:43.685529Z + + [entity.draft] + version_num = 5 + + [entity.published] + version_num = 4 + + # ### Versions + + [[version]] + title = "Text" + version_num = 4 + + [[version]] + title = "Text" + version_num = 5 + +Container entity TOML (``entities/.toml``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Sections, subsections, and units share the same base structure with an +additional ``[entity.container.]`` marker (``section``, ``subsection``, +or ``unit``) and a ``[version.container]`` table that lists child keys. + +Example (section):: + + [entity] + can_stand_alone = true + key = "section1-8ca126" + created = 2025-09-04T22:51:40.919872Z + + [entity.draft] + version_num = 2 + + [entity.published] + # unpublished: no published_version_num + + [entity.container.section] + + # ### Versions + + [[version]] + title = "Section1" + version_num = 2 + + [version.container] + children = ["subsection1-48afa3"] + +Collection TOML (``collections/.toml``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. list-table:: + :header-rows: 1 + :widths: 25 15 60 + + * - Field + - Required + - Description + * - ``title`` + - yes + - Collection display name + * - ``key`` + - yes + - Unique key within the library + * - ``description`` + - yes + - Free-text description (may be blank) + * - ``created`` + - yes + - UTC creation timestamp + * - ``entities`` + - yes + - List of entity reference strings (``xblock.v1::``) + +Example:: + + [collection] + title = "Collection test1" + key = "collection-test" + description = "" + created = 2025-08-19T04:25:27.754968Z + entities = [ + "xblock.v1:html:e32d5479-9492-41f6-9222-550a7346bc37", + "xblock.v1:problem:256739e8-c2df-4ced-bd10-8156f6cfa90b", + ] + +XBlock content (``component_versions/v/block.xml``) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Standard XBlock XML, identical to what Studio stores internally. Static assets +(images, PDFs, etc.) referenced with ``/static/`` in the XML are +stored alongside the XML under ``component_versions/v/static/``. + +Example ``block.xml``:: + + + Hello Me

]]> + diff --git a/docs/openedx_content/index.rst b/docs/openedx_content/index.rst index f68f625aa..80f97f781 100644 --- a/docs/openedx_content/index.rst +++ b/docs/openedx_content/index.rst @@ -10,3 +10,4 @@ Django app for modeling and authoring course content structures. decisions/index api_reference + backup_restore From ef91e39c985a259b5030e2255f65ee0558f56df9 Mon Sep 17 00:00:00 2001 From: Irfan Ahmad Date: Tue, 9 Jun 2026 14:36:17 +0500 Subject: [PATCH 2/3] docs: fix inaccuracies in backup_restore format reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Overview: clarify only draft+published versions exported, not full history - origin_server: free-form string, not validated hostname - [learning_package] heading: note key may be overridden, updated not restored - updated field: mark as reference-only, not applied during restore - [entity.published]: always present (empty table with comment when unpublished) - [[version]]: at most 2 entries — draft first, then published if different - Example: fix version order to draft (v5) first, then published (v4) Co-Authored-By: Claude Sonnet 4.6 --- docs/openedx_content/backup_restore.rst | 26 ++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/docs/openedx_content/backup_restore.rst b/docs/openedx_content/backup_restore.rst index a064c4e4d..7f78d0102 100644 --- a/docs/openedx_content/backup_restore.rst +++ b/docs/openedx_content/backup_restore.rst @@ -15,8 +15,9 @@ Overview -------- A backup ZIP is a self-contained snapshot of one learning package. It captures -every component, collection, container (sections / subsections / units), static -asset, and version history that existed at export time. +every component, collection, container (sections / subsections / units), and +static asset. For each component and container, only the current draft and +published versions are exported — the full version history is not preserved. The archive uses `TOML `_ for all metadata files and keeps the actual XBlock content as XML (the same ``block.xml`` format Studio has always @@ -119,9 +120,10 @@ Located at the root of the archive. Contains two sections: - UTC timestamp when the archive was created * - ``origin_server`` - no - - Hostname of the CMS instance that produced the archive + - Free-form string identifying the origin CMS instance (typically a + hostname or URL; stored as-is with no format validation) -``[learning_package]`` — library data (restored to the database): +``[learning_package]`` — library data (restored to the database, with caveats: ``key`` may be overridden by the caller and ``updated`` is not applied during restore): .. list-table:: :header-rows: 1 @@ -144,7 +146,8 @@ Located at the root of the archive. Contains two sections: - UTC timestamp when the library was originally created * - ``updated`` - yes - - UTC timestamp of the library's last modification + - UTC timestamp of the library's last modification (written to the + archive for reference; **not** applied during restore) Example:: @@ -188,9 +191,14 @@ Each XBlock component gets one TOML file. ``[entity.draft]`` / ``[entity.published]`` — each contains ``version_num`` pointing at the current draft or published ``[[version]]`` entry respectively. -If a section is absent the entity has no draft or published version. +``[entity.draft]`` is absent when the entity has no draft. +``[entity.published]`` is **always present** — when the entity has no +published version it is written as an empty table with an explanatory comment +(see the container example below). -``[[version]]`` — one entry per saved version, in ascending ``version_num`` order: +``[[version]]`` — at most two entries: the current draft version first, then +the current published version if it differs from draft. The full version +history is not stored. .. list-table:: :header-rows: 1 @@ -223,11 +231,11 @@ Example:: [[version]] title = "Text" - version_num = 4 + version_num = 5 [[version]] title = "Text" - version_num = 5 + version_num = 4 Container entity TOML (``entities/.toml``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 2073d8d0943b998611cd86110bfac9d497988e7d Mon Sep 17 00:00:00 2001 From: Irfan Ahmad Date: Wed, 1 Jul 2026 18:35:50 +0500 Subject: [PATCH 3/3] docs: address ormsbee review on backup_restore format doc - Use "back up" consistently to distinguish from future import/export - Fix "OLX format" and "component" qualifier (containers don't use OLX) - Clarify Library vs Learning Package relationship in Overview - Add security warning: always pass package_ref explicitly, don't trust archive - Explain derivation and hash-collision disambiguation - Note modulestore naming difference (block_id vs block.xml + parent TOML) - Note HTMLBlock CDATA limitation vs separate .html file in old course OLX - Fix singular: section / subsection / unit Co-Authored-By: Claude Sonnet 4.6 --- docs/openedx_content/backup_restore.rst | 49 ++++++++++++++++++++++--- 1 file changed, 43 insertions(+), 6 deletions(-) diff --git a/docs/openedx_content/backup_restore.rst b/docs/openedx_content/backup_restore.rst index 7f78d0102..d7dc7754a 100644 --- a/docs/openedx_content/backup_restore.rst +++ b/docs/openedx_content/backup_restore.rst @@ -3,7 +3,7 @@ Backup / Restore Format ======================= -The ``backup_restore`` applet lets you export a learning package (V2 content +The ``backup_restore`` applet lets you back up a learning package (V2 content library) to a portable ZIP archive and restore it on the same or a different Open edX instance. @@ -14,13 +14,21 @@ Open edX instance. Overview -------- +.. note:: + + A **Library** (the user-facing V2 content library) has exactly one + **Learning Package** where it stores its content, but Learning Packages can + also exist independently. During a restore, the system first creates a + standalone Learning Package for inspection; once the operator confirms the + content, that Learning Package is associated with a newly created Library. + A backup ZIP is a self-contained snapshot of one learning package. It captures -every component, collection, container (sections / subsections / units), and +every component, collection, container (section / subsection / unit), and static asset. For each component and container, only the current draft and published versions are exported — the full version history is not preserved. The archive uses `TOML `_ for all metadata files and keeps the -actual XBlock content as XML (the same ``block.xml`` format Studio has always +actual component XBlock content as XML (the same OLX format Studio has always used). This makes backups both machine-readable and human-inspectable. .. note:: @@ -63,6 +71,15 @@ Python API:: if result["status"] == "error": print(result["log_file_error"].getvalue()) +.. warning:: + + Do **not** rely on the ``key`` stored in ``package.toml`` to determine + where the content is restored. Always pass ``package_ref`` explicitly to + ``load_learning_package``; trusting the archive's own key is a security + risk and can lead to content being restored under an unintended identifier. + Similarly, never pass ``user`` from the archive — always supply the + authenticated operator making the restore request. + .. note:: ``load_learning_package`` accepts an optional ``package_ref`` argument. @@ -240,6 +257,11 @@ Example:: Container entity TOML (``entities/.toml``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The ```` is derived from the last segment of the container's +``entity_ref``. If two containers share the same last segment (e.g. a Unit +and a Subsection both named "intro"), a short hash is appended to the +second to avoid filename collisions (e.g. ``intro-48afa3.toml``). + Sections, subsections, and units share the same base structure with an additional ``[entity.container.]`` marker (``section``, ``subsection``, or ``unit``) and a ``[version.container]`` table that lists child keys. @@ -309,9 +331,24 @@ Example:: XBlock content (``component_versions/v/block.xml``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Standard XBlock XML, identical to what Studio stores internally. Static assets -(images, PDFs, etc.) referenced with ``/static/`` in the XML are -stored alongside the XML under ``component_versions/v/static/``. +OLX (Open Learning XML) for the component, in the same format Studio uses +internally. Static assets (images, PDFs, etc.) referenced with +``/static/`` in the XML are stored alongside under +``component_versions/v/static/``. + +.. note:: + + Unlike the old modulestore OLX export — where each component's file was + named after its ``block_id`` (often a machine-generated UUID) — this format + always names the file ``block.xml``. The component's identifier lives in + the parent TOML file, not the filename. + +.. note:: + + **HTMLBlock limitation:** HTML content is currently serialized inline using + a CDATA section rather than stored in a separate ``.html`` file. This + differs from old course OLX exports and is a known limitation of the current + XBlock serialization layer. Example ``block.xml``::