Translation migration and more

I’m running a bit behind, but I want to make it a practice to blog about each migration project I’m involved with (with each client’s permission, of course). Constructed examples are all well and good, but there’s nothing like real-world scenarios to give the flavor of using the migration framework in practice. Where possible (again with the client’s permission), I will share the full migration implementation on GitHub, so people can see not only code snippets illustrating particular points, but their full context.

This spring I contracted with North Studio, a web and mobile development shop based in British Columbia, to assist with the migration of their customer The Carlyle Group’s multilingual site from Drupal 6 to Drupal 8. As happens more often than not when moving from one major version to another of Drupal, the opportunity was taken to refactor the site, rethinking some of the original site structure as well as taking advantage of the new Drupal 8 architecture. One aspect of restructuring was consolidating 26 content types into 8. At the time I joined the project, some of the new content types, and all of the taxonomy, had already been manually recreated in Drupal 8, and my responsibility was to develop automated migration of five remaining content types (four of them with translations), plus editorial user accounts.

User migration - mapping roles

The user migration makes use of the core d6_user migration almost verbatim - the crucial difference is that, as with content types, roles are being consolidated in the move to Drupal 8. In d6_user, roles are mapped 1-to-1 from Drupal 6:

roles:
  plugin: migration
  migration: d6_user_role
  source: roles

In this project, the desired roles were already setup in the Drupal 8 site, and we used a static map to translate the Drupal 6 numeric role IDs to the corresponding Drupal 8 role configuration IDs, consolidating three different D6 roles into the D8 “manager” role.

roles:
  plugin: static_map
  source: roles
  bypass: true
  map:
    4: manager             # content author
    3: manager             # developer
    6: manager             # html content author
    16: media_room_manager # media room manager
    11: translator         # translator

Files

As with users, the migration is mostly identical to the core d6_file migration. The key difference in the file migration itself is that, because much of the content (including files) has already been moved to D8, we want to be sure that the automated file migration we’re implementing here only pulls the files that are referenced by the specific content types we’re migrating. We do this by creating a source plugin extending the core d6_file source plugin, and overriding prepareRow() to ignore any file ID (fid) which is not in the field tables for those content types (by returning FALSE). Note that we could just as easily have done this by implementing hook_migrate_prepare_row(), or a subscriber to the PREPARE_ROW event provided by migrate_plus, and throwing MigrateSkipRowException. In a later post (for a later project) I will demonstrate the event subscription approach.

The other side to this is populating file fields on nodes. We would be able to do this using the core d6_cck_file process plugin - but that has a hardcoded reference to the core d6_file migration. Therefore, we needed to define our own carlyle_cck_file process plugin, extending d6_cck_file to override create() and reference our own file migration (named, imaginatively enough, ‘file’).

Nodes

Conversion of node references to taxonomy references

One aspect of the site refactoring was the conversion of some node types to vocabularies (and thus node references become taxonomy term references). For example, the former “industry” node type became an “industry” vocabulary. This vocabulary was already populated with all necessary terms when this migration project started - the challenge was to convert the node references. There were two elements to this:

A getTargetTitles() method was added to a common source plugin class, CarlyleNode, which from a node reference field name would gather the node ID (nid) references in that source property, look up the title for each referenced node, and set the source property to the list of titles.
The migration configuration file mapped the field using the entity_lookup process plugin (provided in migrate_plus). This lookup enables one to populate a reference field by looking up the incoming source value against a given property of the destination field’s target type, and set the referenced entity ID.

So, given this field mapping:

field_industry:
  plugin: entity_lookup
  source: field_industry
  ignore_case: true

The source property is initially populated by the core migration framework with the referenced node IDs on Drupal 6. In prepareRow(), getTargetTitles() is invoked to convert those nids to node titles (which in the node->vocabulary conversion were used as the term names). Then, in the processing stage, the entity_lookup plugin queries the field_industry target vocabulary for terms with those names, returning the term IDs to populate the term references. The ignore_case setting, of course, makes sure case differences between the manually-created terms in D8 and the incoming values don't prevent us from making the match.

Translations

The biggest challenge of this project was handling translated nodes. At the time of this project, migration of translations had not yet been implemented in core. The issue to implement this for Drupal 6 had become an epic, because the mechanism for storing node translations had changed completely - before Drupal 8, each translation was a separate node, referencing the parent node via a ‘tnid’ column in the node table. With Drupal 8, all versions of a piece of content are stored as a single node, with each revision containing all translations of the node as of the time of that revision (and a new translation introducing a new revision). Substantial progress had been made by others (particularly vasi) on that patch at the time I needed the functionality here, but had run into the problem of properly setting the “default” language for the node (the language of the parent node in D6). For this to work, that parent node needs to be migrated before any of its translations are, and we simply could not come up with a clean way to do this within a single migration. The solution I came up with for the Carlyle project, on which the core solution was ultimately based, was to have separate migrations for the parent node and for its translations - by making the translation migration dependent on the parent node migration, we guaranteed things were created in the right order.

I will note that the following notes, as well as the code on GitHub, are not the code exactly as implemented for this project (which was basically the proof-of-concept for the core solution), but reflects the final solution committed to Drupal core as of 8.1.8 and later (and thus what you would use to implement your own translation migrations).

Considering bio nodes, the node_bio migration imports the parent nodes for each logical piece of content - nothing is special here to support translation, it’s a straight-forward node migration. The migration of bio node translations (node_bio_translation) is nearly identical to node_bio with the following differences:

The source plugin is flagged for translations:

source: plugin: bio_node node_type: bio translations: true

This tells the node source plugin to import only translated nodes - i.e., those which have a tnid which is neither 0 nor identical to nid. When this is omitted, only untranslated nodes (tnid == 0) and parent nodes for translation (tnid == nid) are imported.
The nid is explicitly set to the nid of the D8 version of the original translation (which was migrated by node_bio):

nid: plugin: migration migration: node_bio source: tnid

Note that the node_bio migration does not map nid - we cannot preserve the original nid in this case, because the destination site has manually-created content which would conflict with migrated nids.
The destination plugin is flagged for translations:

destination: plugin: 'entity:node' translations: true

This triggers some code to make sure that a translation is created on the node if the incoming language does not already exist on the node, and to add language as a destination key field to the map table for this migration.

Community contributions

The following community contributions were made in the course of this project.

Migrate Drupal 6 core node translation to Drupal 8 - as noted above, this project was the proof-of-concept for the separate-translation-migration approach which became part of the solution in core. Several people contributed the code which made this solution a reality.

default_value: null in static map skips empty rows - For this project (specifically in the podcast node migration), we needed to have a lack of a match in a static_map plugin return null, preventing the subsequent entity_lookup plugin from running. There was a core bug preventing this from working properly, which I fixed.

config-devel-import-one does not work with a .yml argument - A question frequently asked in #drupal-migrate is how to get changes to a migration configuration entity in config/install to take effect, without uninstalling and re-enabling your module? Starting with this project I’ve been using the cdi1 command in the config_devel module to reload changes - well, once I fixed this little bug.

Route collision reported when alias matches a different language - The translation migration in this project triggered this pathauto bug. My diagnosis assisted in getting this fixed, and I also contributed some work on the tests.