Drupal 8 migration from a SOAP API

Returning from my sabbatical, as promised I’m catching up on blogging about previous projects. For one such project, I was contracted by Acquia to provide migration assistance to a client of theirs [redacted, but let’s call them Acme]. This project involved some straightforward node migrations from CSV files, but more interestingly required implementing two ongoing feeds to synchronize external data periodically - one a SOAP feed, and the other a JSON feed protected by OAuth-based authentication. There were a number of other interesting techniques employed on this project which I think may be broadly useful and haven’t previously blogged about - all-in-all, there was enough to write about on this project that rather than compose one big epic post, I’m going to break things down in a series of posts, spread out over several days so as not to spam Planet Drupal. In this first post of the sequence, I’ll cover migration from SOAP. The full custom migration module for this project is on Gitlab.

A key requirement of the Acme project was to implement an ongoing feed, representing classes (the kind people attend in person, not the PHP kind), from a SOAP API to “event” nodes in Drupal. The first step, of course, was to develop (in migrate_plus) a parser plugin to handle SOAP feeds, based on PHP’s SoapClient class. This class exposes functions of the web service as class methods which may be directly invoked. In WSDL mode (the default, and the only mode this plugin currently supports), it can also report the signatures of the methods it supports (via __getFunctions()) and the data structures passed as parameters and returned as results (via __getTypes()). WSDL allows our plugin to do introspection and saves the need for some explicit configuration (in particular, it can automatically determine the property to be returned from within the response).

migrate_example_advanced (a submodule of migrate_plus) demonstrates a simple example of how to use the SOAP parser plugin - the .yml is well-documented, so please review that for a general introduction to the configuration. Here’s the basic source configuration for this specific project:

source:
  plugin: url
  # To remigrate any changed events.
  track_changes: true 
  data_fetcher_plugin: http # Ignored - SoapClient does the fetching itself.
  data_parser_plugin: soap
  # The method to invoke via the SOAP API.
  function: GetClientSessionsByClientId
  # Within the response, the object property containing the list of events.
  item_selector: SessionBOLExternal
  # Indicates that the response will be in the form of a PHP object.
  response_type: object
  # You won’t find ‘urls’ and ‘parameters’ in the source .yml file (they are inserted
  # by a web UI - the subject of a future post), but for demonstration purposes
  # this is what they might look like.
  urls: http://services.example.com/CFService.asmx?wsdl
  parameters:
    clientId: 1234
    clientCredential:
      ClientID: 1234
      Password: service_password
    startDate: 08-31-2016
  # Unique identifier for each event (section) to be imported, composed of 3 columns.
  ids:
    ClassID:
      type: integer
    SessionID:
      type: integer
    SectionID:
      type: integer
  fields:
    -
      name: ClientSessionID
      label: Session ID for the client
      selector: ClientSessionID
    ...

Of particular note is the three-part source ID defined here. The way this data is structured, a “class” contains multiple “sessions”, which each have multiple “sections” - the sections are the instances that have specific dates and times, which we need to import into event nodes, and we need all three IDs to uniquely identify each unique section.

Not all of the data we need for our event nodes is in the session feed, unfortunately - we want to capture some of the class-level data as well. So, while, the base migration uses the SOAP parser plugin to get the session rows to migrate, we need to fetch the related data at run time by making direct SOAP calls ourselves. We do this in our subscriber to the PREPARE_ROW event - this event is dispatched after the source plugin has obtained the basic data per its configuration, and gives us an opportunity to retrieve further data to add to the canonical source row before it enters the processing pipeline. I won’t go into detail on how that data is retrieved since it isn’t relevant to general migration principles, but the idea is since all the class data is not prohibitively large, and multiple sessions may reference the same class data, we fetch it all on the first source row processed and cache it for reference by subsequent rows.

Community contributions

SOAP Source plugin - Despite the title (from the original feature request), it was implemented as a parser plugin.

Altering migration configuration at import time - the PRE_IMPORT event

Our event feed permits filtering by the event start date - by passing a ‘startDate’ parameter in the format 12-31-2016 to the SOAP method, the feed will only return events starting on or after that date. At any given point in time we are only interested in future events, and don’t want to waste time retrieving and processing past events. To optimize this, we want the startDate parameter in our source configuration to be today’s date each time we run the migration. We can do this by subscribing to the PRE_IMPORT event.

In acme_migrate.services.yml:

services:
 ...
 acme_migrate.update_event_filter:
   class: Drupal\acme_migrate\EventSubscriber\UpdateEventFilter
   tags:
     - { name: event_subscriber }

In UpdateEventFilter.php:

class UpdateEventFilter implements EventSubscriberInterface {

 /**
  * {@inheritdoc}
  */
 public static function getSubscribedEvents() {
    $events[MigrateEvents::PRE_IMPORT] = 'onMigrationPreImport';
    return $events;
 }

The migration system dispatches the PRE_IMPORT event before the actual import begins executing. At that point, we can insert the desired date filter into the migration configuration entity and save it:

 /**
  * Set the event start date filter to today.
  *
  * @param \Drupal\migrate\Event\MigrateImportEvent $event
  * The import event.
  */
 public function onMigrationPreImport(MigrateImportEvent $event) {
   // $event->getMigration() returns the migration *plugin*.
   if ($event->getMigration()->id() == 'event') {
     // Migration::load() returns the migration *entity*.
     $event_migration = Migration::load('event');
     $source = $event_migration->get('source');
     $source['parameters']['startDate'] = date('m-d-Y');
     $event_migration->set('source', $source);
     $event_migration->save();
   }
 }

Note that the entity get() and set() functions only operate directly on top-level configuration properties - we can’t get and set, for example ‘source.parameters.startDate’ directly. We need to retrieve the entire source configuration, modify our one value within it, and set the entire source configuration back on the migration.

Use the Twitter thread below to comment on this post: