Soong 0.5.2 released

The 0.5.2 release of the Soong ETL library is now available on Packagist. tl;dr:

  • It's still too early to use in real-world applications - APIs continue to evolve.
  • It's an excellent time to get involved in development, and help determine the direction we take!

Key changes include:

  1. Tests have been fleshed out - pretty much everything (other than the console commands) has unit tests, plus there's a smoke test that runs all the provided examples to make sure they work. There are base test classes associated with the major interfaces that will be a big help when you implement components.
  2. Implementing the tests exposed the smell (stench?) of static methods (primarily factory methods), so those have been eliminated. For now, we're using straight-up public function __construct(array $configuration) for our key components - we'll address later how best to factoryize them (probably when integrating into other frameworks).
  3. The console commands now use hassankhan/config to read configuration from disk - this means any format supported by the Config library (YAML, JSON, XML, …) can be used for Soong configuration. The primary format for the examples remains YAML, since that's the most human-readable, but there are JSON and XML samples as well.
  4. All our configurable component interfaces (i.e., all except the Data components and TaskPipeline) now inherit from ConfigurableComponent. The default implementations provided extend a base class which uses Symfony OptionsResolver to define and validate the configuration options they support.

So, where are we now and where are we going?

I have been envisioning a very pure architecture where the interfaces (and abstract base classes expected to be used by most implementations) were totally free of external dependencies, every distinct area of functionality in its own interface, etc. I'm seeing now that I may be guilty of premature optimization of architecture - I've been spending too much time trying to purify the architecture and not enough building useful tools.

So, I'd like to turn attention now to basic applications, and let them drive where to put work in on the architecture. We have basic console commands now - let's build those out to where they're really useful for most than just the most basic real-world scenarios. The other thing is my motivation for making this all configuration-driven - an interactive application for creating and modifying ETL processes, supporting a process like:

  1. Present the user with the supported types of data sources for extraction (i.e., let them select an Extractor class, such as Csv or DBAL).
  2. Present the configuration options exposed by the Extractor. Let the user provide the specific source data (enter database credentials, upload a CSV file, etc.) and set any other appropriate options.
  3. Present the supported types of destination (i.e., let them select a Loader class) and configure it.
  4. Present the properties offered by the destination and allow a sequence of Transformers (with supported properties from the Extractor available as inputs) to be configured for each.
  5. Generate the full task configuration.
  6. Tasks can then be managed/run either with the command-line tools or through the UI.

Thus, I want to define the MVP (Minimal Viable Product) for those two applications, and implementing their MVP (probably at just a POC level for the UI at first) will define the MVP of the underlying framework. Ambitiously, I had wanted to get everything I want in an ETL framework (everything I had in Drupal plus more) into Soong V1.0 - but more realistically, I'm thinking let's get an MVP for V1.0 and flesh it out in V2.0.

But I can't do it all alone! While I've had some feedback from fellow Drupal developers (including at the recent MidCamp), I'm looking for more eyes on this work - especially from non-Drupal PHP developers who can challenge the assumptions we have from several years of working with the Drupal migration architecture. Now that we have a test suite, there are just a couple more DX issues to address - I think you'll find it quite easy to jump in, understand the architecture, and contribute.

Obligatory self-promotion - I'm available for data migration projects.

Thanks!

Use the Twitter thread below to comment on this post: