<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:og="http://ogp.me/ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:schema="http://schema.org/" xmlns:sioc="http://rdfs.org/sioc/ns#" xmlns:sioct="http://rdfs.org/sioc/types#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="2.0" xml:base="https://virtuoso-performance.com/tags/soong">
  <channel>
    <title>Soong</title>
    <link>https://virtuoso-performance.com/tags/soong</link>
    <description/>
    <language>en</language>
    
    <item>
  <title>Soong 0.7.0 released</title>
  <link>https://virtuoso-performance.com/blog/mikeryan/soong-070-released</link>
  <description>&lt;span property="schema:name" class="field field-name-title field-formatter-string field-type-string field-label-hidden"&gt;Soong 0.7.0 released&lt;/span&gt;
&lt;span rel="schema:author" class="field field-name-uid field-formatter-author field-type-entity-reference field-label-hidden"&gt;&lt;span lang="" about="https://virtuoso-performance.com/user/6" typeof="schema:Person" property="schema:name" datatype="" xml:lang=""&gt;mikeryan&lt;/span&gt;&lt;/span&gt;
&lt;span property="schema:dateCreated" content="2019-07-05T21:05:10+00:00" class="field field-name-created field-formatter-timestamp field-type-created field-label-hidden"&gt;Friday, July 5, 2019 - 04:05pm&lt;/span&gt;
&lt;div property="schema:text" class="clearfix text-formatted field field-node--body field-formatter-text-default field-name-body field-type-text-with-summary field-label-hidden has-single"&gt;&lt;div class="field__items"&gt;&lt;div property="schema:text" class="field__item"&gt;&lt;p&gt;The 0.7.0 release of the &lt;a href="https://gitlab.com/soongetl/soong"&gt;Soong ETL library&lt;/a&gt; is now available on &lt;a href="https://packagist.org/packages/soong/soong"&gt;Packagist&lt;/a&gt;. &lt;a href="https://gitlab.com/soongetl/soong/blob/0.7.0/docs/CHANGELOG.md"&gt;Key changes&lt;/a&gt; since 0.6.0 include:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;The property abstractions (&lt;code&gt;PropertyInterface&lt;/code&gt;, &lt;code&gt;PropertyFactory&lt;/code&gt;, and implementations) have been removed - it seems like the abstractions will only get in the way of leveraging PHP 7.x's improved native type-checking.&lt;/li&gt;
	&lt;li&gt;The Record Transformer concept has been introduced. While our experience with Drupal emphasized the mapping of properties within each record, the logical purpose of the transformation segment of an ETL pipeline is to transform a record at a time (stipulating that, yes, 90+% of the time you're individually transforming each property within the record)...&lt;/li&gt;
	&lt;li&gt;...thus one of the two &lt;code&gt;RecordTransformer&lt;/code&gt; implementations provided out-of-the-box is the &lt;code&gt;PropertyMapper&lt;/code&gt;. The field mapping configuration we previously provided within the &lt;code&gt;transform&lt;/code&gt; key is now the &lt;code&gt;PropertyMapper&lt;/code&gt;'s configuration, and the classes are now implementations of &lt;code&gt;PropertyTransformer&lt;/code&gt; rather than &lt;code&gt;Transformer&lt;/code&gt;.&lt;/li&gt;
	&lt;li&gt;The other provided &lt;code&gt;RecordTransformer&lt;/code&gt; is &lt;code&gt;Copy&lt;/code&gt;, for bulk-copying properties directly from the source record to the destination record. In many instances, most if not all properties being migrated are retaining exactly the same name and content, so it can help clarify the "interesting" property transformations if you don't have to individual express each and every trivial property copy.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;This release was meant to be more ambitious (including refactoring of task pipelines) and sooner, but I've been busy working a couple of contracts since May. So, for the forseeable future, progress will be slower (unless/until other folks start contributing, of course!). For now, the &lt;a href="https://gitlab.com/soongetl/soong/issues?label_name%5B%5D=Task&amp;label_name%5B%5D=0.8.0&amp;scope=all&amp;sort=priority&amp;state=opened&amp;utf8=%E2%9C%93"&gt;emphasis for 0.8.0 is task refactoring&lt;/a&gt;, starting with seeing if we can leverage other libraries like &lt;a href="https://gitlab.com/soongetl/soong/issues/12"&gt;league/pipeline&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="field field-node-field-tags field-entity-reference-type-taxonomy-term field-formatter-entity-reference-label field-name-field-tags field-type-entity-reference field-label-above"&gt;&lt;h3 class="field__label"&gt;Tags&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item field__item--migration"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/migration" property="schema:about" hreflang="en"&gt;Migration&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--soong"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/soong" property="schema:about" hreflang="en"&gt;Soong&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--php"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/php" property="schema:about" hreflang="en"&gt;PHP&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="clearfix text-formatted field field-node--field-twitter-comments field-formatter-text-default field-name-field-twitter-comments field-type-text-long field-label-above has-single"&gt;&lt;h3 class="field__label"&gt;Use the Twitter thread below to comment on this post:&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item"&gt;&lt;blockquote class="twitter-tweet" data-lang="en"&gt;
&lt;p dir="ltr" lang="en" xml:lang="en" xml:lang="en"&gt;A new release of Soong ETL is out! &lt;a href="https://t.co/9CgXWl5vHx"&gt;https://t.co/9CgXWl5vHx&lt;/a&gt;&lt;/p&gt;
— Virtuoso Performance (@VirtPerformance) &lt;a href="https://twitter.com/VirtPerformance/status/1147256303208845312?ref_src=twsrc%5Etfw"&gt;July 5, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"&gt;&lt;/script&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section rel="schema:comment" class="field field-node--comment field-formatter-comment-default field-name-comment field-type-comment field-label-above display-mode-threaded comment-bundle-comment comment-wrapper"&gt;&lt;a name="comments" id="comments"&gt;&lt;/a&gt;&lt;/section&gt;</description>
  <pubDate>Fri, 05 Jul 2019 21:05:10 +0000</pubDate>
    <dc:creator>mikeryan</dc:creator>
    <guid isPermaLink="false">154 at https://virtuoso-performance.com</guid>
    </item>
<item>
  <title>Soong 0.6.0 released</title>
  <link>https://virtuoso-performance.com/blog/mikeryan/soong-060-released</link>
  <description>&lt;span property="schema:name" class="field field-name-title field-formatter-string field-type-string field-label-hidden"&gt;Soong 0.6.0 released&lt;/span&gt;
&lt;span rel="schema:author" class="field field-name-uid field-formatter-author field-type-entity-reference field-label-hidden"&gt;&lt;span lang="" about="https://virtuoso-performance.com/user/6" typeof="schema:Person" property="schema:name" datatype="" xml:lang=""&gt;mikeryan&lt;/span&gt;&lt;/span&gt;
&lt;span property="schema:dateCreated" content="2019-05-01T20:40:18+00:00" class="field field-name-created field-formatter-timestamp field-type-created field-label-hidden"&gt;Wednesday, May 1, 2019 - 03:40pm&lt;/span&gt;
&lt;div property="schema:text" class="clearfix text-formatted field field-node--body field-formatter-text-default field-name-body field-type-text-with-summary field-label-hidden has-single"&gt;&lt;div class="field__items"&gt;&lt;div property="schema:text" class="field__item"&gt;&lt;p&gt;The 0.6.0 release of the &lt;a href="https://gitlab.com/soongetl/soong"&gt;Soong ETL library&lt;/a&gt; is now available on &lt;a href="https://packagist.org/packages/soong/soong"&gt;Packagist&lt;/a&gt;. &lt;a href="https://gitlab.com/soongetl/soong/blob/0.6.0/docs/CHANGELOG.md"&gt;Key changes&lt;/a&gt; since 0.5.3 include:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;The &lt;code&gt;Filter&lt;/code&gt; component has been introduced, which examines a &lt;code&gt;Record&lt;/code&gt; and "approves" it for further processing, along with a &lt;code&gt;filters&lt;/code&gt; configuration option added to extractors to limit what gets extracted. The &lt;code&gt;Select&lt;/code&gt; filter and &lt;code&gt;migrate&lt;/code&gt; command option &lt;code&gt;--select&lt;/code&gt; have been added - extractor results may thus be filtered either in the base configuration for the extractor (representing the canonical set of data to be processed) or at runtime (for testing). An example of the latter, restricting input to records that have both a low id and a high foo value:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;bin/soong migrate arraytosql --select='id&lt;8' --select='foo&gt;g'&lt;/code&gt;&lt;br /&gt;
	 &lt;/li&gt;
	&lt;li&gt;Dependency injection has been improved - where previously class names were passed through to be instantiated by the components that needed them, responsibility for constructing all (almost, see below) components now belongs to the application (for now, the Symfony console commands being the single example) which will inject the class instances.&lt;br /&gt;
	 &lt;/li&gt;
	&lt;li&gt;Since components do need to dynamically generate &lt;code&gt;Property&lt;/code&gt; and &lt;code&gt;Record&lt;/code&gt; instances during migration, in those cases a &lt;code&gt;PropertyFactory&lt;/code&gt; or &lt;code&gt;RecordFactory&lt;/code&gt; instance is injected.&lt;br /&gt;
	 &lt;/li&gt;
	&lt;li&gt;The &lt;code&gt;DataProperty&lt;/code&gt; and &lt;code&gt;DataRecord&lt;/code&gt; classes have been renamed to Property and Record respectably.&lt;br /&gt;
	 &lt;/li&gt;
	&lt;li&gt;Error handling has been beefed up, with several exception classes added.&lt;br /&gt;
	 &lt;/li&gt;
	&lt;li&gt;We're now integrated with &lt;a href="https://scrutinizer-ci.com/gl/soong/soongetl/soong/"&gt;Scrutinizer&lt;/a&gt; for coverage and quality analysis. Unfortunately, Scrutinizer does not yet allow public access to analysis for Gitlab-based projects, so you can't look for yourself - but here's where we are now (having built out the tests further):&lt;img alt="Code quality/coverage of Soong" data-entity-type="file" data-entity-uuid="fc842583-d2ef-45d7-b03a-6b9dccacae3d" src="https://virtuoso-performance.com/sites/default/files/inline-images/Code_Quality_Summary_-_soongetl_soong_-_Scrutinizer.png" class="align-center" /&gt;&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;At the moment, the main emphases for 0.7.0 are:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Build out the console commands to be more-or-less feature-complete.&lt;/li&gt;
	&lt;li&gt;&lt;a href="https://gitlab.com/soongetl/soong/issues/76"&gt;Refactor the transformation pipeline&lt;/a&gt;.&lt;/li&gt;
	&lt;li&gt;&lt;a href="https://gitlab.com/soongetl/soong/issues?label_name%5B%5D=Task"&gt;Refactor our approach to tasks&lt;/a&gt;, in particular looking at using or gaining inspiration from other tools such as &lt;a href="https://gitlab.com/soongetl/soong/issues/64"&gt;Robo&lt;/a&gt; or &lt;a href="https://gitlab.com/soongetl/soong/issues/8"&gt;PortPHP&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Obligatory self-promotion - &lt;a href="https://virtuoso-performance.com/contact"&gt;I'm available for data migration projects&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="field field-node-field-tags field-entity-reference-type-taxonomy-term field-formatter-entity-reference-label field-name-field-tags field-type-entity-reference field-label-above"&gt;&lt;h3 class="field__label"&gt;Tags&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item field__item--migration"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/migration" property="schema:about" hreflang="en"&gt;Migration&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--soong"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/soong" property="schema:about" hreflang="en"&gt;Soong&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--php"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/php" property="schema:about" hreflang="en"&gt;PHP&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="clearfix text-formatted field field-node--field-twitter-comments field-formatter-text-default field-name-field-twitter-comments field-type-text-long field-label-above has-single"&gt;&lt;h3 class="field__label"&gt;Use the Twitter thread below to comment on this post:&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item"&gt;&lt;div data-oembed-url="https://twitter.com/VirtPerformance/status/1123688585637650432"&gt;
&lt;div style="max-width:480px;margin:auto;"&gt;&lt;!-- You're using demo endpoint of Iframely API commercially. Max-width is limited to 320px. Please get your own API key at https://iframely.com. --&gt;
&lt;blockquote align="center" class="twitter-tweet" data-dnt="true"&gt;
&lt;p dir="ltr" lang="und" xml:lang="und" xml:lang="und"&gt;&lt;a href="https://t.co/Lk9zVdhwQM"&gt;https://t.co/Lk9zVdhwQM&lt;/a&gt;&lt;/p&gt;
— Virtuoso Performance (@VirtPerformance) &lt;a href="https://twitter.com/VirtPerformance/status/1123688585637650432"&gt;May 1, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async="" charset="utf-8" src="https://platform.twitter.com/widgets.js"&gt;&lt;/script&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section rel="schema:comment" class="field field-node--comment field-formatter-comment-default field-name-comment field-type-comment field-label-above display-mode-threaded comment-bundle-comment comment-wrapper"&gt;&lt;a name="comments" id="comments"&gt;&lt;/a&gt;&lt;/section&gt;</description>
  <pubDate>Wed, 01 May 2019 20:40:18 +0000</pubDate>
    <dc:creator>mikeryan</dc:creator>
    <guid isPermaLink="false">153 at https://virtuoso-performance.com</guid>
    </item>
<item>
  <title>Update on Soong ETL</title>
  <link>https://virtuoso-performance.com/blog/mikeryan/update-soong-etl</link>
  <description>&lt;span property="schema:name" class="field field-name-title field-formatter-string field-type-string field-label-hidden"&gt;Update on Soong ETL&lt;/span&gt;
&lt;span rel="schema:author" class="field field-name-uid field-formatter-author field-type-entity-reference field-label-hidden"&gt;&lt;span lang="" about="https://virtuoso-performance.com/user/6" typeof="schema:Person" property="schema:name" datatype="" xml:lang=""&gt;mikeryan&lt;/span&gt;&lt;/span&gt;
&lt;span property="schema:dateCreated" content="2019-02-26T17:26:01+00:00" class="field field-name-created field-formatter-timestamp field-type-created field-label-hidden"&gt;Tuesday, February 26, 2019 - 11:26am&lt;/span&gt;
&lt;div property="schema:text" class="clearfix text-formatted field field-node--body field-formatter-text-default field-name-body field-type-text-with-summary field-label-hidden has-single"&gt;&lt;div class="field__items"&gt;&lt;div property="schema:text" class="field__item"&gt;&lt;p&gt;It's been over a month now since I made the &lt;a href="https://packagist.org/packages/soong/soong"&gt;&lt;u&gt;Soong ETL library&lt;/u&gt;&lt;/a&gt; &lt;a href="https://virtuoso-performance.com/blog/mikeryan/announcing-soong-project-developing-general-purpose-etl-framework"&gt;&lt;u&gt;publicly available&lt;/u&gt;&lt;/a&gt; - about time for some updates.&lt;/p&gt;

&lt;p&gt;One focus has been fleshing out areas that will aid in contribution. These include:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;
	&lt;p&gt;(Too) early I had split things out into a myriad of repositories, a case of premature optimization. I merged things back to a monorepo for now - as APIs are still fluid, it's much easier to make API changes in one repo and keep all the components in sync. Once the APIs are reasonably stable (at least, say, at "beta" level) at the very least specialized integrations like Csv and DBAL will move to their own repos. Although I initially imagined soong/soong including only the interfaces (and maybe some base classes), I'm now thinking it should also hold basic implementations of at least the Data, KeyMap, and Task interfaces.&lt;/p&gt;
	&lt;/li&gt;
	&lt;li&gt;
	&lt;p&gt;Adding tests for existing code (still in progress), in particular adding base classes corresponding to the component interfaces to ease testing that implementations of those interfaces behave consistently and in accordance with the contracts. No new functionality will be added without tests.&lt;/p&gt;
	&lt;/li&gt;
	&lt;li&gt;
	&lt;p&gt;Putting documentation up on &lt;a href="https://soong-etl.readthedocs.io/en/latest/"&gt;&lt;u&gt;Read the Docs&lt;/u&gt;&lt;/a&gt; - in particular, fleshing out the code documentation and generating it with &lt;a href="https://soong-etl.readthedocs.io/en/latest/api/html/"&gt;&lt;u&gt;Doxygen&lt;/u&gt;&lt;/a&gt;, and providing more information on &lt;a href="https://soong-etl.readthedocs.io/en/latest/CONTRIBUTING/"&gt;&lt;u&gt;contributing&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;
	&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Priorities now are:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;
	&lt;p&gt;&lt;a href="https://gitlab.com/soongetl/soong/issues/22"&gt;&lt;u&gt;Seeking more participation from other developers&lt;/u&gt;&lt;/a&gt; (hi out there!). And, by the way, I'll be at &lt;a href="https://2019.midwestphp.org/"&gt;&lt;u&gt;Midwest PHP&lt;/u&gt;&lt;/a&gt; next week, my first non-Drupal PHP conference - any Midwesterners interested in data migration, hit me up and we'll talk!&lt;/p&gt;
	&lt;/li&gt;
	&lt;li&gt;
	&lt;p&gt;&lt;a href="https://gitlab.com/soongetl/soong/issues/20"&gt;&lt;u&gt;Looking at other ETL systems&lt;/u&gt;&lt;/a&gt; for ideas - we have &lt;a href="https://gitlab.com/soongetl/architecture/issues/15"&gt;&lt;u&gt;looked at&lt;/u&gt;&lt;/a&gt; &lt;a href="https://github.com/portphp/portphp"&gt;&lt;u&gt;PortPHP&lt;/u&gt;&lt;/a&gt; so far, which has some ideas we might borrow (and perhaps we can even integrate its readers).&lt;/p&gt;
	&lt;/li&gt;
	&lt;li&gt;
	&lt;p&gt;Addressing any proposed changes that would &lt;a href="https://gitlab.com/soongetl/soong/issues?label_name%5B%5D=API+break"&gt;&lt;u&gt;break the existing API&lt;/u&gt;&lt;/a&gt;. Note that in the current 0.x.x stream, minor versions (e.g., 0.4.0) are API breakers, so be sure to pin any applications using Soong to the minor version ("~0.4.0" constraint in composer.json).&lt;/p&gt;
	&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Coalescing those three priorities - if you have any interest in data migration in PHP, please &lt;a href="https://gitlab.com/soongetl/soong/issues"&gt;&lt;u&gt;stop by&lt;/u&gt;&lt;/a&gt; and offer your thoughts on the architecture!&lt;/p&gt;

&lt;p&gt;Oh, by the way - I'm currently between projects, so if you need a data migration process implemented please &lt;a href="https://virtuoso-performance.com/contact"&gt;&lt;u&gt;contact me&lt;/u&gt;&lt;/a&gt;. I will (for now) take a reduced rate for a project using Soong, as there's nothing like a real-world application to take a general-purpose library to the next level.&lt;/p&gt;

&lt;p&gt;Thanks!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="field field-node-field-tags field-entity-reference-type-taxonomy-term field-formatter-entity-reference-label field-name-field-tags field-type-entity-reference field-label-above"&gt;&lt;h3 class="field__label"&gt;Tags&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item field__item--php"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/php" property="schema:about" hreflang="en"&gt;PHP&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--migration"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/migration" property="schema:about" hreflang="en"&gt;Migration&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--soong"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/soong" property="schema:about" hreflang="en"&gt;Soong&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="clearfix text-formatted field field-node--field-twitter-comments field-formatter-text-default field-name-field-twitter-comments field-type-text-long field-label-above has-single"&gt;&lt;h3 class="field__label"&gt;Use the Twitter thread below to comment on this post:&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item"&gt;&lt;div data-oembed-url="https://twitter.com/VirtPerformance/status/1087805171621679104"&gt;
&lt;div style="max-width:480px;margin:auto;"&gt;&lt;!-- You're using demo endpoint of Iframely API commercially. Max-width is limited to 320px. Please get your own API key at https://iframely.com. --&gt;
&lt;blockquote align="center" class="twitter-tweet" data-dnt="true"&gt;
&lt;p dir="ltr" lang="und" xml:lang="und" xml:lang="und"&gt;&lt;a href="https://t.co/zNhHXmbO8P"&gt;https://t.co/zNhHXmbO8P&lt;/a&gt;&lt;/p&gt;
— Virtuoso Performance (@VirtPerformance) &lt;a href="https://twitter.com/VirtPerformance/status/1100448667067183107"&gt;February 26, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async="" charset="utf-8" src="https://platform.twitter.com/widgets.js"&gt;&lt;/script&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section rel="schema:comment" class="field field-node--comment field-formatter-comment-default field-name-comment field-type-comment field-label-above display-mode-threaded comment-bundle-comment comment-wrapper"&gt;&lt;a name="comments" id="comments"&gt;&lt;/a&gt;&lt;/section&gt;</description>
  <pubDate>Tue, 26 Feb 2019 17:26:01 +0000</pubDate>
    <dc:creator>mikeryan</dc:creator>
    <guid isPermaLink="false">151 at https://virtuoso-performance.com</guid>
    </item>
<item>
  <title>Announcing the Soong project - developing a general-purpose ETL framework</title>
  <link>https://virtuoso-performance.com/blog/mikeryan/announcing-soong-project-developing-general-purpose-etl-framework</link>
  <description>&lt;span property="schema:name" class="field field-name-title field-formatter-string field-type-string field-label-hidden"&gt;Announcing the Soong project - developing a general-purpose ETL framework&lt;/span&gt;
&lt;span rel="schema:author" class="field field-name-uid field-formatter-author field-type-entity-reference field-label-hidden"&gt;&lt;span lang="" about="https://virtuoso-performance.com/user/6" typeof="schema:Person" property="schema:name" datatype="" xml:lang=""&gt;mikeryan&lt;/span&gt;&lt;/span&gt;
&lt;span property="schema:dateCreated" content="2019-01-22T20:10:33+00:00" class="field field-name-created field-formatter-timestamp field-type-created field-label-hidden"&gt;Tuesday, January 22, 2019 - 02:10pm&lt;/span&gt;
&lt;div property="schema:text" class="clearfix text-formatted field field-node--body field-formatter-text-default field-name-body field-type-text-with-summary field-label-hidden has-single"&gt;&lt;div class="field__items"&gt;&lt;div property="schema:text" class="field__item"&gt;&lt;p&gt;I'd like to invite members of the open-source community, particularly (but not exclusively) those involved with PHP, to join in designing and developing a general-purpose &lt;a href="https://en.wikipedia.org/wiki/Extract,_transform,_load"&gt;ETL &lt;/a&gt;framework for data migration. The vendor name for packaging components of this project is &lt;a href="https://packagist.org/?tags=soong"&gt;soong&lt;/a&gt;, and git repos for existing components are under the &lt;a href="https://gitlab.com/soongetl"&gt;GitLab account "soongetl"&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Note: Finally having finished composition of this lengthy monologue, it's clear to me that it's very ambitious (some might say arrogant) of me to write with the expectation that this will grow into a large and robust open-source ecosystem. Very well - it is ambitious, and the effort may very well fall flat on its face. C'est la vie...&lt;/p&gt;

&lt;h2&gt;Who am I?&lt;/h2&gt;

&lt;p&gt;I'm &lt;a href="https://www.drupal.org/u/mikeryan"&gt;Mike Ryan&lt;/a&gt; - a lot of people in the Drupal community know me, but not so much the wider open-source community. &lt;a href="https://virtuoso-performance.com/blog/mikeryan/boston-drupal-meetup"&gt;Almost eleven years ago&lt;/a&gt; at a Drupal meetup in Boston, amongst general agreement that everyone hates to do data migration, &lt;a href="https://www.drupal.org/u/moshe-weitzman"&gt;Moshe Weitzman&lt;/a&gt; looked across the table at me and said "there's an opportunity here." Since then data migration into Drupal has been the primary focus of my professional life, first in partnership with Moshe, then as an &lt;a href="https://www.acquia.com/"&gt;Acquia&lt;/a&gt; employee and finally as a &lt;a href="https://virtuoso-performance.com"&gt;solo consultant&lt;/a&gt;. Over the years I've created several migration-related contrib modules for Drupal, was part of the team integrating migration into Drupal core for D8, and have been involved in dozens of real-world migration projects.&lt;/p&gt;

&lt;h2&gt;Why am I doing this?&lt;/h2&gt;

&lt;h3&gt;I think we can do better&lt;/h3&gt;

&lt;p&gt;Just within Drupal, the migration framework can be improved:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Each step of Drupal migration support has been a port of the previous - from the hook-based Drupal 6 version, to the inheritence-and-composition model in Drupal 7, to the plugin-based system in Drupal 8, technical debt has accumulated. I've wanted for a while to start over with a clean slate - given my experience (and others), what would we do differently starting from scratch? Can we step back and re-examine the assumptions we've been carrying forward?&lt;/li&gt;
	&lt;li&gt;At a specific technical level, the biggest itch I've wanted to scratch is decoupling the components. Within the migration system as it is in D8 today, pretty much every component knows everything about every other component. At one point we had a destination plugin which was using some of the migration's source plugin configuration - that one made my eye twitch!&lt;/li&gt;
	&lt;li&gt;There's also the coupling of the migration system with Drupal - in particular, migration classes *are* Drupal plugins (i.e., their interfaces extend PluginInspectionInterface) rather than being *managed by* Drupal plugins. I would like to see migration classes be all about migration, rather than worry about being plugins as well. And once the basic migration classes are no longer Drupal plugins, then it's a small step to them being entirely independent of Drupal...&lt;/li&gt;
&lt;/ol&gt;&lt;h3&gt;The larger PHP community&lt;/h3&gt;

&lt;p&gt;With Drupal 8, we’ve often talked about “getting off the island” in terms of benefiting from much fine PHP work done outside of the Drupal community. We haven’t talked so much about going in the opposite direction - making our own fine work available for use beyond Drupal. To my knowledge, the only published example of this so far is Kris Vanderwater (&lt;a href="https://www.drupal.org/u/eclipsegc"&gt;EclipseGC&lt;/a&gt;) with the &lt;a href="https://github.com/EclipseGc/Plugins"&gt;plugin library&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Likewise, we Drupal developers don’t have a monopoly on good migration ideas - by moving the general-purpose aspects of migration into a separate open source project, we have the opportunity to benefit from new ideas and new talent.&lt;/p&gt;

&lt;h3&gt;The community we build&lt;/h3&gt;

&lt;p&gt;The major key to success for any large open-source project like this is a thriving community. After seeing open-source projects like Drupal grow organically - and face growing pains as they find themselves dealing with community problems reactively rather than proactively - if a community does form around this project, I would like to establish a supportive and welcoming tone from the beginning.&lt;/p&gt;

&lt;p&gt;Diversity in particular remains an issue in the tech industry in general, and open-source especially - and a lack of diversity is difficult to correct after the fact. In building a community around this framework, my hope is that we draw a diverse set of developers in the beginning, in the hopes that seeding the garden well will be, if not self-sustaining, at least more sustainable. How to do that, I'm not certain - a concerted outreach effort could easily end up looking like Pokemon Go, searching for unique creatures to collect. Apart from starting with a good &lt;a href="https://www.contributor-covenant.org/version/1/4/code-of-conduct"&gt;Code of Conduct&lt;/a&gt;, I'm open to suggestions!&lt;/p&gt;

&lt;p&gt;Another aspect of community-building is providing opportunities for relative novices (whether new to open-source development, new to PHP, or new to migration). The proposed architecture involves myriad small, well-focused packages - an extractor here, a set of related transformers there, integrations for specific frameworks and APIs... Individual transformers, in particular, will generally be very simple. This ecosystem thus will provide ample opportunities for novices to gain experience with mentorship and also establish an online presence.&lt;/p&gt;

&lt;p&gt;Now, all that being said, what about &lt;a href="https://www.ashedryden.com/blog/the-ethics-of-unpaid-labor-and-the-oss-community"&gt;The Ethics of Unpaid Labor and the OSS Community&lt;/a&gt; (also see the &lt;a href="https://twitter.com/drnikki/status/1084831226081402880"&gt;recent Twitter discussion&lt;/a&gt; in the Drupal community)? In reaching out to underrepresented groups and to novices, we are reaching out to the people who have the least ability to work on open source for free. One way to ameliorate this effect may be to explicitly try to draw in students - whether in formal programs or teaching themselves software development - who will benefit from some free practical education and mentorship. Down the road, if this framework does start being adopted in real-world applications, we can look at ways to get sponsorships for people who maintain projects within the ecosystem. At any rate, as the community here grows I expect &lt;a href="https://gitlab.com/soongetl/architecture/issues/9"&gt;this will be an ongoing conversation&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Selfishness&lt;/h3&gt;

&lt;p&gt;Yes, I'm willing to cop to selfish reasons to pursue this.&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Simple ego: I'm proud of the work I've done on migration in Drupal, and think it can be useful on a larger stage. Being old enough to see retirement on the horizon, I admit I'm thinking of this as my magnum opus - the last major contribution I make to open source. I would love to leave behind a significant piece of quality software with a vital community behind it.&lt;/li&gt;
	&lt;li&gt;Money: I've done fine as a Drupal data migration specialist. I hope to do better by expanding my market beyond Drupal, working on a wider variety of migration projects. Yes, retirement is on the horizon but, given earlier attempts at consulting which went less well than my "migration period" has, my funds put that horizon farther out than I'd like...&lt;/li&gt;
&lt;/ol&gt;&lt;h2&gt;What's done so far?&lt;/h2&gt;

&lt;p&gt;Early last year I started playing around with a proof-of-concept in a single repo, getting a single basic ETL migration scenario running with a decoupled class structure based on the basic architecture of the Drupal migration system. Much of the work after getting the initial POC running was figuring out appropriate boundaries between components, and gradually introducing features beyond the most basic ones I started with. And then breaking pieces out into separate source repos, and figuring out those boundaries.&lt;/p&gt;

&lt;h2&gt;My role&lt;/h2&gt;

&lt;p&gt;This will certainly change according to the number and skills of contributors who join into this effort (assuming there are some!), but what I'm aiming for in terms of my own role:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Primary architect of version 1 of Soong. This would mean being the primary maintainer of architecture documentation and the &lt;a href="https://gitlab.com/soongetl/soong"&gt;repository of central interfaces/base classes&lt;/a&gt;. Per "selfishness" above - I have an architectural vision I want to see brought to fruition. Others may take it in different directions after that, but V1 is mine! tl;dr - I don't want to be &lt;a href="https://en.wikipedia.org/wiki/Benevolent_dictator_for_life"&gt;BDFL&lt;/a&gt;; I do want to be BDF1.&lt;/li&gt;
	&lt;li&gt;Community leader. Per "community" above, I have a vision for building a diverse and vibrant open-source community from the ground up. Unlike the technical architecture, however, this plays less to my strengths, so I will be happy to defer as better-suited people show leadership in the community.&lt;/li&gt;
	&lt;li&gt;Mentorship. I'd like to help people up their development skills, their open-source involvement, and their understanding of the pits and perils of data migration.&lt;/li&gt;
&lt;/ol&gt;&lt;h2&gt;Why did it take me so long?&lt;/h2&gt;

&lt;p&gt;After having it in the back of my head for a few years, I finally started creating repos and putting my thoughts into actual interfaces and classes several months ago. Why did I wait until now to share my work with the larger community? I certainly felt seen when I read this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/jessfraz/status/1063425181509652481"&gt;https://twitter.com/jessfraz/status/1063425181509652481&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Frankly, there's an element of imposter syndrome here - I wanted to be sure I wasn't exposing any dumb ideas! Well, enough of that - instead, I now stipulate that you will find dumb things I did here, and ask that you help smartify them.&lt;/p&gt;

&lt;h2&gt;The architecture itself&lt;/h2&gt;

&lt;p&gt;There's a ways to go &lt;a href="https://gitlab.com/soongetl/architecture/blob/master/architecture/index.md"&gt;documenting the architecture&lt;/a&gt; as it is currently implemented in &lt;a href="https://gitlab.com/soongetl/soong"&gt;soong/soong&lt;/a&gt;, but right now it broadly looks like this:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;A Task accepts configuration defining a migration process, and implements operations - most notably &lt;strong&gt;migrate&lt;/strong&gt;, but it may also support other operations like &lt;strong&gt;rollback&lt;/strong&gt;, &lt;strong&gt;status&lt;/strong&gt;, &lt;strong&gt;analyze&lt;/strong&gt;, … The following steps describe the &lt;strong&gt;migrate&lt;/strong&gt; operation.&lt;/li&gt;
	&lt;li&gt;The task constructs the configured Extractor, which obtains data from a source such as a SQL query, a CSV file, an XML/JSON API, etc.&lt;/li&gt;
	&lt;li&gt;Iterating over the extractor returns one DataRecord (collection of named DataProperty instances) at a time containing source data. The task creates an empty DataRecord representing the destination data.&lt;/li&gt;
	&lt;li&gt;The task configuration defines a transform pipeline keyed by destination property names. For each of these properties, a sequence of one or more Transformer classes with corresponding configuration is invoked to determine the destination property value - usually, the first one will be configured to accept one or more source property names, and the results will be fed to subsequent transformers, with the final result assigned to the named property in the destination DataRecord.&lt;/li&gt;
	&lt;li&gt;The destination DataRecord is passed to the configured Loader to be loaded into the destination store - a SQL database, a CSV file, etc.&lt;/li&gt;
	&lt;li&gt;If an optional KeyMap is configured within the task, it is used to store the mapping from the source record's unique key to the destination record's unique key. This enables keyed relationships to be maintained even if keys change when migrating, as well as enabling rollback.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;To try out a couple of working demos, &lt;code&gt;git clone git@gitlab.com:soongetl/soong.git&lt;/code&gt; and follow the README.&lt;/p&gt;

&lt;h2&gt;Initial technical priorities&lt;/h2&gt;

&lt;ol&gt;&lt;li&gt;One of those infamous hard problems in computer science is naming things. Before we go too far, &lt;a href="https://gitlab.com/soongetl/architecture/issues?label_name%5B%5D=Naming"&gt;let's figure out how best to name things&lt;/a&gt; - I think Extractor/Transformer/Loader are pretty solid, but let's discuss whether other components (like &lt;a href="https://gitlab.com/soongetl/architecture/issues/1"&gt;Task&lt;/a&gt;) could use better names. Also, let's decide what naming conventions for implementations should look like - e.g., should CSV extractor and loader classes both be named CSV (or for that matter, Csv) with namespaces alone distinguishing them, or should they be CSVExtractor and CSVLoader?&lt;/li&gt;
	&lt;li&gt;The initial architecture, as I've said before, comes from my narrow experience in Drupal. I'm sure there are plenty of other good migration ideas out there - maybe there's even a package I've missed that's good enough that this effort would better be directed towards improving it rather than starting from scratch. I did do some research last year and did not find any PHP ETL packages that appeared to have wide adoption or as much flexibility, but with more eyes on it (eyes that have seen more beyond Drupal than I have) &lt;a href="https://gitlab.com/soongetl/architecture/issues/7"&gt;let's see if we can do a thorough review of prior art and see if there are some good ideas which may influence this effort&lt;/a&gt;. And let's look beyond PHP as well - are there ETL frameworks written in other object-oriented languages which may provide some architectural inspiration?&lt;/li&gt;
	&lt;li&gt;Review the &lt;a href="https://gitlab.com/soongetl/skeleton"&gt;boilerplate for Soong code repos&lt;/a&gt; (based on &lt;a href="https://github.com/thephpleague/skeleton"&gt;https://github.com/thephpleague/skeleton&lt;/a&gt;) - let's go over what we've got there (especially the code of conduct and contributing guidelines).&lt;/li&gt;
	&lt;li&gt;Test all the things! Before adding new stuff, we need to add tests for the existing components, and set up automated testing on Gitlab.&lt;/li&gt;
&lt;/ol&gt;&lt;h2&gt;Technical goals&lt;/h2&gt;

&lt;ol&gt;&lt;li&gt;For V1, require PHP 7.1 and leverage strict type checking. I expect future versions to require PHP 7.4 and leverage typed object properties.&lt;/li&gt;
	&lt;li&gt;The central interface package &lt;a href="https://gitlab.com/soongetl/soong"&gt;soong/soong&lt;/a&gt; ideally should not depend on anything other than &lt;a href="https://www.php-fig.org/psr/"&gt;PSR interfaces&lt;/a&gt;. It should be approached as if it were a PSR itself - a completely general interface for ETL functionality not dependent on any non-standard interfaces.&lt;/li&gt;
&lt;/ol&gt;&lt;h2&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Again, I know I am getting way ahead of myself here by imagining an active open-source community will quickly spring up here. I have talked to Drupal people about my ideas on occasion, and I expect there will be some interest there, but I very much hope other open-source developers can join this effort and provide different perspectives. I do believe strongly that a standard ETL library with a core of simple standard interfaces (making a simple move-my-stuff-from-here-to-there application a breeze) plus the flexibility to build complex systems to handle many types of data will be extremely valuable across many domains.&lt;/p&gt;

&lt;p&gt;If I may try your patience a bit longer - I've spent a substantial portion of my time since my last contract pulling these thoughts together, and I am now in need of paid work (&lt;a href="https://virtuoso-performance.com/contact"&gt;contact me&lt;/a&gt; if you need some data migration done!). I may fantasize about being sponsored to work fulltime on Soong, or be hopeful there's someone with a project that they think will benefit from Soong and thus I can make progress here in the course of solving their migration problem. Realistically, my next contract (or employment) most likely will not involve Soong development, so once I'm working I won't have as much time to manage this project - let's hope plenty of people join in to pick up my slack!&lt;/p&gt;

&lt;p&gt;If you've made it this far, thank you for your time and I look forward to your merge requests!&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="field field-node-field-tags field-entity-reference-type-taxonomy-term field-formatter-entity-reference-label field-name-field-tags field-type-entity-reference field-label-above"&gt;&lt;h3 class="field__label"&gt;Tags&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item field__item--drupal"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/drupal" property="schema:about" hreflang="en"&gt;Drupal&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--planet-drupal"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/planet-drupal" property="schema:about" hreflang="en"&gt;Planet Drupal&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--php"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/php" property="schema:about" hreflang="en"&gt;PHP&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--migration"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/migration" property="schema:about" hreflang="en"&gt;Migration&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;div class="field__item field__item--soong"&gt;
        &lt;span class="field__item-wrapper"&gt;&lt;a href="https://virtuoso-performance.com/tags/soong" property="schema:about" hreflang="en"&gt;Soong&lt;/a&gt;&lt;/span&gt;
      &lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="clearfix text-formatted field field-node--field-twitter-comments field-formatter-text-default field-name-field-twitter-comments field-type-text-long field-label-above has-single"&gt;&lt;h3 class="field__label"&gt;Use the Twitter thread below to comment on this post:&lt;/h3&gt;&lt;div class="field__items"&gt;&lt;div class="field__item"&gt;&lt;div data-oembed-url="https://twitter.com/VirtPerformance/status/1087805171621679104"&gt;
&lt;div style="max-width:480px;margin:auto;"&gt;&lt;!-- You're using demo endpoint of Iframely API commercially. Max-width is limited to 320px. Please get your own API key at https://iframely.com. --&gt;
&lt;blockquote align="center" class="twitter-tweet" data-dnt="true"&gt;
&lt;p dir="ltr" lang="und" xml:lang="und" xml:lang="und"&gt;&lt;a href="https://t.co/0uVJ13t8md"&gt;https://t.co/0uVJ13t8md&lt;/a&gt;&lt;/p&gt;
— Virtuoso Performance (@VirtPerformance) &lt;a href="https://twitter.com/VirtPerformance/status/1087805171621679104?ref_src=twsrc%5Etfw"&gt;January 22, 2019&lt;/a&gt;&lt;/blockquote&gt;
&lt;script async="" charset="utf-8" src="https://platform.twitter.com/widgets.js"&gt;&lt;/script&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;section rel="schema:comment" class="field field-node--comment field-formatter-comment-default field-name-comment field-type-comment field-label-above display-mode-threaded comment-bundle-comment comment-wrapper"&gt;&lt;a name="comments" id="comments"&gt;&lt;/a&gt;&lt;/section&gt;</description>
  <pubDate>Tue, 22 Jan 2019 20:10:33 +0000</pubDate>
    <dc:creator>mikeryan</dc:creator>
    <guid isPermaLink="false">150 at https://virtuoso-performance.com</guid>
    </item>

  </channel>
</rss>
