Everytime there is web revamp, we often need to do database migration. This process is usually frustating and time consuming. Most people don't know much about this part, they just want to see the old data are migrated properly.
In company's recent task, the old and new website are both built with Wordpress (I know this doesn't sound cool), with very different custom fields. The images are stored in S3.
For previous revamp project, I used custom command line interface ( CLI ), written in Node.js. It extracts data to XML format and use WP All Import plugin to import it.
The import process was slow, which can be understood as it is web process. It involved a lot of downloads, thumbnails generation, and image uploads.
Last time, the actual migration took roughly a week, including overtime. It was very tiring and cumbersome as it still involved manual process of uploading xml files to avoid timeout.
Direct data migration between database is definitely better as it is much faster. This requires us to know fully how data is being stored in the new system.
Things to Consider
There are 2 main things:
- App should be long running.
- The existence of library for reading and creating PHP serialized data.
It is unfortunate that Wordpress stores some data in PHP's serialized format, which is more cumbersome than JSON. I guess it is part of legacy decision, and Wordpress needs to maintain backward data compatibility
For the first point, CLI app is suitable. We can run the app and leave it to run without intervention.
Nowadays almost any programming language can be used to build CLI. In fact, PHP itself is suitable. Node.js is also a good option, it has rich community and there's code snippet for PHP (de)serialization. I also like the syntax of modern Js.
However I was looking for something different, which is Go(lang). I tried Go before, but I have never developed anything, and especially under deadline.
Doing something new in limited time can be thrilling. Especially if the tool I use has a lot of community support and modern enough.
How It Works
Generally the steps are:
CLI app builds model for each post types from YAML definition files.
App extracts the data and image urls from source DB based on model, and stores it in intermediate format. I use Sqlite database for this purpose.
App downloads, generate sub images/thumbnails and uploads image data. Golang's goroutine is good for this part. For image processing, I just use pure Go library, imaging
App modifies the target DB based on Sqlite.
The app itself is not that complex. However, checking how the data being stored takes time. Some ACF field types like Repeater and WP's attachment data are relatively complicated.
The app can be run in normal CLI, or with web UI for status report.
As it is just status report from server to browser, I use server-sent events.
I am quite impressed with Go.
Based on this task, the advantages are:
- Single binary
- Fast enough
Things I don't like:
- Error handling pattern
- Built in regex doesn't support backreference, which is needed to find enclosing shortcodes.
For this task, I don't really need generics, which is not supported yet in Go.