I’m going to give Ryan McCue a big hug next time I see him.
The WordPress Importer is key to a tonne of different workflows, and is one of the most used plugins on the repo.
Unfortunately, the Importer is also a bit unloved. After getting immensely frustrated at the Importer, I figured it was probably time we throw some attention at it. I’ve been working on fixing this with a new and improved Importer!
On Github now, the new importer aims to fix a bunch of issues, including being a lot faster:
The key to these Importer improvements is rewriting the core processing, taking experience with the current Importer and building to fix those specific problems. This means fixing and improving a whole raft of problems:
- Way less memory usage: Testing shows memory usage to import a 41MB WXR file is down from 132MB to 19MB (less than half the actual file size!). This means no more splitting files just to get them to import!
- Faster parser: By using a streaming XML parser, we process data as we go, which is much more scalable than the current approach. Content can begin being imported as soon as the file is read, rather than waiting for pre-processing.
- Resumable parsing: By storing more in the database instead of variables, we can quit and resume imports on-the-go.
- Partial imports: Rethinking the deduplication approach allows better partial imports, such as when you’re updating a production site from staging.
- Better CLI: Treating the CLI as a first-class citizen means a better experience for those doing imports on a daily basis, and better code quality and reusability.
One of the best things about it is the media upgrades, which thanks to the changes, “means we can also handle downloads using a different tool, or even use the existing local files if we already have a copy of the uploads directory (!!!).” Like I said: a big ‘ole hug.
There’s a lot more technical background and information on the post, so check it out.