Paul Chris Jones's coding blog
PHP
JS

Converting a Wordpress site to a normal site – the masochist way

26th March 2020

I don't like Wordpress anymore. There are too many options and the whole thing is too convoluted. I yearn for a simpler time. A happier time. A time when we designed websites by hand, by scrawling things like <p> and <table> in the dirt with a stick, not by using 'plugins' and 'themes'.

So recently I took some Wordpress posts and made them into a normal HTML website. This essentially consisted of copying the Wordpress posts into a new database and then displaying them using PHP.

As it turned out, the task was a lot more difficult than I first anticipated. I don't recommend it and I believe most people would be better off using a plugin like Simply Static or Really Static.

In this article, I'll discuss some of the issues I faced.

Moving posts from a Wordpress database to another database

The first step was easy. I had to copy the blog posts from the Wordpress database to the new database.

I used this query to move the most recent version of all the posts to new database I had just created:

INSERT INTO database2.Posts(
    WordpressID,
    Date_written,
    Slug,
    Title,
    Excerpt,
    Content
)
SELECT
    ID,
    post_date,
    post_name,
    post_title, 
    post_excerpt,
    post_content
FROM
    wordpress_database.wp_123_posts
WHERE
    post_status = 'publish' AND post_content != ""
ORDER BY
    post_date ASC

Featured images

You can get the featured images of Wordpress posts with this complicated query:

SELECT
    p1.ID AS ID,
    wm2.meta_value AS Image
FROM
    wp_123_posts p1
LEFT JOIN wp_123_postmeta wm1 ON
    (
        wm1.post_id = p1.id AND wm1.meta_value IS NOT NULL AND wm1.meta_key = "_thumbnail_id"
    )
LEFT JOIN wp_123_postmeta wm2 ON
    (
        wm1.meta_value = wm2.post_id AND wm2.meta_key = "_wp_attached_file" AND wm2.meta_value IS NOT NULL
    )
ORDER BY
    p1.post_date
DESC

You can use this query to move the featured image names to your new database:

UPDATE
    new_database.Entries
INNER JOIN(
    SELECT
        p1.ID AS ID,
        wm2.meta_value AS Image
    FROM
        wp_123_posts p1
    LEFT JOIN wp_123_postmeta wm1 ON
        (
            wm1.post_id = p1.id AND wm1.meta_value IS NOT NULL AND wm1.meta_key = "_thumbnail_id"
        )
    LEFT JOIN wp_123_postmeta wm2 ON
        (
            wm1.meta_value = wm2.post_id AND wm2.meta_key = "_wp_attached_file" AND wm2.meta_value IS NOT NULL
        )
    ORDER BY
        p1.post_date
    DESC
) featured_images
ON
    featured_images.ID = new_database.Entries.Wordpress_ID
SET
    Featured_image = featured_images.Image

Comments

I don't give a shit about categories and tags so I didn't include these in the query. However, I do want the comments. This query gets the comments and moves them to a Comments table:

INSERT INTO database2.Comments (Entry_ID, Wordpress_ID, Timestamp, Name, IP, Comment, Email, Website, Wordpress_parent_comment_ID)
SELECT
    Posts.ID,
    wp_123_comments.comment_ID,
    wp_123_comments.comment_date,
    wp_123_comments.comment_author,
    wp_123_comments.comment_author_ip,
    wp_123_comments.comment_content,
    wp_123_comments.comment_author_email,
    wp_123_comments.comment_author_url,
    wp_123_comments.comment_parent
FROM
    wp_123_comments
JOIN wp_123_posts ON ID = comment_post_ID
JOIN database2.Posts ON database2.Posts.ID = wp_123_posts.ID
WHERE
    comment_approved = '1' AND post_status = 'publish' 
ORDER BY
    comment_date
DESC

Every comment has a comment_parent in the Wordpress database. You can preserve this structure with a query like:

UPDATE
    Comments
JOIN Comments C2 ON
    Comments.Wordpress_parent_comment_ID = C2.Wordpress_ID
SET
    Comments.Parent_comment_ID = C2.ID

Images

The links to the images still pointed to the Wordpress site. For example:

<a href="http://www.pauljonesblog.com/wp-content/uploads/2014/01/2.jpg"><img%20class="size-medium%20wp-image-471%20alignright"%20src="http://www.pauljonesblog.com/wp-content/uploads/2014/01/2-300x213.jpg" alt="2" width="300" height="213" /></a>

In theory this should be fine. The new site should be able to get the images from the Wordpress site without problem.

However, in practice, I found that my new site was unable to load images from the Wordpress site. I suspect this was an anti-hotlinking measure.

I couldn't seem to disable the hotlink protection, so I had to find another solution.

Solution #1: Copy the entire Wordpress uploads folder to your new site

I decided to try copying the entire Wordpress images folder to a new folder:

cp -r uploads ../../new_site/public_images

However, this solution was like using a sledgehammer to crack a nut because I only really needed a small number of the total number of images.

If you still want to go this route, you can use regex to update the URLs in the new database to make them point to the new directory:

UPDATE
    Posts
SET Post
    =
REPLACE
    (
        Post,
        "http://www.pauljonesblog.com/wp-content/uploads/",
        "https://www.new_site.com/public_images/"
    )

And voila, the images were working.

By the way, Wordpress likes to create lots of small versions of every image. But while makes pages load faster, all those image files can get a bit annoying if you're no longer using Wordpress.

If all the images are in one folder, you can delete the small versions with this command:

rm *150x* *250x* *270x* *300x* *570x* *768x* *1024x* *1080x* *1170x* *x100* *x300* *x350* *x350* *x500* *x1024*

Solution #2: Transfer only the images you need

Then I had a brainwave. What if I didn't need to copy the entire 1-gigabyte uploads folder? What if instead I only got only the images I needed?

First I had to find out what images my new website actually needed.

So first I exported all the text of all the posts. To do this, I ran this query on my new database:

SELECT Text FROM Entries

Then I made a tool that extracts all the images from a Wordpress post. I call it the Wordpress images extractor. You just paste your posts in there and it gives you a list of images like this:

2014/11/poster.jpg
2015/01/update-box.png
2015/01/choose-updates-but-let-me-decide-whether-to-install-them.jpg
2015/01/failure-configuring-your-updates.jpg
[...]

Then I used regex in Sublime Text to turn them into cp commands:

Find: \n(.*)/(.*)/(.*)
Replace: \ncp $1/$2/$3 ~/public_html/new_site/img/originals/$1-$2_$3

Now I had a list of cp commands like this:

cp 2014/11/poster.jpg ~/public_html/new_site/img/originals/2014-11_poster.jpg
cp 2015/01/update-box.png ~/public_html/new_site/img/originals/2015-01_update-box.png
[...]

I put these commands into my server and hey presto, it copied the images to my new site.

Formatting the text

Wordpress often adds ugly and unnecessary HTML to posts. I built a tool to get rid of it. I call it the Wordpress to HTML converter. Among other things, it adds paragraph tags (something that Wordpress doesn't use).

Youtube links

Wordpress has a feature where you can easily embed YouTube videos just by putting a link in the post, like https://youtu.be/SJjk-RdzuuM. This isn't HTML however so a normal webpage would just display it literally.

You can fix this by running this regex on the text of your blog posts:

Find: https://youtu.be/([A-z0-9\?\=\-]+)
Replace: <div class=\'iframe-container\'><iframe src=\'https://youtube.com/embed/$1\'></iframe></div>

You can then use this CSS to make the width of the video responsive:

.iframe-container{
  position: relative;
  width: 100%;
  padding-bottom: 56.25%; 
  height: 0;
}
.iframe-container iframe{
  position: absolute;
  top:0;
  left: 0;
  width: 100%;
  height: 100%;
}

Captions

Captioning images in HTML is easy. The HTML standard is to caption images like this:

<figure>
 <img src="http://kittens.com/kitten.jpg" alt="Small picture of a kitten" />
 <figcaption>
 Small picture of a kitten, graciously shared by <a href="http://placekitten.com">placekitten.com</a>
 </figcaption>
</figure>

So then why does Wordpress have to be different? In Wordpress, images are captioned like this;

[caption id="attachment_13683" align="alignnone" width="300"]<img src="http://kittens.com/kitten.jpg" alt="" width="300" height="154" class="size-medium wp-image-13683" /> caption[/caption]

What a load of bollocks. Why doesn't Wordpress just stick to the HTML standard instead?

To change Wordpress captions to HTML captions, you can use this regex in Sublime Text:

Find: \[caption.*?].*?(<img.*?>)<\/a>(.*?)\[/caption]
Replace: <figure>$1\\r\\n<figcaption><p>$2<\/p></figcaption></figure>

Find: \[caption.*?](<img.*?>)(.*?)\[/caption]
Replace: <figure>$1\\r\\n<figcaption><p>$2<\/p></figcaption></figure>

Leave a comment