Converting a Wordpress site to a normal site – the masochist way
I don't like Wordpress anymore. There are too many options and the whole thing is too convoluted. I yearn for a simpler time. A happier time. A time when we designed websites by hand, by scrawling things like <p> and <table> in the dirt with a stick, not by using 'plugins' and 'themes'.
So recently I took some Wordpress posts and made them into a normal HTML website. This essentially consisted of copying the Wordpress posts into a new database and then displaying them using PHP.
As it turned out, the task was a lot more difficult than I first anticipated. I don't recommend it and I believe most people would be better off using a plugin like Simply Static or Really Static.
In this article, I'll discuss some of the issues I faced.
Moving posts from a Wordpress database to another database
The first step was easy. I had to copy the blog posts from the Wordpress database to the new database.
I used this query to move the most recent version of all the posts to new database I had just created:
INSERT INTO database2.Posts(
WordpressID,
Date_written,
Slug,
Title,
Excerpt,
Content
)
SELECT
ID,
post_date,
post_name,
post_title,
post_excerpt,
post_content
FROM
wordpress_database.wp_123_posts
WHERE
post_status = 'publish' AND post_content != ""
ORDER BY
post_date ASC
Featured images
You can get the featured images of Wordpress posts with this complicated query:
SELECT
p1.ID AS ID,
wm2.meta_value AS Image
FROM
wp_123_posts p1
LEFT JOIN wp_123_postmeta wm1 ON
(
wm1.post_id = p1.id AND wm1.meta_value IS NOT NULL AND wm1.meta_key = "_thumbnail_id"
)
LEFT JOIN wp_123_postmeta wm2 ON
(
wm1.meta_value = wm2.post_id AND wm2.meta_key = "_wp_attached_file" AND wm2.meta_value IS NOT NULL
)
ORDER BY
p1.post_date
DESC
You can use this query to move the featured image names to your new database:
UPDATE
new_database.Entries
INNER JOIN(
SELECT
p1.ID AS ID,
wm2.meta_value AS Image
FROM
wp_123_posts p1
LEFT JOIN wp_123_postmeta wm1 ON
(
wm1.post_id = p1.id AND wm1.meta_value IS NOT NULL AND wm1.meta_key = "_thumbnail_id"
)
LEFT JOIN wp_123_postmeta wm2 ON
(
wm1.meta_value = wm2.post_id AND wm2.meta_key = "_wp_attached_file" AND wm2.meta_value IS NOT NULL
)
ORDER BY
p1.post_date
DESC
) featured_images
ON
featured_images.ID = new_database.Entries.Wordpress_ID
SET
Featured_image = featured_images.Image
Comments
I don't give a shit about categories and tags so I didn't include these in the query. However, I do want the comments. This query gets the comments and moves them to a Comments table:
INSERT INTO database2.Comments (Entry_ID, Wordpress_ID, Timestamp, Name, IP, Comment, Email, Website, Wordpress_parent_comment_ID)
SELECT
Posts.ID,
wp_123_comments.comment_ID,
wp_123_comments.comment_date,
wp_123_comments.comment_author,
wp_123_comments.comment_author_ip,
wp_123_comments.comment_content,
wp_123_comments.comment_author_email,
wp_123_comments.comment_author_url,
wp_123_comments.comment_parent
FROM
wp_123_comments
JOIN wp_123_posts ON ID = comment_post_ID
JOIN database2.Posts ON database2.Posts.ID = wp_123_posts.ID
WHERE
comment_approved = '1' AND post_status = 'publish'
ORDER BY
comment_date
DESC
Every comment has a comment_parent in the Wordpress database. You can preserve this structure with a query like:
UPDATE
Comments
JOIN Comments C2 ON
Comments.Wordpress_parent_comment_ID = C2.Wordpress_ID
SET
Comments.Parent_comment_ID = C2.ID
Images
The links to the images still pointed to the Wordpress site. For example:
<a href="http://www.pauljonesblog.com/wp-content/uploads/2014/01/2.jpg"><img%20class="size-medium%20wp-image-471%20alignright"%20src="http://www.pauljonesblog.com/wp-content/uploads/2014/01/2-300x213.jpg" alt="2" width="300" height="213" /></a>
In theory this should be fine. The new site should be able to get the images from the Wordpress site without problem.
However, in practice, I found that my new site was unable to load images from the Wordpress site. I suspect this was an anti-hotlinking measure.
I couldn't seem to disable the hotlink protection, so I had to find another solution.
Solution #1: Copy the entire Wordpress uploads folder to your new site
I decided to try copying the entire Wordpress images folder to a new folder:
cp -r uploads ../../new_site/public_images
However, this solution was like using a sledgehammer to crack a nut because I only really needed a small number of the total number of images.
If you still want to go this route, you can use regex to update the URLs in the new database to make them point to the new directory:
UPDATE
Posts
SET Post
=
REPLACE
(
Post,
"http://www.pauljonesblog.com/wp-content/uploads/",
"https://www.new_site.com/public_images/"
)
And voila, the images were working.
By the way, Wordpress likes to create lots of small versions of every image. But while makes pages load faster, all those image files can get a bit annoying if you're no longer using Wordpress.
If all the images are in one folder, you can delete the small versions with this command:
rm *150x* *250x* *270x* *300x* *570x* *768x* *1024x* *1080x* *1170x* *x100* *x300* *x350* *x350* *x500* *x1024*
Solution #2: Transfer only the images you need
Then I had a brainwave. What if I didn't need to copy the entire 1-gigabyte uploads folder? What if instead I only got only the images I needed?
First I had to find out what images my new website actually needed.
So first I exported all the text of all the posts. To do this, I ran this query on my new database:
SELECT Text FROM Entries
Then I made a tool that extracts all the images from a Wordpress post. I call it the Wordpress images extractor. You just paste your posts in there and it gives you a list of images like this:
2014/11/poster.jpg
2015/01/update-box.png
2015/01/choose-updates-but-let-me-decide-whether-to-install-them.jpg
2015/01/failure-configuring-your-updates.jpg
[...]
Then I used regex in Sublime Text to turn them into cp commands:
Find: \n(.*)/(.*)/(.*)
Replace: \ncp $1/$2/$3 ~/public_html/new_site/img/originals/$1-$2_$3
Now I had a list of cp commands like this:
cp 2014/11/poster.jpg ~/public_html/new_site/img/originals/2014-11_poster.jpg
cp 2015/01/update-box.png ~/public_html/new_site/img/originals/2015-01_update-box.png
[...]
I put these commands into my server and hey presto, it copied the images to my new site.
Formatting the text
Wordpress often adds ugly and unnecessary HTML to posts. I built a tool to get rid of it. I call it the Wordpress to HTML converter. Among other things, it adds paragraph tags (something that Wordpress doesn't use).
Youtube links
Wordpress has a feature where you can easily embed YouTube videos just by putting a link in the post, like https://youtu.be/SJjk-RdzuuM. This isn't HTML however so a normal webpage would just display it literally.
You can fix this by running this regex on the text of your blog posts:
Find: https://youtu.be/([A-z0-9\?\=\-]+)
Replace: <div class=\'iframe-container\'><iframe src=\'https://youtube.com/embed/$1\'></iframe></div>
You can then use this CSS to make the width of the video responsive:
.iframe-container{
position: relative;
width: 100%;
padding-bottom: 56.25%;
height: 0;
}
.iframe-container iframe{
position: absolute;
top:0;
left: 0;
width: 100%;
height: 100%;
}
Captions
Captioning images in HTML is easy. The HTML standard is to caption images like this:
<figure>
<img src="http://kittens.com/kitten.jpg" alt="Small picture of a kitten" />
<figcaption>
Small picture of a kitten, graciously shared by <a href="http://placekitten.com">placekitten.com</a>
</figcaption>
</figure>
So then why does Wordpress have to be different? In Wordpress, images are captioned like this;
[caption id="attachment_13683" align="alignnone" width="300"]<img src="http://kittens.com/kitten.jpg" alt="" width="300" height="154" class="size-medium wp-image-13683" /> caption[/caption]
What a load of bollocks. Why doesn't Wordpress just stick to the HTML standard instead?
To change Wordpress captions to HTML captions, you can use this regex in Sublime Text:
Find: \[caption.*?].*?(<img.*?>)<\/a>(.*?)\[/caption]
Replace: <figure>$1\\r\\n<figcaption><p>$2<\/p></figcaption></figure>
Find: \[caption.*?](<img.*?>)(.*?)\[/caption]
Replace: <figure>$1\\r\\n<figcaption><p>$2<\/p></figcaption></figure>
Leave a comment