Categories
Personal

MediaWiki database compression

Recently I’ve been taking daily backups of a 4.2GB mediawiki (v1.5.6) database, about 3.9GB of which was in the ‘text’ table. It was all the previous versions of pages that were taking up the space so having looked at http://meta.wikimedia.org/wiki/Help:Reduce_size_of_the_database I decided to give deleteOldRevisions.php a go.


To get the deleteOldRevisions.php script, you have to go to the CVS repository (direct link) and download the following files:
deleteOldRevisions.php
deleteOldRevisions.inc
purgeOldText.php
purgeOldText.inc
Save them into the maintenance directory within the your mediawiki-1.5.6 tree, and ensure that you have an AdminSettings.php script in the parent directory with the following lines specifying the database user/password:
<?php
$wgDBadminuser = “username”;
$wgDBadminpassword = “password”;
?>
Assuming you want to remove all previous revisions, simply change to the maintenance directory and run:
php deleteOldRevisions.php –delete
If you want some of the previous revisions say the last week of history, just in case something needs to be reverted later, you can edit the SQL in line 33 of deleteOldRevisions.inc from:
SELECT rev_id FROM $tbl_rev WHERE rev_id NOT IN ( $set )
to read:
SELECT rev_id FROM $tbl_rev WHERE rev_id NOT IN ( $set ) AND rev_timestamp < DATE_SUB(NOW(),INTERVAL 7 DAY) Obviously the interval can be changed to suit requirements but 7 days seems about right for a relatively busy wiki. You may possibly find that the purgeOldText function fails with a SQL error. I found this was caused by NULL values in the archive.ar_text_id column and the SQL contained two commas without a value between them. To get around this, I changed the SQL in line 31 of purgeOldText.inc from SELECT DISTINCTROW ar_text_id FROM $tbl_arc to read: SELECT DISTINCTROW ar_text_id FROM $tbl_arc WHERE ar_text_id IS NOT NULL Once the deleteOldRevisions script has finished, you should take a few minutes to 'OPTIMIZE TABLE text;' from the MySQL prompt. This will reduce the actual file size potentially by quite a large amount. In my case the database size went from 4.2GB to 1.9GB.