Automatically downloading/backing up/dumping/exporting databases from remote hosts via the web

The problem

You operate a database-backed website (e.g. Drupal) where you can’t access cron jobs, cgi, perl and outgoing connections. So one idea to back up your database on a regular basis (which is always a good idea) is to download SQL dumps via a web-based administration tool (such as the backup and migrate plugin for drupal). Unfortunately, these kinds of downloads cannot simply be automated on the shell by using curl or wget, because they require a bit of javascript, for example to outsmart the php timeout.

The solution

Use a headless browser (that is, a browser without graphical user interface) to automate the task. It fetches the desired page, logs in, (virtually) clicks the download button and downloads the dump file.

It should be a command line tool, in order to run it as cron job from a some server (e.g. a NAS).

Personally, I liked the idea of PhantomJS, but it was not available for my Synology DS213+ PowerPC and I didn’t like the idea of building it from source.

So my plan B was to write a small Java program (remoteDbDumper)  that uses the HtmlUnit framework (our headless browser).

How to use

  1. Install drupal plugin backup and migrate.
  2. Download and extract remoteDbDumper.
  3. Start it from the shell.
    remoteDbDumper -u <username> -p <password> -o <output dir> <url to backup and migrate>

    Note that output dir must be an existing directory

    1. Linux example:
      ./ -u user -p topsecret -o ~/backup
    2. Windows example
      remoteDbDumper.bat -u user -p topsecret -o "%HOMEPATH%\backup"
  4. Use the scheduling mechanism of your choice to call remoteDbDumper regularly, creating backups.

Example (Synology)

Just a short exemplary scenario on how to use remoteDbDumper on a Synology Diskstation (running DSM 4.2) to regularly back up a drupal database.

  1. (if Java is not installed) install Java:
    If available for your Diskstation, use the Java Manager package. Otherwise, you could use a third party Java package (that’s what I had to do).
  2. Download, extract and copy remoteDbDumper to the NAS (e.g. to \\diskstation\public\, which corresponds to /volume1/public/)
  3. SSH to the NAS and check if it works
    /volume1/public/remoteDbDumper-1.0/ -u user -p topsecret -o /volume1/someUser/
  4. (optional) Wrap the command line call in a shell script, e.g.
    BASEDIR=$(dirname $0)
    $BASEDIR/remoteDbDumper-1.0/ -u user -p topsecret -o $1
  5. Either use the web frontend  or the crontab to schedule the back up.
    1. Web frontend:
      Go to http://diskstation:5000, (or whatever combination of host name and port you’re using)
      login as admin,
      click on Control Panel | Task Scheduler.
      Then click on Create | User-defined Script.
      Enter a task name, choose a user (preferably not root), set up a schedule (e.g. every sunday at 8 p.m.).
      Finally enter the path to remoteDbDumpe or the script (4.) respectively. For the example above, the path would look like this:

      /volume1/public/ /volume1/public/
    2. If you insist to do it on foot, here’s what to enter in crontab:
      vi /etc/crontab
      #minute hour    mday    month   wday    who              command
      0       20      *       *       0       enterUserHere    /volume1/public/ /volume1/public/
    3. Set a maker in your calender for the next scheduled run, to check if it worked.

Future tasks

At the current state remoteDbDumper can only backup drupal databases. Fair enough.

However, with just a little more effort it would be possible to extend remoteDbDumper to support addition web-based database administration tools, such as  mysqldumper, phpMyBackupPro, phpMyAdmin or phpPhAdmin.

In order to do so, just fork the repo on github and implement the interface DbDump.