Automatically downloading/backing up/dumping/exporting databases from remote hosts via the web

The problem

You operate a database-backed website (e.g. Drupal) where you can’t access cron jobs, cgi, perl and outgoing connections. So one idea to back up your database on a regular basis (which is always a good idea) is to download SQL dumps via a web-based administration tool (such as the backup and migrate plugin for drupal). Unfortunately, these kinds of downloads cannot simply be automated on the shell by using curl or wget, because they require a bit of javascript, for example to outsmart the php timeout.

The solution

Use a headless browser (that is, a browser without graphical user interface) to automate the task. It fetches the desired page, logs in, (virtually) clicks the download button and downloads the dump file.

It should be a command line tool, in order to run it as cron job from a some server (e.g. a NAS).

Personally, I liked the idea of PhantomJS, but it was not available for my Synology DS213+ PowerPC and I didn’t like the idea of building it from source.

So my plan B was to write a small Java program (remoteDbDumper)  that uses the HtmlUnit framework (our headless browser).

How to use

  1. Install drupal plugin backup and migrate.
  2. Download and extract remoteDbDumper.
  3. Start it from the shell.
    remoteDbDumper -u <username> -p <password> -o <output dir> <url to backup and migrate>

    Note that output dir must be an existing directory

    1. Linux example:
      ./ -u user -p topsecret -o ~/backup
    2. Windows example
      remoteDbDumper.bat -u user -p topsecret -o "%HOMEPATH%\backup"
  4. Use the scheduling mechanism of your choice to call remoteDbDumper regularly, creating backups.

Example (Synology)

Just a short exemplary scenario on how to use remoteDbDumper on a Synology Diskstation (running DSM 4.2) to regularly back up a drupal database.

  1. (if Java is not installed) install Java:
    If available for your Diskstation, use the Java Manager package. Otherwise, you could use a third party Java package (that’s what I had to do).
  2. Download, extract and copy remoteDbDumper to the NAS (e.g. to \\diskstation\public\, which corresponds to /volume1/public/)
  3. SSH to the NAS and check if it works
    /volume1/public/remoteDbDumper-1.0/ -u user -p topsecret -o /volume1/someUser/
  4. (optional) Wrap the command line call in a shell script, e.g.
    BASEDIR=$(dirname $0)
    $BASEDIR/remoteDbDumper-1.0/ -u user -p topsecret -o $1
  5. Either use the web frontend  or the crontab to schedule the back up.
    1. Web frontend:
      Go to http://diskstation:5000, (or whatever combination of host name and port you’re using)
      login as admin,
      click on Control Panel | Task Scheduler.
      Then click on Create | User-defined Script.
      Enter a task name, choose a user (preferably not root), set up a schedule (e.g. every sunday at 8 p.m.).
      Finally enter the path to remoteDbDumpe or the script (4.) respectively. For the example above, the path would look like this:

      /volume1/public/ /volume1/public/
    2. If you insist to do it on foot, here’s what to enter in crontab:
      vi /etc/crontab
      #minute hour    mday    month   wday    who              command
      0       20      *       *       0       enterUserHere    /volume1/public/ /volume1/public/
    3. Set a maker in your calender for the next scheduled run, to check if it worked.

Future tasks

At the current state remoteDbDumper can only backup drupal databases. Fair enough.

However, with just a little more effort it would be possible to extend remoteDbDumper to support addition web-based database administration tools, such as  mysqldumper, phpMyBackupPro, phpMyAdmin or phpPhAdmin.

In order to do so, just fork the repo on github and implement the interface DbDump.

Using Unix command-line tools in the Win32 console

Every time when using (or having to use) the command line in Windows, it takes time until your eyes adjust to the darkness. There’s one thing, however, I’ll never get accustomed to: Working without the Unix/Linux/GNU (whatever you wanna call it) command-line tools. Fortunately, I don’t have to get accustomed to that: There’s plenty of solution for solving this problem out there. In this article I’m going to elaborate on these three:

  1. The “classic” solutions – Cygwin and virtual machine
  2. The lightweight alternative – UnxUtils
  3. A surprising alternative – Git

When refering to the Unix/Linux/GNU command-line tools, I’ll stick to to the term Unix tools, as the heading predicts.

  • The “classic” solutions

The most popular ways of “getting that Linux feeling on Windows” most likely are Cygwin (a linux-like console, that even provides an X-server) or using a virtual machine like VirtualBox or VMWare and a (small) Linux distribution such as Damn Small Linux.
Of course, not using Windows at all would be a decent solution as well 😉

  • The lightweight alternative

There are scenarios, where you might not want to or even can’t install Cygwin or a virtual machine. Maybe you’re just looking for a quick way to access these Unix tools from the windows console and have no use for an X-Sever. For this purpose, I use a collection of tools called UnxUtils. Actually, these tools have been there for a long time. The latest version is five and a half years old! Still, it’s downloaded several hundred times a day – impressive!

Now, the nice thing about these tools is, that they are ports to Windows. That is, they are native Windows applications that can run directly from the Windows command prompt (cmd.exe).
Even better, you don’t need to install anything. Just download, extract to some location on your hard drive (in fact, I even carry the utilities around one my flash drive) and you’re almost there. In order not to type the whole path to the UnxUtils binaries every time you intend to use one of them, this path should be added to the beginning of the PATH environment variable. You can either add it permanently (I did this on my Windows computer) or add it temporarily to a specific instance of cmd.exe. To make the UnxUtils portable, I put this small batch script in the UnxUtils folder on my flash drive:

@set PATH=%~dp0\usr\local\wbin;%PATH%

This script opens a console window where you can execute statements like this:

egrep -in "error|exception" c:\parser.log --context=3 > parserLogErrors

Finally, ending the awful task of analyzing logs with Windows “on-board equipment” 🙂

Note that after adding the path to UnxUtil’s binaries to the beginning of PATH, it’s not possible any more to use Windows-tools that have the same name as one of the UnxUtils, such as find and sort. So, if you prefer the windows-style search tool, you better check the contents of the usr\local\wbin in UnxUtils path first and delete the tools you don’t need.

Unfortunately, I ran into a disadvantage of UnxTools. A rather memory intensive operation like this:

find d:\ | xargs grep "someExpression"

yields an error “xargs cannot fork”. The description for this error at GNU says that the system has run out of process slots, “because the system is very busy and the system has reached its maximum process limit, or because you have a resource limit in place and you’ve reached it”. Not enough memory for cmd.exe? Any ideas?
Someone even filed a bug for that problem, but it obviously has never been fixed (as said, the latest version is more than five years old). So, no solution here, unfortunately 😦

  • A surprising alternative

More recently I stumbled upon an alternative to UnxTools – Git for Windows. Isn’t that a source code management system? It is, but on Windows, it ships with a console application. When installing Git, you can choose, to either use Git’s console application (Git Bash) or to integrate the Git console’s binaries to Windows’ PATH variable, just as described in the solution above.

Note that neither Git nor UnxUtils contain a native version of vi(m). One way to use it would be the Git Bash. The Git Bash feels very much like Cygwin – so not as lightweight. You can read more about Git Bash and its differences to Cygwin here: A Windows console that does not suck.

To use the Git’s Windows ports of the Unix command-line tools from your flash drive – like described for UnxTools above, without permanently changing the PATH variable – just copy Git’s bin directory and a batch file like the following to the drive:

@set PATH=%~dp0\bin;%PATH%

Executing this file will start a console window with the proper PATH set, so you can start finding and greping right away.

Fortunately, the resource error described above doesn’t occur when using the tools provided by Git.
Case closed! 😀