Setting Up A Recovery Controller

This section only applies to SwiftStack Controller On-Premises installations and is not relevant for SwiftStack customers using SwiftStack Controller As-a-Service.

High-Level Overview

Setting up a recovery controller is analogous to setting up a periodic backup process, but you will also have separate hardware onto which a backup can be restored if the primary controller dies. You will take a second machine and install the SwiftStack Controller On-Premises software on it, just as you did for the primary controller. When you configure the recovery controller, you will use the same SwiftStack Controller On-Premises license as the primary controller, but you will click a checkbox that says this controller is a recovery (backup) controller. During the setup process, you will need to load a URL on the primary controller, when prompted. The recovery controller setup will automatically continue after the URL is loaded. After that, the primary controller will copy its time-series data to the recovery controller. The primary controller should be configured to save backups in a Swift cluster that both controllers can talk to. These backups will be scheduled automatically via a cron job, but you can also trigger them on demand. You will then be able to promote (restore on) the standby controller, if the primary controller fails.

How It Works

The main backup file contains everything necessary to allow a standby controller to take over for a failed primary except for time-series data.

Because time-series data is large and every file always changes, but only by a small amount, it is much more efficient to use rsync to keep the time-series data on the recovery controller in sync with the primary. This is done every 5 minutes by default.

During the restore process on the recovery controller, the most recent main backup file between local disk (automatically copied from the primary controller, periodically) and Swift is retrieved restored. Then the continuously-synchronized time-series data is copied from a holding bin into its real location.

The main backup file is always saved to local disk and automatically copied to the recovery controller. However, we strongly recommend you configure the primary controller to also save backups into a Swift cluster (a very safe place with a standard API for storing and retieving).

If you do not have a recovery controller, you can still restore from a backup, but you will lose all historical time-series data.

Installation

Set Up DNS CNAMES

Since the standby controller may become the primary controller at some point, you need to have a DNS CNAME which can point to either machine. For example, if the actual hostnames of your two SwiftStack Controller On-Premises machines are ssc1 and ssc2, then you might have a CNAME sscontroller which initially points to ssc1, but which can be changed to point to ssc2 when a failover occurs. (If you have a planned failover, you'll want to lower your DNS TTL in advance of the event.)

Note

This CNAME is the hostname value that must be in your SwiftStack Controller On-Premises license. You will use the same license file for the primary and recovery controllers. When you setup the primary controller, its local hostname will be changed to match the CNAME in the license, but on the recovery controller, the server's hostname will be left alone.

Set Up Primary Controller

The first step in creating a recovery installation is to install and set up the primary controller. The post-install setup process must be completed on the primary controller prior to setting up the recovery controller.

Certificates

Since cluster administrators will access the controller by the CNAME (e.g. sscontroller rather than ssc1 or ssc2), you will need the SSL certificate on the primary controller to have a CN (Common Name) that matches the CNAME value.

Controller post-install setup will create a self-signed certificate for you, but you should replace that certificate with one signed by a Certificate Authority trusted by your recovery controller and all your nodes. You can easily upload your certificate from the https://platform.swiftstack.com/admin/network/ page, SSL Certificate section.

After uploading your new certificate, the self-signed certificate will be replaced and the web server will be restarted. At this point, the new SSL certificate won't yet be included in any existing backups, so we recommend you to immediately create a fresh backup.

The recovery controller must trust the SSL certificate used by the Primary controller.

Set Up Recovery Controller

Install the SwiftStack Controller On-Premises software on the recovery controller. During post-install setup, use the same license as you did for the primary controller, but check the checkbox labeled "This is a standby controller." You can also enter the HTTPS port of the primary controller if it is not the default value of 443.

During the setup process, you will be given a URL to visit to establish trust between the two servers and to allow the recovery controller's setup process to continue. You may also visit the primary controller's https://platform.swiftstack.com/recovery/standbys/ page to accept the recovery controller's application.

../../../_images/accept-standby.png

Converting A Standalone Installation

If you already have an existing SwiftStack Controller On-Premises installation which you want to convert to a recovery installation (with a primary controller and at least one recovery standby controller), the process is a bit more complicated. You will want to convert your existing installation to use a DNS CNAME which can be flipped over to the standby when necessary. In order to make this possible, your certificates must be issued in the name of the CNAME rather than in the name of the primary controller's FQDN. (You may find it easier to rename the host machine, and turn the old name into the CNAME.)

Then, each of the nodes in your installation must be manually adjusted to communicate with the CNAME rather than with the primary controller's FQDN, so that they'll behave correctly if the recovery standby host is promoted to primary status. This is a complicated operation. Consult with SwiftStack Technical Support for assistance with this process.

Once these steps have been completed, you can perform the step above labeled "Set Up Recovery Controller".

Backup settings

On the recovery settings page on the primary controller (at https://platform.swiftstack.com/recovery/backups/), there are several settings that can be modified:

  • Changing Backup period hours will change the frequency of local and Swift backups. This can be any value between 1 and 24.
  • Changing Metrics sync period will change how often metrics and backup files will sync to the standby controller. By default an alert will be triggered after twice the period set here has elapsed since the previous successful Metrics Sync. If they regularly take longer than the default five minutes, you may wish to increase the sync period.
  • You should enable the setting Save backups to Swift and input the information for your Swift cluster. If a Swift backup is not available, the standby controller can still restore from a backup file copied over from the primary controller along with the time-series metric data. However, storing backups in Swift will have superior durability and accessibility.
../../../_images/backup-settings.png
  • Before you are allowed to save the settings, you must click the Verify Swift credentials button and recieve a positive response
../../../_images/backup-verify-swift.png

Run Your First Backups

By default, your standby host is now configured to have the statistical data synced to it every five minutes, and your primary is now configured to run configuration backups once every six hours. However, you may want to run the first backups manually for two reasons. First, it gives you the opportunity to confirm that everything is working and that an initial backup exists. Second, the first stats sync will take significantly longer than subsequent ones, since initially all the data needs to be copied. (Subsequent stats syncs will only copy new or changed data.)

Back Up Your Configs

To manually initiate a backup job, from the main recovery page, click on the Backup tab along the left. Then, click on the button labelled "Queue backup job". A success message should appear shortly thereafter. Run time depends primarily on the size of your database, e.g. number of nodes, number of controller users, utilization data, etc.

../../../_images/queue-backup.png

Back Up Your Cluster Stats

On the main recovery page, click the button labelled "Sync metrics data". A success message should appear shortly afterwards. Sync time depends primarily on the number of nodes and the duration that your cluster has been in existence. For a small, new cluster, this process takes about 10 seconds. For a larger, older cluster, it could be substantially longer on the initial run, though subsequent runs will be faster.

../../../_images/sync-metrics.png