rostrvm may be installed in a resilient configuration such that the loss of one server due to hardware failure will not cause clients to suffer more than a few minutes interruption in service.

Although configuration is automatic, it is strongly recommended that the whole section be read and understood before undertaking any of the following tasks.

The database is fully mirrored in such a system and procedures are detailed below such that changes made when the Master server is down will be kept when the Master server is recovered.

Basically replication works as follows:

The slave server connects to a master server using a dedicated user account named replica8;
The master server puts all queries that change data in the database into a transaction log;
The slave server retrieves these queries from a position in that log and applies them to its copy of the database.

 

For more detail you may wish to read the MySQL Replication Documentation - https://dev.mysql.com/doc/refman/5.7/en/replication-gtids.html.  Specifically rostrvm uses GTID replication in order to simplify configuration.

Note that in a resilient system, the fully qualified domain name (FQDN) by which the client applications connect to rostrvm specifies two host names so that the client can find a backup system should the Master system fail.

In addition to resilience, the system is configured to make use of the dual hosts for load-sharing. This allows key components of a rostrvm system to be spread across the different hosts thus maximizing system resources.

The following sections describe:

how to install / upgrade to a resilient system;
how to deal with a slave host failure;
how to recover database mirroring following a failure;
how to deal with a master host failure.

 


Frequently Asked Questions

How long does a mysqldump take?

A number of factors come in to play so it’s not easy to determine exactly how long a database dump (snapshot) takes.  However, if a weekly backup exists in the rostrvm\backups directory it is possible to work out how long it took for that file to be created based on when the backup was scheduled to start and when the file was last modified/the backup completed.

How long does it take to copy a mysqldump snapshot from one server to another?

This is dependent on the IT infrastructure in place and the size of the snapshot.

How long does it take to restore a database snapshot?

Approximately 2 to 2.5 times as long as it takes to complete a database backup or dump; again this is by and large dictated by the IT infrastructure in place and the size of the snapshot being restored.  So if the initial backup took an hour (60 minutes) one could reasonably expect the import/restore to take 2.5 hours (150 minutes).


Installing/Upgrading to a resilient system

To achieve a rostrvm resilient architecture it is imperative that the server hardware operating system, configuration and install paths etc. are identical for both the master and slave servers.  Client applications should not be running during the installation/upgrade procedure.

DNS Configuration

To install a dual host resilient domain you first need to configure a number of A (or AAAA) DNS records for both servers using the same FQDN (i.e. two addresses for the domain name).

So for example, suppose you have a single rostrvm server named bart with IP address 192.168.120.10 and you are the owner of the domain rostrvm.com.  For a customer simpsons you would configure a DNS A record as follows:

    simpsons.rostrvm.com    192.168.120.10

 

The rostrvm domain string would then be simpsons.rostrvm.com.

Now suppose you now want to make this a resilient system by adding a server named lisa on IP address 192.168.120.11.  You would now configure the following DNS A records (with the first one defined as above):

    simpsons.rostrvm.com         192.168.120.10

    simpsons.rostrvm.com         192.168.120.11

    bart.simpsons.rostrvm.com    192.168.120.10

    lisa.simpsons.rostrvm.com    192.168.120.11

 

The rostrvm domain string remains simpsons.rostrvm.com and rostrvm and its clients automatically pick up the fact that there are now two hosts in the rostrvm system.  Note that bart.simpsons.rostrvm.com and lisa.simpsons.rostrvm.com are used by rostrvm so it can direct traffic to specific servers in the system (e.g. the one running the simulator) but these names should not be used by the clients.

If you wish to add a new customer to those servers, for example duff, then the following DNS A records could be added:

    duff.rostrvm.com         192.168.120.10

    duff.rostrvm.com         192.168.120.11

    bart.duff.rostrvm.com    192.168.120.10

    lisa.duff.rostrvm.com    192.168.120.11

 

This new rostrvm domain would then be identified as duff.rostrvm.com.  Note that the bart and lisa records must be duplicated for each sub-domain - it is not sufficient just to have a record for bart.rostrvm.com or lisa.rostrvm.com.

Server Installation

The following operations then need to be performed:

Install/upgrade the master rostrvm system following the instructions from the rostrvm installer, leaving the master field name blank;
Install the slave rostrvm system following the instructions from the rostrvm installer, identifying the master server during the installation using just the host name.  You MUST use the same root database password on the slave as the master as it is this that allows the two servers to communicate.

 

If a slave rostrvm system is installed within 14 days of the master installation then the mirroring will start automatically.  The system will also automatically load share any suitable server components between the master and slave.

If the slave is installed more than 14 days after the master server you will need to synchronise the databases using the procedure described in the synchronise database mirroring section below.  The replication logs are purged every 14 days hence the time restriction – this could be extended by editing the my.ini file in \rostrvm\mysql8 and restarting MySQL.  Beware that extending the time will increase the disk space used by the system.


Dealing with a Slave Host Failure

In the event of a Slave server failure (assuming load sharing configuration), rostrvm will continue to operate on the Master host and clients will automatically switch their connections to the Master host with minimal disruption.

When the Slave server is recovered it may be necessary to re-run the procedure described in the synchronise database mirroring section below if the slave has been offline for too long or if database corruption caused by the failure is too bad.


Synchronise Slave Database Mirroring

The system is able to automatically re-synchronise a slave database if it is disconnected from the master server for no more than 14 days.  The instructions herein however, detail the recovery of MySQL replication (aka rostrvm mirroring) between a Master rostrvm Server and a backup Slave Server when replication is not working.  They focus primarily on the scenario whereby replication between the Master and Slave Servers has become "out-of-sync" i.e. the databases on the respective servers are no longer identical, perhaps after a long unplanned server shutdown.

It is strongly recommended that the whole section be read and understood before undertaking any of the following tasks.

In order to perform the steps described in this section you must be familiar with the command line and the MySQL tools.  These tools are found in the \rostrvm\mysql8 directory to which you must navigate before using the tools.

You also need to use the root user of the database with the password that was supplied in the rostrvm installation to gain access to the database.

Where the following sections provide SQL for entry, you must initially run the following command at the Windows command line to allow SQL entry:

    mysql -u root -ppassword --port=3308

 

When prompted enter the root password.

Note: the --port option is a change from previous versions of rostrvm and ensures you connect to the correct database.

Status of MySQL replication

rostrvm will raise an alarm if replication is not working; however, it is recommended that you check that replication really has stopped before performing any reparative steps.

On the slave server use the following SQL command to check replication:

    show slave status\G

 

If replication is working you should see the word ‘Yes’ against both the following fields in the command output:

    Slave_IO_Running   Yes

    Slave_SQL_Running  Yes

 

If Slave_IO_Running is No, there is a communication error between the slave and the master and it should be possible to restart replication using the following command:

  start slave;

If Slave_SQL_Running returns a value of No, or an error message is displayed, then replication has failed.  In some circumstances, depending on the error message, it may be possible to restore synchronisation, otherwise follow the recovery steps detailed below.

Re-synchronisation Process

WARNING: The following steps may take a long time to run depending upon the size of the database in question.

Stop rostrvm and replication on the slave

Stop rostrvm.
Use Task Manager to check that there are no rostrvm processes left in memory.
Disable the rostrvm Windows service.
Stop the MySQL8 service.
Start the MySQL8 service.
Issue the following command to stop the replication slave:
 
   stop slave;

 

Get a dump of the current database from the master

Stop rostrvm.
Use Task Manager to check that there are no rostrvm processes left in memory.
Disable the rostrvm Windows service.
Stop the MySQL8 service.
Start the MySQL8 service.
Run the following command:
 
   mysqldump -u root -ppassword --port=3308 --master-data=2 --all-databases –-set-gtid-purged=AUTO > dbdump.db
 
password is the MySQL root password that was used when setting up rostrvm on the Master server.

 

Note: A file called dbdump.db will be created in \rostrvm\mysql8.  This file is required for loading a database snapshot into the Slave Server (see next steps).

Slave Server replication recovery

Copy the dbdump.db file created earlier on the Master server to the Slave server to the \rostrvm\mysql8 directory.
Reset the master as follows:
 

       mysql -u root –ppassword --port=3308
       reset master

This step should only be carried out if the "Recover the Old Master" method has been tried previously and it did not work.  If the slave server failed and you are just recovering the slave then this step should be skipped.
 
Issue the following command with master_host replaced by the name of the new master server and replica8_password replaced by the replica8 user password, which may be found in the C:\WINDOWS\MY.INI file in the clientreplica section:
 
   change master to

         MASTER_HOST=’master_host’,

         MASTER_PORT=3308,

         MASTER_USER=’replica8’,

         MASTER_PASSWORD=’replica8_password’,

         MASTER_AUTO_POSITION=1;

 

Run the following commands from the \rostrvm\mysql8 directory:
 
   mysql -u root –ppassword --port=3308 < dbdump.db
 
Log in to MySQL via a Command Prompt:
 
   mysql -u root –ppassword --port=3308
 
Continuing the MySQL command line session put MySQL into slave mode:
 
   start slave;
 
Check the mysql slave is running use the following command:
 
   show slave status\G
 
You should see Yes and Yes against the Slave_IO_Running and Slave_SQL_Running fields.  If either of the above values show No refer to rostrvm support.
On the Master Server start the rostrvm Windows service.
On the Slave Server start the rostrvm Windows service.
Check that the replication alarms that were showing in rostrvm have been cleared.

 


Dealing with Master host failure

In the event of a Master server failure (assuming load sharing configuration), rostrvm will continue to operate on the Slave host and clients will automatically switch their connections to the Slave host with minimal disruption.  This may take 2 to 3 minutes to complete.

Following a switch to the Slave host, if the Master is still running, processes will be re-distributed across both hosts but MySQL will only be running on the Slave and will not be mirrored.  To fully recover the old master and to reset mirroring follow the steps in Re-synchronisation Process above.