Performing disaster recovery for distributed VDFS repositories

The VisualSVN Distributed File System (VDFS) follows the classic master/slave replication architecture which ensures the following:

  • a master repository is always readable and writable — even if some of the corresponding slave repositories are unreachable,
  • a slave repository is writable if there is connectivity to the corresponding master repository,
  • replication is resilient to temporary connectivity issues,
  • read access to the slave repository does not require connectivity to the corresponding master repository.

However, in case of a disaster that renders master server and its repositories permanently inoperable, the replication will halt and slave repositories will become read only. In the event of such hardware or software failure, you can perform a disaster recovery procedure that is briefly described below.

The main steps required for the disaster recovery are the following:

  1. Promote the most up to date slave repository to act as a temporary master. This is a quick operation and all the available slave repositories become writable after this step.
  2. Recover the lost repository from the temporary master. This operation could take significant time depending on the size of repositories.
  3. Promote the recovered repository back to its master role. After this step, the replication system returns to its original state.

Prerequisites and assumptions

For explanatory reasons, we assume the following:

  • There are 3 instances of VisualSVN Server installed in the corresponding geographically distributed sites:
    • ‘Berlin-SVN’ hosts a single master VDFS repository ‘MyRepo’,
    • The ‘Rome-SVN’ and ‘NY-SVN’ servers host slave VDFS repositories co-named ‘MyRepo’,
  • The ‘Berlin-SVN’ server is completely lost due to a disaster and the replacement server is named ‘Berlin-SVN2’.
  • All VisualSVN Server instances are in the same Active Directory domain named ‘example.com’.
  • End users on all three sites contact the VisualSVN Server using the ‘svn.example.com’ hostname and are routed to the closest VisualSVN Server instance using the DNS settings.
  • All VisualSVN Server instances are at the same server version of 3.4 or newer.
  • Windows Firewall and security settings on the Berlin-SVN2, Rome-SVN and NY-SVN servers allow remote VisualSVN Server administration using PowerShell and VisualSVN Server Manager console. For more information, refer to the article KB99: Configuring remote administration with VisualSVN Server PowerShell.

We also assume that all PowerShell cmdlets required for disaster recovery are executed remotely and the target server for all cmdlets is identified by the standard CimSession parameter.

Promote the most up to date slave repository to act as a temporary master

Because the master repository on the Berlin-SVN server is completely lost, you should choose the most up-to-date slave repository and promote it to be a temporary master. You can examine which of the remaining slave repositories is up-to-date by comparing the youngest revision number using VisualSVN Server Manager or the Get-SvnRepository PowerShell cmdlet.

Let’s assume that the slave repository on the Rome-SVN server is the most up-to-date. This means that you can safely promote it to act as a temporary master. After this promotion, repositories on both Rome-SVN and NY-SVN become writable again.

The following steps are required to promote a repository on the Rome-SVN server to act as a temporary master:

  1. Make sure that the VDFS firewall rule is enabled on the Rome-SVN server to allow incoming connections to the VDFS service. Follow the steps specified in the article KB73: Enabling the inbound firewall rule for a master VDFS service.
  2. Make sure that the list of authorized replication partners on the Rome-SVN server includes the NY-SVN server. You may edit this list on the Replication tab in the VisualSVN Server Properties dialog.
  3. Suspend both remaining slave repositories to prevent accidental desynchronization until the procedure is completed and all the configuration checks are done:
    Suspend-SvnRepository -Name MyRepo -CimSession Rome-SVN
    Suspend-SvnRepository -Name MyRepo -CimSession NY-SVN
    This step provides additional safety and prevents your repositories from becoming out of sync due to accidental commit and configuration mistakes: both repositories are read-only until the Resume-SvnRepository command is executed for them.
  4. Promote a slave VDFS repository located on the Rome-SVN server to act as a temporary master:
    Switch-SvnRepository -Name MyRepo -Role Master -CimSession Rome-SVN
  5. Allow the NY-SVN server to replicate the temporary master repository located on the Rome-SVN server:
    Set-SvnRepository MyRepo -ReplicatorsAuthenticatedByActiveDirectory NY-SVN$ -CimSession Rome-SVN
  6. Enable repository replication for the temporary master repository (replication is initially disabled in the temporary master repository, hence you need to enable it manually):
    Set-SvnRepository MyRepo -ReplicationEnabled $true -CimSession Rome-SVN
  7. Configure a slave VDFS repository located on the NY-SVN server to begin replication from the Rome-SVN server:
    Set-SvnRepository MyRepo -MasterServer Rome-SVN -CimSession NY-SVN
  8. Resume repositories on the Rome-SVN and NY-SVN server to make them writable:
    Resume-SvnRepository -Name MyRepo -CimSession Rome-SVN
    Resume-SvnRepository -Name MyRepo -CimSession NY-SVN
  9. Reconfigure DNS settings in Berlin to route all local users to the closest site.

On completion of the above steps, the replication system will be partially recovered and repositories on the Rome-SVN and NY-SVN servers will be writable again. Users in Berlin will be also able to work with one of the servers at the remote site (but data transfer speed will be slower due to WAN latency).

Recover the lost repository from the temporary master

As the original master repository is completely lost, you need to create a new slave VDFS repository on the replacement Berlin-SVN2 server and perform initial replication from the temporary master. During the initial replication phase end users from Berlin should be still redirected to one of the available operable repositories (on the Rome-SVN or NY-SVN servers).

The following steps are required to create a new slave VDFS repository on the replacement Berlin-SVN2 server:

  1. Make sure that the list of authorized replication partners on the Rome-SVN server includes the Berlin-SVN2 server. You may edit this list on the Replication tab in the VisualSVN Server Properties dialog.
  2. Allow the Berlin-SVN2 server to replicate the temporary master repository located on the Rome-SVN server:
    Set-SvnRepository MyRepo -ReplicatorsAuthenticatedByActiveDirectory Berlin-SVN2$ -CimSession Rome-SVN
  3. Begin recovering the lost repository from the temporary master by creating a new slave VDFS repository on the Berlin-SVN2 server:
    New-SvnRepository MyRepo -Type VdfsSlave -MasterServer Rome-SVN -CimSession Berlin-SVN2

The new slave repository will immediately start replicating data from the Rome-SVN server. Depending on the size of the repository it may take some time to complete the initial replication. When the initial replication of the slave repository in Berlin is finished, you can alter your DNS settings and reroute users in Berlin to work with the local slave repository on ‘Berlin-SVN2’.

Promote the recovered repository back to its master role

As the lost repository is recovered from the temporary master, you need to promote the recovered repository on the Berlin-SVN2 server back to its master role and demote the temporary master on Rome-SVN to be a slave. Following these steps will fully recover the multisite repository replication system to the state as it was before the disaster.

  1. Make sure that the VDFS firewall rule is enabled on the Berlin-SVN2 server to allow incoming connections to the VDFS service. Follow the steps specified in the article KB73: Enabling the inbound firewall rule for a master VDFS service.
  2. Make sure that the list of authorized replication partners on the Berlin-SVN2 server includes the Rome-SVN and NY-SVN servers. You may edit this list on the Replication tab in the VisualSVN Server Properties dialog.
  3. Suspend the temporary master repository on the Rome-SVN server:
    Suspend-SvnRepository -Name MyRepo -CimSession Rome-SVN
    This step is required to make sure that your repositories will be in sync after switching the temporary and recovered master repositories.
  4. Make sure that the slave repository located on Berlin-SVN2 is up to date and the initial replication process has actually completed:
    Sync-SvnRepository MyRepo -PassThru -CimSession Berlin-SVN2
    This step ensures that the slave repository on Berlin-SVN2 is in sync with the temporary master on the Rome-SVN server (that is suspended).
  5. Suspend both available slave repositories to prevent accidental commits until the procedure is completed and all the configuration checks are done:
    Suspend-SvnRepository -Name MyRepo -CimSession Berlin-SVN2
    Suspend-SvnRepository -Name MyRepo -CimSession NY-SVN
  6. Promote the recovered repository on Berlin-SVN2 to its original master role:
    Switch-SvnRepository MyRepo -Role Master -CimSession Berlin-SVN2
  7. Grant the Rome-SVN and NY-SVN server permissions to replicate the recovered master repository located on the Berlin-SVN2 computer:
    Set-SvnRepository MyRepo -ReplicatorsAuthenticatedByActiveDirectory Rome-SVN$, NY-SVN$ -CimSession Berlin-SVN2
  8. Enable repository replication in the recovered master repository on the Berlin-SVN2 server (replication is initially disabled after the switching to the master role, hence you need to enable it manually):
    Set-SvnRepository MyRepo -ReplicationEnabled $true -CimSession Berlin-SVN2
  9. Demote the temporary master repository on the Rome-SVN server back to its original slave role:
    Switch-SvnRepository MyRepo -Role Slave -MasterServer Berlin-SVN2 -CimSession Rome-SVN
  10. Resume replication for all three repositories:
    Resume-SvnRepository MyRepo -CimSession Berlin-SVN2
    Resume-SvnRepository MyRepo -CimSession Rome-SVN
    Resume-SvnRepository MyRepo -CimSession NY-SVN

Your replication system will be fully recovered after this step and all three repositories will be both readable and writable.

Please make sure that your DNS settings are reverted back to its original state and that users located in Berlin are working with the local repository (which is currently recovered to its original master role).

See also

KB84: Comparing VDFS with master-master replication solutions
KB68: Getting started with VDFS replication in an Active Directory environment
Last Modified: