One of the best features in SQL Server 2016 is the distributed availability group (AG).
This new feature gives us the option to scale the synchronization across forwarder nodes instead of on the primary replica only (see Figure 1 & 2 for distributed versus non-distributed AGs).
As you can imagine, the introduction of the forwarder node can not only reduce congestion on the primary node but offers a clear savings on network bandwidth. This is especially important over wide area networks, where bandwidth cost is at a premium.
Another important feature of the distributed availability group is the decoupling Windows Server Failover Cluster (WSFC). In previous versions, we had to set up a stretched WSFC to span all the nodes located in both production and DR datacenters to leverage AGs. In SQL Server 2016 and above, as Figure 2 shows, we now have a choice to build separate WSFCs in any network and then configure distributed availability groups between the SQL engines. This has the effect of simplifying deployments, administration, and build times for the DBAs.
Here is a high-level overview of the implementation:
- Create availability groups along with listener name staged in AD
- Create the endpoint on all replicas and grant service account connect permission to endpoint
- Create distributed availability group on the primary replica of primary availability group
- Join the secondary availability group to the distributed availability group on the primary replica of secondary availability group (so-called forwarder)
- Add databases to the primary availability group and join the databases on the secondary replicas in the same availability group
- Join the databases on both the forwarder and the secondary replicas on the secondary availability group
More detail on AG configuration can be found here.
Considerations:
- Firewall needs to be opened for mirroring endpoint across all replicas
- For better performance and separation, we can even configure dedicated network for data synchronization traffic
- This is not specific to distributed availability group, but it’s not recommended to add large number of databases in the group, say it above 100 databases.
For monitoring, here is a T-SQL script we use to monitor the latency and throughput of distributed availability group (Run on primary replica of primary availability group):
Distributed availability groups are an excellent option to leverage when we want to do database upgrades, refresh hardware, or add disaster recovery environments; either on-premise or in a cloud infrastructure.