From e8121c4afb3680f532b450872b5a3ffcb3766a97 Mon Sep 17 00:00:00 2001 From: Kaleb S KEITHLEY Date: Mon, 14 Dec 2015 09:24:57 -0500 Subject: common-ha: reliable grace using pacemaker notify actions Using *-dead_ip-1 resources to track on which nodes the ganesha.nfsd had died was found to be unreliable. Running `pcs status` in the ganesha_grace monitor action was seen to time out during failover; the HA devs opined that it was, generally, not a good idea to run `pcs status` in a monitor action in any event. They suggested using the notify feature, where the resources on all the nodes are notified when a clone resource agent dies. This change adds a notify action to the ganesha_grace RA. The ganesha_mon RA monitors its ganesha.nfsd daemon. While the daemon is running, it creates two attributes: ganesha-active and grace-active. When the daemon stops for any reason, the attributes are deleted. Deleting the ganesha-active attribute triggers the failover of the virtual IP (the IPaddr RA) to another node where ganesha.nfsd is still running. The ganesha_grace RA monitors the grace-active attribute. When the grace-active attibute is deleted, the ganesha_grace RA stops, and will not restart. This triggers pacemaker to trigger the notify action in the ganesha_grace RAs on the other nodes in the cluster; which send a DBUS message to their ganesha.nfsd. (N.B. grace-active is a bit of a misnomer. while the grace-active attribute exists, everything is normal and healthy. Deleting the attribute triggers putting the surviving ganesha.nfsds into GRACE.) To ensure that the remaining/surviving ganesha.nfsds are put into NFS-GRACE before the IPaddr (virtual IP) fails over there is a short delay (sleep) between deleting the grace-active attribute and the ganesha-active attribute. To summarize: 1. on node 2 ganesha_mon:monitor notices that ganesha.nfsd has died 2. on node 2 ganesha_mon:monitor deletes its grace-active attribute 3. on node 2 ganesha_grace:monitor notices that grace-active is gone and returns OCF_ERR_GENERIC, a.k.a. new error. When pacemaker tries to (re)start ganesha_grace, its start action will return OCF_NOT_RUNNING, a.k.a. known error, don't attempt further restarts. 4. on nodes 1, 3, etc., ganesha_grace:notify receives a post-stop notification indicating that node 2 is gone, and sends a DBUS message to its ganesha.nfsd putting it into NFS-GRACE. 5. on node 2 ganesha_mon:monitor waits a short period, then deletes its ganesha-active attribute. This triggers the IPaddr (virt IP) failover according to constraint location rules. ganesha_nfsd modified to run for the duration, start action is invoked to setup the /var/lib/nfs symlink, stop action is invoked to restore it. ganesha-ha.sh modified accordingly to create it as a clone resource. Change-Id: Iad60b0c5222bbd55ef95c8b8f955e791caa3ffd0 BUG: 1290865 Signed-off-by: Kaleb S KEITHLEY Reviewed-on: http://review.gluster.org/12964 Smoke: Gluster Build System NetBSD-regression: NetBSD Build System CentOS-regression: Gluster Build System --- doc/features/ganesha-ha.md | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 doc/features/ganesha-ha.md (limited to 'doc') diff --git a/doc/features/ganesha-ha.md b/doc/features/ganesha-ha.md new file mode 100644 index 00000000000..4b226a22ccf --- /dev/null +++ b/doc/features/ganesha-ha.md @@ -0,0 +1,43 @@ +# Overview of Ganesha HA Resource Agents in GlusterFS 3.7 + +The ganesha_mon RA monitors its ganesha.nfsd daemon. While the +daemon is running, it creates two attributes: ganesha-active and +grace-active. When the daemon stops for any reason, the attributes +are deleted. Deleting the ganesha-active attribute triggers the +failover of the virtual IP (the IPaddr RA) to another node — +according to constraint location rules — where ganesha.nfsd is +still running. + +The ganesha_grace RA monitors the grace-active attribute. When +the grace-active attibute is deleted, the ganesha_grace RA stops, +and will not restart. This triggers pacemaker to invoke the notify +action in the ganesha_grace RAs on the other nodes in the cluster; +which send a DBUS message to their respective ganesha.nfsd. + +(N.B. grace-active is a bit of a misnomer. while the grace-active +attribute exists, everything is normal and healthy. Deleting the +attribute triggers putting the surviving ganesha.nfsds into GRACE.) + +To ensure that the remaining/surviving ganesha.nfsds are put into + NFS-GRACE before the IPaddr (virtual IP) fails over there is a +short delay (sleep) between deleting the grace-active attribute +and the ganesha-active attribute. To summarize, e.g. in a four +node cluster: + +1. on node 2 ganesha_mon::monitor notices that ganesha.nfsd has died + +2. on node 2 ganesha_mon::monitor deletes its grace-active attribute + +3. on node 2 ganesha_grace::monitor notices that grace-active is gone +and returns OCF_ERR_GENERIC, a.k.a. new error. When pacemaker tries +to (re)start ganesha_grace, its start action will return +OCF_NOT_RUNNING, a.k.a. known error, don't attempt further restarts. + +4. on nodes 1, 3, and 4, ganesha_grace::notify receives a post-stop +notification indicating that node 2 is gone, and sends a DBUS message +to its ganesha.nfsd, putting it into NFS-GRACE. + +5. on node 2 ganesha_mon::monitor waits a short period, then deletes +its ganesha-active attribute. This triggers the IPaddr (virt IP) +failover according to constraint location rules. + -- cgit