| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ganesha_grace resource agent can start before the ganesha_mon
resource agent, with the result that the crm_attribute that
ganesha_grace expects to find has not been created yet.
This is never (never? Or just so rarely that it has never actually
been seen during development) seen with four nodes, but with just
two nodes it's very repeatable.
Note that when long (FQDN) names are used it is not unexpected to
see Failed Actions in the output of `pcs status`, e.g.:
* nfs-grace_monitor_5000 on node1.fully.qualified.domain.name.com
'unknown error' (1): call=20, status=complete, exitreason='none',
last-rc-change='Wed Jun 1 12:32:32 2016', queued=0ms, exec=0ms
* nfs-grace_monitor_5000 on node2.fully.qualified.domain.name.com
'unknown error' (1): call=18, status=complete, exitreason='none',
last-rc-change='Wed Jun 1 12:32:42 2016', queued=0ms, exec=0ms
and as long as all the ganesha_grace_clone and cluster_ip-1
resource agents are in Started state then this is okay.
backport master:
> http://review.gluster.org/14607
> BUG: 1341768
release-3.8
> http://review.gluster.org/14609
> BUG: 1341770
Change-Id: I726c9946ceb1ca92872b321612eb0f4c3cc039d8
BUG: 1341772
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/14610
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A little known, rarely used feature of pacemaker called
"notification" is used to follow the status of the ganesha.nfsds
in the cluster. This is done with location constraints and other
Black Magick.
When a nfsd dies, the ganesha-active attribute is cleared, the
associated floating IP (VIP) fails over to another node, and the
ganesha_grace notify method is invoked with post-stop on all the
nodes where the ganesha.nfsd is still running. The notify methods
send dbus msgs to put their nfsds into NFS-GRACE, and the nfsds
perform their grace processing, e.g. taking over locks from the
failed nfsd.
N.B. Fail-back was originally not planned to be a feature for
glusterfs-3.7, but we sorta got it for free.
For fail-back, the opposite occurs. The ganesha-active attribute
is recreated, the floating IP fails back, and the notify method is
invoked with pre-start on all the nodes where the surviving
ganesha.nfsds continue to run. The notify methods send dbus msgs
again to put their nsfds into NFS-GRACE again, and the nfsds clean
up their locks.
backport mainline
> http://review.gluster.org/14506
> BUG: 1338967
release-3.8
> http://review.gluster.org/14507
> BUG: 1338968
Change-Id: I3fc64afa20ae3a928143d69aa533a8df68dd680e
BUG: 1338969
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/14508
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: soumya k <skoduri@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the cluster is configured with long (FQDN) cluster members
the log is flooded with "Could not map name=$shortname to a UUID"
notices, and setting/getting the attribute is failing
Change-Id: I954d8cef7115659cc9c8b23dae75a5a247dc5db7
BUG: 1337653
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/14437
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
messages are seen on RHEL6.x and RHEL7.1 and earlier versions of
pacemaker. (And RHEL7.2 with RHEL7.1 pacemaker packages.)
It's not possible to query attrd attributes in the older version,
only set/update/clear them. The messages come from invalid attempts
to query the attributes.
However it is possible to query crm attributes. The fix here is to
create a "shadow" crm attribute for the attrd attribute. Changes are
made to both, queries are made on the crm attribute.
(Resource Agents "follow" the attrd attribute using constraint locations,
so we must keep the attrd attribute.)
Backport of
> Change-Id: I84ac1a80673e528d98b67b7d5062e21dcf744d4a
> BUG: 1324509
> http://review.gluster.org/#/c/13919/
Change-Id: I7301c48849496be026ef598c588e78c68f273a8a
BUG: 1324510
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/13920
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: soumya k <skoduri@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using *-dead_ip-1 resources to track on which nodes the ganesha.nfsd
had died was found to be unreliable.
Running `pcs status` in the ganesha_grace monitor action was seen to
time out during failover; the HA devs opined that it was, generally,
not a good idea to run `pcs status` in a monitor action in any event.
They suggested using the notify feature, where the resources on all
the nodes are notified when a clone resource agent dies.
This change adds a notify action to the ganesha_grace RA. The ganesha_mon
RA monitors its ganesha.nfsd daemon. While the daemon is running, it
creates two attributes: ganesha-active and grace-active. When the daemon
stops for any reason, the attributes are deleted. Deleting the
ganesha-active attribute triggers the failover of the virtual IP (the
IPaddr RA) to another node where ganesha.nfsd is still running. The
ganesha_grace RA monitors the grace-active attribute. When the
grace-active attibute is deleted, the ganesha_grace RA stops, and will
not restart. This triggers pacemaker to trigger the notify action in
the ganesha_grace RAs on the other nodes in the cluster; which send a
DBUS message to their ganesha.nfsd.
(N.B. grace-active is a bit of a misnomer. while the grace-active
attribute exists, everything is normal and healthy. Deleting the
attribute triggers putting the surviving ganesha.nfsds into GRACE.)
To ensure that the remaining/surviving ganesha.nfsds are put into
NFS-GRACE before the IPaddr (virtual IP) fails over there is a short
delay (sleep) between deleting the grace-active attribute and the
ganesha-active attribute. To summarize:
1. on node 2 ganesha_mon:monitor notices that ganesha.nfsd has died
2. on node 2 ganesha_mon:monitor deletes its grace-active attribute
3. on node 2 ganesha_grace:monitor notices that grace-active is gone
and returns OCF_ERR_GENERIC, a.k.a. new error. When pacemaker
tries to (re)start ganesha_grace, its start action will return
OCF_NOT_RUNNING, a.k.a. known error, don't attempt further
restarts.
4. on nodes 1, 3, etc., ganesha_grace:notify receives a post-stop
notification indicating that node 2 is gone, and sends a DBUS
message to its ganesha.nfsd putting it into NFS-GRACE.
5. on node 2 ganesha_mon:monitor waits a short period, then deletes
its ganesha-active attribute. This triggers the IPaddr (virt IP)
failover according to constraint location rules.
ganesha_nfsd modified to run for the duration, start action is invoked
to setup the /var/lib/nfs symlink, stop action is invoked to restore it.
ganesha-ha.sh modified accordingly to create it as a clone resource.
Change-Id: Iad60b0c5222bbd55ef95c8b8f955e791caa3ffd0
BUG: 1290865
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/12964
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* use --name on RHEL7 (later versions of pcs drop --name) we guessed
wrong and did not get the version that dropped use of --name option
* more robust config file param parsing for n/v with ""s in the value
after not sourcing the config file
* pid file fix. RHEL6 init.d adds -p /var/run/ganesha.nfsd.pid to
cmdline options. RHEL7 systemd does not, so defaults to
/var/run/ganesha.pid.
Change-Id: I2236d41c8a87e4ead082274dddec19307d1f4db9
BUG: 1232333
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/11258
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: soumya k <skoduri@redhat.com>
Reviewed-by: Meghana M <mmadhusu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
omnibus patch consisting of:
+ completed implemenation of delete-node (BZ 1213934 (master 1213933))
+ teardown leaves /var/lib/nfs symlink (BZ 1213927 (master 1210712))
+ setup copy config, teardown clean /etc/cluster (BZ 1214888 (master 1212823))
setup for copy config, teardown clean /etc/cluster:
1. on one (primary) node in the cluster, run:
`ssh-keygen -f /var/lib/glusterd/nfs/secret.pem`
Press Enter twice to avoid passphrase.
2. deploy the pubkey ~root/.ssh/authorized keys on _all_ nodes, run:
`ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@$node`
3. copy the keys to _all_ nodes in the cluster, run:
`scp /var/lib/glusterd/nfs/secret.* $node:/var/lib/glusterd/nfs/`
N.B. this allows setup, teardown, etc., to be run on any node
Change-Id: I66e947538769c3c531cfdb89854997130ca5c05b
BUG: 1213934
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/10318
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
fix bug with reading pid file to determine if ganesha.nfsd is running
Change-Id: I4050a119e2be93578045a221b67f616e152546d9
BUG: 1188184
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/10163
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
|
|
Resubmitting after a gerrit bug bungled the merge of
http://review.gluster.org/9621 (was it really a gerrit bug?)
Scripts related to NFS-Ganesha are in extras/ganesha/scripts.
Config files are in extras/ganesha/config.
Resource Agent files are in extras/ganesha/ocf
Files are copied to appropriate locations.
Change-Id: I137169f4d653ee2b7d6df14d41e2babd0ae8d10c
BUG: 1188184
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/9912
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|