glusterfs.git/xlators/cluster/afr/src, branch v3.4.0alpha

cluster/afr: added logging of changelog for split-brain in glustershd.log file

2013-02-03T19:48:01+00:00

Change-Id: Iaf119f839cb2113b8f8efb7bf7636d471b6541bf
BUG: 866440
Signed-off-by: Venkatesh Somyajula 
Reviewed-on: http://review.gluster.org/4385
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Jeff Darcy 
Tested-by: Gluster Build System

cluster/afr: if a subvolume is down wind the lock request to next

2013-01-29T20:50:55+00:00

When one of the subvolume is down, then lock request is not attempted
on that subvolume and move on to the next subvolume.

/* skip over children that are down */
                while ((child_index < priv->child_count)
                       && !local->child_up[child_index])
                        child_index++;

In the above case if there are 2 subvolumes and 2nd subvolume is down (subvolume
1 from afr's view), then after attempting lock on 1st child (i.e subvolume 0)
child index is calculated to be 1. But since the 2nd child is down child_index
is incremented to 2 as per the above logic and lock request is STACK_WINDed to
the child with child_index 2. Since there are only 2 children for afr the child
(i.e the xlator_t pointer) for child_index will be NULL. The process crashes
when it dereference the NULL xlator object.

Change-Id: Icd9b5ad28bac1b805e6e80d53c12d296526bedf5
BUG: 765564
Signed-off-by: Raghavendra Bhat 
Reviewed-on: http://review.gluster.org/4438
Reviewed-by: Krishnan Parthasarathi 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: wakeup delayed post op on fsync

2013-01-29T20:33:05+00:00

Change-Id: I5d84ef72615f9d71b4af210976e2449de6e02326
BUG: 888174
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4446
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: Change order of unwind, resume for writev

2013-01-29T20:29:51+00:00

Generally inode-write fops do transaction.unwind then
transaction.resume, but writev needs to make sure that
delayed post-op frame is placed in fdctx before unwind
happens. This prevents the race of flush doing the
changelog wakeup first in fuse thread and then this
writev placing its delayed post-op frame in fdctx.
This helps flush make sure all the delayed post-ops are
completed.

Change-Id: Ia78ca556f69cab3073c21172bb15f34ff8c3f4be
BUG: 888174
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4428
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: before checking lock_count of internal lock make sure its not

2013-01-28T08:20:46+00:00

             entrylk

when the expected lock count is equal to the attempted lock count, then before
deciding that lock is failed on all the nodes, make sure the lock type is
checked properly.

Change-Id: I1f362d54320cb6ec5654c5c69915c0f61c91d8c7
BUG: 765564
Signed-off-by: Raghavendra Bhat 
Reviewed-on: http://review.gluster.org/4436
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

replicate: fix lock counting in blocking lock path

2013-01-26T08:20:03+00:00

As of http://review.gluster.org/2828, the blocking lock code
path's condition for checking completion of locking atempt is
broken. The condition -

	if ((child_index == priv->child_count) || ...)

and

	if ((child_index == priv->child_count) && ...)

which is retained to check completion of blocking lock attempts
for DATA/METADATA transaction will _always_ fail because a few
lines above we have -

      child_index = cookie % priv->child_count;

So child_index will never equal priv->child_count. This leaves
the correctness at the mercy of the next part of the
conditional -

	.. (int_lock->lock_count == int_lock->lk_expected_count) ..

This "works" as long as no server went down during the transaction.

If a server goes down in the middle of the transaction, then this
condition also fails, and the code wraps around and starts a
blocking lock attempt loop all the way again from from the first server.
This results in double locks getting acquired on those servers, and
eventually the second condition gets hit (first condition is _never_
hit) and we come out of locking phase.

During unlock phase we perform only one unlock per server leaving the
other lock "leaked" forever.

Change-Id: I7189cdf3f70901b04647516fe1d1e189f36cc8dd
BUG: 765564
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/4433
Reviewed-by: Krishnan Parthasarathi 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System

afr: Modified book-keeping structures for entrylks

2013-01-23T17:17:00+00:00

* There are upto 3 entry lockees that may be needed to perform
  entrylk'ing in posix dir-write operations.

* For eg, rmdir ("/a/b") needs to acquire locks on two entities,
  - entrylk ("/a", "b")
  - entrylk ("/a/b", null)

* Changed existing entrylk/rename/selfheal (entrylk) transactions
  to use the new book-keeping structures

* Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now
  aware of the new entry lockee structure.

Implementation notes:
* Changed 'cookie' sent in stack_wind to encode lockee_entity_no
  and subvol_no.

  cookie is a non-negative integer such that 0 <= cookie < replica_count,
  When more than one lock is being acquired across the subvolumes,
  cookie % replica_count gives the subvol_no
  cookie / replica_count gives the lockee_entity_no.

Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153
BUG: 765564
Signed-off-by: Krishnan Parthasarathi 
Reviewed-on: http://review.gluster.org/2828
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

cluster/afr: Remove strict-readdir implementation

2013-01-23T09:26:09+00:00

Leaving option frame-work un-changed for backward compatibility.

Change-Id: I40bce1ec360801307e67f09e53b0721f64efab37
BUG: 886998
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4309
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

self-heald: Remove stale index even in heal info

2013-01-23T03:43:50+00:00

Change-Id: Ic1c9559aec59c1fb9dfede4aba8895f3b86f32f1
BUG: 861015
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4098
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy

cluster/afr: Link inode only on lookup

2013-01-22T05:48:29+00:00

Problem:
When "gluster volume heal  info is executed, crawl's
process_entry is not going to populate iatt structure so the
iatt's gfid will be empty. So inode_links are failing.

Fix:
inode_link should be done only after lookup i.e. when heal is
performed. So moved the inode_link related code to just after
the lookup which is triggered when self-heal is done.

Tests:
The testcase that gives this issue does not give the inode-link
failures anymore. glustershd heal, info commands are working as
expected.
Wrote basic automation tests for proactive-self-heal-daemon
https://github.com/pranithk/gluster-tests/blob/master/afr/proactive-self-heal.sh

Change-Id: Ic112bf104a4d553a64d3d8559f681a25ae1a5362
BUG: 861015
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/4090
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati