glusterfs.git/xlators/cluster/ec/src, branch v3.6.4beta1

ec: Special handling of anonymous fd

2015-03-30T07:22:32+00:00

Anonymous file descriptors need to be handled specially because
they can be used in some non standard ways (i.e. an anonymous fd
can be used without having been opened).

This caused NFS to fail on some operations because ec always
expected to have a previous successful opendir call (from patch
http://review.gluster.org/9098/).

This patch treats all anonymous fd as opened on all subvolumes.

This is a backport of http://review.gluster.org/9513/

Change-Id: I09dbbce2ffc1ae3a5bcbb328bed55b84f4f0b9f8
BUG: 1187526
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9596
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
Reviewed-by: Dan Lambright 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat

cluster/ec: Wait for all bricks to notify before notifying parent

2015-03-30T07:20:56+00:00

        Backport of http://review.gluster.org/9523

This is to prevent spurious heals that can result in self-heal.

BUG: 1188471
Change-Id: Iaea335d59431d8d85a236963a365f5c791fc7c49
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/9552
Reviewed-by: Xavier Hernandez 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat

cluster/ec: Handle CHILD UP/DOWN in all cases

2015-03-30T07:20:38+00:00

        Backport of http://review.gluster.org/9396

Problem:
When all the bricks are down at the time of mounting the volume, then mount
command hangs.

Fix:
1. Ignore all CHILD_CONNECTING events comming from subvolumes.
2. On timer expiration (without enough up or down childs) send
   CHILD_DOWN.
3. Once enough up or down subvolumes are detected, send the appropriate event.
   When rest of the subvols go up/down without changing the overall
   ec-up/ec-down send CHILD_MODIFIED to parent subvols.

BUG: 1188471
Change-Id: If92bd84107d49495cd104deb34601afe7f9b155c
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/9551
Reviewed-by: Xavier Hernandez 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat

ec: Don't use inodelk on getxattr when clearing locks

2015-02-11T09:40:11+00:00

When command 'clear-locks' from cli is executed, a getxattr request
is received by ec. This request was handled as usual, first locking
the inode. Once this request was processed by the bricks, all locks
were removed, including the lock used by ec.

When ec tried to unlock the previously acquired lock (which was
already released), caused a crash in glusterfsd.

This fix executes the getxattr request without any lock acquired
for the clear-locks command.

This is a backport of http://review.gluster.org/9440/

Change-Id: I77e550d13c4673d2468a1e13fe6e2fed20e233c6
BUG: 1181977
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9444
Reviewed-by: Dan Lambright 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat

ec: Fix posix compliance failures

2015-02-11T09:37:55+00:00

This patch solves some problems that caused dispersed volumes to not
pass posix smoke tests:

* Problems in open/create with O_WRONLY
    Opening files with -w- permissions using O_WRONLY returned an EACCES
    error because internally O_WRONLY was replaced with O_RDWR.

* Problems with entrylk on renames.
    When source and destination were the same, ec tried to acquire
    the same entrylk twice, causing a deadlock.

* Overwrite of a variable when reordering locks.
    On a rename, if the second lock needed to be placed at the beggining
    of the list, the 'lock' variable was overwritten and later its timer
    was cancelled, cancelling the incorrect one.

* Handle O_TRUNC in open.
    When O_TRUNC was received in an open call, it was blindly propagated
    to child subvolumes. This caused a discrepancy between real file
    size and the size stored into trusted.ec.size xattr. This has been
    solved by removing O_TRUNC from open and later calling ftruncate.

This is a backport of http://review.gluster.org/9420

Change-Id: I20c3d6e1c11be314be86879be54b728e01013798
BUG: 1159471
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9501
Tested-by: Gluster Build System 
Reviewed-by: Dan Lambright 
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Raghavendra Bhat

ec: Fix failures with missing files

2015-02-11T09:20:24+00:00

When a file does not exist on a brick but it does on others, there
could be problems trying to access it because there was some loc_t
structures with null 'pargfid' but 'name' was set. This forced
inode resolution based on /name instead of  which
would be the correct one. To solve this problem, 'name' is always
set to NULL when 'pargfid' is not present.

Another problem was caused by an incorrect management of errors
while doing incremental locking. The only allowed error during an
incremental locking was ENOTCONN, but missing files on a brick can
be returned as ESTALE. This caused an EIO on the operation.

This patch doesn't care of errors during an incremental locking. At
the end of the operation it will check if there are enough successfully
locked bricks to continue or not.

This is a backport of http://review.gluster.org/9407/

Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a
BUG: 1183716
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9560
Tested-by: Gluster Build System 
Reviewed-by: Dan Lambright 
Reviewed-by: Raghavendra Bhat

cluster/ec: Do not modify quota, selinux xattrs in healing

2015-02-04T11:54:47+00:00

        Backport of http://review.gluster.org/9401

Problem:
EC heal tries to heal quota-size, selinux xattrs as well.  quota-size is
private to the brick but since quotad accesses them using the standard
interface as well, they can not be filtered in the fops.

Fix:
Ignore QUOTA_SIZE_KEY and SELINUX xattrs during heal.

BUG: 1178590
Change-Id: Id569a49ef996e5507f4474c99b6cdc22781ad82d
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/9454
Reviewed-by: Xavier Hernandez 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat

cluster/ec: Handle internal xattr get/set

2015-02-04T11:53:33+00:00

        Backport of http://review.gluster.org/9385

Problem:
Internal xattrs of EC like trusted.ec.size/config/version
can be modified by users and that can lead to misbehavior
in EC.

Fix:
Don't let the user modify the xattrs. Hide these xattrs
in getfattr outputs.

BUG: 1182490
Change-Id: Ie32ebb95ee67cabbb9488951097a517172b45bcf
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/9455
Reviewed-by: Xavier Hernandez 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat

ec: Fix mutex related coverity scan issues

2015-01-05T06:58:30+00:00

This patch solves 3 issues detected by coverity scan:

    CID1241484 Data race condition
    CID1241486 Data race condition
    CID1256173 Thread deadlock
    CID1257622 Thread deadlock

With this patch, inode lock is never acquired inside a region locked
with fop->lock.

This is a backport of http://review.gluster.org/9230/ and
http://review.gluster.org/9263/

Change-Id: I35c4633efd1b68b9f72b42661fa7c728b1f52c6a
BUG: 1170954
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9244
Tested-by: Gluster Build System 
Reviewed-by: Dan Lambright 
Reviewed-by: Raghavendra Bhat

ec: Fix self-healing issues.

2014-12-21T16:04:21+00:00

Three problems have been detected:

1. Self healing is executed in background, allowing the fop that
   detected the problem to continue without blocks nor delays.

   While this is quite interesting to avoid unnecessary delays,
   it can cause spurious failures of self-heal because it may
   try to recover a file inside a directory that a previous
   self-heal has not recovered yet, causing the file self-heal
   to fail.

2. When a partial self-heal is being executed on a directory,
   if a full self-heal is attempted, it won't be executed
   because another self-heal is already in process, so the
   directory won't be fully repaired.

3. Information contained in loc's of some fop's is not enough
   to do a complete self-heal.

To solve these problems, I've made some changes:

* Improved ec_loc_from_loc() to add all available information
  to a loc.

* Before healing an entry, it's parent is checked and partially
  healed if necessary to avoid failures.

* All heal requests received for the same inode while another
  self-heal is being processed are queued. When the first heal
  completes, all pending requests are answered using the results
  of the first heal (without full execution), unless the first
  heal was a partial heal. In this case all partial heals are
  answered, and the first full heal is processed normally.

* An special virtual xattr (not physically stored on bricks)
  named 'trusted.ec.heal' has been created to allow synchronous
  self-heal of files.

  Now, the recommended way to heal an entire volume is this:

    find  -d -exec getfattr -h -n trusted.ec.heal {} \;

Some minor changes:

* ec_loc_prepare() has been renamed to ec_loc_update().

* All loc management functions return 0 on success and -1 on
  error.

* Do not delay fop unlocks if heal is needed.

* Added basic ec xattrs initially on create, mkdir and mknod
  fops.

* Some coding style changes

This is a backport of http://review.gluster.org/9072/

Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985
BUG: 1159484
Signed-off-by: Xavier Hernandez 
Reviewed-on: http://review.gluster.org/9073
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat