summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: set loc->gfid for building root locv3.3.0qa20Shylesh Kumar2012-01-231-1/+1
| | | | | | | | | Change-Id: Icb902846d243df0502f664bfd187280cecd4397c BUG: 784176 Signed-off-by: Shylesh Kumar <shylesh@gluster.com> Reviewed-on: http://review.gluster.com/2681 Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* pump: move internal pump xattrs out of trusted domainRajesh Amaravathi2012-01-232-14/+9
| | | | | | | | | | | | | | | | | | | * the trusted.glusterfs.pump.{start|pause|commit|status|abort} xattrs have been moved out of trusted domain. This enables separation of xattrs used as gluster-internal commands (handled by pump) for replace-brick, which are not set in the back-end, from xattrs set on the replace-brick source and destinations bricks. * macros definitions from pump.h and glusterd.h, #defining these xattrs have been merged and put into libglusterfs/src/glusterfs.h Change-Id: I87b8bfbf045aa140f5d3f0c9baa9b2e79f87b67b BUG: 783049 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/2663 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: GFID filehandle based backend and anonymous FDsAnand Avati2012-01-206-73/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: do not unlock without holding the lock on the fdRaghavendra Bhat2012-01-191-1/+1
| | | | | | | | | | | | | | | In afr_open_fd_fix we were unlocking the local->fd->lock, without holding the lock on it if we were not able to get the fd context. Now we are directly going to out and returning, instead of going to unlock without holding the lock. Change-Id: I0da638bbd2c269127cf111b3aac707e4a95d20c6 BUG: 783036 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2658 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core/setxattr: prevent users from setting glusterfs xattrsRajesh Amaravathi2012-01-146-18/+233
| | | | | | | | | | | | | | | | | | | | | | | | * Each xlator prevents the user from setting glusterfs-internal xattrs like trusted.gfid by handling it in respective setxattr functions. The speacial case of trusted.gfid is handled in fuse (Not in posix because posix_setxattr is used to set gfid). * For xlators which did not define setxattr and/or fsetxattr, the functions have been implemented with appropriate checks. xlator | fops-added _______________|__________________________ | 1. afr | fsetxattr 2. stripe | setxatrr and fsetxattr 3. quota | setxattr and fsetxattr Change-Id: Ib62abb7067415b23a708002f884d30e8866fbf48 BUG: 765487 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/685 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* cluster/afr: Remove dead codePranith Kumar K2012-01-103-55/+0
| | | | | | | | | | Change-Id: I239128c51b728fbb7814fd6a41020b76c88fbd93 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> BUG: 772876 Reviewed-on: http://review.gluster.com/2623 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle fini for afr,pumpv3.3.0qa19Pranith Kumar K2011-12-285-63/+89
| | | | | | | | | Change-Id: Idc0a05a8a25f278a7ab05e242263e0a5001bde18 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> BUG: 767862 Reviewed-on: http://review.gluster.com/800 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: EIO should overwrite ENOENT in lookupPranith Kumar K2011-12-281-2/+3
| | | | | | | | | | | | | | | In case if lookup decides there is a gfid-mismatch, some enoents and self-heal cant remove the stale entry, it tells lookup to unwind with EIO but since ENOENT has more priority it is not over-written, this patch fixes that case. Change-Id: Icd68c4a5cf05dd97c568964ab647a34fdb6e26f4 BUG: 765528 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2541 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle error cases in local initPranith Kumar K2011-12-289-516/+384
| | | | | | | | | | | | - Fop should unwind with appropriate errno - Local is de-allocated on errors Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Change-Id: I4db40342ae184fe1cc29e51072e8fea72ef2cb15 BUG: 770513 Reviewed-on: http://review.gluster.com/2539 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle split-brain/all-fool xattrs for directoryPranith Kumar K2011-12-276-147/+152
| | | | | | | | | | | | In case of split-brain/all-fool xattrs perform conservative merge. Don't treat ignorant subvol as fool. Change-Id: I6ddf89949cd5793c2abbead7c47f091e8461f1d4 BUG: 765528 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2521 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Set pargfid when missingv3.3.0qa18Pranith Kumar K2011-12-223-4/+14
| | | | | | | | | | | | client asserts for missing pargfid in case of unlink. So Afr needs to make sure it is present in that fop. Change-Id: Iea0ad65e1e7254c8df412942c52d5870e853aa51 BUG: 769055 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2495 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Fix meta data lock rangePranith Kumar K2011-12-221-1/+1
| | | | | | | | | | Change-Id: I7615f31309c6c8f5373e1ff0535d84396dfa1455 BUG: 765430 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/807 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Double the call count if transaction is for renamePranith Kumar K2011-12-131-4/+18
| | | | | | | | | | | | | In rename the changelog modification needs to happen both on old parent-dir and new parent-dir, so 2 stack winds are done per brick. Change-Id: I43f34661e397c4288162213944529e18b7724b1d BUG: 766603 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/783 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Add command-line support (but no doc) for enforce-quorum option.Jeff Darcy2011-11-286-69/+118
| | | | | | | | Change-Id: Ia52ddb551e24c27969f7f5fa0f94c1044789731f BUG: 3823 Reviewed-on: http://review.gluster.com/743 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Update read-child if it becomes stalePranith Kumar K2011-11-283-36/+30
| | | | | | | | Change-Id: I00c714a89575023f6dbdd3430dcbf191e5d08019 BUG: 3650 Reviewed-on: http://review.gluster.com/740 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* afr: Fixed backgroundness detection in self-heal algo.Krishnan Parthasarathi2011-11-231-0/+1
| | | | | | | | | | Change-Id: I9888d8a0b86fdaf6589885766f2de7222d8c8ba2 BUG: 3802 Reviewed-on: http://review.gluster.com/705 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/745 Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: Remove treating sh_frame as special loop_framePranith Kumar K2011-11-234-61/+129
| | | | | | | | Change-Id: I0d87f06f989b2d4b971967c52d4898331693a801 BUG: 3675 Reviewed-on: http://review.gluster.com/735 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Open fd fix should perform opendir for dirsPranith Kumar K2011-11-231-7/+22
| | | | | | | | Change-Id: Iee12828ca515d44ed71d9cf97dcb8627c85f0593 BUG: 3740 Reviewed-on: http://review.gluster.com/725 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Fix memory leaksPranith Kumar K2011-11-232-16/+13
| | | | | | | | Change-Id: I79a1c70c47649fbcf236191f174d766d5806545c BUG: 3805 Reviewed-on: http://review.gluster.com/719 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle absence of gfid in lookupPranith Kumar K2011-11-231-6/+16
| | | | | | | | Change-Id: I6295245a7f40ba4f786f1f9f35b337f3f711128d BUG: 3783 Reviewed-on: http://review.gluster.com/739 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Add quorum checks to avoid split-brain.Jeff Darcy2011-11-206-1/+90
| | | | | | | | Change-Id: I2f123ef93989862aa796903a45682981d5d7fc3c BUG: 3533 Reviewed-on: http://review.gluster.com/473 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: lookup should honor gfid present in locPranith Kumar K2011-11-181-2/+7
| | | | | | | | Change-Id: I2319258743e478cc3a932d8ff0b2204a97cd4f8e BUG: 3760 Reviewed-on: http://review.gluster.com/680 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* dht,afr: Fixed gfid problemsPranith Kumar K2011-11-187-82/+65
| | | | | | | | | | | *) removed uuid_generate usage in pump and afr, self-heald *) filled the gfids for the fops which were sending no gfid in loc Change-Id: I85da3c10f5ee2006248b0123155a60867870d202 BUG: 3760 Reviewed-on: http://review.gluster.com/679 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterfs: An effort to fix all the spell mistakes and typoHarshavardhana2011-11-165-10/+10
| | | | | | | | | | | | | | | in the entire glusterfs codebase. This patch fixes many of spell mistakes and typo in the entire glusterfs codebase and all supported modules. Change-Id: I83238a41aa08118df3cf4d1d605505dd3cda35a1 BUG: 3809 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/731 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: remove 'ino' variable from 'loc_t' structureAmar Tumballi2011-11-161-3/+2
| | | | | | | | Change-Id: I53b007fbdb42313d207d5d63fbfaaa6aaf033f95 BUG: 3518 Reviewed-on: http://review.gluster.com/523 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: remove 'ino' variable from 'inode_t' structureAmar Tumballi2011-11-163-4/+2
| | | | | | | | Change-Id: I0f078d1753db65d2f2e0380d1b0450c114cf40dd BUG: 3518 Reviewed-on: http://review.gluster.com/522 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr : Fix self-heal of special filesKaushal M2011-10-313-12/+44
| | | | | | | | | | | | | | | | Fixes self-heal of special files like device files, fifo files, socket files etc. Does it by doing the following: * Prevent setting of pending data xattr on a special file during entry self-heal when a new fils is created. * Allow data self-heal to be started on all file types other than directories. During data self-heal, for special files just erase pending xattrs, if those xattrs were set by previous releases of glusterfs. Change-Id: I34d8121e23ad00e85371ae2a36ef30cf3bd5db7a BUG: 3525 Reviewed-on: http://review.gluster.com/618 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
* cluster/afr: Remove unused 'ino' codePranith Kumar K2011-10-285-78/+0
| | | | | | | | Change-Id: I425e2d23e9e45f10ddeff2eacf918dd90f8baee7 BUG: 3744 Reviewed-on: http://review.gluster.com/639 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* pump: Change crawl to accommodate afr_lookup changesPranith Kumar K2011-10-283-51/+66
| | | | | | | | Change-Id: I600120252445c06d9cc3e7aa24022c2559b6abe2 BUG: 3747 Reviewed-on: http://review.gluster.com/638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Take gfid-req from xattr_req only if inode->gfid is nullPranith Kumar K2011-10-281-29/+64
| | | | | | | | Change-Id: Iddf5b59d3534c517dcd3c0d7b819e3768f6e915a BUG: 3747 Reviewed-on: http://review.gluster.com/637 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* build: warning suppression (round n)Amar Tumballi2011-10-203-14/+17
| | | | | | | | | | with this patch, there are no more warnings with gcc (GCC) 4.6.1 20110908 Change-Id: Ice0d52d304b9846395f8a4a191c98eb53125f792 BUG: 2550 Reviewed-on: http://review.gluster.com/607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* pump: status shouldn't be cleared until migration is complete during commit.Krishnan Parthasarathi2011-10-191-8/+0
| | | | | | | | | | | Clearing pump status on migration complete is futile because we would be 'replacing' src brick with destination brick in the volume anyway. Change-Id: Ib12fee84bd5445c4a20dac1cf10555331d7b8ebd BUG: 3653 Reviewed-on: http://review.gluster.com/585 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Prevent truncation of mask uint64_t->gf_boolean_tPranith Kumar K2011-10-142-5/+9
| | | | | | | | Change-Id: If67f726f21b713fa9312dc499a1aca4cb00f71de BUG: 3682 Reviewed-on: http://review.gluster.com/589 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: set the read child ctx only if the self-heal source is validv3.3.0qa14Raghavendra Bhat2011-10-031-4/+4
| | | | | | | | Change-Id: Ifdf0db71594ce526ad85c21103726798d9aceef4 BUG: 3639 Reviewed-on: http://review.gluster.com/556 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* statedump: do not print the inode number in the statedumpRaghavendra Bhat2011-10-011-31/+18
| | | | | | | | | | | | | Since gfid is used to uniquely identify a inode, in the statedump printing inode number is not necessary. Its suffecient if the gfid of the inode is printed. And do not print the the inodelks, entrylks and posixlks if the lock count is 0. Change-Id: Idac115fbce3a5684a0f02f8f5f20b194df8fb27f BUG: 3476 Reviewed-on: http://review.gluster.com/530 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* cluster/afr: Don't unlock sh_frame on failure in algoPranith Kumar K2011-09-302-8/+15
| | | | | | | | Change-Id: I0ef541c1f387c397c345e3f2bc9a57f1eff282a1 BUG: 3647 Reviewed-on: http://review.gluster.com/527 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: set fd ctx on opendirPranith Kumar K2011-09-294-39/+59
| | | | | | | | Change-Id: Ica845035781f47de990e9dcfefdeb37bed99d515 BUG: 3637 Reviewed-on: http://review.gluster.com/536 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Second round of warning suppression.Jeff Darcy2011-09-298-39/+10
| | | | | | | | | | | | | | | | | | | | | | Used a #pragma to kill ~170 in rpcgen code. Added GF_UNUSED to deal with a few more from macros elsewhere. The remainder are function return values (mostly context and dict calls) that really should be checked. Those would be harder to fix without real understanding of the code where they occur, so they remain as reminders. (Patchset 2: deal with older gcc that doesn't handle #pragma GCC diagnostic) (Patchset 3: fix include paths in generated files) (Patchset 4: keep up with trunk, squash 9 new warnings) (Patchset 5: six more, all in AFR) Change-Id: I29760c8c81be4d7e6489312c5d0e92cc24814b7b BUG: 2550 Reviewed-on: http://review.gluster.com/378 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle files without gfid in self-healPranith Kumar K2011-09-299-654/+642
| | | | | | | | Change-Id: Ibcaaa9c928195939ff1e31b28b592e524e63a423 BUG: 3557 Reviewed-on: http://review.gluster.com/519 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterd: Implemented cmd to trigger self-heal on a replicate volume.v3.3.0qa10Krishnan Parthasarathi2011-09-221-0/+8
| | | | | | | | | | | | | This cmd is used in the context of proactive self-heal for replicated volumes. User invokes the following cmd when (s)he suspects that self-heal needs to be done on a particular volume, gluster volume heal <VOLNAME>. Change-Id: I3954353b53488c28b70406e261808239b44997f3 BUG: 3602 Reviewed-on: http://review.gluster.com/454 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Make local->child_up immutablePranith Kumar K2011-09-216-164/+113
| | | | | | | | | | | | | | | | | | | | | | | Afr transaction performs lock, pre-op, op, post-op and unlock steps in that order. The child_up[] is overloaded with the information of where all the first two steps succeeded. This works perfectly fine for Transaction, but the locking/unlocking part of the code is re-used by data self-heal. In that each loop_frame does lock, rchecksum, read-from-source and write-to-sinks, unlock steps. Rchecksum fop assumes that the fop needs to happen on one source + all sinks and sets the call_count to that number. But if the lock step fails on any of the sinks it will mark the child_up of that child to 0, which will result in call_count mismatch and the frame will hang thinking that some more cbks need to come. When this happens loop_frame will never go to unlock step leading to hangs on that file. Change-Id: I3dd0449cc6193a980bacf637d935881f4b22210a BUG: 3597 Reviewed-on: http://review.gluster.com/474 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Proactive self heal process implementationPranith Kumar K2011-09-1410-121/+631
| | | | | | | | Change-Id: I96db0d94566ceabf1649f890318363f738c06553 BUG: 2458 Reviewed-on: http://review.gluster.com/403 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: perform self-heal with least priorityPranith Kumar K2011-09-091-0/+7
| | | | | | | | Change-Id: Id8a1dffa3c3200234ad154d1749278a2d7c7021b BUG: 3502 Reviewed-on: http://review.gluster.com/336 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: Make data selfheal trigger to be configurable.Pranith Kumar K2011-09-0811-112/+217
| | | | | | | | | | | | | | | | | | | | By default, lookup triggers data self-heal but that is not the preferred way of operating replicated volumes. We would like the data self heals to be triggered in open instead. Number of back-ground self-heals allowed is 16 and lookups block until self-heal is completed. We want to prevent blocking in fops. We can not make lookups independent of self-heal frames because when there are gfid conflicts the decision of which file is correct is determined in self-heal phase. So in afr, lookup self-heal is going to guarantee name space consistency and open/fd fops will take responsibility for data consistency, these are non blocking. The user needs to set the option cluster.data-self-heal "open" for this behavior. Change-Id: If9463cdb9ebac114708558ec13bbca0270acd659 BUG: 3503 Reviewed-on: http://review.gluster.com/334 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: eager locking of FD writesAnand Avati2011-09-084-58/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a change in the way write transactions hold a lock which optimizes the case of sequential writes from a single writer. Lock phase of a transaction has two sub-phases. First is an attempt to acquire locks in parallel by broadcasting non-blocking lock requests. If lock aquistion fails on any server, then the held locks are unlocked and revert to a blocking locked mode sequentially on one server after another. The change in this patch is to make the initial broadcasting lock request attempt to acquire lock on the entire file. If this fails, we revert back to the sequential "regional" blocking lock as before. In the case where such an "eager" lock is granted in the non-blocking phase, it gives rise to an opportunity for optimization. i.e, if the next write transaction on the same FD arrives before the unlock phase of the first transaction, it "takes over" the full file lock. Similarly if yet another transaction arrives before the unlock phase of the "optimized" transaction, that in turn "takes over" the lock as well. The actual unlock now happens at the end of the last "optimzed" transaction. Any operation which arrives before the unlock phase of the previous transaction is a potential candidate to become an "optimized" transaction. In cases where the previous transaction had aquired lock as a "regional" blocking lock, and the next transaction comes in before its unlock phase, then it would not be an "optimized" transaction. Implied assumption ------------------ Since two or more transactions can now operate within the same large lock, there is a possibility that overlapping transactions can arrive at oppoosite orders on the servers. However in the larger picture this is not possible as write-behind already ensures that no two overlapping writes on an inode are in transit at the same time. Overlapping writes across clients are not a problem as they compete at locks anyways. Theoretical benefits and potential harms ---------------------------------------- In case of a single writer: The benefits are large for sequential writes. In the best case the entire file write can happen with just one lock and unlock per server, provided writes are coming in fast enough and getting pipelined by write-behind soon enough (which is usually the case). If the writes are not coming in fast enough, then the optimization "kicks in" for only those subsets of writes which are close enough to get "piggybacked". For random writes the benefits are the same as well. In any case the overall performance is better than or equal to the performance without this optimization for a single writer. In case of multiple writers: When multiple writers are not writing concurrently, there is no negative performance impact. When multiple writers are writing concurrently to the same region, there is no negative impact either, as they were previously getting arbitrated at the locks translator too. In the case of multiple writers writing to different regions concurrently, there will be an increased number of "failovers" from failed parallel non-blocking to sequential blocking regional locks. This above "worst case" has a simple workaround that as soon as we detect > 1 open-fd-count in lookup xattr, we can disable this optimization on those fds. Beneficial side-effects ----------------------- There is another similar optimization in AFR for changelogs which goes by the name of "changelog-piggybacking". That works in a similar way where pending flags get 'taken over' or 'piggybacked' by the next transaction if its 'pre-op' phase kicks in before the 'post-op' phase of the previous transaction. It has been observed that this changelog-piggybacking optimization gives a saving of about ~55% savings of xattr calls hitting the wire, measured across various types of network interfaces. The side effect of this eager-lock optimization is that it gives an almost 100% saving of xattr calls by making the optimistic-changelog work much more efficiently as it gives a wider overlap of the xattr phases of two consecutive transactions. Change-Id: I41c02eb3b64c14c68ef66a344610ec3f024cd59d BUG: 3409 Reviewed-on: http://review.gluster.com/240 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* Eliminate many "var set but not used" warnings with newer gcc.Jeff Darcy2011-09-0711-123/+0
| | | | | | | | | | | | | | | | This fixes ~200 such warnings, but leaves three categories untouched. (1) Rpcgen code. (2) Macros which set variables in the outer (calling function) scope. (3) Variables which are set via function calls which may have side effects. Change-Id: I6554555f78ed26134251504b038da7e94adacbcd BUG: 2550 Reviewed-on: http://review.gluster.com/371 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: Prevent double big lock when data self-heal loops are not spawnedPranith Kumar K2011-09-062-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The steps in normal data self heal: 1) take big lock by self-heal frame. Get the xattrs/stat to decide source, sink information. 2) spawn loop frames which perform self-heal by taking small locks on the file. Every time a new lock is taken and the old lock is released. 3) Before releasing the final small lock a big lock is taken by the self-heal frame, and unlock on small-lock. Erasing of the pending xattrs happen then the big unlock happen and that is the end of the data self-heal. When a data self-heal is needed for a file and the fop that triggers the self-heal is open with O_TRUNC. Fuse sends open then an explicit truncate for this. Open triggers the self-heal but by the time it tries to spawn the loops the file size is truncated to 0, so no loops are formed. These are the steps: 1) Take big lock by self-heal frame. Get the xattrs/stat to decide source, sink information. 2) loop frames are not spawned. The big lock is not released. 3) One more big lock is taken by the same self-heal frame, Erasing of the pending xattrs etc happen, now it does two big unlocks, but after the first unlock, the information on which the locks were performed is forgotten, so the next unlock becomes a no-op. So there is a stale big lock on that file preventing further writes. As a fix, if the loops are not spawned, use the previous big lock to perform the rest of the operations needed in completing the data self-heal. No need to have one more big lock. Change-Id: Id03171269594e447b2b6d1331e362d83bd1e3430 BUG: 3506 Reviewed-on: http://review.gluster.com/339 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: Bring down the self-heal window size to 1Pranith Kumar K2011-09-061-1/+1
| | | | | | | | | | | | This is brought in an effort to be nice to the system resources when self-heal is in progress. Change-Id: I123f1eb4d8000613a35c0117f0aa27f926f3a921 BUG: 3503 Reviewed-on: http://review.gluster.com/333 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: Perform flush on all the children involved in self-healPranith Kumar K2011-08-221-19/+6
| | | | | | | | Change-Id: I66362a3087a635fb7b759d7836a1f6564a6a7fc9 BUG: 3456 Reviewed-on: http://review.gluster.com/294 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Change definition of stale childPranith Kumar K2011-08-221-1/+1
| | | | | | | | | | | | The code is checking for priv->child_up[i], which can change while the fop is in progress. Since pending[child][id-of-transaction] alone is enough to tell if the child became stale or not, use just that. Change-Id: I494bf02cca66f4fd41526195fafce86a202c6bd1 BUG: 3455 Reviewed-on: http://review.gluster.com/293 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>