summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: handle NULL check before strlen/strcmp in fgetxattrAnand Avati2013-12-041-1/+1
| | | | | | | | | | | xattr name can legally be NULL. Handle that case without crashing. Change-Id: Ie214cb05ccd52565dc247a9234ad83ae799d3866 BUG: 1036879 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6420 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: fix errno for non-existent GFIDAnand Avati2013-11-261-1/+1
| | | | | | | | | | | | | | | | When clients refer to a GFID which does not exist, the errno to be returned in ESTALE (and not ENOENT). Even though ENOENT might look "proper" most of the time, as the application eventually expects ENOENT even if a parent directory does not exist, not returning ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution in uncached mode. This can result in spurious ENOENTs during concurrent path modification operations. Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936 BUG: 1032894 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6322 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Disable eager-locks on NetBSD for 3.4 branchEmmanuel Dreyfus2013-10-241-0/+10
| | | | | | | | | | | | | | As described in https://bugzilla.redhat.com/show_bug.cgi?id=1005526 eager-locks are broken on release-3.4, at least for NetBSD. This change disable them by default, leaving the admin the possibility to explicitely enable the feature if needed. BUG: 1005526 Change-Id: I6f1b393865b103ec56ad5eb5143f59bb8672f19c Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6020 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: unlock before aborting transactionAnand Avati2013-09-101-0/+2
| | | | | | | | | | | Else this results in a missing frame causing a hang Change-Id: Ib5f3dc6a3999449faa2853cee2944af2fb065a20 BUG: 1002399 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5878 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Perform delayed changelog wakeups for anon fdv3.4.0beta3Pranith Kumar K2013-06-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Nfs xlator never does open on a file for performing writes, afr does not perform changelog wakeup for this fd so operations which do metadata operations as soon as the data operations are completed perceive a delay od 'post-op-delay-secs'. Fix: Perform changelog wakeup on anon-fd if the fd with same pid is not present in inode-list. Note: This approach is a short-term fix. A proper fix needs a new domain for taking metadata locks so that data/metadata locks don't compete with each other. BUG: 966018 Change-Id: Ia9188a253e7943801b665e1b9205e2f551952d87 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5067 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Avoid order mismatch in blocking entrylksPranith Kumar K2013-06-061-6/+9
| | | | | | | | | | | | | | | | | | | | | | Problem: When taking blocking entrylks, afr orders the entrylks based on uuid_compare of gfids of parent dirs, if they are equal then it orders them based on the basenames. While this approach works fine, the implementation assumes loc->gfids to be populated at the time of the comparison, but loc may have gfid in loc->inode->gfid instead of loc->gfid which was leading to order mismatches and dead-locks. Fix: Implemented loc_gfid which gives gfid by checking both loc->gfid, loc->inode->gfid. Used this for ordering the blocking entrylks. Change-Id: I2743fcaff3d670fbeb6b8e0a496f106a3585dde1 BUG: 965987 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5063 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Don't queue transactions during open-fd fixPranith Kumar K2013-05-086-385/+167
| | | | | | | | | | | | | | | | | | | | | Before Anonymous fds are available, afr had to queue up transactions if the file is not opened on one of its subvolumes. This happens until the attempt to open the file either succeeds or fails. These attempts happen until the file is successfully opened on the subvolume. Now client xlator uses anonymous fds to perform the fops if the fd used for the fop is not 'opened'. Fops will be successful even when the file is not opened so there is no need to queue up the transactions anymore in afr. Open is attempted on the subvolume where it is not opened independent of the fop. Change-Id: I6d59293023e2de41c606395028c8980b83faca3f BUG: 953887 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4868 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Turn on eager-lock for fd DATA transactionsEmmanuel Dreyfus2013-05-072-21/+7
| | | | | | | | | | | | | | | | | | | | | | | Problem: With the present implementation, eager-lock is issued for any fd fop. eager-lock is being transferred to metadata transactions. But the lk-owner is set to local->fd address only for DATA transactions, but for METADATA transactions it is frame->root. Because of this unlock on the eager-lock fails and rebalance hangs. Fix: Enable eager-lock for fd DATA transactions This is a backport of change If30df7486a0b2f5e4150d3259d1261f81473ce8a http://review.gluster.org/#/c/4588/ BUG: 916226 Change-Id: Id41ac17f467c37e7fd8863e0c19932d7b16344f8 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/4899 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve mtime in self-healPranith Kumar K2013-04-121-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Data self-heal may choose sink iatt to set mtimes. This happens because after syncing of data is done self-heal does one more xattrops/fstat to determine sources sinks to set the inode-ctx. Since this is done after data syncing and erase of xattrs, old source and old sink are now sources, but the mtimes of them differ. Old code just takes the first source from the list and update mtimes, which could be sink before the self-heal started. Fix: Set mtime from 'sources before syncing'. Change-Id: Id769e1b99aa4f041eaee775f64cbf2c57b799723 BUG: 918437 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4658 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4663
* cluster/afr: do complete split-brain check in all the fd based fopsRaghavendra Bhat2013-03-054-19/+33
| | | | | | | | | | | | | | | | | fd based operations such as readv checked only for data split brain instead of complete split-brain (i.e both data + metadata) assuming that open would have done the complete split-brain check. However open-behind would have unwound open, without winding to afr thus preventing the complete split-brain check and some appliations will be able to read the contents of the file even though the file has metadata split-brain. So let all the fd based fops do a defensive check of complete split-brain. Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Use proper libtool option -avoid-version instead of bogus -avoidversionAnand Avati2013-02-071-2/+2
| | | | | | | | | | Change-Id: I1c9541058c7d07786539a3266ca125a6a15287d8 BUG: 859835 Signed-off-by: Anand Avati <avati@redhat.com> Original-author: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com> Signed-off-by: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com> Reviewed-on: http://review.gluster.org/3967 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: serialize modification of {entrylk,inodelk}_lock_countAnand Avati2013-02-071-53/+54
| | | | | | | | | | | | | | Typically this lock was not needed in practice, but with http://review.gluster.org/3842, this code gets executed in multiple threads for different servers and we lose a count. This results in leaked lock and a hang for a future transaction. Change-Id: I377ed20e44f2a45cff522289dfef181f0653eca2 BUG: 765564 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4480 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Avoid priv->eager_lock value update racePranith Kumar K2013-02-064-4/+6
| | | | | | | | | | Change-Id: I7049c0c64e36a9dfa4cc0e0b34de7ec111d2f6c1 BUG: 908302 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4076 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Perform wakeup just before fopPranith Kumar K2013-02-062-13/+21
| | | | | | | | | | | | | | | There is no necessity for the delayed-post-op to wait until the next fop phase on the fd completes. Change-log, locks are inherited by the time next fop phase is attempted so the wakeup can happen just before the fop phase is started. Change-Id: I0b8e591f591b0f7565eb55265ab51f476ed2b165 BUG: 908302 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4073 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: added logging of changelog for split-brain in glustershd.log fileVenkatesh Somyajula2013-02-034-10/+68
| | | | | | | | | | Change-Id: Iaf119f839cb2113b8f8efb7bf7636d471b6541bf BUG: 866440 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4385 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: if a subvolume is down wind the lock request to nextRaghavendra Bhat2013-01-291-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | When one of the subvolume is down, then lock request is not attempted on that subvolume and move on to the next subvolume. /* skip over children that are down */ while ((child_index < priv->child_count) && !local->child_up[child_index]) child_index++; In the above case if there are 2 subvolumes and 2nd subvolume is down (subvolume 1 from afr's view), then after attempting lock on 1st child (i.e subvolume 0) child index is calculated to be 1. But since the 2nd child is down child_index is incremented to 2 as per the above logic and lock request is STACK_WINDed to the child with child_index 2. Since there are only 2 children for afr the child (i.e the xlator_t pointer) for child_index will be NULL. The process crashes when it dereference the NULL xlator object. Change-Id: Icd9b5ad28bac1b805e6e80d53c12d296526bedf5 BUG: 765564 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4438 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: wakeup delayed post op on fsyncPranith Kumar K2013-01-291-5/+3
| | | | | | | | | Change-Id: I5d84ef72615f9d71b4af210976e2449de6e02326 BUG: 888174 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4446 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Change order of unwind, resume for writevPranith Kumar K2013-01-291-31/+87
| | | | | | | | | | | | | | | | | | Generally inode-write fops do transaction.unwind then transaction.resume, but writev needs to make sure that delayed post-op frame is placed in fdctx before unwind happens. This prevents the race of flush doing the changelog wakeup first in fuse thread and then this writev placing its delayed post-op frame in fdctx. This helps flush make sure all the delayed post-ops are completed. Change-Id: Ia78ca556f69cab3073c21172bb15f34ff8c3f4be BUG: 888174 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4428 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: before checking lock_count of internal lock make sure its notRaghavendra Bhat2013-01-281-12/+13
| | | | | | | | | | | | | | | entrylk when the expected lock count is equal to the attempted lock count, then before deciding that lock is failed on all the nodes, make sure the lock type is checked properly. Change-Id: I1f362d54320cb6ec5654c5c69915c0f61c91d8c7 BUG: 765564 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4436 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: fix lock counting in blocking lock pathAnand Avati2013-01-262-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of http://review.gluster.org/2828, the blocking lock code path's condition for checking completion of locking atempt is broken. The condition - if ((child_index == priv->child_count) || ...) and if ((child_index == priv->child_count) && ...) which is retained to check completion of blocking lock attempts for DATA/METADATA transaction will _always_ fail because a few lines above we have - child_index = cookie % priv->child_count; So child_index will never equal priv->child_count. This leaves the correctness at the mercy of the next part of the conditional - .. (int_lock->lock_count == int_lock->lk_expected_count) .. This "works" as long as no server went down during the transaction. If a server goes down in the middle of the transaction, then this condition also fails, and the code wraps around and starts a blocking lock attempt loop all the way again from from the first server. This results in double locks getting acquired on those servers, and eventually the second condition gets hit (first condition is _never_ hit) and we come out of locking phase. During unlock phase we perform only one unlock per server leaving the other lock "leaked" forever. Change-Id: I7189cdf3f70901b04647516fe1d1e189f36cc8dd BUG: 765564 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4433 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: Modified book-keeping structures for entrylksKrishnan Parthasarathi2013-01-236-460/+512
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * There are upto 3 entry lockees that may be needed to perform entrylk'ing in posix dir-write operations. * For eg, rmdir ("/a/b") needs to acquire locks on two entities, - entrylk ("/a", "b") - entrylk ("/a/b", null) * Changed existing entrylk/rename/selfheal (entrylk) transactions to use the new book-keeping structures * Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now aware of the new entry lockee structure. Implementation notes: * Changed 'cookie' sent in stack_wind to encode lockee_entity_no and subvol_no. cookie is a non-negative integer such that 0 <= cookie < replica_count, When more than one lock is being acquired across the subvolumes, cookie % replica_count gives the subvol_no cookie / replica_count gives the lockee_entity_no. Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153 BUG: 765564 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.org/2828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Remove strict-readdir implementationPranith Kumar K2013-01-231-201/+0
| | | | | | | | | | | Leaving option frame-work un-changed for backward compatibility. Change-Id: I40bce1ec360801307e67f09e53b0721f64efab37 BUG: 886998 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4309 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* self-heald: Remove stale index even in heal infoPranith Kumar K2013-01-221-35/+45
| | | | | | | | | Change-Id: Ic1c9559aec59c1fb9dfede4aba8895f3b86f32f1 BUG: 861015 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4098 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: Link inode only on lookupPranith Kumar K2013-01-211-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When "gluster volume heal <volname> info is executed, crawl's process_entry is not going to populate iatt structure so the iatt's gfid will be empty. So inode_links are failing. Fix: inode_link should be done only after lookup i.e. when heal is performed. So moved the inode_link related code to just after the lookup which is triggered when self-heal is done. Tests: The testcase that gives this issue does not give the inode-link failures anymore. glustershd heal, info commands are working as expected. Wrote basic automation tests for proactive-self-heal-daemon https://github.com/pranithk/gluster-tests/blob/master/afr/proactive-self-heal.sh Change-Id: Ic112bf104a4d553a64d3d8559f681a25ae1a5362 BUG: 861015 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4090 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Disable delayed post op when eager-lock is offPranith Kumar K2013-01-181-0/+3
| | | | | | | | | | | | | | | | | | | | | Problem: When eager-lock is disabled, inodelks for write-fops on same fd conflict with each other. If eager-lock is disabled but delayed post-op is enabled then each write fop's inodelk unlock waits for post-op-delay-secs. So the conflicting write fop acquires inodelk after post-op-delay-secs. This results in post-op-delay-secs delay for every write fop on the fd for sequential writes (Ex: dd). Fix: Disable delayed-post-op when eager-lock is off. Change-Id: I87ea4c8d1c7bb269b9b174388ae50f37e82629b7 BUG: 895235 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4391 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Fail readv on data-split-brainPranith Kumar K2013-01-183-0/+23
| | | | | | | | | | | | | | | | | | Problem: Afr prevents opens on a file in split-brian but the fd that is already open still has the capability to perform both reads and writes to the file. Fix: Fail readvs on a file with EIO. Change-Id: I8e07f24c36fab800499b36ab374f984b743332cd BUG: 873962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4199 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: conditionally prioritize EIO errors over ENOENTBrian Foster2013-01-183-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | The most important errno logic historically only prioritized ESTALE over ENOENT. Commit c8c0942d added EIO prioritization over ENOENT to ensure that split-brain was reported when it occurs in conjunction with bricks missing the file entry. The unintended side effect of this change is that (non split-brain) EIO errors reported from the bricks themselves are now reported to the client when the expectation is that afr should squash said errors in favor of marking the file inconsistent. The high-level problem is that EIO is overloaded with different meanings from different contexts. This commit adds an eio parameter to the errno priority logic to conditionally flag when EIO is of higher priority and should be propagated to the client. BUG: 892730 Change-Id: Ib692a8a1f1737ef190d57894f392ec53ffb33aab Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4376 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: replace afr_more_important_error with afr_most_important_errorBrian Foster2013-01-173-26/+18
| | | | | | | | | | | | | | | | | | | | | afr_more_important_error() is written to return whether a new errno should override an existing errno for high-level operations that could span multiple sub-operations. It specifically prioritizes ESTALE over EIO over ENOENT, and otherwise defaults to the latest error passed having priority. This change preserves current behavior, but rewrites the logic to return the higher priority error of the existing and new errno. The purpose of the change is to make the logic a bit more clear and set the stage for future changes to make the logic flexible based on context. BUG: 892730 Change-Id: Id1aa48855dfb0507abc9d1ef22f2259b30472576 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4375 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Pre-op should be undone for non-piggyback post-opPranith Kumar K2013-01-161-2/+6
| | | | | | | | | | | | | | | | | | | | | | Problem: When fop fails post-op is always performed over the network irrespective of whether pre-op is piggybacked or not. Decrementing Pre-op-done count even for the piggybacked ones is wrong. I have added an assert for pre_op_done to be non-zero and when dd of=a if=/dev/urandom bs=5M count=1000 is executed and a brick is taken down, the mount is crashing. Fix: Decrement pre-op-done count only when the post-op is not piggybacked. Change-Id: Ie837251a43bfb437f0fada191302eeee60be1601 BUG: 863939 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4310 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: don't clear changelog for un-healed replicasJeff Darcy2013-01-161-6/+44
| | | | | | | | | | Change-Id: Iebfa6770a688e89c051666b46977862188061738 BUG: 802417 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4034 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: check for the -ve values returned from dict_serialized_lengthRaghavendra Bhat2012-12-171-1/+1
| | | | | | | | | Change-Id: I9fa7744b02791180ccb93adef10c363a1b38aa31 BUG: 838204 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4319 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Remember type of split-brain in inode-ctxPranith Kumar K2012-12-115-117/+114
| | | | | | | | | | | | | Along with this change, fixed the race of setting the split-brain status in inode-ctx after unwinding the fop from self-heal in case of back-ground self-heal. Change-Id: Ifc829300df485f50f139443802e8b6dc7038b4ad BUG: 873962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4198 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Empty string should not be default option valPranith Kumar K2012-12-053-5/+6
| | | | | | | | | | | | Glusterd does not allow empty string as default value. Changed afr option values to disallow empty string as value. Change-Id: I92a2d658907dbc6101e1139dd91f548acb5506f5 BUG: 859927 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4271 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: mark new entry changelog for create/mknod failuresPranith Kumar K2012-12-045-67/+228
| | | | | | | | | | | | | | | | | | | | | | Problem: When create/mknod fails on some of the nodes, appropriate pending data/metadata changelogs are not assigned. This was not considered to be an issue because entry self-heal would do the assigning of appropriate changelog after creating new entries. But using the combination of rebalance and remove brick we can construct a case where a file with same name and gfid can be created in a dir with different data and link-to xattr without any changelog. Fix: When a create/mknod failure is observed mark the appropriate changelog on the new file created. Change-Id: I4c32cbf5594a13fb14deaf97ff30b2fff11cbfd6 BUG: 858212 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4207 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: use data trylock mode in read/write self-heal trigger pathsBrian Foster2012-12-041-1/+8
| | | | | | | | | | | | | | | | | | | | | | Self-heal data lock contention between clients and glustershd instances can lead to long wait and user response times if the client ends up pending its lock on glustershd self-heal of a large file. We have reports of guest vm instances going completely unresponsive during self-heal of virtual disk images. Optimize the read/write self-heal trigger codepath (i.e., afr_open_fd_fix()) to trylock for self-heal and skip the self-heal otherwise to minimize the likelihood of a running/active guest of competing with glustershd on arrival of a brick. Note that lock contention is still possible from the client (e.g., via lookup). BUG: 874045 Change-Id: I406443c061ff6acd2a851179626b78352caa5c03 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4258 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: support self-heal data trylock mechanismBrian Foster2012-12-044-8/+15
| | | | | | | | | | | | | | | Introduce a block flag to support an optional blocking or non-blocking mode in the self-heal data locking mechanism. All callers are modified to use blocking mode, which is the current default behavior (no change in behavior is introduced by this commit). BUG: 874045 Change-Id: Ib7ff9984578fa11de4e3b6981508100cdddd37cd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4257 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: make flush non-transactionalBrian Foster2012-12-043-139/+38
| | | | | | | | | | | | | | | | | | Flush is historically a transaction to ensure all previous writes were complete. This is no longer required as write-behind has learned to make flush a barrier operation (re: conversation w/ Avati). Flush taking a full file lock causes VMs running on afr volumes to stall when a migration occurs and self-heal is in progress. Make afr_flush() a non-transactional operation. BUG: 874045 Change-Id: If2db83823e280c86b1b29b41361eed7081601632 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4261 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to disable readdir failoverPranith Kumar K2012-12-034-25/+41
| | | | | | | | | | | | | | | | | In a replica pair unlike files, directories may not have their content in same order, so readdir for same (offset, size) may not give same entries on both the sobvolumes of replica pair. Switching over from one subvolume to another may not be a good idea sometimes. It may lead to duplicate entries or fewer entries or both. This patch provides a way to disable readdir-failover so that applications like rebalance can retry if they want to. Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0 BUG: 859387 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4159 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Added descriptions to afr optionsPranith Kumar K2012-11-291-11/+86
| | | | | | | | | Change-Id: I4aef1c79743ee08b62e04d7b709f3e8c6b9dc56a BUG: 881517 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4244 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* afr: send unique dict_t instances to replicas in self-heal fxattropBrian Foster2012-11-291-28/+42
| | | | | | | | | | | | | | | | | | | | | afr_sh_data_fxattrop() currently allocates and sends a single xattr dict_t instance to each replica. The callback codepath references the returned object in the self-heal in-memory state for the particular replica. If storage/posix is in the same address-space (i.e., running a single glusterfs client with a fuse->afr->posix graph), the same object is modified and returned for each child, causing corrupted in-memory state and afr xattrs. Allocate and send independent xattr dict_t's for each replica. This allows self-heal to work correctly in a single address-space graph. BUG: 868478 Change-Id: I42832e85b5d1abb6098c28944c717e129300109e Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4149 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: handle short writes in afr_writev_wind and self-heal to avoid corruptionBrian Foster2012-11-295-16/+75
| | | | | | | | | | | | | | | | | | | | | The current failure to handle short writes on writev fops leaves us open to file corruption. A short write on a user request is ignored and leaves replicas in an inconsistent state. A short write during a self-heal is ignored and incorrectly marks the files as consistent if the heal completes. Modify user writev handling to return the best case return value from each of the replicas. Short writes that occur relative to this value are marked as failed and will require a heal. Modify self-heal to set an error on a short write and abort the heal. BUG: 853690 Change-Id: I18b30f58702326249230eeebb361b29e40b535f5 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4150 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: handle GF_XATTR_LOCKINFO_KEY appropriately.Raghavendra G2012-11-272-26/+521
| | | | | | | | | | | | | | values from all children need to be aggregated into a dictionary and serialized buffer of this aggregated dictionary has to be the value of GF_XATTR_LOCKINFO_KEY in the dict sent as a result of fgetxattr. Change-Id: Ie877f7c637c07feaee4c44d7ef86aa967a17b7e7 BUG: 808400 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4121 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* replicate: don't stop checking xattrs because one was absentJeff Darcy2012-11-263-114/+41
| | | | | | | | | | | | | | | | | | | | | The functional issue is described by the subject line. This patch also addresses several efficiency/structure issues, such as... * Calling dict_set_ptr once for each txn type, instead of once overall. * Calling afr_index_for_transaction_type once per iteration instead of once per call (or better yet zero since the conversion is unnecessary). * Implementation of inner functions in a different file than their one caller, creating a spurious header-file dependency. Change-Id: I29e0df906a820533b66b9ced73e015dfe77267d2 BUG: 865825 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Cluster/afr: Fix output for gluster volume heal vn info healedVenkatesh Somyajula2012-11-267-17/+53
| | | | | | | | | | | | | | | | | | | | | | Problem: Whenever gluster volume heal vol full command is executed, the entries stored in the circual buffer for sh->healed are added in the dictionary in the _crawl_post_sh_action function irrespective of whether actual self heal (due to non-zero values in chage log) takes place or not. Fix: Value of key (actual-sh-done) will be set to 1 whenever self heal takes place due to non-zero change log values and if for some FOP self heal daemon finds that no self heal required after examining the pending matrix, the value will be 0. Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc BUG: 863068 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4181 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: check transaction type for eager-lock after it is setPranith Kumar K2012-11-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: I7607c7ff4f88c7ced5416a1cddb6586cf45d88f9 BUG: 861335 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4220 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Edited log message in afr_sh_entry_expunge_entry_cbkVenkatesh Somyajulu2012-10-121-1/+2
| | | | | | | | | Change-Id: I9f7562d28c8bc798552c403164397f929a7bd1e7 BUG: 860246 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4052 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Preventing client crashing as the callings of GF_CALLOC has been failed.linbaiye2012-10-114-31/+107
| | | | | | | | | | | | | As the callings of GF_CALLOC can seldom come to a failure, glusterfs client will crash due to segment fault. We should have returned once the variables of transaction's local can't be alloced. Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b BUG: 861335 Signed-off-by: linbaiye <linbaiye@gmail.com> Reviewed-on: http://review.gluster.org/4005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: don't use synctask_new from within a synctaskJeff Darcy2012-10-111-3/+14
| | | | | | | | | | Change-Id: Iebf821ff720c63ab6da4b219d82c7f1d00769992 BUG: 862838 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4032 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: Changes and enhancements to XML outputKaushal M2012-10-111-4/+8
| | | | | | | | | | | | | | | | | | | | This patch contains several xml related changes which fix some bugs and introduce xml output for commands which were missing it. These include, * XML output for rebalance & remove-brick status * XML output for replace-brick * XML output for 'volume status all' in on xml document * proper XML output for "volume {create|start|stop|delete}" * type & status of a volume in 'volume info' is now given as a string as well This patch also cleans up the '#if (HAVE_LIB_XML)' sections from the code-base, so that it is not littered around. Change-Id: I5bb022adf0fedf7e3ead92b4b79bfa02b0b5fef5 BUG: 828131 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3869 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr Changed the message's log level from Error to DebugVenkatesh Somyajulu2012-10-101-3/+3
| | | | | | | | | Change-Id: Ic2506561367bfec9022dc53e9b17b03dc343df95 BUG: 859411 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4055 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>