summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: Do not attempt entry self-heal if the last lookup on entry ↵Krutika Dhananjay2015-06-181-1/+8
| | | | | | | | | | | | | | | | | | | | | failed on src Backport of: http://review.gluster.org/11119 Test bug-948686.t was causing shd to dump core due to gfid being NULL. This was due to the volume being stopped while index heal's in progress, causing afr_selfheal_unlocked_lookup_on() to fail sometimes on the src brick with ENOTCONN. And when afr_selfheal_newentry_mark() copies the gfid off the src iatt, it essentially copies null gfid. This was causing the assertion as part of xattrop in protocol/client to fail. Change-Id: I81723567af824ce4a9fa37e309eeeab8404ac71e BUG: 1233036 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11309 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr: allow readdir to proceed for directories in split-brainv3.6.4beta2Ravishankar N2015-06-151-18/+22
| | | | | | | | | | | | | | | | | | | | | | | Problem: afr_read_txn() bails out if read_subvol==-1. This meant that for directories that were in entry split-brain, FOPS like readdir, access, stat etc were not allowed. Fix: Except for getxattr, all other FOPS are wound on the first up child of afr. Change-Id: Iacec8fbb1e75c4d2094baa304f62331c81a6f670 BUG: 1230242 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/10776 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> (cherry picked from commit 49b428433a03fcf709fdc8c08603b4cf02198e0a) Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr: honour selfheal enable/disable volume set optionsRavishankar N2015-06-152-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/11012 Note: http://review.gluster.org/9459 is not backported to 3.6 but the change it makes to afr_get_heal_info() (i.e. handling ret values) is needed for heal info to work correctly and tests/basic/afr/client-side-heal.t to pass. -------------------------- afr-v1 had the following volume set options that are used to enable/ disable self-heals from happening in AFR xlator when loaded in the client graph: cluster.metadata-self-heal cluster.data-self-heal cluster.entry-self-heal In afr-v2, these 3 heals can happen from the client if there is an inode refresh. This patch allows such heals to proceed only if the corresponding volume set options are set to true. -------------------------- Change-Id: Iebf863758d902fd2f95be320c6791d4e15f634e7 BUG: 1230259 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11170 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* core: fix Ubuntu code audit (cppcheck) resultsKaleb S. KEITHLEY2015-06-102-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See also http://review.gluster.org/#/c/8064/, BZ 1109180, and http://review.gluster.org/#/c/7693/, BZ 1091677 AFAICT these are false positives: [geo-replication/src/gsyncd.c:100]: (error) Memory leak: str [geo-replication/src/gsyncd.c:403]: (error) Memory leak: argv [xlators/nfs/server/src/nlm4.c:1201]: (error) Possible null pointer dereference: fde [xlators/cluster/afr/src/afr-self-heal-common.c:138]: (error) Possible null pointer dereference: __ptr [xlators/cluster/afr/src/afr-self-heal-common.c:140]: (error) Possible null pointer dereference: __ptr [xlators/cluster/afr/src/afr-self-heal-common.c:331]: (error) Possible null pointer dereference: __ptr Test program: [extras/test/test-ffop.c:27]: (error) Buffer overrun possible for long command line arguments. [tests/basic/fops-sanity.c:55]: (error) Buffer overrun possible for long command line arguments. the remainder are fixed with this change-set: [cli/src/cli-rpc-ops.c:8883]: (error) Possible null pointer dereference: local [cli/src/cli-rpc-ops.c:8886]: (error) Possible null pointer dereference: local [contrib/uuid/gen_uuid.c:369]: (warning) %ld in format string (no. 2) requires 'long *' but the argument type is 'unsigned long *'. [contrib/uuid/gen_uuid.c:369]: (warning) %ld in format string (no. 3) requires 'long *' but the argument type is 'unsigned long *'. [xlators/cluster/dht/src/dht-rebalance.c:1734]: (error) Possible null pointer dereference: ctx [xlators/cluster/stripe/src/stripe.c:4940]: (error) Possible null pointer dereference: local [xlators/mgmt/glusterd/src/glusterd-geo-rep.c:1718]: (error) Possible null pointer dereference: command [xlators/mgmt/glusterd/src/glusterd-replace-brick.c:942]: (error) Resource leak: file [xlators/mgmt/glusterd/src/glusterd-replace-brick.c:1026]: (error) Resource leak: file [xlators/mgmt/glusterd/src/glusterd-sm.c:249]: (error) Possible null pointer dereference: new_ev_ctx [xlators/mgmt/glusterd/src/glusterd-snapshot.c:6917]: (error) Possible null pointer dereference: volinfo [xlators/mgmt/glusterd/src/glusterd-utils.c:4517]: (error) Possible null pointer dereference: this [xlators/mgmt/glusterd/src/glusterd-utils.c:6662]: (error) Possible null pointer dereference: this [xlators/mgmt/glusterd/src/glusterd-utils.c:7708]: (error) Possible null pointer dereference: this [xlators/mount/fuse/src/fuse-bridge.c:4687]: (error) Uninitialized variable: finh [xlators/mount/fuse/src/fuse-bridge.c:3080]: (error) Possible null pointer dereference: state [xlators/nfs/server/src/nfs-common.c:89]: (error) Dangerous usage of 'volname' (strncpy doesn't always null-terminate it). [xlators/performance/quick-read/src/quick-read.c:586]: (error) Possible null pointer dereference: iobuf Rerunning cppcheck after fixing the above: As before, test program: [extras/test/test-ffop.c:27]: (error) Buffer overrun possible for long command line arguments. [tests/basic/fops-sanity.c:55]: (error) Buffer overrun possible for long command line arguments. As before, false positive: [geo-replication/src/gsyncd.c:100]: (error) Memory leak: str [geo-replication/src/gsyncd.c:403]: (error) Memory leak: argv [xlators/nfs/server/src/nlm4.c:1201]: (error) Possible null pointer dereference: fde [xlators/cluster/afr/src/afr-self-heal-common.c:138]: (error) Possible null pointer dereference: __ptr [xlators/cluster/afr/src/afr-self-heal-common.c:140]: (error) Possible null pointer dereference: __ptr [xlators/cluster/afr/src/afr-self-heal-common.c:331]: (error) Possible null pointer dereference: __ptr False positive after fix: [xlators/performance/quick-read/src/quick-read.c:584]: (error) Possible null pointer dereference: iobuf Change-Id: Ia5a256281156dd1df2fa900caea7694d9d0a7077 BUG: 1122290 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/8351 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Treat op_ret >= 0 as success in afr_final_errno()Krutika Dhananjay2015-06-031-1/+1
| | | | | | | | | | | | Backport of: http://review.gluster.org/10946 Change-Id: I6ca36fee5dc46f2a2afe04f3359c4102f45c3ae0 BUG: 1225745 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10961 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* Tests: fix spurious failure in read-subvol-entry.tEmmanuel Dreyfus2015-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | read-subvol-entry.t tests that if a brick has pending operations, it is not used for readdir operations. On NetBSD this test exhibits spurious failures, with the wrong brick being used to perform readdir. It happens because when afr_replies_interpret() looks at xattr for pending attributes, it uses alternative bahvior whether it is working on a directory or another object. The decision is based on inode->ia_type, which may be IA_INVAL at that time if we come there from: afr_replies_interpret.() afr_xattrs_are_equal() afr_lookup_metadata_heal_chec() afr_lookup_entry_heal() afr_lookup_cbk() Using replies[i].poststat.ia_type, which is correctly set, works around the problem. Resubmitted as is after rebase. This is a backport of Id9ccdd8604f79a69db5f1902697f8913acac50ad BUG: 1138897 Change-Id: I73f5e04dec86e648a28363f417559b0cbf80324d Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10178 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/dht: Unwind with proper op_retRaghavendra Talur2015-05-041-12/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Expected behavior of get_real_filename feature. A getxattr on a existing dir with glusterfs.get_real_filename:<filename> as key should result in one of the following things. a. A value returned for that key having the real filename (a file whose match is a case insensitive match to the filename passed in key). b. op_ret = -1 and errno set to ENOENT meaning that no such file exists under the specified dir in any case. c. op_ret = -1 and errno set to ENODATA. This is a case assuming no xlator interprets the glusterfs.get_real_filename key and it get passed down to the posix xlator. Naturally, posix xlator would not find any xattr with this key and would return ENODATA. This will be interpreted specially by the caller as the feature not being supported by underlying glusterfs. 2. What assumptions are wrong? Initially the key used to be user.glusterfs.get_real_filename. In that case, when posix xlator did a getxattr call it would have received ENODATA as error. However, the key has now changed to glusterfs.get_real_filename. This leads to a EOPNOTSUPP error instead. Considering the above information, this is a rewrite of get_real_filename logic in dht. Change-Id: I012e9150047fc8563be91b0d112a368ac1cbf598 BUG: 1204140 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9956 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> (cherry picked from commit 331e705b6a458600c0b5cbcf2b0f7b9e1167bdc2) Reviewed-on: http://review.gluster.org/10403
* ec: Special handling of anonymous fdv3.6.3beta2Xavier Hernandez2015-03-301-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | Anonymous file descriptors need to be handled specially because they can be used in some non standard ways (i.e. an anonymous fd can be used without having been opened). This caused NFS to fail on some operations because ec always expected to have a previous successful opendir call (from patch http://review.gluster.org/9098/). This patch treats all anonymous fd as opened on all subvolumes. This is a backport of http://review.gluster.org/9513/ Change-Id: I09dbbce2ffc1ae3a5bcbb328bed55b84f4f0b9f8 BUG: 1187526 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9596 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/ec: Wait for all bricks to notify before notifying parentPranith Kumar K2015-03-301-14/+34
| | | | | | | | | | | | | | Backport of http://review.gluster.org/9523 This is to prevent spurious heals that can result in self-heal. BUG: 1188471 Change-Id: Iaea335d59431d8d85a236963a365f5c791fc7c49 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9552 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/ec: Handle CHILD UP/DOWN in all casesPranith Kumar K2015-03-302-104/+134
| | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9396 Problem: When all the bricks are down at the time of mounting the volume, then mount command hangs. Fix: 1. Ignore all CHILD_CONNECTING events comming from subvolumes. 2. On timer expiration (without enough up or down childs) send CHILD_DOWN. 3. Once enough up or down subvolumes are detected, send the appropriate event. When rest of the subvols go up/down without changing the overall ec-up/ec-down send CHILD_MODIFIED to parent subvols. BUG: 1188471 Change-Id: If92bd84107d49495cd104deb34601afe7f9b155c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9551 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: serialize inode locksPranith Kumar K2015-03-252-60/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9372 Problem: Afr winds inodelk calls without any order, so blocking inodelks from two different mounts can lead to dead lock when mount1 gets the lock on brick-1 and blocked on brick-2 where as mount2 gets lock on brick-2 and blocked on brick-1 Fix: Serialize the inodelks whether they are blocking inodelks or non-blocking inodelks. Non-blocking locks also need to be serialized. Otherwise there is a chance that both the mounts which issued same non-blocking inodelk may endup not acquiring the lock on any-brick. Ex: Mount1 and Mount2 request for full length lock on file f1. Mount1 afr may acquire the partial lock on brick-1 and may not acquire the lock on brick-2 because Mount2 already got the lock on brick-2, vice versa. Since both the mounts only got partial locks, afr treats them as failure in gaining the locks and unwinds with EAGAIN errno. BUG: 1189023 Change-Id: If5dd502d9d25d12425749a8efcf08a1423b29255 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9576 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Make read child match check in afr optionalKrutika Dhananjay2015-03-253-0/+21
| | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/9917 Having this particular check which was introduced by commit c57c455347a72ebf0085add49ff59aae26c7a70d causes a drop in performance in readdirp. So the behavior is made configurable with this patch. Change-Id: I4a19813cfc786504340264a5a5533a0c43a1d4a4 BUG: 1202673 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9929 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr: remove stale index entriesRavishankar N2015-03-252-3/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9714 Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write tansaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: Iffb171e40490abd8d44df09ccc058b5da67baafe BUG: 1203081 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9920 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Convert quota size from n/w to host order before useKrutika Dhananjay2015-03-161-0/+4
| | | | | | | | | | | | | | Backport of: http://review.gluster.org/9853 Change-Id: I83f1ab16a2dc54841e7beff3033333fba009b3a4 BUG: 1201622 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9884 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Handle getxattr of quota-size keyPranith Kumar K2015-03-143-61/+103
| | | | | | | | | | | | | | | Backport of http://review.gluster.org/9820 Afr needs to query QUOTA_SIZE_KEY from all the subvolumes and return the value which is maximum of the readable bricks. BUG: 1201624 Change-Id: I41725a7323020c1480c38560dc5ae2c2e82d6d47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9873 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Do not increment healed_count if no healing was performedKrutika Dhananjay2015-03-146-26/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/9713 PROBLEM: When file modifications are happening while index heal is launched, index healer could pick up entries which appeared in indices/xattrop transiently during the course of the operations on the mount point, and do not really need any heal. This will cause index healer to keep doing index-heal in a loop as long as it finds this entry, by believing that it did successfully heal some gfids even when it didn't. FIX: afr_selfheal() now returns a 1 to indicate that it did not (need to) heal a given gfid. afr_shd_selfheal() will not increment healed_count whenever afr_selfheal() returns a 1. Change-Id: I9158c814419b635fac3dfe2fe40c94d1548ea4e8 BUG: 1194306 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9852 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr: stop encoding subvolume id in readdir d_offAnand Avati2015-03-033-131/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9332 The purpose of encoding d_off in AFR is to indicate the selected subvolume for the first readdir, and continue all further readdirs of the session on the same subvolume. This is required because, unlike files, dir d_offs are specific to the backend and cannot be re-used on another subvolume. The d_off transformation encodes the subvolume id and prevents such invalid use of d_offs on other servers. However, this approach could be quite wasteful of precious d_off bit-space. Unlike DHT, where server id can change from entry to entry and thus encoding the server id in the transformed d_off is necessary, we could take a slightly relaxed approach in AFR. The approach is to save the subvolume where the last readdir request was sent in the fd_ctx. This consumes constant space (i.e no per-entry cache), and serves the purpose of avoiding d_off "misuse" (i.e using d_off from one server on another). The compromise here is NFS resuming readdir from a non-0 cookie after an extended delay (either anonymous FD has been reclaimed, or server has restarted). In such cases a subvolume is picked freshly. To make this fresh picking more deterministic (i.e, to pick the same subvolume whenever possible, even after reboots), the function afr_hash_child (used by afr_read_subvol_select_by_policy) is modified to skip all dynamic inputs (i.e PID) for the case of directories. BUG: 1191537 Change-Id: I7e3bd8dfe346a9a8e428d7ddeada6cfb66e64e54 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/9638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Crawl should continue on self-heal failuresPranith Kumar K2015-02-221-3/+2
| | | | | | | | | | | | | | | | | | | | Problem: If self-heal fails on the last entry of readdir response list, index heal stops. Fix: Continue crawl even with partial failures. PS: Bug seems to exist only on 3.6.2, on master the code is fine after http://review.gluster.com/9485 Change-Id: Ie39b0d424297e3c95a05cbe72438dfd9a4d5696d BUG: 1192522 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9653 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Don't use inodelk on getxattr when clearing locksXavier Hernandez2015-02-112-9/+12
| | | | | | | | | | | | | | | | | | | | | | | When command 'clear-locks' from cli is executed, a getxattr request is received by ec. This request was handled as usual, first locking the inode. Once this request was processed by the bricks, all locks were removed, including the lock used by ec. When ec tried to unlock the previously acquired lock (which was already released), caused a crash in glusterfsd. This fix executes the getxattr request without any lock acquired for the clear-locks command. This is a backport of http://review.gluster.org/9440/ Change-Id: I77e550d13c4673d2468a1e13fe6e2fed20e233c6 BUG: 1181977 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9444 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix posix compliance failuresXavier Hernandez2015-02-116-87/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch solves some problems that caused dispersed volumes to not pass posix smoke tests: * Problems in open/create with O_WRONLY Opening files with -w- permissions using O_WRONLY returned an EACCES error because internally O_WRONLY was replaced with O_RDWR. * Problems with entrylk on renames. When source and destination were the same, ec tried to acquire the same entrylk twice, causing a deadlock. * Overwrite of a variable when reordering locks. On a rename, if the second lock needed to be placed at the beggining of the list, the 'lock' variable was overwritten and later its timer was cancelled, cancelling the incorrect one. * Handle O_TRUNC in open. When O_TRUNC was received in an open call, it was blindly propagated to child subvolumes. This caused a discrepancy between real file size and the size stored into trusted.ec.size xattr. This has been solved by removing O_TRUNC from open and later calling ftruncate. This is a backport of http://review.gluster.org/9420 Change-Id: I20c3d6e1c11be314be86879be54b728e01013798 BUG: 1159471 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9501 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr : Fixes to 59ba78ae1461651e290ce72013786d828545d4c1Anuradha2015-02-111-0/+4
| | | | | | | | | | | | | | | | | | | A bug was found while reviewing http://review.gluster.org/9377/ (inode was not being unreffed after use). It is fixed on master as a part of the following change - http://review.gluster.org/#/c/9377/7/xlators/cluster/afr/src/afr-common.c . Fixing the same issue in 3.6 with this patch. Change-Id: Ibdc4be49d88613ccb1f80349eb4d368710c0c24b BUG: 1173528 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9450 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: When parent and entry read subvols are different, set ↵Krutika Dhananjay2015-02-112-23/+94
| | | | | | | | | | | | | | | | | entry->inode to NULL Backport of: http://review.gluster.org/#/c/9477 That way a lookup would be forced on the entry, and its attributes will always be selected from its read subvol. Change-Id: Iaba25e2cd5f83e983fc8b1a1f48da3850808e6b8 BUG: 1186119 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9562 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix failures with missing filesXavier Hernandez2015-02-112-102/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When a file does not exist on a brick but it does on others, there could be problems trying to access it because there was some loc_t structures with null 'pargfid' but 'name' was set. This forced inode resolution based on <pargfid>/name instead of <gfid> which would be the correct one. To solve this problem, 'name' is always set to NULL when 'pargfid' is not present. Another problem was caused by an incorrect management of errors while doing incremental locking. The only allowed error during an incremental locking was ENOTCONN, but missing files on a brick can be returned as ESTALE. This caused an EIO on the operation. This patch doesn't care of errors during an incremental locking. At the end of the operation it will check if there are enough successfully locked bricks to continue or not. This is a backport of http://review.gluster.org/9407/ Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a BUG: 1183716 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9560 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/ec: Do not modify quota, selinux xattrs in healingPranith Kumar K2015-02-041-14/+38
| | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9401 Problem: EC heal tries to heal quota-size, selinux xattrs as well. quota-size is private to the brick but since quotad accesses them using the standard interface as well, they can not be filtered in the fops. Fix: Ignore QUOTA_SIZE_KEY and SELINUX xattrs during heal. BUG: 1178590 Change-Id: Id569a49ef996e5507f4474c99b6cdc22781ad82d Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9454 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/ec: Handle internal xattr get/setPranith Kumar K2015-02-046-115/+188
| | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9385 Problem: Internal xattrs of EC like trusted.ec.size/config/version can be modified by users and that can lead to misbehavior in EC. Fix: Don't let the user modify the xattrs. Hide these xattrs in getfattr outputs. BUG: 1182490 Change-Id: Ie32ebb95ee67cabbb9488951097a517172b45bcf Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9455 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr: Don't write to sparse regions of sink.Ravishankar N2015-02-031-2/+39
| | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9480 Problem: When data-self-heal-algorithm is set to 'full', shd just reads from source and writes to sink. If source file happened to be sparse (VM workloads), we end up actually writing 0s to the corresponding regions of the sink causing it to lose its sparseness. Fix: If the source file is sparse, and the data read from source and sink are both zeros for that range, skip writing that range to the sink. Change-Id: Id23d953fe2c8c64cde5ce3530b52ef91a7583891 BUG: 1187547 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9515 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/dht: Fix incorrect updates to parent timesKrutika Dhananjay2015-02-031-6/+5
| | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/9457 In directory write FOPs, as far as updates to timestamps associated with parent by DHT is concerned, there are three possibilities: a) time (in sec) gotten from child of DHT < time (in sec) in inode ctx b) time (in sec) gotten from child of DHT = time (in sec) in inode ctx c) time (in sec) gotten from child of DHT > time (in sec) in inode ctx In case (c), for time in nsecs, it is the value returned by DHT's child that must be selected. But what DHT_UPDATE_TIME ends up doing is to choose the maximum of (time in nsec gotten from DHT's child, time in nsec in inode ctx). Change-Id: I1046bb7147ab48eca5c477be03ef2d4630e7752d BUG: 1186119 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9490 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* Cluster/DHT : Fixed crash due to null derefv3.6.2Nithya Balachandran2015-01-211-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | A lookup on a linkto file whose trusted.glusterfs.dht.linkto xattr points to a subvol that is not part of the volume can cause the brick process to segfault due to a null dereference. Modified to check for a non-null value before attempting to access the variable. > Change-Id: Ie8f9df058f842cfc0c2b52a8f147e557677386fa > BUG: 1159571 > Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> > Reviewed-on: http://review.gluster.org/9034 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: venkatesh somyajulu <vsomyaju@redhat.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> > Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Change-Id: I53b086289d2386d269648653629a0750baae07a4 BUG: 1184191 Reviewed-on: http://review.gluster.org/9467 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* afr : glfs-heal implementationAnuradha2015-01-0610-38/+400
| | | | | | | | | | | | | Backport of http://review.gluster.org/6529 and http://review.gluster.org/9119 Change-Id: Ie420efcb399b5119c61f448b421979c228b27b15 BUG: 1173528 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9335 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/dht: Change log level to avoid annoying logsVijay Bellur2015-01-051-1/+1
| | | | | | | | | | | [2015-01-04 08:03:23.820376] I [dht-common.c:1822:dht_lookup_cbk] 0-patchy-dht: Entry /tls missing on subvol patchy-replicate-0 Change-Id: Id9eae47213ed39e8bf969e82cc7b935dfada4598 BUG: 1138385 Reviewed-on: http://review.gluster.org/9382 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Make entry-self-heal in afr-v2 compatible with afr-v1Pranith Kumar K2015-01-041-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/9227 Problem: entry self-heal in 3.6 and above, takes full lock on the directory only for the duration of figuring out the xattrs of the directories where as 3.5 takes locks through out the entry-self-heal. If the cluster is heterogeneous then there is a chance that 3.6 self-heal is triggered and then 3.5 self-heal will also triggered and both the self-heal daemons of 3.5 and 3.6 do self-heal. Fix: In 3.6.x and above get an entry lock on a very long name before entry self-heal begins so that 3.5 entry self-heal will not get locks until 3.6.x entry self-heal completes. BUG: 1177418 Change-Id: Iecf49d794c6b480e38563e39599a40067b3a21cb Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9355 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix mutex related coverity scan issuesXavier Hernandez2015-01-042-11/+19
| | | | | | | | | | | | | | | | | | | | | | | This patch solves 3 issues detected by coverity scan: CID1241484 Data race condition CID1241486 Data race condition CID1256173 Thread deadlock CID1257622 Thread deadlock With this patch, inode lock is never acquired inside a region locked with fop->lock. This is a backport of http://review.gluster.org/9230/ and http://review.gluster.org/9263/ Change-Id: I35c4633efd1b68b9f72b42661fa7c728b1f52c6a BUG: 1170954 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9244 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* Avoid spurious directory metedata split brainEmmanuel Dreyfus2014-12-231-1/+115
| | | | | | | | | | | | | | | | | | | | | When directory content is modified, [mc]time is updated. On Linux, the filesystem does it, while at least on NetBSD, the kernel file-system independant code does it. This means that when entries are added while bricks are down, the kernel sends a SETATTR [mc]time which will cause metadata split brain for the directory. In this case, clear the split brain by finding the source with the most recent modification date. Backport of: Ic0177e0df753a4748624d0b906834ed54593adb9 BUG: 1138897 Change-Id: Ic2e697be4f0074e2bbd3f23d6ad40a2d2d126a2c Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9319 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix self-healing issues.Xavier Hernandez2014-12-2110-312/+549
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three problems have been detected: 1. Self healing is executed in background, allowing the fop that detected the problem to continue without blocks nor delays. While this is quite interesting to avoid unnecessary delays, it can cause spurious failures of self-heal because it may try to recover a file inside a directory that a previous self-heal has not recovered yet, causing the file self-heal to fail. 2. When a partial self-heal is being executed on a directory, if a full self-heal is attempted, it won't be executed because another self-heal is already in process, so the directory won't be fully repaired. 3. Information contained in loc's of some fop's is not enough to do a complete self-heal. To solve these problems, I've made some changes: * Improved ec_loc_from_loc() to add all available information to a loc. * Before healing an entry, it's parent is checked and partially healed if necessary to avoid failures. * All heal requests received for the same inode while another self-heal is being processed are queued. When the first heal completes, all pending requests are answered using the results of the first heal (without full execution), unless the first heal was a partial heal. In this case all partial heals are answered, and the first full heal is processed normally. * An special virtual xattr (not physically stored on bricks) named 'trusted.ec.heal' has been created to allow synchronous self-heal of files. Now, the recommended way to heal an entire volume is this: find <mount> -d -exec getfattr -h -n trusted.ec.heal {} \; Some minor changes: * ec_loc_prepare() has been renamed to ec_loc_update(). * All loc management functions return 0 on success and -1 on error. * Do not delay fop unlocks if heal is needed. * Added basic ec xattrs initially on create, mkdir and mknod fops. * Some coding style changes This is a backport of http://review.gluster.org/9072/ Change-Id: I2a5fd9c57349a153710880d6ac4b1fa0c1475985 BUG: 1159484 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9073 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* telldir()/seekdir() portability fixesEmmanuel Dreyfus2014-12-201-6/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX says that an offset obtained from telldir() can only be used on the same DIR *. Linux is abls to reuse the offset accross closedir()/opendir() for a given directory, but this is not portable and such a behavior should be fixed. An incomplete fix for the posix xlator was merged in http://review.gluster.org/8933 This change set completes it. - Perform the same fix index xlator. - Use appropriate casts and variable types so that 32 bit signed offsets obtained by telldir() do not get clobbered when copied into 64 bit signed types. - modify afr-self-heald.c so that it does not use anonymous fd, since this will cause closedir()/opendir() between each syncop_readdir(). On failure we fallback to anonymous fs only for Linux so that we can cope with updated client vs not updated brick. - Avoid sending an EINVAL when the client request for the EOF offset. Here we fix an error in previous fix for posix xlator: since we fill each directory entry with the offset of the next entry, we must consider as EOF the offset of the last entry, and not the value of telldir() after we read it. This is a backport of I59fb7f06a872c4f98987105792d648141c258c6a BUG: 1138897 Change-Id: I1e9f3e4a7d780b98adf6d9f197ee2198d43ef94d Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/9084 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Remove O_APPEND from flags on create and open.Xavier Hernandez2014-12-184-53/+60
| | | | | | | | | | | | | | | | | | | | | | Allowing O_APPEND flag to pass through to the brick files corrupts fragment contents because writes are not stored on the desired place. Write fop has been modified so that it uses current file size as its write offset. This guarantees that all writes, even those comming from different file descriptors and clients, will write to the end of the file. This is backport of http://review.gluster.org/9079/ Change-Id: I9f721f12217a98231fe52e344166d1c94172c272 BUG: 1161885 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9080 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix incorrect value of EC_MAX_NODESXavier Hernandez2014-12-182-1/+4
| | | | | | | | | | | | | | | | | EC_MAX_NODES was incorrectly calculated. Now the value if computed as the minimum between the theoretical maximum and the limit imposed by the Galois Field. This is a backport of http://review.gluster.org/9193/ Change-Id: I75a8345147f344f051923d66be2c10d405370c7b BUG: 1170959 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9245 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Fix return errors when not enough bricksXavier Hernandez2014-12-128-16/+80
| | | | | | | | | | | | | | | | | | | | | | | | | Changes introduced by this patch: * Fix an incorrect error propagation when the state of the life cycle of a fop returns an error. * Fix incorrect unlocking of failed locks. * Return ENOTCONN if there aren't enough bricks online. * In readdir(p) check that the fd has been successfully open by a previous opendir. This is a backport of http://review.gluster.org/9098/ Change-Id: Ib44f25a1297849ebcbab839332f3b6359f275ebe BUG: 1161066 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9107 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* ec: Change licenseXavier Hernandez2014-12-1223-369/+138
| | | | | | | | | | | | | This is a backport of http://review.gluster.org/9201/ Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b BUG: 1170515 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9233 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Eliminate locking in sh domain in metadata self-healKrutika Dhananjay2014-12-091-35/+2
| | | | | | | | | | | | | Backport of: http://review.gluster.org/9240 Change-Id: I2ab2ad9a02d88c299cfb32e0cf6baa44d1c2ee12 BUG: 1171077 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9246 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Raghavendra Bhat <raghavendra@redhat.com>
* features/marker: Filter internal xattrs in lookupPranith Kumar K2014-11-163-7/+21
| | | | | | | | | | | | | | Backport of http://review.gluster.org/9061 Afr should ignore quota-size-key as part of self-heal but should heal quota-limit key. BUG: 1163569 Change-Id: I93d203002eac4fe20b70730c27c852d783c16d7f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9110 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve errno in case of failures on all subvolsPranith Kumar K2014-11-161-5/+16
| | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/8984 Problem: When quorum is enabled and the fop fails on all the subvolumes, op_errno is set to EROFS which overrides the actual errno returned from bricks. Fix: Don't override the errno when fop fails on all subvols. Change-Id: I61e57bbf1a69407230ec172a983de18d1c624fd2 BUG: 1162122 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9087 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Avoid self-heal on directories on (f)stat callsXavier Hernandez2014-11-151-1/+2
| | | | | | | | | | | | | | | | | | | | | To avoid inconsistent directory listings, a full self-heal cannot happen on a directory until all its contents have been healed. This is controlled by a manual command using getfattr recursively and in post-order. While navigating the directories, sometimes an (f)stat fop can be sent. This fop caused a full self-heal of the directory. This patch makes that (f)stat only initiates a partial self-heal. This is a backport of http://review.gluster.org/9117/ Change-Id: I0a92bda8f4f9e43c1acbceab2d7926944a8a4d9a BUG: 1159498 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9118 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Correctly handle quota xattrsXavier Hernandez2014-11-151-0/+53
| | | | | | | | | | | This is a backport of http://review.gluster.org/8990/ Change-Id: I35e11d83c318210d44b918e847cf13db35b01510 BUG: 1158088 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8992 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* cluster/afr: Perform post-op in entry selfheal inside locksKrutika Dhananjay2014-10-311-3/+31
| | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/9020 Take entrylks in xlator domain before doing post-op (undo-pending) in entry self-heal. This is to prevent a parallel name self-heal on an entry under @fd->inode from reading pending xattrs while it is being modified by SHD after entry sh below, given that name self-heal takes locks ONLY in xlator domain and is free to read pending changelog in the absence of the following locking. Change-Id: I0bc92978efc0741d6e3f2439540d008e31472313 BUG: 1136821 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9030 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* ec: Correctly handle xtime extended attributeXavier Hernandez2014-10-281-2/+39
| | | | | | | | | Change-Id: I2bd34f063d6bf1835d5ae57a8e9aa03f3ec3deb3 BUG: 1156405 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8976 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Fix rebalance issuesXavier Hernandez2014-10-275-113/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some issues in ec xlator made that rebalance didn't complete successfully and generated some warnings and errors in the log. The most critical error was a race condition that caused false corruption detection when two specific operations were executed sequentially and they shared the same lock. This explains the problem: 1. A setxattr is issued. 2. setxattr: ec locks the inode before updating the xattr. 3. setxattr: The xattr is updated. 4. setxattr: Upper xlator is notified that the operation completed. 5. setxattr: A background task is initiated to update the version of the file. 6. A stat is issued on the same file. 7. stat: Since the lock is already acquired, it's reused. 8. stat: A lookup is issued to determine version and size information of the file. At this point, operations 5 and 8 can interfere. This can make that lookup sees different information on each brick, determining that some bricks are corrupted and incorrectly excluding them from the operation and initiating a self-heal. In some cases this false detection combined with self-heal could lead to invalid updates of the trusted.ec.size xattr, leaving the file smaller than it should be. This only happens if the first operation does not perform a lookup, because chained operations reuse the information returned by the previous one, avoiding this kind of problems. To solve this, now the background update is executed atomically with the posterior unlock. This avoids some reuses of the lock while updating. However this reduces performance because the window in which new requests can reuse the lock is much smaller now. This has been alleviated by using the same technique implemented in AFR (i.e. waiting some time before releasing the lock). Some minor changes also introduced in this patch: * Bug in management of 'trusted.glusterfs.pathinfo' that was writing beyond the allocated space. * Uninitialized variable. * trusted.ec.config was not created for regular files created with mknod. * An invalid state was used in access fop. This is a backport of http://review.gluster.org/8947/ Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395 BUG: 1152903 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8948 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* logs: Do selective logging for errnosPranith Kumar K2014-10-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/8918 http://review.gluster.org/8955 Problem: Just after replace-brick the mount logs are filled with ENOENT/ESTALE warning logs because the file is yet to be self-healed now that the brick is new. Fix: Do conditional logging for the logs. ENOENT/ESTALE will be logged at lower log level. Only when debug logs are enabled, these logs will be written to the logfile. Change-Id: If203d09e2479e8c2415ebc14fb79d4fbb81dfc95 BUG: 1155027 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8957 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Add afr-v1 xattr compatibilityPranith Kumar K2014-10-226-83/+330
| | | | | | | | | | | | | | | Backport of http://review.gluster.org/8536 All the special cases v1 handles and also self-accusing pending changelog from v1 pre-op also is handled in this patch. BUG: 1155017 Change-Id: I86cf6b80492be5c1f240c74f91a0e1b0dd9b58b2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8956 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Fix self-heal issuesXavier Hernandez2014-10-2213-302/+391
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Doing an 'ls' of a directory that has been modified while one of the bricks was down, sometimes returns the old directory contents. Cause: Directories are not marked when they are modified as files are. The ec xlator balances requests amongst available and healthy bricks. Since there is no way to detect that a directory is out of date in one of the bricks, it is used from time to time to return the directory contents. Solution: Basically the solution consists in use versioning information also for directories, however some additional changes have been necessary. Changes: * Use directory versioning: This required to lock full directory instead of a single entry for all requests that add or remove entries from it. This is needed to allow atomic version update. This affects the following fops: create, mkdir, mknod, link, symlink, rename, unlink, rmdir Another side effect is that opendir requires to do a previous lookup to get versioning information and discard out of date bricks for subsequent readdir(p) calls. * Restrict directory self-heal: Till now, when one discrepancy was found in lookup, a self-heal was automatically started. This caused the versioning information of a bad directory to be healed instantly, making the original problem to reapear again. To solve this, when a missing directory is detected in one or more bricks on lookup or opendir fops, only a partial self-heal is performed on it. A partial self-heal basically creates the directory but does not restore any additional information. This avoids that an 'ls' could repair the directory and cause the problem to happen again. With this change, output of 'ls' is always consistent. However, since the directory has been created in the brick, this allows any other operation on it (create new files, for example) to succeed on all bricks and not add additional work to the self-heal process. To force a self-heal of a directory, any other operation must be done on it. For example a getxattr. With these changes, the correct healing procedure that would avoid inconsistent directory browsing consists on a post-order traversal of directoriesi being healed. This way, the directory contents will be healed before healing the directory itslef. * Additional changes to fix self-heal errors - Don't use fop->fd to decide between fd/loc. open, opendir and create have an fd, but the correct data is in loc. - Fix incorrect management of bad bricks per inode/fd. - Fix incorrect selection of fop's target bricks when there are bad bricks involved. - Improved ec_loc_parent() to always return a parent loc as complete as possible. This is a backport of http://review.gluster.org/8916/ Change-Id: Iaf3df174d7857da57d4a87b4a8740a7048b366ad BUG: 1149727 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8946 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>