summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr/src/afr.h
Commit message (Collapse)AuthorAgeFilesLines
* afr: support self-heal data trylock mechanismBrian Foster2012-12-041-0/+1
| | | | | | | | | | | | | | | Introduce a block flag to support an optional blocking or non-blocking mode in the self-heal data locking mechanism. All callers are modified to use blocking mode, which is the current default behavior (no change in behavior is introduced by this commit). BUG: 874045 Change-Id: Ib7ff9984578fa11de4e3b6981508100cdddd37cd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4257 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to disable readdir failoverPranith Kumar K2012-12-031-0/+2
| | | | | | | | | | | | | | | | | In a replica pair unlike files, directories may not have their content in same order, so readdir for same (offset, size) may not give same entries on both the sobvolumes of replica pair. Switching over from one subvolume to another may not be a good idea sometimes. It may lead to duplicate entries or fewer entries or both. This patch provides a way to disable readdir-failover so that applications like rebalance can retry if they want to. Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0 BUG: 859387 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4159 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: handle short writes in afr_writev_wind and self-heal to avoid corruptionBrian Foster2012-11-291-0/+7
| | | | | | | | | | | | | | | | | | | | | The current failure to handle short writes on writev fops leaves us open to file corruption. A short write on a user request is ignored and leaves replicas in an inconsistent state. A short write during a self-heal is ignored and incorrectly marks the files as consistent if the heal completes. Modify user writev handling to return the best case return value from each of the replicas. Short writes that occur relative to this value are marked as failed and will require a heal. Modify self-heal to set an error on a short write and abort the heal. BUG: 853690 Change-Id: I18b30f58702326249230eeebb361b29e40b535f5 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4150 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Cluster/afr: Fix output for gluster volume heal vn info healedVenkatesh Somyajula2012-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | Problem: Whenever gluster volume heal vol full command is executed, the entries stored in the circual buffer for sh->healed are added in the dictionary in the _crawl_post_sh_action function irrespective of whether actual self heal (due to non-zero values in chage log) takes place or not. Fix: Value of key (actual-sh-done) will be set to 1 whenever self heal takes place due to non-zero change log values and if for some FOP self heal daemon finds that no self heal required after examining the pending matrix, the value will be 0. Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc BUG: 863068 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4181 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Provide option to set readdir-size in entry-self-healPranith Kumar K2012-10-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Entry self-heal does lookups on all the entries that are read in readdir. More the size of readdir more number of lookups happen in parallel. It is observed that it leads to HUGE cpu spikes rendering everything else on the system unusable. Fix: Provided the option self-heal-readdir-size to configure the size. Default value is at 1KB. Tests: Checked that the readdirs are happening with the configured value in entry-self-heal. Change-Id: Icaa937ad88857e6f9a12375b1e7f6a49192bc8b1 BUG: 860895 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4002 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Unwind with correct pre/post parent bufsPranith Kumar K2012-08-021-55/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: In case of dir fops create, mknod, mkdir, link, symlink, rename if the fop fails on read-child then unwinds are happening with all-zero pre/post iatt-bufs. The bug occurs because the parent bufs are not saved if the response is not from read-child. Fix: Save the pre/post-bufs for the first response. If the response comes from read-child, overwrite whatever we have cached. Tests: Attached the mount process to gdb. Tested that the unwinds happen with proper pre/post iatt bufs in the following cases: 1) All success case 2) Failure on read-child 3) Failure on non-read-child 4) Failure on all children. Tested soft-link self-heal to test the change made in that. Tested errno ENOTEMPTY for rmdir, rename fops. Change-Id: I82882423d2d766b4f4a3044203bcb5dbcaee1755 BUG: 845242 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Modified split-brain handlingPranith Kumar K2012-07-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA The bug is observed because the decision to mark a file in split-brain is taken outside appropriate locks. Lookup gathers xattrs outside any lock. The xattrs being in split-brain in lookup should only be taken as a hint. Appropriate inodelks should be taken before confirming a split-brain. Self-heal confirms this at the moment. If data/metadata self-heal is turned off, inspecting of xattrs could not be performed so split-brain behavior does not work correctly if the self-heal options are turned off. Fix Self-heals are launched to inspect xattrs even when the data/metadata self-heal options are turned off. The decision to heal data/metadata after the xattrs are inspected is based on whether the options are turned on/off. So decision to set/reset split-brain flag is taken inside appropriate locks. Testcases: tests 33-36 in https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh Change-Id: Ia8aeab08208b50c06609ad35a9d72f3d553ee343 BUG: 833727 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: post-op-delay supportAnand Avati2012-07-041-0/+7
| | | | | | | | | | | | | | | | post-op-delay introduces an artificial delay between the OP and POST-OP-CHANGELOG phases of a write transaction to increase the probability of changelog-piggyback and eager-locking to work more efficiently. Also enable eager-locking by default. Change-Id: I865ca4b68512c44818719c7e388952f15d53e6c2 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3621 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
* cluster/afr: cleanup lk_owner and PID messAnand Avati2012-07-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically PID (frame->root->pid) was used by the locks translator to identify a locker (and make decisions about which locks contend or cooperate/merge). Since the introduction of lock_owner parameter the usage of PID (for locks) was deprecated and is now unused. This patch nukes the usage of PID in AFR The usage of lk_owner has also ended up being a mess, because of the differentiation required between ->lk() and ->inodelk(), (->lk() needs to be identified by the process (roughly) and ->inodelk() needs to be identified by the transaction) and also because of optimizations like eager locking (locks are no more identified by the transaction as they now get inherited by the next transaction). The scheme (and technique) now is: - All FOPs (the third phase of the transaction) happen with the lk_owner which is set by the topmost layer (FUSE, NFS etc.) - All entrylks are issued with lk_owner set to the frame->root address. - Inodelks which will not be subject to eager locking are issued with lk_owner set to frame->root. - Inodelks which are subject to eager locking are issued with lk_owner set to the address of fd_t (which are the only type of frames which get subject to the eager locking optimization) - At the start of the transaction, the transaction frame's lk_owner is set to the either frame->root or fd_t (and never unmodified) depending on the type of transaction. - Just before the third phase (FOP phase) the set lk_owner is "saved" away and overwritten by the lk_owner submitted by the top layer (FUSE or NFS) - Right after the third phase, the saved lk_owner is "restored" to resume the transaction into the POST-OP and eventually UNLOCK using the same lk_owner which was used during the LOCK phase. Change-Id: I6ab8e4d6b65ae4185fa85ad3fded8e9188b2f929 BUG: 836033 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com>
* replicate: default read_child to a local brick if there is one.Jeff Darcy2012-06-051-0/+3
| | | | | | | | | | | Controlled by the "choose-local" option (on by default). Change-Id: I560f27c81703f2c9c62fdb51532c8eb763826df7 BUG: 806462 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/3005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: add hashed read-child method.Jeff Darcy2012-05-311-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both the first-to-respond method and the round-robin method are susceptible to clients repeatedly choosing the same servers across a series of opens, creating hot spots. Also, the code to handle a replica being down will ignore both methods and just choose the first remaining (which is not an issue for two-way but can be otherwise). The hashed method more reliably avoids such hot spots. There are three values/modes. 0: use the old (broken) methods. 1: select a read-child based on a hash of the file's GFID, so all clients will choose the same subvolume for a file (ensuring maximum consistency) but will distribute load for a set of files. 2: select a read-child based on a hash of the file's GFID plus the client's PID, so different children will distribute load even for one file. Mode 2 will probably be optimal for most cases. Using response time when we open the file is problematic, both because a single sample might not have been representative even then and because load might have shifted in the hours or days since (for long-lived files). Trying to use more current load information can lead to "herd following" behavior which is just as bad. Pseudo-random distribution is likely to be the best we can reasonably do, just as it is for DHT. Change-Id: I798c2760411eacf32e82a85f03bb7b08a4a49461 BUG: 802513 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/2926 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Enforce order in pre/post opPranith Kumar K2012-05-181-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xattrop order in pre/post op on all the subvols is client-0, client-1... client-n where n is (replica-count - 1). This order can lead to invalid split-brains if the brick dies in the middle of xattrops. Example: transaction completed pre-op, so on all the subvolumes xattrs have '1' changelog. Now post-op is sent to both the subvols. On subvol-0 change-log of client-0 is decremented to 0, before decrementing change-log of client-1 to 0 the brick dies. This change-log status on subvol-0 gives the meaning that a change is done on subvol-0 successfully but on subvol-1 it failed. Which is not what happened. Changes done when the subvol-0 was down will lead to pending change-log on subvol-1 for subvol-0. Which is correct. When the subvol-0 is brought back up, the change-log will be in split-brain state even when it is not a legitimate split-brain. If the brick dies in the middle of xattrops it should remain fool. Pre-op should perform xattrop of the local change-log first and post-op should perform xattrop of the local change-log last. In case of optimistic changelogs txn_changelog should be done last on local if it succeeds, first if it fails. Change-Id: Ib6eeb20cdc49b0b1fd2f454f25a9c8e08388c6e7 BUG: 765194 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3226 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Determining sources should do both fxattrop, fstatPranith Kumar K2012-05-181-0/+1
| | | | | | | | | Change-Id: Ifab37db2af8d489cd516e992b7423c765dcabc4f BUG: 765587 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3088 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Self-heald: Enable configuring of heal poll timeoutPranith Kumar K2012-05-181-0/+1
| | | | | | | | | Change-Id: I631e5bf4b3615b553b72e7ac7f490714b3b995f9 BUG: 821395 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3329 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* license: dual license under GPLV2 and LGPLV3+Kaleb KEITHLEY2012-05-101-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the license was not changed in any of the following: .../argp-standalone/... .../booster/... .../cli/... .../contrib/... .../extras/... .../glusterfsd/... .../glusterfs-hadoop/... .../mod_clusterfs/... .../scheduler/... .../swift/... The license was not changed in any of the non-building xlators. The license was not changed in any of the xlators that seemed — to me — to be clearly server-side only, e.g. protocol/server Note too that copyright was changed along with the license; I did not change the copyright in files where the license did not change. If you find any errors or ommissions please don't hesitate to let me know. The complete list of files with the license change is: libglusterfs/src/byte-order.h libglusterfs/src/call-stub.c libglusterfs/src/call-stub.h libglusterfs/src/checksum.c libglusterfs/src/checksum.h libglusterfs/src/circ-buff.c libglusterfs/src/circ-buff.h libglusterfs/src/common-utils.c libglusterfs/src/common-utils.h libglusterfs/src/compat-errno.c libglusterfs/src/compat-errno.h libglusterfs/src/compat.c libglusterfs/src/compat.h libglusterfs/src/daemon.c libglusterfs/src/daemon.h libglusterfs/src/defaults.c libglusterfs/src/defaults.h libglusterfs/src/dict.c libglusterfs/src/dict.h libglusterfs/src/event-history.c libglusterfs/src/event-history.h libglusterfs/src/event.c libglusterfs/src/event.h libglusterfs/src/fd-lk.c libglusterfs/src/fd-lk.h libglusterfs/src/fd.c libglusterfs/src/fd.h libglusterfs/src/gf-dirent.c libglusterfs/src/gf-dirent.h libglusterfs/src/globals.c libglusterfs/src/globals.h libglusterfs/src/glusterfs.h libglusterfs/src/graph-print.c libglusterfs/src/graph-utils.h libglusterfs/src/graph.c libglusterfs/src/hashfn.c libglusterfs/src/hashfn.h libglusterfs/src/iatt.h libglusterfs/src/inode.c libglusterfs/src/inode.h libglusterfs/src/iobuf.c libglusterfs/src/iobuf.h libglusterfs/src/latency.c libglusterfs/src/latency.h libglusterfs/src/list.h libglusterfs/src/lkowner.h libglusterfs/src/locking.h libglusterfs/src/logging.c libglusterfs/src/logging.h libglusterfs/src/mem-pool.c libglusterfs/src/mem-pool.h libglusterfs/src/mem-types.h libglusterfs/src/options.c libglusterfs/src/options.h libglusterfs/src/rbthash.c libglusterfs/src/rbthash.h libglusterfs/src/run.c libglusterfs/src/run.h libglusterfs/src/scheduler.c libglusterfs/src/scheduler.h libglusterfs/src/stack.c libglusterfs/src/stack.h libglusterfs/src/statedump.c libglusterfs/src/statedump.h libglusterfs/src/syncop.c libglusterfs/src/syncop.h libglusterfs/src/syscall.c libglusterfs/src/syscall.h libglusterfs/src/timer.c libglusterfs/src/timer.h libglusterfs/src/trie.c libglusterfs/src/trie.h libglusterfs/src/xlator.c libglusterfs/src/xlator.h libglusterfsclient/src/libglusterfsclient-dentry.c libglusterfsclient/src/libglusterfsclient-internals.h libglusterfsclient/src/libglusterfsclient.c libglusterfsclient/src/libglusterfsclient.h rpc/rpc-lib/src/auth-glusterfs.c rpc/rpc-lib/src/auth-null.c rpc/rpc-lib/src/auth-unix.c rpc/rpc-lib/src/protocol-common.h rpc/rpc-lib/src/rpc-clnt.c rpc/rpc-lib/src/rpc-clnt.h rpc/rpc-lib/src/rpc-transport.c rpc/rpc-lib/src/rpc-transport.h rpc/rpc-lib/src/rpcsvc-auth.c rpc/rpc-lib/src/rpcsvc-common.h rpc/rpc-lib/src/rpcsvc.c rpc/rpc-lib/src/rpcsvc.h rpc/rpc-lib/src/xdr-common.h rpc/rpc-lib/src/xdr-rpc.c rpc/rpc-lib/src/xdr-rpc.h rpc/rpc-lib/src/xdr-rpcclnt.c rpc/rpc-lib/src/xdr-rpcclnt.h rpc/rpc-transport/rdma/src/name.c rpc/rpc-transport/rdma/src/name.h rpc/rpc-transport/rdma/src/rdma.c rpc/rpc-transport/rdma/src/rdma.h rpc/rpc-transport/socket/src/name.c rpc/rpc-transport/socket/src/name.h rpc/rpc-transport/socket/src/socket.c rpc/rpc-transport/socket/src/socket.h xlators/cluster/afr/src/afr-common.c xlators/cluster/afr/src/afr-dir-read.c xlators/cluster/afr/src/afr-dir-read.h xlators/cluster/afr/src/afr-dir-write.c xlators/cluster/afr/src/afr-dir-write.h xlators/cluster/afr/src/afr-inode-read.c xlators/cluster/afr/src/afr-inode-read.h xlators/cluster/afr/src/afr-inode-write.c xlators/cluster/afr/src/afr-inode-write.h xlators/cluster/afr/src/afr-lk-common.c xlators/cluster/afr/src/afr-mem-types.h xlators/cluster/afr/src/afr-open.c xlators/cluster/afr/src/afr-self-heal-algorithm.c xlators/cluster/afr/src/afr-self-heal-algorithm.h xlators/cluster/afr/src/afr-self-heal-common.c xlators/cluster/afr/src/afr-self-heal-common.h xlators/cluster/afr/src/afr-self-heal-data.c xlators/cluster/afr/src/afr-self-heal-entry.c xlators/cluster/afr/src/afr-self-heal-metadata.c xlators/cluster/afr/src/afr-self-heal.h xlators/cluster/afr/src/afr-self-heald.c xlators/cluster/afr/src/afr-self-heald.h xlators/cluster/afr/src/afr-transaction.c xlators/cluster/afr/src/afr-transaction.h xlators/cluster/afr/src/afr.c xlators/cluster/afr/src/afr.h xlators/cluster/afr/src/pump.c xlators/cluster/afr/src/pump.h xlators/cluster/dht/src/dht-common.c xlators/cluster/dht/src/dht-common.h xlators/cluster/dht/src/dht-diskusage.c xlators/cluster/dht/src/dht-hashfn.c xlators/cluster/dht/src/dht-helper.c xlators/cluster/dht/src/dht-inode-read.c xlators/cluster/dht/src/dht-inode-write.c xlators/cluster/dht/src/dht-layout.c xlators/cluster/dht/src/dht-linkfile.c xlators/cluster/dht/src/dht-mem-types.h xlators/cluster/dht/src/dht-rebalance.c xlators/cluster/dht/src/dht-rename.c xlators/cluster/dht/src/dht-selfheal.c xlators/cluster/dht/src/dht.c xlators/cluster/dht/src/nufa.c xlators/cluster/dht/src/switch.c xlators/cluster/stripe/src/stripe-helpers.c xlators/cluster/stripe/src/stripe-mem-types.h xlators/cluster/stripe/src/stripe.c xlators/cluster/stripe/src/stripe.h xlators/features/index/src/index-mem-types.h ¹ xlators/features/index/src/index.c ¹ xlators/features/index/src/index.h ¹ xlators/performance/io-cache/src/io-cache.c xlators/performance/io-cache/src/io-cache.h xlators/performance/io-cache/src/ioc-inode.c xlators/performance/io-cache/src/ioc-mem-types.h xlators/performance/io-cache/src/page.c xlators/performance/io-threads/src/io-threads.c xlators/performance/io-threads/src/io-threads.h xlators/performance/io-threads/src/iot-mem-types.h xlators/performance/md-cache/src/md-cache-mem-types.h xlators/performance/md-cache/src/md-cache.c xlators/performance/quick-read/src/quick-read-mem-types.h xlators/performance/quick-read/src/quick-read.c xlators/performance/quick-read/src/quick-read.h xlators/performance/read-ahead/src/page.c xlators/performance/read-ahead/src/read-ahead-mem-types.h xlators/performance/read-ahead/src/read-ahead.c xlators/performance/read-ahead/src/read-ahead.h xlators/performance/symlink-cache/src/symlink-cache.c xlators/performance/write-behind/src/write-behind-mem-types.h xlators/performance/write-behind/src/write-behind.c xlators/protocol/auth/addr/src/addr.c ¹ xlators/protocol/auth/login/src/login.c ¹ xlators/protocol/client/src/client-callback.c xlators/protocol/client/src/client-handshake.c xlators/protocol/client/src/client-helpers.c xlators/protocol/client/src/client-lk.c xlators/protocol/client/src/client-mem-types.h xlators/protocol/client/src/client.c xlators/protocol/client/src/client.h xlators/protocol/client/src/client3_1-fops.c ¹ Copyright only, license reverted to original Change-Id: If560e826c61b6b26f8b9af7bed6e4bcbaeba31a8 BUG: 820551 Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.com/3304 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Self-heald: Dump the event history completelyPranith Kumar K2012-05-071-1/+0
| | | | | | | | | Change-Id: Icf08ef1752795276f88c343d1d74af104095c6cb BUG: 796579 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3276 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle transient parent-entry xactions in lookupPranith Kumar K2012-04-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses the case when the lookup on an entry is performed while it is being renamed. The lookup can possibly return 2 different gfids when lookup on one subvol reached before rename and on other after rename. In such cases the conflicting entry self-heal is triggered to resolve the issue, but if there are lot entry transactions going on the parent directory of the entry then the non-blocking locks could fail resulting in EIO. To avoid this, lookup queries locks xlator if there are any parent-entrylk on entry's basename. If afr finds that there are such locks and gfids are differing then it chooses the file with latest ctime as the iatt of the entry. This solution is not foolproof, but it decreases the probability of hitting the EIO. The correct solution is to take blocking locks on the parent-entry to find out the correct source. Taking blocking locks in lookup is not good. One stale entry lock can hang the whole filesystem. So we chose to go with this for now. Change-Id: Ibebb6c3074f56f80a96893b6bf5b77941e30d400 BUG: 765551 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3179 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* self-heald: Add node-uuid option for determining brick positionPranith Kumar K2012-04-051-0/+1
| | | | | | | | | Change-Id: Ia60981da7473d74682d86286e4d540568c8de25b BUG: 807556 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3074 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* self-heald: Find self-heal failures, split-brainPranith Kumar K2012-04-051-2/+3
| | | | | | | | | | Change-Id: Ib967f0fe0b537fe60e51d7d05462b58a7f16596e BUG: 806745 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3077 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle self-heal of files with holesPranith Kumar K2012-04-041-0/+1
| | | | | | | | | Change-Id: I6c04fe3022f234455d52620f42b9add80fc6abe4 BUG: 765424 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3065 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: adding extra data for fopsAmar Tumballi2012-03-221-4/+9
| | | | | | | | | | | | | with this change, the xlator APIs will have a dictionary as extra argument, which is passed between all the layers. This can be utilized for overloading in some of the operations. Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2960 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: set_read_child when xactions in progress in fresh lookupPranith Kumar K2012-03-181-0/+1
| | | | | | | | | Change-Id: I33e0268635ae7a1f247b0052994e027f990083da BUG: 800755 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2963 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Enable eager-lockPranith Kumar K2012-03-171-1/+1
| | | | | | | | | | | | | | | | | | Eager-lock is disabled by default. Use cluster.eager-lock on/off to change the config. write-behind on and eager-lock off is not supported configuration. In afr, when eager-lock is enabled the inode lock on fd is taken using the fd address as the lk-owner. So the lock is interchangableale between the inode-locks on the same fd. Change-Id: I7eef1ecd510f8028f5395dee882782da53c0de3f BUG: 802515 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2925 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: save the xattr obtained in the {f}xattrop_cbk in localRaghavendra Bhat2012-03-121-0/+8
| | | | | | | | | | | | | | | | | | | If the {f}xattrop operation succeeds on one of the subvolumes and fails on another (thus the xattr dict obtained from the failed subvolume in the callback will be NULL), then afr would be unwinding with op_ret = 0 (since the operation was successful on one subvolume), but the xattr dict would be NULL (afr is not saving the xattr it has received in the callback in its local structure and will send the xattr it has received in the last callback). xlators above afr might segfault when they access the xattr since they would have assumed that xattr would be present as op_ret is 0. Change-Id: I50761a302150285f31dfdaa397f890c9370a989a BUG: 797119 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2813 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Revert "afr: [Un]Set the 'right' lkowner for [f]{inode|entry}_lk and the ↵Vijay Bellur2012-03-031-9/+1
| | | | | | | | | | | 'enclosed' fop." This reverts commit 2e80fdbeb6abbb23ff6789c2b98c82704883af0a. Change-Id: I417fd43e4195d63e5b8b83dd3beb712887130e1e Reviewed-on: http://review.gluster.com/2860 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Add new option to know which process it is inPranith Kumar K2012-03-011-0/+1
| | | | | | | | | | | | | | | Afr xl needs to maintain inode-table inside the xl if it is in self-heal-daemon. The code was depending on the option self-heal-daemon to do this. This is wrong as the option can be reconfigured to on/off. Added a new option which can't be reconfigured for this purpose. Change-Id: Idc42c403c4bd9b73d1f328427ae4158ff1420b3a BUG: 795741 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle errors in build_parent_locPranith Kumar K2012-03-011-2/+2
| | | | | | | | | BUG: 787671 Change-Id: I0b01b0f9e14a26d757748413dd71909e915c7573 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2826 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* afr: [Un]Set the 'right' lkowner for [f]{inode|entry}_lk and the 'enclosed' fop.Krishnan Parthasarathi2012-03-011-1/+9
| | | | | | | | | | | | | | afr 'mangles' the lkowner inorder to ensure [f]inodelk/[f]entrylk fops from the same application contend. But other fops that are 'visible' to the application should operate with the lkowner provided by fuse for correct functioning of posix-locks xlator. Change-Id: I7e71f35ae7df2a070f1f46d4fc77eed26a717673 BUG: 790743 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.com/2752 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Introduce new extended attribute: node-uuidVenky Shankar2012-02-221-1/+1
| | | | | | | | | | | | | | | | Request for trusted.glusterfs.node-uuid returns pathinfo like string but containing the UUID of glusterd instead of the backend path for the requested file. This info is benificial for tasks like parallel rebalance that will make use of the UUID for data locality. Change-Id: I766a09cc4a5f63aebd11c73107924a1b29242dcf BUG: 772610 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.com/2614 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <shishirng@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: utilize mempool for frame->local allocationsAmar Tumballi2012-02-211-21/+24
| | | | | | | | | | | | | | | in each translator, which uses 'frame->local', we are using GF_CALLOC/GF_FREE, which would be costly considering the number of allocation happening in a lifetime of 'fop'. It would be good to utilize the mem pool framework for xlator's local structures, so there is no allocation overhead. Change-Id: Ida6e65039a24d9c219b380aa1c3559f36046dc94 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765336 Reviewed-on: http://review.gluster.com/2772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Add commands to see self-heald opsPranith Kumar K2012-02-201-7/+16
| | | | | | | | | Change-Id: Id92d3276e65a6c0fe61ab328b58b3954ae116c74 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Self-heald, Index integrationPranith Kumar K2012-02-201-2/+4
| | | | | | | | | Change-Id: Ic68eb00b356a6ee3cb88fe2bde50374be7a64ba3 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2749 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: add an extra flag to readv()/writev() APIAmar Tumballi2012-02-141-0/+2
| | | | | | | | | | | | needed to implement a proper handling of open flag alterations using fcntl() on fd. Change-Id: Ic280d5db6f1dc0418d5c439abb8db1d3ac21ced0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2723 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Perform xattrop with all afr-keysPranith Kumar K2012-01-271-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | Self-heal does not happen if the file has change log xattr only for one of the subvol keys. This patch makes sure that xattrop is done for all the afr subvol keys after a new entry is created in entry-self-heal. 1) Added matrix create/cleanup functions 2) Impunging a new file does multiple xattrops on the source subvol, one per sink. The code can do a single xattrop after the entry is created on all the sinks. 3) Missing entry self-heal uses one frame per sink to heal the file. This leads to multiple xattrops on the source subvol. That code is changed now to use one frame which will create the file on all subvols. Change-Id: I65a42f9779b03f7efae283479f8653fb2cb8046b BUG: 762680 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2503 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Krishnan Parthasarathi <kp@gluster.com>
* complete the implementation of missing 'f**xattr()' fopsAmar Tumballi2012-01-251-1/+1
| | | | | | | | | | | | | | in debug/* and cluster/* translators and a syncop_fsetxattr() added a test case for testing the working of 'f-fop()' on fuse mount. Change-Id: I0c2aeeb30a0fb382ef2495cca1e66b00abaffd35 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 766571 Reviewed-on: http://review.gluster.com/802 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: get xattrs also as part of readdirpAmar Tumballi2012-01-251-1/+1
| | | | | | | | | | | | | readdirp_req() call sends a dict_t * as an argument, which contains all the xattr keys for which the entries got in readdirp_rsp() are having xattr value filled dictionary. Change-Id: I8b7e1290740ea3e884e67d19156ce849227167c0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765785 Reviewed-on: http://review.gluster.com/771 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: GFID filehandle based backend and anonymous FDsAnand Avati2012-01-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core/setxattr: prevent users from setting glusterfs xattrsRajesh Amaravathi2012-01-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | * Each xlator prevents the user from setting glusterfs-internal xattrs like trusted.gfid by handling it in respective setxattr functions. The speacial case of trusted.gfid is handled in fuse (Not in posix because posix_setxattr is used to set gfid). * For xlators which did not define setxattr and/or fsetxattr, the functions have been implemented with appropriate checks. xlator | fops-added _______________|__________________________ | 1. afr | fsetxattr 2. stripe | setxatrr and fsetxattr 3. quota | setxattr and fsetxattr Change-Id: Ib62abb7067415b23a708002f884d30e8866fbf48 BUG: 765487 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/685 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* cluster/afr: Handle fini for afr,pumpv3.3.0qa19Pranith Kumar K2011-12-281-5/+0
| | | | | | | | | Change-Id: Idc0a05a8a25f278a7ab05e242263e0a5001bde18 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> BUG: 767862 Reviewed-on: http://review.gluster.com/800 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle error cases in local initPranith Kumar K2011-12-281-1/+1
| | | | | | | | | | | | - Fop should unwind with appropriate errno - Local is de-allocated on errors Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Change-Id: I4db40342ae184fe1cc29e51072e8fea72ef2cb15 BUG: 770513 Reviewed-on: http://review.gluster.com/2539 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle split-brain/all-fool xattrs for directoryPranith Kumar K2011-12-271-0/+5
| | | | | | | | | | | | In case of split-brain/all-fool xattrs perform conservative merge. Don't treat ignorant subvol as fool. Change-Id: I6ddf89949cd5793c2abbead7c47f091e8461f1d4 BUG: 765528 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2521 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Add command-line support (but no doc) for enforce-quorum option.Jeff Darcy2011-11-281-9/+15
| | | | | | | | Change-Id: Ia52ddb551e24c27969f7f5fa0f94c1044789731f BUG: 3823 Reviewed-on: http://review.gluster.com/743 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Update read-child if it becomes stalePranith Kumar K2011-11-281-1/+1
| | | | | | | | Change-Id: I00c714a89575023f6dbdd3430dcbf191e5d08019 BUG: 3650 Reviewed-on: http://review.gluster.com/740 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Remove treating sh_frame as special loop_framePranith Kumar K2011-11-231-1/+3
| | | | | | | | Change-Id: I0d87f06f989b2d4b971967c52d4898331693a801 BUG: 3675 Reviewed-on: http://review.gluster.com/735 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Add quorum checks to avoid split-brain.Jeff Darcy2011-11-201-0/+18
| | | | | | | | Change-Id: I2f123ef93989862aa796903a45682981d5d7fc3c BUG: 3533 Reviewed-on: http://review.gluster.com/473 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* dht,afr: Fixed gfid problemsPranith Kumar K2011-11-181-0/+1
| | | | | | | | | | | *) removed uuid_generate usage in pump and afr, self-heald *) filled the gfids for the fops which were sending no gfid in loc Change-Id: I85da3c10f5ee2006248b0123155a60867870d202 BUG: 3760 Reviewed-on: http://review.gluster.com/679 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Remove unused 'ino' codePranith Kumar K2011-10-281-33/+0
| | | | | | | | Change-Id: I425e2d23e9e45f10ddeff2eacf918dd90f8baee7 BUG: 3744 Reviewed-on: http://review.gluster.com/639 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Prevent truncation of mask uint64_t->gf_boolean_tPranith Kumar K2011-10-141-1/+1
| | | | | | | | Change-Id: If67f726f21b713fa9312dc499a1aca4cb00f71de BUG: 3682 Reviewed-on: http://review.gluster.com/589 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: set fd ctx on opendirPranith Kumar K2011-09-291-0/+3
| | | | | | | | Change-Id: Ica845035781f47de990e9dcfefdeb37bed99d515 BUG: 3637 Reviewed-on: http://review.gluster.com/536 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Handle files without gfid in self-healPranith Kumar K2011-09-291-1/+7
| | | | | | | | Change-Id: Ibcaaa9c928195939ff1e31b28b592e524e63a423 BUG: 3557 Reviewed-on: http://review.gluster.com/519 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>