summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/afr
Commit message (Collapse)AuthorAgeFilesLines
* afr: Modified book-keeping structures for entrylksKrishnan Parthasarathi2013-01-236-460/+512
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * There are upto 3 entry lockees that may be needed to perform entrylk'ing in posix dir-write operations. * For eg, rmdir ("/a/b") needs to acquire locks on two entities, - entrylk ("/a", "b") - entrylk ("/a/b", null) * Changed existing entrylk/rename/selfheal (entrylk) transactions to use the new book-keeping structures * Fixed few issues in afr_trace_entry_lk{in,out} functions. Tracing is now aware of the new entry lockee structure. Implementation notes: * Changed 'cookie' sent in stack_wind to encode lockee_entity_no and subvol_no. cookie is a non-negative integer such that 0 <= cookie < replica_count, When more than one lock is being acquired across the subvolumes, cookie % replica_count gives the subvol_no cookie / replica_count gives the lockee_entity_no. Change-Id: Idbf41803387a7d59a0f7fcb1453d91cea74da153 BUG: 765564 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.org/2828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Remove strict-readdir implementationPranith Kumar K2013-01-231-201/+0
| | | | | | | | | | | Leaving option frame-work un-changed for backward compatibility. Change-Id: I40bce1ec360801307e67f09e53b0721f64efab37 BUG: 886998 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4309 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* self-heald: Remove stale index even in heal infoPranith Kumar K2013-01-221-35/+45
| | | | | | | | | Change-Id: Ic1c9559aec59c1fb9dfede4aba8895f3b86f32f1 BUG: 861015 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4098 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: Link inode only on lookupPranith Kumar K2013-01-211-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When "gluster volume heal <volname> info is executed, crawl's process_entry is not going to populate iatt structure so the iatt's gfid will be empty. So inode_links are failing. Fix: inode_link should be done only after lookup i.e. when heal is performed. So moved the inode_link related code to just after the lookup which is triggered when self-heal is done. Tests: The testcase that gives this issue does not give the inode-link failures anymore. glustershd heal, info commands are working as expected. Wrote basic automation tests for proactive-self-heal-daemon https://github.com/pranithk/gluster-tests/blob/master/afr/proactive-self-heal.sh Change-Id: Ic112bf104a4d553a64d3d8559f681a25ae1a5362 BUG: 861015 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4090 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Disable delayed post op when eager-lock is offPranith Kumar K2013-01-181-0/+3
| | | | | | | | | | | | | | | | | | | | | Problem: When eager-lock is disabled, inodelks for write-fops on same fd conflict with each other. If eager-lock is disabled but delayed post-op is enabled then each write fop's inodelk unlock waits for post-op-delay-secs. So the conflicting write fop acquires inodelk after post-op-delay-secs. This results in post-op-delay-secs delay for every write fop on the fd for sequential writes (Ex: dd). Fix: Disable delayed-post-op when eager-lock is off. Change-Id: I87ea4c8d1c7bb269b9b174388ae50f37e82629b7 BUG: 895235 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4391 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Fail readv on data-split-brainPranith Kumar K2013-01-183-0/+23
| | | | | | | | | | | | | | | | | | Problem: Afr prevents opens on a file in split-brian but the fd that is already open still has the capability to perform both reads and writes to the file. Fix: Fail readvs on a file with EIO. Change-Id: I8e07f24c36fab800499b36ab374f984b743332cd BUG: 873962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4199 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: conditionally prioritize EIO errors over ENOENTBrian Foster2013-01-183-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | The most important errno logic historically only prioritized ESTALE over ENOENT. Commit c8c0942d added EIO prioritization over ENOENT to ensure that split-brain was reported when it occurs in conjunction with bricks missing the file entry. The unintended side effect of this change is that (non split-brain) EIO errors reported from the bricks themselves are now reported to the client when the expectation is that afr should squash said errors in favor of marking the file inconsistent. The high-level problem is that EIO is overloaded with different meanings from different contexts. This commit adds an eio parameter to the errno priority logic to conditionally flag when EIO is of higher priority and should be propagated to the client. BUG: 892730 Change-Id: Ib692a8a1f1737ef190d57894f392ec53ffb33aab Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4376 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: replace afr_more_important_error with afr_most_important_errorBrian Foster2013-01-173-26/+18
| | | | | | | | | | | | | | | | | | | | | afr_more_important_error() is written to return whether a new errno should override an existing errno for high-level operations that could span multiple sub-operations. It specifically prioritizes ESTALE over EIO over ENOENT, and otherwise defaults to the latest error passed having priority. This change preserves current behavior, but rewrites the logic to return the higher priority error of the existing and new errno. The purpose of the change is to make the logic a bit more clear and set the stage for future changes to make the logic flexible based on context. BUG: 892730 Change-Id: Id1aa48855dfb0507abc9d1ef22f2259b30472576 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4375 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Pre-op should be undone for non-piggyback post-opPranith Kumar K2013-01-161-2/+6
| | | | | | | | | | | | | | | | | | | | | | Problem: When fop fails post-op is always performed over the network irrespective of whether pre-op is piggybacked or not. Decrementing Pre-op-done count even for the piggybacked ones is wrong. I have added an assert for pre_op_done to be non-zero and when dd of=a if=/dev/urandom bs=5M count=1000 is executed and a brick is taken down, the mount is crashing. Fix: Decrement pre-op-done count only when the post-op is not piggybacked. Change-Id: Ie837251a43bfb437f0fada191302eeee60be1601 BUG: 863939 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4310 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: don't clear changelog for un-healed replicasJeff Darcy2013-01-161-6/+44
| | | | | | | | | | Change-Id: Iebfa6770a688e89c051666b46977862188061738 BUG: 802417 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4034 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: check for the -ve values returned from dict_serialized_lengthRaghavendra Bhat2012-12-171-1/+1
| | | | | | | | | Change-Id: I9fa7744b02791180ccb93adef10c363a1b38aa31 BUG: 838204 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4319 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Remember type of split-brain in inode-ctxPranith Kumar K2012-12-115-117/+114
| | | | | | | | | | | | | Along with this change, fixed the race of setting the split-brain status in inode-ctx after unwinding the fop from self-heal in case of back-ground self-heal. Change-Id: Ifc829300df485f50f139443802e8b6dc7038b4ad BUG: 873962 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4198 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Empty string should not be default option valPranith Kumar K2012-12-053-5/+6
| | | | | | | | | | | | Glusterd does not allow empty string as default value. Changed afr option values to disallow empty string as value. Change-Id: I92a2d658907dbc6101e1139dd91f548acb5506f5 BUG: 859927 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4271 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: mark new entry changelog for create/mknod failuresPranith Kumar K2012-12-045-67/+228
| | | | | | | | | | | | | | | | | | | | | | Problem: When create/mknod fails on some of the nodes, appropriate pending data/metadata changelogs are not assigned. This was not considered to be an issue because entry self-heal would do the assigning of appropriate changelog after creating new entries. But using the combination of rebalance and remove brick we can construct a case where a file with same name and gfid can be created in a dir with different data and link-to xattr without any changelog. Fix: When a create/mknod failure is observed mark the appropriate changelog on the new file created. Change-Id: I4c32cbf5594a13fb14deaf97ff30b2fff11cbfd6 BUG: 858212 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4207 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: use data trylock mode in read/write self-heal trigger pathsBrian Foster2012-12-041-1/+8
| | | | | | | | | | | | | | | | | | | | | | Self-heal data lock contention between clients and glustershd instances can lead to long wait and user response times if the client ends up pending its lock on glustershd self-heal of a large file. We have reports of guest vm instances going completely unresponsive during self-heal of virtual disk images. Optimize the read/write self-heal trigger codepath (i.e., afr_open_fd_fix()) to trylock for self-heal and skip the self-heal otherwise to minimize the likelihood of a running/active guest of competing with glustershd on arrival of a brick. Note that lock contention is still possible from the client (e.g., via lookup). BUG: 874045 Change-Id: I406443c061ff6acd2a851179626b78352caa5c03 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4258 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: support self-heal data trylock mechanismBrian Foster2012-12-044-8/+15
| | | | | | | | | | | | | | | Introduce a block flag to support an optional blocking or non-blocking mode in the self-heal data locking mechanism. All callers are modified to use blocking mode, which is the current default behavior (no change in behavior is introduced by this commit). BUG: 874045 Change-Id: Ib7ff9984578fa11de4e3b6981508100cdddd37cd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4257 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: make flush non-transactionalBrian Foster2012-12-043-139/+38
| | | | | | | | | | | | | | | | | | Flush is historically a transaction to ensure all previous writes were complete. This is no longer required as write-behind has learned to make flush a barrier operation (re: conversation w/ Avati). Flush taking a full file lock causes VMs running on afr volumes to stall when a migration occurs and self-heal is in progress. Make afr_flush() a non-transactional operation. BUG: 874045 Change-Id: If2db83823e280c86b1b29b41361eed7081601632 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4261 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to disable readdir failoverPranith Kumar K2012-12-034-25/+41
| | | | | | | | | | | | | | | | | In a replica pair unlike files, directories may not have their content in same order, so readdir for same (offset, size) may not give same entries on both the sobvolumes of replica pair. Switching over from one subvolume to another may not be a good idea sometimes. It may lead to duplicate entries or fewer entries or both. This patch provides a way to disable readdir-failover so that applications like rebalance can retry if they want to. Change-Id: I2b23eb224a2e84016a561362932613ac824c11a0 BUG: 859387 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4159 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Added descriptions to afr optionsPranith Kumar K2012-11-291-11/+86
| | | | | | | | | Change-Id: I4aef1c79743ee08b62e04d7b709f3e8c6b9dc56a BUG: 881517 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4244 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* afr: send unique dict_t instances to replicas in self-heal fxattropBrian Foster2012-11-291-28/+42
| | | | | | | | | | | | | | | | | | | | | afr_sh_data_fxattrop() currently allocates and sends a single xattr dict_t instance to each replica. The callback codepath references the returned object in the self-heal in-memory state for the particular replica. If storage/posix is in the same address-space (i.e., running a single glusterfs client with a fuse->afr->posix graph), the same object is modified and returned for each child, causing corrupted in-memory state and afr xattrs. Allocate and send independent xattr dict_t's for each replica. This allows self-heal to work correctly in a single address-space graph. BUG: 868478 Change-Id: I42832e85b5d1abb6098c28944c717e129300109e Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4149 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: handle short writes in afr_writev_wind and self-heal to avoid corruptionBrian Foster2012-11-295-16/+75
| | | | | | | | | | | | | | | | | | | | | The current failure to handle short writes on writev fops leaves us open to file corruption. A short write on a user request is ignored and leaves replicas in an inconsistent state. A short write during a self-heal is ignored and incorrectly marks the files as consistent if the heal completes. Modify user writev handling to return the best case return value from each of the replicas. Short writes that occur relative to this value are marked as failed and will require a heal. Modify self-heal to set an error on a short write and abort the heal. BUG: 853690 Change-Id: I18b30f58702326249230eeebb361b29e40b535f5 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4150 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: handle GF_XATTR_LOCKINFO_KEY appropriately.Raghavendra G2012-11-272-26/+521
| | | | | | | | | | | | | | values from all children need to be aggregated into a dictionary and serialized buffer of this aggregated dictionary has to be the value of GF_XATTR_LOCKINFO_KEY in the dict sent as a result of fgetxattr. Change-Id: Ie877f7c637c07feaee4c44d7ef86aa967a17b7e7 BUG: 808400 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4121 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* replicate: don't stop checking xattrs because one was absentJeff Darcy2012-11-263-114/+41
| | | | | | | | | | | | | | | | | | | | | The functional issue is described by the subject line. This patch also addresses several efficiency/structure issues, such as... * Calling dict_set_ptr once for each txn type, instead of once overall. * Calling afr_index_for_transaction_type once per iteration instead of once per call (or better yet zero since the conversion is unnecessary). * Implementation of inner functions in a different file than their one caller, creating a spurious header-file dependency. Change-Id: I29e0df906a820533b66b9ced73e015dfe77267d2 BUG: 865825 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Cluster/afr: Fix output for gluster volume heal vn info healedVenkatesh Somyajula2012-11-267-17/+53
| | | | | | | | | | | | | | | | | | | | | | Problem: Whenever gluster volume heal vol full command is executed, the entries stored in the circual buffer for sh->healed are added in the dictionary in the _crawl_post_sh_action function irrespective of whether actual self heal (due to non-zero values in chage log) takes place or not. Fix: Value of key (actual-sh-done) will be set to 1 whenever self heal takes place due to non-zero change log values and if for some FOP self heal daemon finds that no self heal required after examining the pending matrix, the value will be 0. Change-Id: I11fd0b9ee76759af17c5bca6bfafbaf66bcaacbc BUG: 863068 Signed-off-by: Venkatesh Somyajula <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4181 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: check transaction type for eager-lock after it is setPranith Kumar K2012-11-211-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: I7607c7ff4f88c7ced5416a1cddb6586cf45d88f9 BUG: 861335 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4220 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr : Edited log message in afr_sh_entry_expunge_entry_cbkVenkatesh Somyajulu2012-10-121-1/+2
| | | | | | | | | Change-Id: I9f7562d28c8bc798552c403164397f929a7bd1e7 BUG: 860246 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4052 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Preventing client crashing as the callings of GF_CALLOC has been failed.linbaiye2012-10-114-31/+107
| | | | | | | | | | | | | As the callings of GF_CALLOC can seldom come to a failure, glusterfs client will crash due to segment fault. We should have returned once the variables of transaction's local can't be alloced. Change-Id: Ia3798b8349d832b23c7825e64dbad93ebe29cd1b BUG: 861335 Signed-off-by: linbaiye <linbaiye@gmail.com> Reviewed-on: http://review.gluster.org/4005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* replicate: don't use synctask_new from within a synctaskJeff Darcy2012-10-111-3/+14
| | | | | | | | | | Change-Id: Iebf821ff720c63ab6da4b219d82c7f1d00769992 BUG: 862838 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4032 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: Changes and enhancements to XML outputKaushal M2012-10-111-4/+8
| | | | | | | | | | | | | | | | | | | | This patch contains several xml related changes which fix some bugs and introduce xml output for commands which were missing it. These include, * XML output for rebalance & remove-brick status * XML output for replace-brick * XML output for 'volume status all' in on xml document * proper XML output for "volume {create|start|stop|delete}" * type & status of a volume in 'volume info' is now given as a string as well This patch also cleans up the '#if (HAVE_LIB_XML)' sections from the code-base, so that it is not littered around. Change-Id: I5bb022adf0fedf7e3ead92b4b79bfa02b0b5fef5 BUG: 828131 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3869 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr Changed the message's log level from Error to DebugVenkatesh Somyajulu2012-10-101-3/+3
| | | | | | | | | Change-Id: Ic2506561367bfec9022dc53e9b17b03dc343df95 BUG: 859411 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4055 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: check transaction type for eager-lock after it is setPranith Kumar K2012-10-101-6/+6
| | | | | | | | | | | | | | | | | | | | | | | Problem: Eager locking lk-owner decision is taken before transaction type is set. Default transaction type is DATA so all transactions are treated as DATA transactions at the time of eager-locking decision. Fix: Move the code that takes lk-owner decision after the transaction type is set. Test: Checked that the transaction type is set properly in gdb at the time of the lk-owner decision. Change-Id: Ib1c886866f28788aed67622982e86d667b2cdb80 BUG: 864786 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4053 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: split CPPFLAGS from CFLAGSJeff Darcy2012-10-031-2/+4
| | | | | | | | | | | | | | | | | Automake provides a separate variable for preprocessor flags (*_CPPFLAGS). They are already uses in a few places, so make it consistent and use it everywhere. Note that cflags obtained from pkg-config often are cppflags, which is why LIBXML2_CFLAGS moves with into AM_CPPFLAGS, for example. Change-Id: I15feed1d18b2ca497371271c4b5876d5ec6289dd BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4029 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove useless explicit -fPIC -shared fromJeff Darcy2012-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | CFLAGS libtool will automatically add "-fPIC" to the compiler command line as needed, so there is no need to specify it separately. "-shared" is normally a linker flag and has an odd effect when used with libtool --mode=compile, namely that it inhibits production of static objects. For that however, using AC_DISABLE_STATIC is a lot simpler. Change-Id: Ic4cba0fad18ffd985cf07f8d6951a976ae59a48f BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4027 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: remove -nostartfiles flagJeff Darcy2012-10-021-1/+1
| | | | | | | | | | | | | | | The "-nostartfiles" is a discouraged option and is documented to potentially result in undesired behavior. Since I see no reason why it should be in glusterfs, remove it. Change-Id: I56f2b08874516ebad91447b2583ca2fb776bb7ab BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4018 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: consolidate common compilation flags into one variableJeff Darcy2012-10-011-1/+1
| | | | | | | | | | | | | | | Some -D flags are present in all files, so collect them. This adds -D${GF_HOST_OS} to some compiler command lines, but this should not be a problem. Change-Id: I1aeb346143d4984c9cc4f2750c465ce09af1e6ca BUG: 862082 Original-author: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4013 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Provide option to set readdir-size in entry-self-healPranith Kumar K2012-10-013-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Entry self-heal does lookups on all the entries that are read in readdir. More the size of readdir more number of lookups happen in parallel. It is observed that it leads to HUGE cpu spikes rendering everything else on the system unusable. Fix: Provided the option self-heal-readdir-size to configure the size. Default value is at 1KB. Tests: Checked that the readdirs are happening with the configured value in entry-self-heal. Change-Id: Icaa937ad88857e6f9a12375b1e7f6a49192bc8b1 BUG: 860895 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/4002 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Trigger heal on local subvols on any child_upPranith Kumar K2012-09-251-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The index in the child that comes online is generally empty because the changes would have happened on the other child which has been up. So the sync begins when the other child's poll time-out happens (i.e. 10 minutes). The expectation is that the sync must be triggered as soon as the connection with any brick is established. Fix: Whenever any child_up happens trigger the index self-heal on all local children in the replicate subvolume. Tests: 1) Checked that the self-heal is triggered on all local children whenever any child comes online. 2) Checked that the volume heal commands are working fine. Change-Id: I4f64737866470a2f989349a889ea52782930e11d BUG: 852741 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3972 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Wake up post-op on non-co-operative transactionPranith Kumar K2012-09-251-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The problem is observed when kernel untar is done. One file untar happens every second. The reason for this is, setattr lock is blocked on the prev fd data-transaction full-lock (because of eager-lock). Because of post-op-delay the post-op (xattrop + unlock) of the prev data-transaction happens after 1 sec. Until this the setattr is blocked resulting in performance problems in untar. Fix: Whenever an loc data, meta-data transaction comes, it should wakeup the prev-post-op on the same process' fd. Tests: The performance problem in untar went away. I put a breakpoint in client_finodelk for a 2G file dd and the inodelk is hit only 4 times. This confirms that the change does not affect post-op-delay in a -ve way. Change-Id: Ice3c2a1211f4dca6520a19bc4ba6cb9efb2902ad BUG: 845754 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3975 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Clean up of typepunning errors ( Strict aliasing warnings )Varun Shastry2012-09-171-1/+3
| | | | | | | | | | | Change-Id: I48733967facc526fb523a8dc9bd068f8c5cc5971 BUG: 764282 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3950 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: add option description of 'open'.Jules.Wang2012-09-061-0/+2
| | | | | | | | | Signed-off-by: Jules Wang <lancelotds@163.com> Change-Id: I6c7dd337c758e82e9d58d4d65f53b5aa72ac5dfb BUG: 764890 Reviewed-on: http://review.gluster.org/3895 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/dict: make 'dict_t' a opaque objectAmar Tumballi2012-09-063-12/+11
| | | | | | | | | | | | | | | * ie, don't dereference dict_t pointer, instead use APIs everywhere * other than dict_t only 'data_t' should be the valid export from dict.h * added 'dict_foreach_fnmatch()' API * changed dict_lookup() to use data_t, instead of data_pair_t Change-Id: I400bb0dd55519a7c5d2a107e67c8e7a7207228dc Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 850917 Reviewed-on: http://review.gluster.org/3829 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Don't stop entry/data self-heal on metadata split-brainPranith Kumar K2012-08-292-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Entry/Data self-heal is orthogonal to meta-data self-heal. meta-data split-brain should not affect entry/data self-heal. Fix: Prevented aborting rest of the self-heals when metadata split-brain happens. Tests: 1) Simulated meta-data split-brain then checked data-self-heal succeed on regular file, entry-self-heal succeed on dir. 2) Reset meta-data change-log on one of the subvols and checked that meta-data self-heal also completes. 3) Executed self-heal sanity script. Change-Id: I05ca222d855d3a6000703e3775471d0f874d35d6 BUG: 851451 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3853 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <obdurodon@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com>
* afr: Avoid excessive logging in self-heal.Krishnan Parthasarathi2012-08-236-22/+25
| | | | | | | | | | | | | | - (Excessive) Logging has been very useful as 'bread-crumbs' in many a root-cause analyses. This patch aims at avoiding logging when the information could be reconstructed using the xattrs, statedump, and/or "volume heal" CLI commands. Change-Id: Iebc6b10ae18f0dd9704bdc6dd03bcfe0f2a09abd BUG: 844804 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3805 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Self-heald: Prevent logging of errno ENOENTVenkatesh Somyajulu2012-08-201-4/+4
| | | | | | | | | Change-Id: Ie56228dfbdc7e519a344681487164a835488a470 BUG: 835423 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/3826 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Unwind with correct pre/post parent bufsPranith Kumar K2012-08-023-403/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: In case of dir fops create, mknod, mkdir, link, symlink, rename if the fop fails on read-child then unwinds are happening with all-zero pre/post iatt-bufs. The bug occurs because the parent bufs are not saved if the response is not from read-child. Fix: Save the pre/post-bufs for the first response. If the response comes from read-child, overwrite whatever we have cached. Tests: Attached the mount process to gdb. Tested that the unwinds happen with proper pre/post iatt bufs in the following cases: 1) All success case 2) Failure on read-child 3) Failure on non-read-child 4) Failure on all children. Tested soft-link self-heal to test the change made in that. Tested errno ENOTEMPTY for rmdir, rename fops. Change-Id: I82882423d2d766b4f4a3044203bcb5dbcaee1755 BUG: 845242 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Handle child_up & fd not opened case in xactionPranith Kumar K2012-08-011-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: When an fd is opened while a brick is down, after the brick comes back up afr issues open on the other brick. It can fail for a number of reasons (enoent etc). While the system is in that state, inode/entrylks pre-op happen only on the brick that is up and fd is opened for fd-fops. post-op should consider only the bricks where both pre-op and fop succeeded as success, rest of them as failures. Code now marks only the children that are down as failures as opposed to child_down & fd-not-opened. This makes change-log appear as success on the subvolume where we did not do any fop leading to no change-log but differences in data/metadata for reg-files. Fix: Mark non-participants of fop as failure. This is tracked in transaction.pre_op[]. Tests: Simulated the scenario using err-gen on top of one of the client xlator which fails all fops always. Performed fops and the changelog represented pending fops on the brick with err-gen loaded. Tested the case of brick down and perform entry/metadata/data operations to confirm they still work as expected. Change-Id: I41905936126b19abba56ca581c0301a894507e1a BUG: 844987 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3765 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Handle failures in fop_cbk gracefullyPranith Kumar K2012-07-311-31/+46
| | | | | | | | | | | | | | | | | | | | | | | | | RCA: Afr crashes when a last fop response fails and 'fop output' arguments are NULL. Afr does not handle these gracefully. Fix: Changed the fops to not access the 'fop output' arguments in case of failures. Tests: Changed afr wind_cbk code to fail the last response by setting op_ret as -1 and op_errno as ENOMEM and setting all other output variables as NULL to test the change. Removed the code to verify success cases. No crashes or errors seen. Change-Id: Iad9bc54db093a162f85bfb8dbeeda5b95acd21d8 BUG: 844689 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3760 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: update loc inode after inode_linkPranith Kumar K2012-07-311-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | RCA: inode passed to inode_link is not assigned any gfid if the inode with that gfid is already linked, so loc for opendir does not have a valid inode Fix: Use the linked_inode returned by inode_link in the loc to perform further operations on the entry. Tests: Checked that opendir comes with an loc with valid inode. Checked that re-opendir happens successfully. Tested index, full self-heal work fine with the fix. Change-Id: Idf4ced4cc2320133744962059d363e373af0e5ec BUG: 826580 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3748 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Modified split-brain handlingPranith Kumar K2012-07-265-57/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA The bug is observed because the decision to mark a file in split-brain is taken outside appropriate locks. Lookup gathers xattrs outside any lock. The xattrs being in split-brain in lookup should only be taken as a hint. Appropriate inodelks should be taken before confirming a split-brain. Self-heal confirms this at the moment. If data/metadata self-heal is turned off, inspecting of xattrs could not be performed so split-brain behavior does not work correctly if the self-heal options are turned off. Fix Self-heals are launched to inspect xattrs even when the data/metadata self-heal options are turned off. The decision to heal data/metadata after the xattrs are inspected is based on whether the options are turned on/off. So decision to set/reset split-brain flag is taken inside appropriate locks. Testcases: tests 33-36 in https://github.com/pranithk/gluster-tests/blob/master/afr/self-heal.sh Change-Id: Ia8aeab08208b50c06609ad35a9d72f3d553ee343 BUG: 833727 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Filter O_TRUNC in afr-fix-openPranith Kumar K2012-07-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | RCA: When open was done while a brick is down, afr opens the file after the brick comes backup. If this happens after the self-heal on the file is completed by self-heald etc, the file will end up in truncated state. Fix: Filter O_TRUNC while afr-fix-open because afr_open turns O_TRUNC into truncate transaction, so there will be pending changelog for the subvolume on which open fails. Testing: Had to simulate the race by stopping fix-open until self-heald completes self-heal on the file after brick online. Change-Id: I32759cc37f4bb34f206d01606a279f17b246dba4 BUG: 841840 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3705 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>