summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* mount/fuse: enable fuse real async dio when availableBrian Foster2013-05-152-0/+8
| | | | | | | | | | | | | | | | | | | | | fuse has support for optimized async. direct I/O handling via the FUSE_ASYNC_DIO init flag. Enable FUSE_ASYNC_DIO when advertised by fuse. performance/write-behind: fix dio hang Also fix a hang observed during aio-stress testing due to conflicting request handling in write-behind. Overlapping requests are skipped in pick_winds and may never continue when the conflicting write in progress returns. Add a wb_process_queue() call after a non-wb request completes to keep the queue moving. BUG: 963258 Change-Id: Ifba6e8aba7a7790b288a32067706b75f263105d4 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5014 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glupy patch by Ram, Justin: Add/Modify fops, structure types, utility fnsRam Raja2013-05-138-162/+3773
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the following fops with Python: * open * readv * writev * opendir * readdir * readdirp * stat * fstat * statfs * setxattr * getxattr * fsetxattr * fgetxattr * removexattr * fremovexattr * link * unlink * readlink * symlink * mkdir * rmdir Add fd_t, inode_t and iatt_t structure types. Modify loc_t structure type; Alter the data types of the following attributes - inode, parent, gfid, pargfid. Modify uuid2str function, which returns a string equivalent for a ctype object representing a gfid, to make use of python's 'uuid' module for accurate representation of uuids. by Justin Clift: Adjust debug-trace.py to work with Python 2.6 Work around 'zero length field name in format' bug in negative.py's uuid2str function Fix indentation errors in negative.py, glupy.h, glupy.c, gluster.py Change-Id: If0fcfb2866e21c0380a973f8ffab9ea7b6a4cd5d BUG: 961856 Signed-off-by: Ram Raja <rraja@redhat.com> Reviewed-on: http://review.gluster.org/4907 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Justin Clift <jclift@redhat.com> Tested-by: Justin Clift <jclift@redhat.com>
* glusterd: Perform NULL check on rsp.op_errstr before using itKrutika Dhananjay2013-05-131-1/+1
| | | | | | | | | | Change-Id: Id18b215a91cf016964ea98d2f414293b82167d24 BUG: 962362 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4992 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: remove-brick: prevent removal from a replicate volume.Ravishankar N2013-05-131-2/+13
| | | | | | | | | | | | Prevent the removal of brick(s) from a plain replicate volume and display the error message at the CLI. Change-Id: I8e182404564147329d8cd364b7c7931d19f14570 BUG: 961669 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/4975 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Fix uuid to hostname conversion for 'volume status'Kaushal M2013-05-121-3/+4
| | | | | | | | | | Change-Id: I46c41c29c2d11652f6d8ccd5637be0ac9774fc1d BUG: 927648 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4873 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Set op-version on startup based on install statusKaushal M2013-05-121-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the current installation of glusterfs doesn't have a stored op-version and is, a. a fresh install, then set op-version to maximum b. an upgrade from release which didn't have op-version support, set it to minimum. The install status is detected using the peer-uuid. If both peer-uuid and op-version are not present in the store, the installation is fresh. If peer-uuid is present, but op-version is absent in the store, the installation has been upgraded from a version which didn't support op-versions. By setting the initial op-version as above, we can ensure that a. features are not enabled accidentally during upgrades b. a fresh install starts with all possible features enabled. Change-Id: I52aed9ebeb5d3823c81e65f739a78a924ecf7e12 BUG: 954256 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4867 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glupy: Importing Jeff's glupy project into glusterfsRam Raja2013-05-109-1/+750
| | | | | | | | | | Change-Id: I3891ef6eaf6ede7c8cbedc3298ce2501a69b2b05 BUG: 961856 Original-author: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Ram Raja <rraja@redhat.com> Reviewed-on: http://review.gluster.org/4906 Reviewed-by: Justin Clift <jclift@redhat.com> Tested-by: Justin Clift <jclift@redhat.com>
* object-storage: final removal of ufo codePeter Portante2013-05-101-54/+0
| | | | | | | | | | | | | | | | | | | See https://git.gluster.org/gluster-swift.git for the new location of the Gluster-Swift code. With this patch, no OpenStack Swift related RPMs are constructed. This patch also removes the unused code that references the user.ufo-test xattr key in the DHT translator. Change-Id: I2da32642cbd777737a41c5f9f6d33f059c85a2c1 BUG: 961902 (https://bugzilla.redhat.com/show_bug.cgi?id=961902) Signed-off-by: Peter Portante <peter.portante@redhat.com> Reviewed-on: http://review.gluster.org/4970 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Luis Pabon <lpabon@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Give up big lock before performing any RPCKrishnan Parthasarathi2013-05-071-1/+13
| | | | | | | | | Change-Id: Ib0a772dc1cb9afc8adccd8f7092f480d2b525ea0 BUG: 960580 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* nfs3-server: call truncate fop only if necessaryMichael Brown2013-05-071-1/+2
| | | | | | | | | | | * nfs3svc_setattr_cbk: only truncate if requested size != current size Change-Id: I3d89e4d2b0710118f90cf5bf9cfdea61d8877491 BUG: 960725 Signed-off-by: Michael Brown <michael@netdirect.ca> Reviewed-on: http://review.gluster.org/4917 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Club missing entry, missing gfid self-healsPranith Kumar K2013-05-071-45/+8
| | | | | | | | | | | | | | | | | | | | | Problem: gfid-self-heal always assigns the gfid(GFID-1) it gets from lookup. Between the time of lookup to triggering the gfid-self-heal the entry could have changed. Now lets say there is a case where one of the files of the replica subolumes already has a gfid (GFID-2) and the other does not. In that case healing should happen with GFID-2 instead of GFID-1. Fix: Missing-entry-self-heal already handles all these cases. So removed separate handling of gfid-self-heal. Change-Id: Ie96261e9036c8f3cb4cad89347f9bf7b681cdc1a BUG: 767585 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/2670 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs: avoid double fd unref in opendirRajesh Amaravathi2013-05-061-3/+1
| | | | | | | | | | | | | | | Noticed that the fd_unref was called on the fd regardless of the return value at nfs3svc_opendir_readdir_cbk(), hence removing an extra unref in the negative case in nfs_inode_opendir_cbk, which fixes the spurious fd_unref(). Change-Id: I2bf68410dd86cdf9cfe8a3d43adc27497d8bb36f BUG: 959190 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4943 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Revert "glusterd: Fix spurious wakeups in glusterd syncops"Krishnan Parthasarathi2013-05-042-32/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit efa154bb0a4cac34d5a9610ec25d38eebe495f22. -- Following is Avati's analysis (edited) from gerrit -- The claim of the patch (being reverted) is that it in some cases cbkfn is missed. This is wrong analysis. cbk_fn is _always_ called. The patch treats ret > 0 as a "missed cbk". ret > 0 only means socket submission was not complete, and is queued to submit asynchronously when POLLOUT is raised. This is sufficient to guarantee that cbkfn is going to be called (either the socket errors or submission succeeds and reply eventually arrives). This commit also removes spurious barrier_wake(s). call backs are guaranteed to be called even if the transport is disconnected. This means, a 'wake' would be called if rpc_clnt_submit is called. Also, we count both successful and failed operations in a particular batch of operations for the synctask_barrier_wait. So, calling synctask_barrier_wake on failure of rpc_clnt_submit (say, due to network failure) would result in a spurious wake. Change-Id: I7d508c2a54b74a65b82f097742206bc777afc53a BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4922 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* synctask: implement barriers around yield, not the other wayAnand Avati2013-05-041-0/+2
| | | | | | | | | | | | | | | | | | | | In the current implementation, barriers are in the core of the syncprocessors. Wake()s are treated as syncbarrier wake. This is however delicate, as spurious wake()s of the synctask can mess up the accounting of the barrier and waking it prematurely. The fix is to keep yield() and wake() as the basic primitives, and implement barriers as an object impelemented on top of these primitives. This way, only an explicit barrier_wake() gets counted towards the barrier accounting, and spurious wakes will be truly safe. Change-Id: I8087f0f446113e5b2d0853431c0354335ccda076 BUG: 948686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4921 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* posix-acl: fetch ACLs in readdirplusAnand Avati2013-05-021-0/+6
| | | | | | | | | | | | | Not fetching ACLs in readdirplus can potentially result in spurious wrong ACL decisions (which magically go away on a lookup() which populates the ACLs) Change-Id: Ided38b4d868fab482b477ce51b4878289ef9eed0 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4926 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Syncop callbks should take big lock tooKrishnan Parthasarathi2013-05-022-13/+53
| | | | | | | | | | Change-Id: I5ae71ab98f9a336dc9bbf0e7b2ec50a6ed42b0f5 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4938 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: delete "volume-name" from dict before processing the next optionKrutika Dhananjay2013-05-021-0/+2
| | | | | | | | | | Change-Id: Ib78963c1f43a66dab50b443742979c7c4e4cbc23 BUG: 958790 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4940 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Log hostname of the peer where there is cksum/version mismatchKrutika Dhananjay2013-05-023-9/+12
| | | | | | | | | Change-Id: I08065aaa3c140d4b02af4ca38f5f4d00d7f0c2bb BUG: 958739 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4937 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: check the inode context to be NULL before accessingRaghavendra Bhat2013-05-011-0/+7
| | | | | | | | | | Change-Id: I475af7f8ffd5e5d8adbd2a74af20e56ad7751f69 BUG: 958108 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4916 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Expand gluster's NFS FD header to 4 bytesMichael Brown2013-05-012-3/+15
| | | | | | | | | | | | | | * https://bugzilla.redhat.com/show_bug.cgi?id=950121 * Oracle's DNFS does not properly XDR encoding on NFS FDs that are not congruent to 0mod4 bytes long * This patch is a workaround to support Oracle's buggy code Change-Id: Ic621e2cd679a86aa9a06ed9ca684925e1e0ec43f BUG: 950121 Signed-off-by: Michael Brown <michael@netdirect.ca> Reviewed-on: http://review.gluster.org/4918 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Avoid self-healing extended attribute used by SELinux.Vijay Bellur2013-04-302-1/+9
| | | | | | | | | | | | | | | | | | Since removexattr() fails to remove "security.selinux" in a system where SELinux is enforcing, xattr self-healing fails. As a consequence of this, user extended attributes are not being healed. Added a check in afr to prune SELinux xattr from the dictionary used for removing xattrs from the sink. Minor changes in tests and md-cache as well. Signed-off-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I854bfc0098dde812ce2afe64b125ee40c04bdeb1 BUG: 957877 Reviewed-on: http://review.gluster.org/4905 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Removed 'proactive' failing of volume opKrishnan Parthasarathi2013-04-301-53/+3
| | | | | | | | | | | | | | | | | | | Volume operations were failed 'proactively', on the first disconnect of a peer that was participating in the transaction. The reason behind having this kludgey code in the first place was to 'abort' an ongoing volume operation as soon as we perceive the first disconnect. But the rpc call backs themselves are capable of injecting appropriate state machine events, which would set things in motion for an eventual abort of the transaction. Change-Id: Iad7cb2bd076f22d89a793dfcd08c2d208b39c4be BUG: 847214 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4869 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: add a command 'gluster pool list [--xml]'Niels de Vos2013-04-261-19/+32
| | | | | | | | | | | | | | | | | * unlike 'gluster peer status', which lists only info about peers, this command lists localhost also in the list, so the sorted output from all the nodes should match. * made the output script friendly by keeping it one output per line. Change-Id: I853656753b35c617debbcceecbb71c8d6dd3c334 BUG: 764638 Original-review: http://review.gluster.org/4221 Original-author: Amar Tumballi <amarts@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/4862 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Introduce volume op-versionsKaushal M2013-04-2610-418/+600
| | | | | | | | | | | | | | | | | | | | | | | Each volume is now associated with two op-versions, * op_version - the op-version of the highest op-versioned feature enabled * client_op_version - the op-version of the highest op-versioned feature enabled which affects the clients only. These two op-versions are generated dynamically and kept updated during runtime. Glusterd now uses the respective volumes' client-op-version during getspec requests. To achieve the above a new field in the vme table is introduced, client_option, this boolean field tells if the option is a client side option. Change-Id: I12c83b1dd29ab506026efd50d448cebbcee53c27 BUG: 907311 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4584 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: Avoid double mem_put in ioc_readvPranith Kumar K2013-04-261-2/+3
| | | | | | | | | | | | | On readv error io-cache frame->local is not set to NULL so the local is mem_put in STACK_DESTROY as well. This patch sets frame->local to NULL in all cases. Change-Id: I00013df1377475aa5f3c0c681dcb58b32e1e8063 BUG: 955751 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4884 Reviewed-by: Raghavendra G <raghavendra@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Added documentation for eager-lock checkPranith Kumar K2013-04-221-0/+17
| | | | | | | | | Change-Id: Ifa42762adde8b55ef1e2b51a59c93cebd983343f BUG: 912581 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4792 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: volume-sync needs to work with rejected peersKrishnan Parthasarathi2013-04-221-3/+5
| | | | | | | | | Change-Id: I970a51d3f62bcf414eb9552a68d1068430b93216 BUG: 950048 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4815 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: validate performance.nfs.* option values during volume set stageKrutika Dhananjay2013-04-181-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: performance.nfs.* option values (which are of type boolean) are not validated during the stage phase of 'volume set'. The result - nfs graph generation fails during commit phase, AFTER the option and its (invalid) value have been placed in volinfo->dict. CAUSE: nfsperfxl_option_handler() - the function that validates the values of performance.nfs.* options - never receives the (key,value) pair that needs to be set, for validation during 'volume set' stage. FIX: In build_nfs_graph(), copy the (mod_)dict containing the (option,value) parameters into set_dict before attempting to build the client graph for the volume on which the operation is being performed. Of course, an easier way out would be to simply do a 'volume reset' and pretend nothing wrong happened! Change-Id: I56b17d0239d58a9e0b7798933a3c8451e2675b69 BUG: 949930 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4814 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Avoided deadlock in single node cluster, glusterd restartKrishnan Parthasarathi2013-04-161-0/+3
| | | | | | | | | | | | | | | | | | | In a single node cluster, it is possible to deadlock on the "big lock", while restarting bricks. In glusterd_restart_bricks, we perform a glusterd_brick_connect, where we release the big lock in anticipation that glusterd_brick_rpc_notify could run in the same C stack (and deadlocking). So, in the restart code path, we could unlock before we have performed a lock on the big lock. To fix this, we need to take the big lock in the glusterd_launch_synctask 'thread' as well. Change-Id: I1abea1ca82b55c784b8a810a8194f254b32b1dcc BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4837 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: big lock - a coarse-grained locking to prevent racesKrishnan Parthasarathi2013-04-1216-106/+656
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are primarily three lists that are part of glusterd process, that are concurrently accessed. Namely, priv->volumes, priv->peers and volinfo->bricks_list. Big-lock approach ----------------- WHAT IS IT? Big lock is a coarse-grained lock which protects all three lists, mentioned above, from racy access. HOW DOES IT WORK? At any given point in time, glusterd's thread(s) are in execution _iff_ there is a preceding, inbound network event. Of course, the sigwaiter thread and timer thread are exceptions. A network event is an external trigger to glusterd, via the epoll thread, in the form of POLLIN and POLLERR. As long as we take the big-lock at all such entry points and yield it when we are done, we are guaranteed that all the network events, accessing the global lists, are serialised. This amounts to holding the big lock at - all the handlers of all the actors in glusterd. (POLLIN) - all the cbks in glusterd. (POLLIN) - rpc_notify (DISCONNECT event), if we access/modify one of the three lists. (POLLERR) In the case of synctask'ized volume operations, we must remember that, if we held the big lock for the entire duration of the handler, we may block other non-synctask rpc actors from executing. For eg, volume-start would block in PMAP SIGNIN, if done incorrectly. To prevent this, we need to yield the big lock, when we yield the synctask, and reacquire on waking up of the synctask. Change-Id: Ib929f9905b55fb6c3fc27fefb497a26dba058e4f BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4784 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* license: xlators/protocol/server dual license GPLv2 and LGPLv3+Kaleb S. KEITHLEY2013-04-1210-145/+57
| | | | | | | | | BUG: 951549 Change-Id: I3de5bd86d4238a60a0a85ba2e15d9c131969b210 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4816 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Fixed spurious wakeups in glusterd syncopsKrishnan Parthasarathi2013-04-122-15/+26
| | | | | | | | | | | | | glusterd syncops perform a barrier_wake whenever rpc_clnt_submit returned -1. This is based on the wrong assumption that the cbkfn wasn't called. This would result in one more wakeup than there ought to be. Change-Id: I591e67c267f0e26d1145bf8fb5feeb2c13a751a1 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4802 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: changes in 'volume create' behaviourKrutika Dhananjay2013-04-095-20/+137
| | | | | | | | | | | | | This patch incorporates all the changes suggested on the behaviour of 'volume create' command in http://review.gluster.org/#change,4214 (comment #14, to be precise). Change-Id: Iaac524a59738b177415595b18aa8a136090d3d25 BUG: 948729 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4740 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: enable valgrind usage even in non DEBUG buildRaghavendra Bhat2013-04-094-24/+13
| | | | | | | | | | | | * Till now running glusterfs processes were allowed to run in valgrind mode only when built with debug mode enabled. Change-Id: I11e07ea2a4da4f82f70cdded6258a22d65d6db64 BUG: 922877 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4688 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Ignore non-participating subvols for layout checksshishir gowda2013-04-092-20/+88
| | | | | | | | | | | | | | | | | | | | | | | When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. Change-Id: I9f57062722c9e8a26285e10675c31a78921115a1 BUG: 921408 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4668 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* afr: let eager-locking do its own overlap checksAnand Avati2013-04-054-56/+87
| | | | | | | | | | | | | | | | | | | | | | | | Today there is a non-obvious dependence of eager-locking on write-behind. The reason is that eager-locking works as long as the inheriting transaction has no overlaps with any of the transactions already in progress. While write-behind provides non-overlapping writes as a side-effect most of times (and only guarantees it when strict-write-ordering option is enabled, which is not on by default) eager-lock needs the behavior as a guarantee. This is leading to complex and unwanted checks for the presence of write-behind in the graph, for the simple task of checking for overlaps. This patch removes the interdependence between eager-locking and write-behind by making eager-locking do its own overlap checks with in-progress writes. Change-Id: Iccba1185aeb5f1e7f060089c895a62840787133f BUG: 912581 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4782 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* storage/posix: introduce node-uuid-pathinfoVenky Shankar2013-04-053-2/+39
| | | | | | | | | | | | | | | enabling this option has an effect on pathinfo xattr request returning <node-uuid>:<path> instead of the default - which is <hostname>:<path>. Change-Id: Ice1b38abf8e5df1568bab6d79ec0d53dfa520332 BUG: 765380 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/4567 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gsync: Display additional information in status commandsarvotham s pai2013-04-042-4/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | Added code to display extra information when status command is executed. Information shown now are 1 Number of files synced 2 crawl time 3 total sync time 4 bytes synced bytes synced is taken from rsync output . --stats option of rsync gives extra infor mation about the sync.In stats output there is a field called Total transferred file size which states the ammount of bytes synced . This information is parsed from stdout output using regular expressions.Bytes synced information can be used to calculate throughput. Change-Id: Id9bba9fff45ee7049bb8257c6fd918e5237e05b1 BUG: 947774 Signed-off-by: sarvotham s pai <spai@redhat.com> Reviewed-on: http://review.gluster.org/4749 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Treat all dir fop failure as success in changelogPranith Kumar K2013-04-033-2/+35
| | | | | | | | | | | | | | | For example: If a new entry creation fop fails with EEXIST or a delete entry fop fails with ENOENT, on all the subvols the fop is wound, then no change took place to the directory. So we can treat that case as no change happened to the directory. Change-Id: I3b3a7931954da2166a9cba19ff9f76f37739d751 BUG: 860210 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* posix: fix dangerous "sharing" of fd in readdir between two requestsAnand Avati2013-04-031-2/+17
| | | | | | | | | | | | | | | | | | | | | | | posix_fill_readdir() is a multi-step function which performs many readdir() calls, and expects the directory cursor to have not "seeked away" elsewhere between two successive iterations. Usually this is not a problem as each opendir() from an application has its own backend fd, and there is nobody else to "seek away" the directory cursor. However in case of NFS's use of anonymous fd, the same fd_t is shared between all NFS readdir requests, and two readdir loops can be executing in parallel on the same dir dragging away the cursor in a chaotic manner. The fix in this patch is to lock on the fd around the loop. Another approach could be to reimplement posix_fill_readdir() with a single getdents() call, but that's for another day. Change-Id: Ia42e9c7fbcde43af4c0d08c20cc0f7419b98bd3f BUG: 948086 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4774 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Made afr_sh_purge_entry_common message log more clear.Venkatesh Somyajulu2013-04-031-3/+1
| | | | | | | | | | | | | | | FIX: In missing entry self heal, once the source directories are determined after the lookup and if file is not present on any of the brick which contains the souce directory, the entry is removed from the directory. So log message should give information of "Purging of entry". Change-Id: I4d3deb602e0812dc1c9c8ba0a466716d81dede7e BUG: 947312 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4753 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* pump: Set self-heal readdir size in pumpPranith Kumar K2013-04-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In Pump entry self-heal happens for each directory during the first opendir using conservative merge. But in entry-self-heal readdir is issued with '0' size. So entry self-heal is not creating any files. After pump thinks entry self-heal is complete it proceeds to heal each of the file in the directory it just healed. Fortunately most of the times it chooses source-brick in pump as read-child for readdir. This happens because readchild is the subvolume on which lookup succeeds first. In pump lookup succeeds faster in local process than on the destination brick process most of the times. For all the entries pump finds in readdir it does a lookup. During this lookup the entry on the destination brick is created and healed. This is the reason why replace-brick succeeds whenever read-child for the directory is chosen as the source-brick. Which is most of the times. When read-child is chosen as the destination brick, readdir returns no entries so replace-brick completes without syncing the whole data. Fix: Set readdir-size in pump so that entry self-heal happens with 64k size. This ensures that entry self-heal triggered from opendir actually creates the files on the destination brick. Change-Id: I65ea45d3c2735a9578f3aa34eff771b6563241ca BUG: 909800 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4712 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/client: Print valid loc identifiersPranith Kumar K2013-04-031-35/+27
| | | | | | | | | | Change-Id: I45f91105862a2484b8906a7a63b98ab4aaf80d05 BUG: 924643 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4683 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: make nufa/switch call dht's init/finiJeff Darcy2013-04-036-1032/+786
| | | | | | | | | | | | | These functions keep changing as new functionality is added, so copying and pasting the code is not a good solution. This way ensures that all fields get initialized properly no matter how much new stuff we throw in. Change-Id: I9e9b043d2d305d31e80cf5689465555b70312756 BUG: 924488 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4710 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: Start rebalance with option readdir-optimize onshishir gowda2013-04-031-0/+1
| | | | | | | | | | | | | | With readdir-optimize set to on, we instruct the posix layer to ignore directory entries from not first subvolume. DHT discards directories returned from non first subvolume. By making posix itself ignore it, we are making directory crawls faster Change-Id: Ia1faf2dedec0c615c0632c3c063e846f5742ede6 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4613 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: add more specific log messagesBala.FA2013-04-021-8/+8
| | | | | | | | | Change-Id: I57fbdd83f3098e64886c3dd690a1ae04fc37442d BUG: 928648 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/4739 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: detect in-progress creation in lookup and return ENOENTPranith Kumar K2013-04-021-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if any subvol returned ENOENT while parent entrylk lock was held, yield and return ENOENT for the entire lookup. This is how the issue happens: Multiple clients A, B and C are attempting 'mkdir -p /mnt/a/b/c' 1 Client A is in the middle of mkdir(/a). It has acquired lock. It has performed mkdir(/a) on one subvol, and second one is still in progress 2 Client B performs a lookup, sees directory /a on one, ENOENT on the other, succeeds lookup. 3 Client B performs lookup on /a/b on both subvols, both return ENOENT (one subvol because /a/b does not exist, another because /a itself does not exist) 4 Client B proceeds to mkdir /a/b. It obtains entrylk on inode=/a with basename=b on one subvol, but fails on other subvol as /a is yet to be created by Client A. 5 Client A finishes mkdir of /a on other subvol 6 Client C also attempts to create /a/b, lookup returns ENOENT on both subvols. 7 Client C tries to obtain entrylk on on inode=/a with basename=b, obtains on one subvol (where B had failed), and waits for B to unlock on other subvol. 8 Client B finishes mkdir() on one subvol with GFID-1 and completes transaction and unlocks 9 Client C gets the lock on the second subvol, At this stage second subvol already has /a/b created from Client B, but Client C does not check that in the middle of mkdir transaction 10 Client C attempts mkdir /a/b on both subvols. It succeeds on ONLY ONE (where Client B could not get lock because of missing parent /a dir) with GFID-2, and gets EEXIST from ONE subvol. This way we have /a/b in GFID mismatch. One subvol got GFID-1 because Client B performed transaction on only one subvol (because entrylk() could not be obtained on second subvol because of missing parent dir -- caused by premature/speculative succeeding of lookup() on /a when locks are detected). Other subvol gets GFID-2 from Client C because while it was waiting for entrylk() on both subvols, Client B was in the middle of creating mkdir() on only one subvol, and Client C does not "expect" this when it is between lock() and pre-op()/op() phase of the transaction. Original-author: Anand Avati <avati@redhat.com> Change-Id: Idca475dbbc2a51e09da6fa0f9e1e37148caef208 BUG: 860210 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4625 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* rpc/nfs: cleanup legacy code of general optionsRajesh Amaravathi2013-04-021-2/+1
| | | | | | | | | | | | | | | | | | Removing the code which handles "general" options. Since it is no longer possible to set general options which apply for all volumes by default, this was redundant. This cleanup of general options code also solves a bug wherein with nfs.addr-namelookup on, nfs.rpc-auth-reject wouldn't work on ip addresses Change-Id: Iba066e32f9a0255287c322ef85ad1d04b325d739 BUG: 921072 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4691 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: sync xattrs removed on source to sink(s)Venky Shankar2013-04-023-12/+125
| | | | | | | | | | | | | xattrs are first removed from sink followed by setting source xattrs. Change-Id: I181cb5b785b667bbfc6e40787a2183a8f45de06b BUG: 906646 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/4656 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: prevent piggyback on stale pre_opPranith Kumar K2013-04-021-33/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here are the logs of a file on which we saw EIO because of size mismatch: [root@lizzie ~]# grep 38f18204 /var/log/glusterfs/mnt-x-.log Reporting Unstable write for 38f18204-2840-408e-ae65-c01f4106b8c4 for offset: 0, len: 7680 Cleared unstable write flag for 38f18204-2840-408e-ae65-c01f4106b8c4: offset 0 length 7680 Reporting Unstable write for 38f18204-2840-408e-ae65-c01f4106b8c4 for offset: 7680, len: 71680 Reporting Unstable write for 38f18204-2840-408e-ae65-c01f4106b8c4 for offset: 79360, len: 15716 fsync completed on 38f18204-2840-408e-ae65-c01f4106b8c4 for offset 0 length 7680 with changelog status: -1 -1 According to these logs fsync did not happen after writev with offset: 79360, len: 15716. Which is the reason for this problem. In total 3 writes came. lets call them w1, w2, w3 w1 does pre_op so pre_op_done[0], pre_op_done[1] counts become 1 and 1 then is_piggyback_post_op() is called for w1 and it returns *false* w1's fsync is fired Now w2 and w3 come and see that pre_op_done[0], pre_op_done[1] are both 1, so pre_op_piggyback[0] and pre_op_piggyback[1] are both incremented twice, once by w2, one more time by w3 and become 2, 2 ------- Step-A Now fsync of w1 is complete and it goes ahead with post op and decrements pre_op_done[0], pre_op_done[1] to 0, 0 Now w2, w3 writevs complete and is_piggyback_post_op will return *true* for both w2, w3. So fsync is not fired for both w2, w3 this patch prevents Step-A from happening. Change-Id: I8b6af1f1875b2cf5f718caa3c16ee7ff3dc96b5c BUG: 927146 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4752 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>