summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: big lock - a coarse-grained locking to prevent racesv3.4.0alpha3Krishnan Parthasarathi2013-04-1716-106/+657
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are primarily three lists that are part of glusterd process, that are concurrently accessed. Namely, priv->volumes, priv->peers and volinfo->bricks_list. Big-lock approach ----------------- WHAT IS IT? Big lock is a coarse-grained lock which protects all three lists, mentioned above, from racy access. HOW DOES IT WORK? At any given point in time, glusterd's thread(s) are in execution _iff_ there is a preceding, inbound network event. Of course, the sigwaiter thread and timer thread are exceptions. A network event is an external trigger to glusterd, via the epoll thread, in the form of POLLIN and POLLERR. As long as we take the big-lock at all such entry points and yield it when we are done, we are guaranteed that all the network events, accessing the global lists, are serialised. This amounts to holding the big lock at - all the handlers of all the actors in glusterd. (POLLIN) - all the cbks in glusterd. (POLLIN) - rpc_notify (DISCONNECT event), if we access/modify one of the three lists. (POLLERR) In the case of synctask'ized volume operations, we must remember that, if we held the big lock for the entire duration of the handler, we may block other non-synctask rpc actors from executing. For eg, volume-start would block in PMAP SIGNIN, if done incorrectly. To prevent this, we need to yield the big lock, when we yield the synctask, and reacquire on waking up of the synctask. BUG: 948686 Change-Id: I429832f1fed67bcac0813403d58346558a403ce9 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4835 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fixed spurious wakeups in glusterd syncopsKrishnan Parthasarathi2013-04-172-15/+26
| | | | | | | | | | | | | glusterd syncops perform a barrier_wake whenever rpc_clnt_submit returned -1. This is based on the wrong assumption that the cbkfn wasn't called. This would result in one more wakeup than there ought to be. BUG: 948686 Change-Id: I839fd218a81255fe50c2047d67461d45360e894d Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4834 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: fix segfault on volume status detailLars Ellenberg2013-04-161-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | If for some reason glusterd_get_brick_root() fails, it frees the gf_strdup'ed *mount_point in its own error path, and returns -1. Unfortunately it already had assigned that pointer value to the output argument, the caller function glusterd_add_brick_detail() sees a non-NULL pointer, and free() again: segfault. Could be fixed with a one-liner (*mount_point = NULL) in the error path, but I think glusterd_get_brick_root() should only assign to the output argument once all checks passed, so I use a local temporary pointer, which increases the patch a bit. Change-Id: I3f3035f01e80a5e9bdf2da895e4cf7baa3dfbd2f BUG: 919352 Signed-off-by: Lars Ellenberg <lars@linbit.com> Reviewed-on: http://review.gluster.org/4646 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4841 Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: allow multiple instances of glusterd on one machineKrishnan Parthasarathi2013-04-163-1/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to support automated testing of cluster-communication features such as probing and quorum. In order to use this, you need to do the following preparatory steps. * Copy /var/lib/glusterd to another directory for each virtual host * Ensure that each virtual host has a different UUID in its glusterd.info Now you can start each copy of glusterd with the following xlator-options. * management.transport.socket.bind-address=$ip_address * management.working-directory=$unique_working_directory You can use 127.x.y.z addresses for binding without needing to assign them to interfaces explicitly. Note that you must use addresses, not names, because of some stuff in the socket code that's not worth fixing just for this usage, but after that you can use names in /etc/hosts instead. At this point you can issue CLI commands to a specific glusterd using the --remote-host option. So far probe, volume create/start/stop, mount, and basic I/O all seem to work as expected with multiple instances. Change-Id: I1beabb44cff8763d2774bc208b2ffcda27c1a550 BUG: 913555 Original-author: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4838 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* license: xlators/protocol/server dual license GPLv2 and LGPLv3+Kaleb S. KEITHLEY2013-04-1210-145/+57
| | | | | | | | | | | | cherry-pick from: refs/changes/16/4816/1; http://review.gluster.org/#/c/4816/ BUG: 951551 Change-Id: I3de5bd86d4238a60a0a85ba2e15d9c131969b210 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4817 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Fixed volume-sync in synctask codepath.Krishnan Parthasarathi2013-04-121-12/+18
| | | | | | | | | Change-Id: I2911d3ac80825310f84c5ba6bd7890e65e1ee219 BUG: 950048 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4643 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve mtime in self-healPranith Kumar K2013-04-121-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Data self-heal may choose sink iatt to set mtimes. This happens because after syncing of data is done self-heal does one more xattrops/fstat to determine sources sinks to set the inode-ctx. Since this is done after data syncing and erase of xattrs, old source and old sink are now sources, but the mtimes of them differ. Old code just takes the first source from the list and update mtimes, which could be sink before the self-heal started. Fix: Set mtime from 'sources before syncing'. Change-Id: Id769e1b99aa4f041eaee775f64cbf2c57b799723 BUG: 918437 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4658 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4663
* cluster/distribute: Ignore non-participating subvols for layout checksshishir gowda2013-04-112-20/+88
| | | | | | | | | | | | | | | | | | | | | | | | | Backporting fix http://review.gluster.org/#/c/4668/ When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. BUG: 921408 Change-Id: I75a8edcb92af5b53b3253c9addd7a812e9242836 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4800 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: improve transform/detransform of d_off (and be ext4 safe)shishir gowda2013-04-111-5/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backporting Avati's fix http://review.gluster.org/4711 The scheme to encode brick d_off and brick id into global d_off has two approaches. Since both brick d_off and global d_off are both 64-bit wide, we need to be careful about how the brick id is encoded. Filesystems like XFS always give a d_off which fits within 32bits. So we have another 32bits (actually 31, in this scheme, as seen ahead) to encode the brick id - which is typically plenty. Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off, as the d_off is calculated based on a hash function value. This leaves us no "unused" bits to encode the brick id. However both these filesystmes (EXT4 more importantly) are "tolerant" in terms of the accuracy of the value presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which has the "closest" true offset. This "two-prong" scheme exploits this behavior - which seems to be the best middle ground amongst various approaches and has all the advantages of the old approach: - Works against XFS and EXT4, the two most common filesystems out there. (which wasn't an "advantage" of the old approach as it is borken against EXT4) - Probably works against most of the others as well. The ones which would NOT work are those which return HUGE d_offs _and_ NOT tolerant to seekdir() to "closest" true offset. - Nothing to "remember in memory" or evict "old entries". - Works fine across NFS server reboots and also NFS head failover. - Tolerant to seekdir() to arbitrary locations. Algorithm: Each d_off can be encoded in either of the two schemes. There is no requirement to encode all d_offs of a directory or a reply-set in the same scheme. The topmost bit of the 64 bits is used to specify the "type" of encoding of this particular d_off. If the topmost bit (bit-63) is 1, it indicates that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0, it indicates that the "small" d_off encoding scheme is used. The goal of the "small" d_off encoding is to stay as dense as possible towards the lower bits even in the global d_off. The goal of the HUGE d_off encoding is to stay as accurate (close) as possible to the "true" d_off after a round of encoding and decoding. If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick ID (call it "n"). SMALL d_off =========== Encoding -------- If the top n + 1 bits are free in a brick offset, then we leave the top bit as 0 and set the remaining bits based on the old formula: hi_mask = 0xffffffffffffffff hi_mask = ~(hi_mask >> (n + 1)) if ((hi_mask & d_off_brick) != 0) do_large_d_off_encoding () d_off_global = (d_off_brick * N) + brick_id Decoding -------- If the top bit in the global offset is 0, it indicates that this is the encoding formula used. So decoding such a global offset will be like the old formula: if ((d_off_global & 0x8000000000000000) != 0) do_large_d_off_decoding() d_off_brick = (d_off_global % N) brick_id = d_off_global / N HUGE d_off ========== Encoding -------- If the top n + 1 bits are NOT free in a given brick offset, then we set the top bit as 1 in the global offset. The low n bits are replaced by brick_id. low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N)) d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id if (d_off_global == 0xffffffffffffffff) discard_entry(); Decoding -------- If the top bit in the global offset is set 1, it indicates that the encoding formula used is above. So decoding would look like: hi_mask = (0xffffffffffffffff << n) low_mask = ~(hi_mask) d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff) brick_id = global_d_off & low_mask If "losing" the low n bits in this decoding of d_off_brick looks "scary", we need to realize that till recently EXT4 used to only return what can now be expressed as (d_off_global >> 32). The extra 31 bits of hash added by EXT recently, only decreases the probability of a collision, and not eliminate it completely, anyways. In a way, the "lost" n bits are made up by decreasing the probability of collision by sharding the files into N bricks / EXT directories -- call it "hash hedging", if you will :-) Change-Id: I9551c581c3f3d4c9e719764881036d554f60c557 Thanks-to: Zach Brown <zab@redhat.com> BUG: 838784 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4799 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: Start fs-crawl in separate thread so as not to block epollVarun Shastry2013-03-191-4/+2
| | | | | | | | | | | | | | | tests/basic/quota.t covers test case for this. Patch is only for 3.4 branch, http://review.gluster.org/4495 fixes the issue in master. Change-Id: I92674f5413441cc896245d5b3d0925f44ce8b2d3 BUG: 919998 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4680 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/distribute: Fix layout overlaps due to spread-count in selfheal pathshishir gowda2013-03-091-50/+12
| | | | | | | | | | | | | | | We needed to zero out the layout range, before we re-calculate the range. When spread-count is issued, we would end up with stale ranges in the layout. Replaced dht_selfheal_dir_xattr with dht_fix_dir_xattr, which correctly resets the un-used (after re-cal) layouts. Change-Id: I1a900d15df07335f59356bd23182ccec34381ab2 BUG: 884455 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4648 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/write-behind: guarantee non-overlapping concurrent writesv3.4.0alpha2Jeff Darcy2013-03-061-1/+65
| | | | | | | | | | | | | | | | | | Maintain a list of writes (either written behind or SYNC) which are currently "in progress" (i.e, STACK_WIND'ed towards server) and hold off any new STACK_WIND of write (either written behind or SYNC) which overlaps with any of the "in progress" writes. This is a guarantee which AFR's eager-lock depends upon (though not strictly a write-behind requirement) Change-Id: Icedd0b51b440366a906dc9223d62b7fd6ef2ca03 BUG: 857673 Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4642 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Increasing throughput of synctask based mgmt ops.Krishnan Parthasarathi2013-03-063-425/+563
| | | | | | | | | Change-Id: Ibd963f78707b157fc4c9729aa87206cfd5ecfe81 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* volgen: Use bind-address option for bricks when option set on glusterdKrishnan Parthasarathi2013-03-061-8/+20
| | | | | | | | | | | | | | | | Brick processes listen on all the interfaces on a given port. When multiple glusterds run on one machine, glusterd assumes that it 'owns' the ports on that machine. This can lead to the different glusterd instances to step on each other's ports. This fix ensures that brick processes listen only on the its host IP when glusterd has bind-address option set. Change-Id: I4c1b05643c64d3098bf56e977e768e611ffce0f5 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4637 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* performance/write-behind: mark fd bad if any written behind writes failRaghavendra G2013-03-061-57/+114
| | | | | | | | | BUG: 765473 Change-Id: I1ddd6ef9f5361aed96f97aa1344823836c6ddecb Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4630 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* Do not call xdr_string() with a NULL error messageJeff Darcy2013-03-051-0/+1
| | | | | | | | | | | | | It is illegal to call xdr_string() with a NULL string. Linux just retruns false, NetBSD gets a SIGSEGV when xdr_string() calls strlen(NULL) BUG: 916439 Change-Id: Ia958470ada6e8e55a86d439922ec942d038f5f13 Original-author: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4629
* cluster/afr: do complete split-brain check in all the fd based fopsRaghavendra Bhat2013-03-054-19/+33
| | | | | | | | | | | | | | | | | fd based operations such as readv checked only for data split brain instead of complete split-brain (i.e both data + metadata) assuming that open would have done the complete split-brain check. However open-behind would have unwound open, without winding to afr thus preventing the complete split-brain check and some appliations will be able to read the contents of the file even though the file has metadata split-brain. So let all the fd based fops do a defensive check of complete split-brain. Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix check for task-id existence in 'volume status'Kaushal M2013-03-053-3/+8
| | | | | | | | | | | | | | This fixes the issue of task-id tests failing randomly. The condition used to check rebalance/remove-brick was running was wrong, which could lead to the task-id for these tasks to not be displayed even when the actual commit hadn't occured. BUG: 857330 Change-Id: I0f86c6bbe7acec586ee0ea6e663369ea26171904 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4617 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rpc: bring in root-squashing behavior in rpcRaghavendra Bhat2013-03-042-0/+10
| | | | | | | | | | | | | | | | | | * requests coming in as root are converted to nfsnobody * with open-behind some acl checks wont happen and nfsnobody can read the file "whose owner is root and other users do not have permission to read the file". This is becasue open-behind does not send the open to the brick and sends success to the application, thus the acl related tests on the file wont happen which would have prevented the file from being opened. Change-Id: I12a3e6b2a12884d00bb81f2779074fed09b1b2e4 BUG: 887145 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4619 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Reopen fds in migration internally as root:rootshishir gowda2013-03-041-1/+10
| | | | | | | | | | | | | | Though linkfile_create and rebalance dst file create sent a setattr with correct ownership, there is still a race window where the linkfile open (client open due to migration) will fail, as its ownership will be root:root. BUG: 884597 Change-Id: Iba73681eae4f280d39ee6c9a40009e195768bee7 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4612 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Prevent spurious multiple defrag crawlsshishir gowda2013-03-041-9/+14
| | | | | | | | | | | | | | | | | In dht_notify, we used to create a thread to start defrag crawls after we had heard from all child subvols. This was in-correct, as a later event, could also trigger the crawl again(due to the fact that all subvols had responded). The fix is to make sure, the thread is started only once after all subvols have responded the first time BUG: 916449 Change-Id: I1619344fbb1cb51d5e1db38d8a29821fa870fa8b Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4610 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Preserve file size during rebalance migrationshishir gowda2013-03-041-0/+6
| | | | | | | | | | | | | | | | | | If holes are encountered, then we do not write these to the dst, which sometimes causes file size to be lesser than src. Data is not corrupted, as when non-zero reads are received, we do write that data. Calling a truncrate to give file size to prevent it from being truncated to less than src in case the file end has holes. Thanks to Brian Foster for providing the test case BUG: 915554 Change-Id: I7e1e0c475118b073c3ebb87e93220c1ec22e8b7d Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4609 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/distribute: Remove suprious fd_unref callshishir gowda2013-03-041-2/+0
| | | | | | | | | | | | After fix http://review.gluster.org/4282 (libglusterfsterfs/syncop: do not hold ref on the fd in cbk) was pushed, syncop_open does not take a ref anymore. BUG: 910661 Change-Id: Idedff91270966e6e70e71ee83785c0228e238d31 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4608 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/dht: Create linkfile with file uid/gidshishir gowda2013-03-044-4/+101
| | | | | | | | | | | | | | | | Currently, linkfile creation happens as root. use uid/gid returned from _cbk (link/rename) to set the correct ownership of the link files. Also added test/dht.rc to implement common dht functions BUG: 884597 Change-Id: I6bc0e04f62d4716fc033681e5678e852a1be7a2f Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: use gf_strdup() in place of strdup()Krutika Dhananjay2013-03-041-1/+1
| | | | | | | | | Change-Id: Idee71019dbc6eeaa0a808d671b29d6f3038a1a89 BUG: 913487 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4563 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* nfs/server: Fix multiple crashes in acl handler.Vijay Bellur2013-03-031-10/+16
| | | | | | | | | Change-Id: I67c224c74c02f7058bcf546713501dd7ab810826 BUG: 915280 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4606 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mount.glusterfs: Introduce mem-accounting as an optionVijay Bellur2013-03-031-9/+17
| | | | | | | | | | | | | | | | option mem-accounting enables memory accounting for the client process. re-factored to keep options with values and options without values in different sections of mount.glusterfs. Change-Id: I54ebc31a1eae6d7a5ce7b0255cd7df74d37d46c1 BUG: 834465 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4528 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: harden 'volume start' staging to check for brick dirs' presenceKrutika Dhananjay2013-02-081-13/+49
| | | | | | | | | | | | | | | | | | | | | PROBLEM: When the brick directory of a volume is absent on any of the servers, AND an attempt is made to start the volume, commit fails ONLY on the node where the brick dir is absent, leading to a split-brain like situation. FIX: Harden 'volume start' to check for the presence of brick directories at the time of staging, thereby preventing commit failure. Change-Id: I67faeb9afbd3aa76f08645924462db126bf7a977 BUG: 889996 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4365 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: pathinfo xattr changes for directoriesVenky Shankar2013-02-082-92/+224
| | | | | | | | | | | | | | | | | | | | Since directories have presence on all subvolumes there is no definite meaning of ->hashed_subvol or ->cached_subvol. getxattr() code path chooses ->cached_subvol for pathinfo extended attribute. While this makes sense of files, it makes less sense for directories. Further if a hashed or a cached subvolume is down, and there's a getxattr request for a directory, we return with an errno. This patch changes pathinfo extended attribute contents by aggregating information from all subvolumes that are up. Change-Id: I58adb741d63ccfd1d0239af75eb65f26f0fb384d Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 856455 Reviewed-on: http://review.gluster.org/4047 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made gsync set use synctask frameworkAvra Sengupta2013-02-082-6/+2
| | | | | | | | | Change-Id: I409fa5a9f55434ece47a8a51d4812d3eca42d269 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4473 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Making volume-reset use synctask frameworkAvra Sengupta2013-02-081-5/+1
| | | | | | | | | Signed-off-by: Avra Sengupta <asengupt@redhat.com> Change-Id: Ib25c8fa69d84b8132505ae3f1e67cf88d3f6f9ec BUG: 852147 Reviewed-on: http://review.gluster.org/4474 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd : Made volume-set use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I1aa08ca843b87839180f9097bca370270a856e6d BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4488 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-sync use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I048aac2af4d9da9ed541d3756fefefbb2a29198e BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4489 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd : Made volume clear-locks use synctask framework.Avra Sengupta2013-02-087-16/+36
| | | | | | | | | Change-Id: Ia1fe3d0500d999c1f95b43c9e53947834e39d680 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4490 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-stop use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I88270f70538bb89d828bb51830b54e9f59be258a BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4491 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-delete use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I52573bb49946f904484e2ead483e8f6f41cbd0c8 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4492 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-statedump use synctask framework.Avra Sengupta2013-02-081-4/+1
| | | | | | | | | Change-Id: I230ecbd8978725070b5910ead4249f21038224a6 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4494 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gsyncd: allow the override of the compiled-in python pathJoe Julian2013-02-071-3/+7
| | | | | | | | | | | .. using the environment variable $PYTHON Change-Id: Ieaad8be98b826c803268216826e250d9944c8190 BUG: 882127 Signed-off-by: Joe Julian <me@joejulian.name> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4252 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterfsd can't listen in a specified addressmerfi2013-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | When specifying transport.socket.bind-address option, only the glusterd daemon uses this address. But glusterfsd still working with the default localhost address. For instance, when we want to use an IPV6 specific address we will want that all process use that specified address even glusterfsd. To handle this change we just need to replace the fixed address “localhost” to the specified brick address “brickinfo->hostname” Change-Id: I540d30e6c155f71379a1cf1c0b459ac00faeb62c BUG: 865327 Signed-off-by: Lahoucine BENLAHMR <lahoucine@benlahmr.com> Reviewed-on: http://review.gluster.org/3889 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Use proper libtool option -avoid-version instead of bogus -avoidversionAnand Avati2013-02-0737-41/+41
| | | | | | | | | | Change-Id: I1c9541058c7d07786539a3266ca125a6a15287d8 BUG: 859835 Signed-off-by: Anand Avati <avati@redhat.com> Original-author: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com> Signed-off-by: Kacper Kowalik (Xarthisius) <xarthisius.kk@gmail.com> Reviewed-on: http://review.gluster.org/3967 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* afr: serialize modification of {entrylk,inodelk}_lock_countAnand Avati2013-02-071-53/+54
| | | | | | | | | | | | | | Typically this lock was not needed in practice, but with http://review.gluster.org/3842, this code gets executed in multiple threads for different servers and we lose a count. This results in leaked lock and a hang for a future transaction. Change-Id: I377ed20e44f2a45cff522289dfef181f0653eca2 BUG: 765564 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4480 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* fuse-bridge: use READDIRPLUS support when availableAnand Avati2013-02-073-2/+141
| | | | | | | | | | | | | This patch makes use of READDIRPLUS call when support is available in the kernel. Change-Id: Iac78881179567856b55af1f46594a2b2859309f0 BUG: 908128 Signed-off-by: Anand V. Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/3905 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* dht: better layout-optimization algorithmJeff Darcy2013-02-072-22/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This method deals with the case where swapping might gain a bigger overlap for the xlator currently under consideration, but sacrifices even more from the xlator we're swapping with. For example: A = 0x00000000 - 0x44444443 (new 0x00000000 - 0x55555554) B = 0x44444444 - 0x77777776 (new 0x55555555 - 0xaaaaaaa9) C = 0x77777777 - 0xffffffff (new 0xaaaaaaaa - 0xffffffff) Here, the new range for B has a bigger overlap with the old C than with the old B (0x33333333 vs. 0x22222222 to be precise) so looking only at that might lead us to swap. However, such a swap turns the new C's overlap from 0x55555556 (vs. old C) to *zero* (vs. old B). In other words, we've gained 0x11111111 for B but lost 0x55555556 for C, so it's a bad idea. The new algorithm accounts for all effects of the swap, so it not only avoids bad swaps but can make some good ones that would have been missed previously. For example, if swapping a range X with a later range Y would not increase the overlap for X we would previously have skipped it even if the swap would increase Y's overlap without affecting X's. This is the normal case when we're adding a new brick (which initially has zero overlap with any old range) so finding more good swaps is probably even more important than avoiding bad ones. Also, the logic in dht_overlap_calc was completely broken before, causing integer overflows instead of providing correct values, so no matter what higher-level algorithm was in place the GIGO effect would have resulted in bad decisions. Change-Id: If61ed513cfcb931916c6b51da293e3efbaaf385f BUG: 853258 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/3908 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: synctaskize 'volume create' operationKrutika Dhananjay2013-02-061-42/+6
| | | | | | | | | | | | .. and also move brickpath validation to volume create stage Change-Id: Ia028677932ca5f6aa05dcf624f47033b62e7b212 BUG: 862834 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4213 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* rpc: get hostnames of client to allow FQDN based authenticationRajesh Amaravathi2013-02-061-0/+4
| | | | | | | | | | | | | | If FQDNs are used to authenticate clients, then from this commit forth, the client ip(v4,6) is reverse looked up using getnameinfo to get a hostname associated with it, if any, thereby making FQDN-based rpc authentication possible. Change-Id: I4c5241e7079a2560de79ca15f611e65c0b858f9b BUG: 903553 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4439 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* nfs/nlm: use req's uid and gid for open_and_resumeRajesh Amaravathi2013-02-061-7/+4
| | | | | | | | | | | | | | | | Previously, NLM was setting the frame->root->{uid,gid} to root by default. This causes permission problems with root squashing for lock calls. Now, we obtain the uid and gid from rpc request. And duplicate #defines are removed from rpcsvc.h Change-Id: I5d6c87aed8d04aab2619bb913408048c0a02d1e7 BUG: 906884 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4466 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* open-behind: translator to perform open calls in backgroundAnand Avati2013-02-066-1/+959
| | | | | | | | | | | | | | | | | | | | This is functionality peeled out of quick-read into a separate translator. Fops which modify the file (where it is required to perform the operation on the true fd) will trigger and wait for the backend open to succeed and use that fd. Fops like fstat() readv() etc. will use anonymous FD (configurable) when original fd is unopened at the backend. Change-Id: Id9847fdbfdc82c1c8e956339156b6572539c1876 BUG: 846240 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4406 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: Use client-op-versions during "volume set"Kaushal M2013-02-062-2/+59
| | | | | | | | | | | | | | | The supported op-versions of the client and the name of the requested volume, are saved during server_getspec(). These are used during the staging of volume set. If the option being set is not supported by any of the clients which currently have the volume mounted, then set will fail. Change-Id: I4e6b60b274d5200508762dc0204cfa848a6c0aa4 BUG: 907311 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4424 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd,glusterfsd,libgfapi: Client op-versionKaushal M2013-02-062-20/+69
| | | | | | | | | | | | | | | This patch introduces op-version support for glusterfs clients. Now, a client sends its supported op-versions during the volfile fetch request and glusterd will return the volfile only if the client can support the current op-version of the cluster. Change-Id: Iab1f1f1706802962bcf27058657c44e8a344d2f6 BUG: 907311 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4247 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Fix open-fd-count virtual xattrPranith Kumar K2013-02-061-8/+3
| | | | | | | | | | | Send open-fd-count maintained in inode. Change-Id: I23db5d052bdeb4f67978ff618ed5a0bed7d1592d BUG: 908146 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4469 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>