summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: big lock - a coarse-grained locking to prevent racesv3.4.0alpha3Krishnan Parthasarathi2013-04-1717-106/+702
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are primarily three lists that are part of glusterd process, that are concurrently accessed. Namely, priv->volumes, priv->peers and volinfo->bricks_list. Big-lock approach ----------------- WHAT IS IT? Big lock is a coarse-grained lock which protects all three lists, mentioned above, from racy access. HOW DOES IT WORK? At any given point in time, glusterd's thread(s) are in execution _iff_ there is a preceding, inbound network event. Of course, the sigwaiter thread and timer thread are exceptions. A network event is an external trigger to glusterd, via the epoll thread, in the form of POLLIN and POLLERR. As long as we take the big-lock at all such entry points and yield it when we are done, we are guaranteed that all the network events, accessing the global lists, are serialised. This amounts to holding the big lock at - all the handlers of all the actors in glusterd. (POLLIN) - all the cbks in glusterd. (POLLIN) - rpc_notify (DISCONNECT event), if we access/modify one of the three lists. (POLLERR) In the case of synctask'ized volume operations, we must remember that, if we held the big lock for the entire duration of the handler, we may block other non-synctask rpc actors from executing. For eg, volume-start would block in PMAP SIGNIN, if done incorrectly. To prevent this, we need to yield the big lock, when we yield the synctask, and reacquire on waking up of the synctask. BUG: 948686 Change-Id: I429832f1fed67bcac0813403d58346558a403ce9 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4835 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fixed spurious wakeups in glusterd syncopsKrishnan Parthasarathi2013-04-174-22/+64
| | | | | | | | | | | | | glusterd syncops perform a barrier_wake whenever rpc_clnt_submit returned -1. This is based on the wrong assumption that the cbkfn wasn't called. This would result in one more wakeup than there ought to be. BUG: 948686 Change-Id: I839fd218a81255fe50c2047d67461d45360e894d Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4834 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncenv: be robust against spurious wake()sKrishnan Parthasarathi2013-04-171-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation, when the callers of synctasks perform a spurious wake() of a sleeping synctask (i.e, an extra wake() soon after a wake() which already woke up a yielded synctask), there is now a possibility of two sync threacs picking up the same synctask. This can result in a crash. The fix is to change ->slept = 0|1 and membership of synctask in runqueue atomically. Today we dequeue a task from the runqueue in syncenv_task(), but reset ->slept = 0 much later in synctask_switchto() in an unlocked manner -- which is safe, when there are no spurious wake()s. However, this opens a race window where, if a second wake() happens after the dequeue, but before setting ->slept = 0, it results in queueing the same synctask in the runqueue once again, and get picked up by a different synctask. This is has been diagnosed to be the crashes in the regression tests of http://review.gluster.org/4784. However that patch still has a spurious wake() [the trigger for this bug] which is yet to be fixed. BUG: 948686 Change-Id: I51858e887cad2680e46fb973629f8465f4429363 Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4833 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tests: fix dependency on sleep in bug-874498.tKrishnan Parthasarathi2013-04-171-7/+14
| | | | | | | | | | | | | | | | With the introduction of http://review.gluster.org/4784, there are delays which breaks bug-874498.t which wrongly depends on healing to finish within 2 seconds. Fix this by using 'EXPECT_WITHIN 60' instead of sleep 2. BUG: 874498 Change-Id: I7131699908e63b024d2dd71395b3e94c15fe925c Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4832 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tests: fix further issues with bug-874498.tKrishnan Parthasarathi2013-04-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The failure of bug-874498.t seems to be a "bug" in glustershd. The situation seems to be when both subvolumes of a replica are "local" to glustershd, and in such cases glustershd is sensitive to the order in which the subvols come up. The core of the issue itself is that, without the patch (#4784), self-heal daemon completes the processing of index and no entries are left inside the xattrop index after a few seconds of volume start force. However with the patch, the stale "backing file" (against which index performs link()) is left. The likely reason is that an "INDEX" based crawl is not happening against the subvol when this patch is applied. Before #4784 patch, the order in which subvols came up was : [2013-04-09 22:55:35.117679] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49156, attached to remote volume '/d/backends/brick1'. ... [2013-04-09 22:55:35.118399] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49157, attached to remote volume '/d/backends/brick2'. However, with the patch, the order is reversed: [2013-04-09 22:53:34.945370] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49153, attached to remote volume '/d/backends/brick2'. ... [2013-04-09 22:53:34.950966] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49152, attached to remote volume '/d/backends/brick1'. The index in brick2 has the list of files/gfid to heal. It appears to be the case that when brick1 is the first subvol to be detected as coming up, somehow an INDEX based crawl is clearing all the index entries in brick2, but if brick2 comes up as the first subvol, then the backing file is left stale. Also, doing a "gluster volume heal full" seems to leave out stale backing files too. As the crawl is performed on the namespace and the backing file is never encountered there to get cleared out. So the interim (possibly permanent) fix is to have the script issue a regular self-heal command (and not a "full" one). The failure of the script itself is non-critical. The data files are all healed, and it is just the backing file which is left behind. The stale backing file too gets cleared in the next index based healing, either triggered manually or after 10mins. BUG: 874498 Change-Id: I601e9adec46bb7f8ba0b1ba09d53b83bf317ab6a Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4831 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* synctask: introduce synclocks for co-operative lockingKrishnan Parthasarathi2013-04-172-1/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a synclocks - co-operative locks for synctasks. Synctasks yield themselves when a lock cannot be acquired at the time of the lock call, and the unlocker will wake the yielded locker at the time of unlock. The implementation is safe in a multi-threaded syncenv framework. It is also safe for sharing the lock between non-synctasks. i.e, the same lock can be used for synchronization between a synctask and a regular thread. In such a situation, waiting synctasks will yield themselves while non-synctasks will sleep on a cond variable. The unlocker (which could be either a synctask or a regular thread) will wake up any type of lock waiter (synctask or regular). Usage: Declaration and Initialization ------------------------------ synclock_t lock; ret = synclock_init (&lock); if (ret) { /* lock could not be allocated */ } Locking and non-blocking lock attempt ------------------------------------- ret = synclock_trylock (&lock); if (ret && (errno == EBUSY)) { /* lock is held by someone else */ return; } synclock_lock (&lock); { /* critical section */ } synclock_unlock (&lock); BUG: 763820 Change-Id: I23066f7b66b41d3d9fb2311fdaca333e98dd7442 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Original-author: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4830 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: fix segfault on volume status detailLars Ellenberg2013-04-161-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | If for some reason glusterd_get_brick_root() fails, it frees the gf_strdup'ed *mount_point in its own error path, and returns -1. Unfortunately it already had assigned that pointer value to the output argument, the caller function glusterd_add_brick_detail() sees a non-NULL pointer, and free() again: segfault. Could be fixed with a one-liner (*mount_point = NULL) in the error path, but I think glusterd_get_brick_root() should only assign to the output argument once all checks passed, so I use a local temporary pointer, which increases the patch a bit. Change-Id: I3f3035f01e80a5e9bdf2da895e4cf7baa3dfbd2f BUG: 919352 Signed-off-by: Lars Ellenberg <lars@linbit.com> Reviewed-on: http://review.gluster.org/4646 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4841 Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: allow multiple instances of glusterd on one machineKrishnan Parthasarathi2013-04-164-1/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to support automated testing of cluster-communication features such as probing and quorum. In order to use this, you need to do the following preparatory steps. * Copy /var/lib/glusterd to another directory for each virtual host * Ensure that each virtual host has a different UUID in its glusterd.info Now you can start each copy of glusterd with the following xlator-options. * management.transport.socket.bind-address=$ip_address * management.working-directory=$unique_working_directory You can use 127.x.y.z addresses for binding without needing to assign them to interfaces explicitly. Note that you must use addresses, not names, because of some stuff in the socket code that's not worth fixing just for this usage, but after that you can use names in /etc/hosts instead. At this point you can issue CLI commands to a specific glusterd using the --remote-host option. So far probe, volume create/start/stop, mount, and basic I/O all seem to work as expected with multiple instances. Change-Id: I1beabb44cff8763d2774bc208b2ffcda27c1a550 BUG: 913555 Original-author: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4838 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tests/cluster.rc: support for virtual multi-server glusterdJeff Darcy2013-04-151-0/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tests Since http://review.gluster.org/4556 glusterd is capable of running many instances of itself on a single system. This patch exploits that feature and enhances the regression test framework to expose handy primitives so that test cases may be written to test glusterd in a cluster. Usage: 1. Include "$(dirname)/../cluster.rc" to get access to the extensions 2. Call launch_cluster $N where $N is the count of virtual servers Calling launch_cluster, starts $N glusterds which bind to $N different IPs and dynamically defines these primitives: - Variables $H1 .. $Hn assigned to hostnames of each "server". - Variables $CLI_1 .. $CLI_n assigned as commands to run CLI commands on the corresponding N'th server. - Variables $B1 .. $Bn assigned to the backend directories on each "server". - Function kill_glusterd, which accepts a parameter - index number of glusterd to be killed. - Variables $glusterd_1 .. $glusterd_n assigned to the command lines to restart the corresponding glusterd, if it was previously killed. The current set of primitives and functions were implemented with the goal of satisfying ./tests/bugs/bug-913555.t. The API will be made richer as we add more cluster test cases Change-Id: I6e79c58098ed0862cf75a0b56e4ce384ec2e4eb2 BUG: 913555 Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4836 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* rpc-transport: fix glusterd crash when rdma.so missingRajesh Amaravathi2013-04-121-2/+4
| | | | | | | | | | | | | Add checks before trying to delete vol_opt from list and free Change-Id: I2858f58518394beb8f74fa477be81d7bdd38304f BUG: 924215 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4704 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4819 Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* rpc: before freeing the volume options object, delete it from the listRaghavendra Bhat2013-04-122-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | * Suppose there is an xlator option which is considered by the xlator only if the source was built with debug mode enabled (the only example in the current code base is run-with-valgrind option for glusterd), then giving that option would make the process crash if the source was not built with debug mode enabled. Reason: In rpc, after getting the options symbol dynamically, it was stored in the newly allocated volume options structure and the structure's list head was added to the xlator's volume_options list. But while freeing the structure the list was not deleted. Thus when the list was traversed, already freed structure was accessed leading to segfault. Change-Id: I3e9e51dd2099e34b206199eae7ba44d9d88a86ad BUG: 922877 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4687 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4818 Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* license: xlators/protocol/server dual license GPLv2 and LGPLv3+Kaleb S. KEITHLEY2013-04-1210-145/+57
| | | | | | | | | | | | cherry-pick from: refs/changes/16/4816/1; http://review.gluster.org/#/c/4816/ BUG: 951551 Change-Id: I3de5bd86d4238a60a0a85ba2e15d9c131969b210 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4817 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Fixed volume-sync in synctask codepath.Krishnan Parthasarathi2013-04-121-12/+18
| | | | | | | | | Change-Id: I2911d3ac80825310f84c5ba6bd7890e65e1ee219 BUG: 950048 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4643 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* object-storage: rebase Swift to 1.8.0 (grizzly)Kaleb S. KEITHLEY2013-04-122-37/+136
| | | | | | | | | | | Merged (git cherry-pick) from master/HEAD to release-3.4 Change-Id: I24265c12a45eac4cec761748096118c9647440be BUG: 948041 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4780 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Preserve mtime in self-healPranith Kumar K2013-04-122-14/+77
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Data self-heal may choose sink iatt to set mtimes. This happens because after syncing of data is done self-heal does one more xattrops/fstat to determine sources sinks to set the inode-ctx. Since this is done after data syncing and erase of xattrs, old source and old sink are now sources, but the mtimes of them differ. Old code just takes the first source from the list and update mtimes, which could be sink before the self-heal started. Fix: Set mtime from 'sources before syncing'. Change-Id: Id769e1b99aa4f041eaee775f64cbf2c57b799723 BUG: 918437 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4658 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4663
* socket: Make non-ssl sockets perform non-blocking connect()Krishnan Parthasarathi2013-04-121-0/+12
| | | | | | | | | Change-Id: Icb60cf7ad3ea7ca0eeb12fd19b95a6b340857bb2 BUG: 920916 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4685 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dict: Put "goto out" in dict_unserialize to avoid process crashVenkatesh Somyajulu2013-04-121-0/+1
| | | | | | | | | | | | | | | | Problem: In the dictionary serialization function, if the [(buf + vallen) > (orig_buf + size)], then memdup is getting failed. Fix: Put "goto out" whenever this condition is met. Change-Id: I8c07dd5187364ccd6ad7625e2e3907d8b56447a9 BUG: 947824 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4771 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* build: add BuildRequires librdmacm-develKaleb S. KEITHLEY2013-04-121-1/+1
| | | | | | | | | | | | | | | See http://review.gluster.org/149 Installed librdmacm-devel RPM on the build server. cherry pick from http://review.gluster.org/#/c/4804/ BUG: 819130 Change-Id: I30e14ebf7646c19923940f86a72bf42497cac70c Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4806 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Ignore non-participating subvols for layout checksshishir gowda2013-04-114-27/+178
| | | | | | | | | | | | | | | | | | | | | | | | | Backporting fix http://review.gluster.org/#/c/4668/ When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. BUG: 921408 Change-Id: I75a8edcb92af5b53b3253c9addd7a812e9242836 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4800 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: improve transform/detransform of d_off (and be ext4 safe)shishir gowda2013-04-111-5/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backporting Avati's fix http://review.gluster.org/4711 The scheme to encode brick d_off and brick id into global d_off has two approaches. Since both brick d_off and global d_off are both 64-bit wide, we need to be careful about how the brick id is encoded. Filesystems like XFS always give a d_off which fits within 32bits. So we have another 32bits (actually 31, in this scheme, as seen ahead) to encode the brick id - which is typically plenty. Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off, as the d_off is calculated based on a hash function value. This leaves us no "unused" bits to encode the brick id. However both these filesystmes (EXT4 more importantly) are "tolerant" in terms of the accuracy of the value presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which has the "closest" true offset. This "two-prong" scheme exploits this behavior - which seems to be the best middle ground amongst various approaches and has all the advantages of the old approach: - Works against XFS and EXT4, the two most common filesystems out there. (which wasn't an "advantage" of the old approach as it is borken against EXT4) - Probably works against most of the others as well. The ones which would NOT work are those which return HUGE d_offs _and_ NOT tolerant to seekdir() to "closest" true offset. - Nothing to "remember in memory" or evict "old entries". - Works fine across NFS server reboots and also NFS head failover. - Tolerant to seekdir() to arbitrary locations. Algorithm: Each d_off can be encoded in either of the two schemes. There is no requirement to encode all d_offs of a directory or a reply-set in the same scheme. The topmost bit of the 64 bits is used to specify the "type" of encoding of this particular d_off. If the topmost bit (bit-63) is 1, it indicates that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0, it indicates that the "small" d_off encoding scheme is used. The goal of the "small" d_off encoding is to stay as dense as possible towards the lower bits even in the global d_off. The goal of the HUGE d_off encoding is to stay as accurate (close) as possible to the "true" d_off after a round of encoding and decoding. If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick ID (call it "n"). SMALL d_off =========== Encoding -------- If the top n + 1 bits are free in a brick offset, then we leave the top bit as 0 and set the remaining bits based on the old formula: hi_mask = 0xffffffffffffffff hi_mask = ~(hi_mask >> (n + 1)) if ((hi_mask & d_off_brick) != 0) do_large_d_off_encoding () d_off_global = (d_off_brick * N) + brick_id Decoding -------- If the top bit in the global offset is 0, it indicates that this is the encoding formula used. So decoding such a global offset will be like the old formula: if ((d_off_global & 0x8000000000000000) != 0) do_large_d_off_decoding() d_off_brick = (d_off_global % N) brick_id = d_off_global / N HUGE d_off ========== Encoding -------- If the top n + 1 bits are NOT free in a given brick offset, then we set the top bit as 1 in the global offset. The low n bits are replaced by brick_id. low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N)) d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id if (d_off_global == 0xffffffffffffffff) discard_entry(); Decoding -------- If the top bit in the global offset is set 1, it indicates that the encoding formula used is above. So decoding would look like: hi_mask = (0xffffffffffffffff << n) low_mask = ~(hi_mask) d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff) brick_id = global_d_off & low_mask If "losing" the low n bits in this decoding of d_off_brick looks "scary", we need to realize that till recently EXT4 used to only return what can now be expressed as (d_off_global >> 32). The extra 31 bits of hash added by EXT recently, only decreases the probability of a collision, and not eliminate it completely, anyways. In a way, the "lost" n bits are made up by decreasing the probability of collision by sharding the files into N bricks / EXT directories -- call it "hash hedging", if you will :-) Change-Id: I9551c581c3f3d4c9e719764881036d554f60c557 Thanks-to: Zach Brown <zab@redhat.com> BUG: 838784 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4799 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterfs.spec.in: sync with fedora glusterfs.specKaleb S. KEITHLEY2013-03-282-0/+13
| | | | | | | | | | | | | | | add --without ufo cherry-pick from refs/changes/42/4742/1 Change-Id: If1b77003ded537f9664fa6ad677d48d118516c64 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 819130 Reviewed-on: http://review.gluster.org/4743 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Luis Pabon <lpabon@redhat.com> Reviewed-by: Luis Pabon <lpabon@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/dict: fix infinite loop in dict_keys_join()Vijaykumar Koppad2013-03-271-2/+4
| | | | | | | | | | | | - missing "pairs = next" caused infinite loop Change-Id: I3edc4f50473f7498815c73e1066167392718fddf BUG: 905871 Signed-off-by: Vijaykumar Koppad <vkoppad@redhat.com> Reviewed-on: http://review.gluster.org/4728 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs.spec.in: resync with Fedora glusterfs.specKaleb S. KEITHLEY2013-03-267-181/+1001
| | | | | | | | | | | | | | | | | | | | | | | | | | cherry-pick from master, including commits: 5d3b478e76f1015b11bfd7d48465ab12a4f0737e fd407a4f5cdb869dc52efe8fc9e1d284f60f5992 6f6789884227b8260f140c39c063d77b0516af97 84f5e4b354526fbb7f0665345816e81c81245c8f 2398e1e0da61f4ec5f209c704e037b54b5c249e1 Resync with Fedora's glusterfs.spec To build a set of RPMs: % ./autogen.sh % ./configure --enable-fusermount % make dist % cd extras/LinuxRPM && make glusterrpms Updated rpm.t BUG: 819130 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Change-Id: Ib73be0fbb7ee16a5c41b4f7c7a3f66d0224bfe6c Reviewed-on: http://review.gluster.org/4725 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: Fix package version to match release versionKrishnan Parthasarathi2013-03-201-1/+1
| | | | | | | | | Change-Id: I00e0ebc4e36cedd771a46b6bd1f3267439ab9474 BUG: 922765 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4673 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: Start fs-crawl in separate thread so as not to block epollVarun Shastry2013-03-192-4/+53
| | | | | | | | | | | | | | | tests/basic/quota.t covers test case for this. Patch is only for 3.4 branch, http://review.gluster.org/4495 fixes the issue in master. Change-Id: I92674f5413441cc896245d5b3d0925f44ce8b2d3 BUG: 919998 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4680 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rpm: package /var/run/gluster so that statedumps can be createdKaleb S. KEITHLEY2013-03-121-0/+5
| | | | | | | | | | | | | | | Creating statedumps fail when /var/run/gluster does not exist. This directory should be part of the 'glusterfs' package that is installed on storage servers and native clients. Merged Niels's change from both $HEAD and release-3.3 BUG: 917554 Change-Id: I6ffc497c0bb6bc90c97a91a72bba9118853d4c8c Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4659 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/distribute: Fix layout overlaps due to spread-count in selfheal pathshishir gowda2013-03-092-52/+20
| | | | | | | | | | | | | | | We needed to zero out the layout range, before we re-calculate the range. When spread-count is issued, we would end up with stale ranges in the layout. Replaced dht_selfheal_dir_xattr with dht_fix_dir_xattr, which correctly resets the un-used (after re-cal) layouts. Change-Id: I1a900d15df07335f59356bd23182ccec34381ab2 BUG: 884455 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4648 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: dict_unref() xattr_req in fop finish instead of dict_destroy()Anand Avati2013-03-072-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | The current way of calling dict_destroy() at the end of an API fop assumes that xattr_req is not stored/ref'd by any translators in the stack. However when translators like DHT store xattr_req in dht_local_t with a dict_ref() and perform dict_unref() in the unwind path, things get subject to a race. The race is between the woken up thread (by syncop_wake) i.e, the gfapi invoking thread and the thread where the FOP was unwound. As the C stack of STACK_UNWIND unwinds back, dht_local_unref() gets invoked within the DHT_STACK_UNWIND macro. This thread attempts dict_unref, which would be "safe" if it wins the race against the gfapi invoking thread. However if the gfapi invoking thread wins the race, it will perform dict_destroy() first and therefore make dict_unref() within dht_local_unref perform a double free. This is the embarrassing on-screen bug which showed up in a roomful of people during the gluster dev summit demo of qemu/libgfapi integration. Change-Id: I284c93de87cdc128d5801f42c84aa87f753090d4 BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4645 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/write-behind: guarantee non-overlapping concurrent writesv3.4.0alpha2Jeff Darcy2013-03-061-1/+65
| | | | | | | | | | | | | | | | | | Maintain a list of writes (either written behind or SYNC) which are currently "in progress" (i.e, STACK_WIND'ed towards server) and hold off any new STACK_WIND of write (either written behind or SYNC) which overlaps with any of the "in progress" writes. This is a guarantee which AFR's eager-lock depends upon (though not strictly a write-behind requirement) Change-Id: Icedd0b51b440366a906dc9223d62b7fd6ef2ca03 BUG: 857673 Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4642 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/write-behind: Add test case for fd being marked bad after write ↵Raghavendra G2013-03-061-0/+33
| | | | | | | | | | | failures. BUG: 765473 Change-Id: Ia5d9fecc7f84ee4d51f8037e2dd1ed03f0394bd9 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4632 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Increasing throughput of synctask based mgmt ops.Krishnan Parthasarathi2013-03-064-425/+564
| | | | | | | | | Change-Id: Ibd963f78707b157fc4c9729aa87206cfd5ecfe81 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* volgen: Use bind-address option for bricks when option set on glusterdKrishnan Parthasarathi2013-03-061-8/+20
| | | | | | | | | | | | | | | | Brick processes listen on all the interfaces on a given port. When multiple glusterds run on one machine, glusterd assumes that it 'owns' the ports on that machine. This can lead to the different glusterd instances to step on each other's ports. This fix ensures that brick processes listen only on the its host IP when glusterd has bind-address option set. Change-Id: I4c1b05643c64d3098bf56e977e768e611ffce0f5 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4637 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* synctask: support for (assymetric) counted barriersKrishnan Parthasarathi2013-03-062-76/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Backport of Avati's patch on master - http://review.gluster.org/4558] This patch introduces a new set of primitives: - synctask_barrier_init (stub) - synctask_barrier_waitfor (stub, count) - synctask_barrier_wake (stub) Unlike pthread_barrier_t, this barrier has an explicit notion of "waiter" and "waker". The "waiter" waits for @count number of "wakers" to call synctask_barrier_wake() before returning. The wait performed by the waiter via synctask_barrier_waitfor() is co-operative in nature and yields the thread for scheduling other synctasks in the mean time. Intended use case: Eliminate excessive serialization in glusterd and allow for concurrent RPC transactions. Code which are currently in this format: ---old--- list_for_each_entry (peerinfo, peers, op_peers_list) { ... GD_SYNCOP (peerinfo->rpc, stub, rpc_cbk, ...); } ... int rpc_cbk (rpc, stub, ...) { ... __wake (stub); } ---old--- Can be restructred into the format: ---new--- synctask_barrier_init (stub); { list_for_each_entry (peerinfo, peers, op_peers_list) { ... rpc_submit (peerinfo->rpc, stub, rpc_cbk, ...); count++; } } synctask_barrier_wait (stub, count); ... int rpc_cbk (rpc, stub, ...) { ... synctask_barrier_wake (stub); } ---new--- In the above structure, from the synctask's point of view, the region between synctask_barrier_init() and synctask_barrier_wait() are spawning off asynchronous "threads" (or RPC) and keep count of how many such threads have been spawned. Each of those threads are expected to make one call to synctask_barrier_wake(). The call to synctask_barrier_wait() makes the synctask thread co-operatively wait/sleep till @count such threads call their wake function. This way, the synctask thread retains the "synchronous" flow in the code, yet at the same time allows for asynchronous "threads" to acheive parallelism over RPC. Change-Id: Ie037f99b2d306b71e63e3a56353daec06fb0bf41 BUG: 913662 Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4636 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* testcase for the open-behind xlator when open failsJeff Darcy2013-03-061-0/+56
| | | | | | | | | | | | | | Test if the fops which are put into a stub and are waiting for the open to complete should be unwound with the error if open call itself fails. Change-Id: I8c363d98303a7df1a0ca9ea6ef207c7123fdd388 BUG: 846240 Original-author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4634 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/write-behind: mark fd bad if any written behind writes failRaghavendra G2013-03-061-57/+114
| | | | | | | | | BUG: 765473 Change-Id: I1ddd6ef9f5361aed96f97aa1344823836c6ddecb Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4630 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* tests: move common funtion definitions to include.rcRaghavendra G2013-03-068-156/+22
| | | | | | | | | BUG: 765473 Change-Id: Id0d194374d34cfec8ee601090f7fe38b1856ac22 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4631 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* tests/fileio.rc: library for file descriptor based IO in testsRaghavendra G2013-03-061-0/+61
| | | | | | | | | | | | | | | | | | | | There are situations in test scripts where we want to keep open file descriptors while performing other commands. Bash has abilities to manage file descriptors by numbers, but the syntax is a little brain damaging. This library provides wrappers around it to abstract away bash's syntax and also provides a helper function to pick a free file descriptor on the fly. The APIs are pretty self explanatory. Change-Id: I82f1d1957646dd6c468d9e85c90ec30c978c7ad6 BUG: 764966 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4635 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Jeff Darcy <jdarcy@redhat.com>
* Do not call xdr_string() with a NULL error messageJeff Darcy2013-03-051-0/+1
| | | | | | | | | | | | | It is illegal to call xdr_string() with a NULL string. Linux just retruns false, NetBSD gets a SIGSEGV when xdr_string() calls strlen(NULL) BUG: 916439 Change-Id: Ia958470ada6e8e55a86d439922ec942d038f5f13 Original-author: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4629
* cluster/afr: do complete split-brain check in all the fd based fopsRaghavendra Bhat2013-03-054-19/+33
| | | | | | | | | | | | | | | | | fd based operations such as readv checked only for data split brain instead of complete split-brain (i.e both data + metadata) assuming that open would have done the complete split-brain check. However open-behind would have unwound open, without winding to afr thus preventing the complete split-brain check and some appliations will be able to read the contents of the file even though the file has metadata split-brain. So let all the fd based fops do a defensive check of complete split-brain. Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix check for task-id existence in 'volume status'Kaushal M2013-03-053-3/+8
| | | | | | | | | | | | | | This fixes the issue of task-id tests failing randomly. The condition used to check rebalance/remove-brick was running was wrong, which could lead to the task-id for these tasks to not be displayed even when the actual commit hadn't occured. BUG: 857330 Change-Id: I0f86c6bbe7acec586ee0ea6e663369ea26171904 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4617 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* tests: Add spaces around '=' in a string comparision in TEST primitiveKaushal M2013-03-041-1/+1
| | | | | | | | | BUG: 764966 Change-Id: I2da197bdddb4a4d098ebb044410e21ced4dbd806 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4618 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rpc: bring in root-squashing behavior in rpcRaghavendra Bhat2013-03-047-2/+139
| | | | | | | | | | | | | | | | | | * requests coming in as root are converted to nfsnobody * with open-behind some acl checks wont happen and nfsnobody can read the file "whose owner is root and other users do not have permission to read the file". This is becasue open-behind does not send the open to the brick and sends success to the application, thus the acl related tests on the file wont happen which would have prevented the file from being opened. Change-Id: I12a3e6b2a12884d00bb81f2779074fed09b1b2e4 BUG: 887145 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4619 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Reopen fds in migration internally as root:rootshishir gowda2013-03-041-1/+10
| | | | | | | | | | | | | | Though linkfile_create and rebalance dst file create sent a setattr with correct ownership, there is still a race window where the linkfile open (client open due to migration) will fail, as its ownership will be root:root. BUG: 884597 Change-Id: Iba73681eae4f280d39ee6c9a40009e195768bee7 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4612 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* synctask: implement setuid-like SYNCTASK_SETID()shishir gowda2013-03-042-0/+30
| | | | | | | | | | | | | | | | | synctasks can now call SYNCTASK_SETID(uid,gid) to set the effective uid/gid of the frame with which the FOP will be performed. Once called, the uid/gid is set either till the end of the synctask or till the next call of SYNCTASK_SETID() Back-porting Avati's patch http://review.gluster.org/#change,4269 BUG: 884597 Change-Id: Id0569da4bb8959636881457217fe004bf30c5b9d Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4611 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Prevent spurious multiple defrag crawlsshishir gowda2013-03-041-9/+14
| | | | | | | | | | | | | | | | | In dht_notify, we used to create a thread to start defrag crawls after we had heard from all child subvols. This was in-correct, as a later event, could also trigger the crawl again(due to the fact that all subvols had responded). The fix is to make sure, the thread is started only once after all subvols have responded the first time BUG: 916449 Change-Id: I1619344fbb1cb51d5e1db38d8a29821fa870fa8b Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4610 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Preserve file size during rebalance migrationshishir gowda2013-03-043-0/+94
| | | | | | | | | | | | | | | | | | If holes are encountered, then we do not write these to the dst, which sometimes causes file size to be lesser than src. Data is not corrupted, as when non-zero reads are received, we do write that data. Calling a truncrate to give file size to prevent it from being truncated to less than src in case the file end has holes. Thanks to Brian Foster for providing the test case BUG: 915554 Change-Id: I7e1e0c475118b073c3ebb87e93220c1ec22e8b7d Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4609 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/distribute: Remove suprious fd_unref callshishir gowda2013-03-041-2/+0
| | | | | | | | | | | | After fix http://review.gluster.org/4282 (libglusterfsterfs/syncop: do not hold ref on the fd in cbk) was pushed, syncop_open does not take a ref anymore. BUG: 910661 Change-Id: Idedff91270966e6e70e71ee83785c0228e238d31 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4608 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/dht: Create linkfile with file uid/gidshishir gowda2013-03-046-4/+282
| | | | | | | | | | | | | | | | Currently, linkfile creation happens as root. use uid/gid returned from _cbk (link/rename) to set the correct ownership of the link files. Also added test/dht.rc to implement common dht functions BUG: 884597 Change-Id: I6bc0e04f62d4716fc033681e5678e852a1be7a2f Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: use gf_strdup() in place of strdup()Krutika Dhananjay2013-03-042-1/+15
| | | | | | | | | Change-Id: Idee71019dbc6eeaa0a808d671b29d6f3038a1a89 BUG: 913487 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4563 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* nfs/server: Fix multiple crashes in acl handler.Vijay Bellur2013-03-031-10/+16
| | | | | | | | | Change-Id: I67c224c74c02f7058bcf546713501dd7ab810826 BUG: 915280 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4606 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>