summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* performance/io-cache: check the inode context to be NULL before accessingv3.4.0beta2Raghavendra Bhat2013-05-231-0/+7
| | | | | | | | | | | | Change-Id: I475af7f8ffd5e5d8adbd2a74af20e56ad7751f69 BUG: 958108 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4916 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/5077 Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: Enable write-behind in nfsPranith Kumar K2013-05-211-1/+1
| | | | | | | | | | | | | We observed that the number of write requests thus inodelks are increasing very rapidly to thousands without write-behind in the graph. Change-Id: I901a6a820eb7b21b413d33e1a0a3420c7f4746a8 BUG: 928341 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4736 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* md-cache: Make options structure NULL terminatedKrishnan Parthasarathi2013-05-201-0/+1
| | | | | | | | | Change-Id: I8aa4f90ba7e1eecf3f978be04f8550049275464f BUG: 765785 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5028 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Start bricks on glusterd startup, only onceKrishnan Parthasarathi2013-05-164-5/+7
| | | | | | | | | | | | | | The restarting of bricks has been deffered until the cluster 'stabilizes' itself volumes' view. Since glusterd_spawn_daemons is executed everytime a peer 'joins' the cluster, it may inadvertently restart bricks that were taken offline for say, maintenance purposes. This fix avoids that. Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba BUG: 960190 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5022 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Refresh glusterd-syncop fixes from masterKrishnan Parthasarathi2013-05-162-46/+80
| | | | | | | | | | | | | | | | | | | Following commits were cherry-picked from master, 044f8ce syncop: Remove task from synclock's waitq before 'wake' cb6aeed glusterd: Give up big lock before performing any RPC 46572fe Revert "glusterd: Fix spurious wakeups in glusterd syncops" 5021e04 synctask: implement barriers around yield, not the other way 4843937 glusterd: Syncop callbks should take big lock too Change-Id: I5ae71ab98f9a336dc9bbf0e7b2ec50a6ed42b0f5 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4938 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5021 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Perform NULL check on rsp.op_errstr before using itKrutika Dhananjay2013-05-161-2/+1
| | | | | | | | | Change-Id: I2ae30f08965b26a21db541f87de78772cb17135a BUG: 962362 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4996 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix uuid to hostname conversion for 'volume status'Kaushal M2013-05-141-3/+4
| | | | | | | | | | BUG: 927648 Change-Id: Ic016e9d1f090372329a8a2e530dac5fc6ed6c5ae Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4874 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Don't queue transactions during open-fd fixPranith Kumar K2013-05-086-385/+167
| | | | | | | | | | | | | | | | | | | | | Before Anonymous fds are available, afr had to queue up transactions if the file is not opened on one of its subvolumes. This happens until the attempt to open the file either succeeds or fails. These attempts happen until the file is successfully opened on the subvolume. Now client xlator uses anonymous fds to perform the fops if the fd used for the fop is not 'opened'. Fops will be successful even when the file is not opened so there is no need to queue up the transactions anymore in afr. Open is attempted on the subvolume where it is not opened independent of the fop. Change-Id: I6d59293023e2de41c606395028c8980b83faca3f BUG: 953887 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4868 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: fix dangerous "sharing" of fd in readdir between two requestsv3.4.0beta1Anand Avati2013-05-071-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | posix_fill_readdir() is a multi-step function which performs many readdir() calls, and expects the directory cursor to have not "seeked away" elsewhere between two successive iterations. Usually this is not a problem as each opendir() from an application has its own backend fd, and there is nobody else to "seek away" the directory cursor. However in case of NFS's use of anonymous fd, the same fd_t is shared between all NFS readdir requests, and two readdir loops can be executing in parallel on the same dir dragging away the cursor in a chaotic manner. The fix in this patch is to lock on the fd around the loop. Another approach could be to reimplement posix_fill_readdir() with a single getdents() call, but that's for another day. Change-Id: Ia42e9c7fbcde43af4c0d08c20cc0f7419b98bd3f BUG: 948086 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4774 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4963
* storage/posix: honor O_SYNC and O_DSYNC sent in @flags of writev()Anand Avati2013-05-072-5/+10
| | | | | | | | | | | | | | | | | Historic bug - posix_writev() has been inspecting pfd->flushwrites for performing fsync() after write, instead of @flags for O_SYNC|O_DSYNC. pfd->flushwrites was never set anywhere and is unused completely. This is behavior from the time before anonymous FD where open() had @wbflags param. This is a leftover from that cleanup. Change-Id: Id9bfe562a60db4eb3bd0a7705bdba91f2df2f3ec BUG: 916372 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4738 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4962
* cluster/afr: Turn on eager-lock for fd DATA transactionsEmmanuel Dreyfus2013-05-072-21/+7
| | | | | | | | | | | | | | | | | | | | | | | Problem: With the present implementation, eager-lock is issued for any fd fop. eager-lock is being transferred to metadata transactions. But the lk-owner is set to local->fd address only for DATA transactions, but for METADATA transactions it is frame->root. Because of this unlock on the eager-lock fails and rebalance hangs. Fix: Enable eager-lock for fd DATA transactions This is a backport of change If30df7486a0b2f5e4150d3259d1261f81473ce8a http://review.gluster.org/#/c/4588/ BUG: 916226 Change-Id: Id41ac17f467c37e7fd8863e0c19932d7b16344f8 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/4899 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: Avoid double mem_put in ioc_readvPranith Kumar K2013-05-071-2/+3
| | | | | | | | | | | | | | On readv error io-cache frame->local is not set to NULL so the local is mem_put in STACK_DESTROY as well. This patch sets frame->local to NULL in all cases. BUG: 955751 Change-Id: I4a7340189efe02473452986b5870b02fcfa9038e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4886 Reviewed-by: Raghavendra G <raghavendra@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: big lock - a coarse-grained locking to prevent racesv3.4.0alpha3Krishnan Parthasarathi2013-04-1716-106/+657
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are primarily three lists that are part of glusterd process, that are concurrently accessed. Namely, priv->volumes, priv->peers and volinfo->bricks_list. Big-lock approach ----------------- WHAT IS IT? Big lock is a coarse-grained lock which protects all three lists, mentioned above, from racy access. HOW DOES IT WORK? At any given point in time, glusterd's thread(s) are in execution _iff_ there is a preceding, inbound network event. Of course, the sigwaiter thread and timer thread are exceptions. A network event is an external trigger to glusterd, via the epoll thread, in the form of POLLIN and POLLERR. As long as we take the big-lock at all such entry points and yield it when we are done, we are guaranteed that all the network events, accessing the global lists, are serialised. This amounts to holding the big lock at - all the handlers of all the actors in glusterd. (POLLIN) - all the cbks in glusterd. (POLLIN) - rpc_notify (DISCONNECT event), if we access/modify one of the three lists. (POLLERR) In the case of synctask'ized volume operations, we must remember that, if we held the big lock for the entire duration of the handler, we may block other non-synctask rpc actors from executing. For eg, volume-start would block in PMAP SIGNIN, if done incorrectly. To prevent this, we need to yield the big lock, when we yield the synctask, and reacquire on waking up of the synctask. BUG: 948686 Change-Id: I429832f1fed67bcac0813403d58346558a403ce9 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4835 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fixed spurious wakeups in glusterd syncopsKrishnan Parthasarathi2013-04-172-15/+26
| | | | | | | | | | | | | glusterd syncops perform a barrier_wake whenever rpc_clnt_submit returned -1. This is based on the wrong assumption that the cbkfn wasn't called. This would result in one more wakeup than there ought to be. BUG: 948686 Change-Id: I839fd218a81255fe50c2047d67461d45360e894d Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4834 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: fix segfault on volume status detailLars Ellenberg2013-04-161-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | If for some reason glusterd_get_brick_root() fails, it frees the gf_strdup'ed *mount_point in its own error path, and returns -1. Unfortunately it already had assigned that pointer value to the output argument, the caller function glusterd_add_brick_detail() sees a non-NULL pointer, and free() again: segfault. Could be fixed with a one-liner (*mount_point = NULL) in the error path, but I think glusterd_get_brick_root() should only assign to the output argument once all checks passed, so I use a local temporary pointer, which increases the patch a bit. Change-Id: I3f3035f01e80a5e9bdf2da895e4cf7baa3dfbd2f BUG: 919352 Signed-off-by: Lars Ellenberg <lars@linbit.com> Reviewed-on: http://review.gluster.org/4646 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4841 Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: allow multiple instances of glusterd on one machineKrishnan Parthasarathi2013-04-163-1/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to support automated testing of cluster-communication features such as probing and quorum. In order to use this, you need to do the following preparatory steps. * Copy /var/lib/glusterd to another directory for each virtual host * Ensure that each virtual host has a different UUID in its glusterd.info Now you can start each copy of glusterd with the following xlator-options. * management.transport.socket.bind-address=$ip_address * management.working-directory=$unique_working_directory You can use 127.x.y.z addresses for binding without needing to assign them to interfaces explicitly. Note that you must use addresses, not names, because of some stuff in the socket code that's not worth fixing just for this usage, but after that you can use names in /etc/hosts instead. At this point you can issue CLI commands to a specific glusterd using the --remote-host option. So far probe, volume create/start/stop, mount, and basic I/O all seem to work as expected with multiple instances. Change-Id: I1beabb44cff8763d2774bc208b2ffcda27c1a550 BUG: 913555 Original-author: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4838 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* license: xlators/protocol/server dual license GPLv2 and LGPLv3+Kaleb S. KEITHLEY2013-04-1210-145/+57
| | | | | | | | | | | | cherry-pick from: refs/changes/16/4816/1; http://review.gluster.org/#/c/4816/ BUG: 951551 Change-Id: I3de5bd86d4238a60a0a85ba2e15d9c131969b210 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4817 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Fixed volume-sync in synctask codepath.Krishnan Parthasarathi2013-04-121-12/+18
| | | | | | | | | Change-Id: I2911d3ac80825310f84c5ba6bd7890e65e1ee219 BUG: 950048 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4643 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve mtime in self-healPranith Kumar K2013-04-121-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Data self-heal may choose sink iatt to set mtimes. This happens because after syncing of data is done self-heal does one more xattrops/fstat to determine sources sinks to set the inode-ctx. Since this is done after data syncing and erase of xattrs, old source and old sink are now sources, but the mtimes of them differ. Old code just takes the first source from the list and update mtimes, which could be sink before the self-heal started. Fix: Set mtime from 'sources before syncing'. Change-Id: Id769e1b99aa4f041eaee775f64cbf2c57b799723 BUG: 918437 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4658 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4663
* cluster/distribute: Ignore non-participating subvols for layout checksshishir gowda2013-04-112-20/+88
| | | | | | | | | | | | | | | | | | | | | | | | | Backporting fix http://review.gluster.org/#/c/4668/ When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. BUG: 921408 Change-Id: I75a8edcb92af5b53b3253c9addd7a812e9242836 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4800 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: improve transform/detransform of d_off (and be ext4 safe)shishir gowda2013-04-111-5/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backporting Avati's fix http://review.gluster.org/4711 The scheme to encode brick d_off and brick id into global d_off has two approaches. Since both brick d_off and global d_off are both 64-bit wide, we need to be careful about how the brick id is encoded. Filesystems like XFS always give a d_off which fits within 32bits. So we have another 32bits (actually 31, in this scheme, as seen ahead) to encode the brick id - which is typically plenty. Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off, as the d_off is calculated based on a hash function value. This leaves us no "unused" bits to encode the brick id. However both these filesystmes (EXT4 more importantly) are "tolerant" in terms of the accuracy of the value presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which has the "closest" true offset. This "two-prong" scheme exploits this behavior - which seems to be the best middle ground amongst various approaches and has all the advantages of the old approach: - Works against XFS and EXT4, the two most common filesystems out there. (which wasn't an "advantage" of the old approach as it is borken against EXT4) - Probably works against most of the others as well. The ones which would NOT work are those which return HUGE d_offs _and_ NOT tolerant to seekdir() to "closest" true offset. - Nothing to "remember in memory" or evict "old entries". - Works fine across NFS server reboots and also NFS head failover. - Tolerant to seekdir() to arbitrary locations. Algorithm: Each d_off can be encoded in either of the two schemes. There is no requirement to encode all d_offs of a directory or a reply-set in the same scheme. The topmost bit of the 64 bits is used to specify the "type" of encoding of this particular d_off. If the topmost bit (bit-63) is 1, it indicates that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0, it indicates that the "small" d_off encoding scheme is used. The goal of the "small" d_off encoding is to stay as dense as possible towards the lower bits even in the global d_off. The goal of the HUGE d_off encoding is to stay as accurate (close) as possible to the "true" d_off after a round of encoding and decoding. If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick ID (call it "n"). SMALL d_off =========== Encoding -------- If the top n + 1 bits are free in a brick offset, then we leave the top bit as 0 and set the remaining bits based on the old formula: hi_mask = 0xffffffffffffffff hi_mask = ~(hi_mask >> (n + 1)) if ((hi_mask & d_off_brick) != 0) do_large_d_off_encoding () d_off_global = (d_off_brick * N) + brick_id Decoding -------- If the top bit in the global offset is 0, it indicates that this is the encoding formula used. So decoding such a global offset will be like the old formula: if ((d_off_global & 0x8000000000000000) != 0) do_large_d_off_decoding() d_off_brick = (d_off_global % N) brick_id = d_off_global / N HUGE d_off ========== Encoding -------- If the top n + 1 bits are NOT free in a given brick offset, then we set the top bit as 1 in the global offset. The low n bits are replaced by brick_id. low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N)) d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id if (d_off_global == 0xffffffffffffffff) discard_entry(); Decoding -------- If the top bit in the global offset is set 1, it indicates that the encoding formula used is above. So decoding would look like: hi_mask = (0xffffffffffffffff << n) low_mask = ~(hi_mask) d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff) brick_id = global_d_off & low_mask If "losing" the low n bits in this decoding of d_off_brick looks "scary", we need to realize that till recently EXT4 used to only return what can now be expressed as (d_off_global >> 32). The extra 31 bits of hash added by EXT recently, only decreases the probability of a collision, and not eliminate it completely, anyways. In a way, the "lost" n bits are made up by decreasing the probability of collision by sharding the files into N bricks / EXT directories -- call it "hash hedging", if you will :-) Change-Id: I9551c581c3f3d4c9e719764881036d554f60c557 Thanks-to: Zach Brown <zab@redhat.com> BUG: 838784 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4799 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: Start fs-crawl in separate thread so as not to block epollVarun Shastry2013-03-191-4/+2
| | | | | | | | | | | | | | | tests/basic/quota.t covers test case for this. Patch is only for 3.4 branch, http://review.gluster.org/4495 fixes the issue in master. Change-Id: I92674f5413441cc896245d5b3d0925f44ce8b2d3 BUG: 919998 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4680 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/distribute: Fix layout overlaps due to spread-count in selfheal pathshishir gowda2013-03-091-50/+12
| | | | | | | | | | | | | | | We needed to zero out the layout range, before we re-calculate the range. When spread-count is issued, we would end up with stale ranges in the layout. Replaced dht_selfheal_dir_xattr with dht_fix_dir_xattr, which correctly resets the un-used (after re-cal) layouts. Change-Id: I1a900d15df07335f59356bd23182ccec34381ab2 BUG: 884455 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4648 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/write-behind: guarantee non-overlapping concurrent writesv3.4.0alpha2Jeff Darcy2013-03-061-1/+65
| | | | | | | | | | | | | | | | | | Maintain a list of writes (either written behind or SYNC) which are currently "in progress" (i.e, STACK_WIND'ed towards server) and hold off any new STACK_WIND of write (either written behind or SYNC) which overlaps with any of the "in progress" writes. This is a guarantee which AFR's eager-lock depends upon (though not strictly a write-behind requirement) Change-Id: Icedd0b51b440366a906dc9223d62b7fd6ef2ca03 BUG: 857673 Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4642 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Increasing throughput of synctask based mgmt ops.Krishnan Parthasarathi2013-03-063-425/+563
| | | | | | | | | Change-Id: Ibd963f78707b157fc4c9729aa87206cfd5ecfe81 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* volgen: Use bind-address option for bricks when option set on glusterdKrishnan Parthasarathi2013-03-061-8/+20
| | | | | | | | | | | | | | | | Brick processes listen on all the interfaces on a given port. When multiple glusterds run on one machine, glusterd assumes that it 'owns' the ports on that machine. This can lead to the different glusterd instances to step on each other's ports. This fix ensures that brick processes listen only on the its host IP when glusterd has bind-address option set. Change-Id: I4c1b05643c64d3098bf56e977e768e611ffce0f5 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4637 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* performance/write-behind: mark fd bad if any written behind writes failRaghavendra G2013-03-061-57/+114
| | | | | | | | | BUG: 765473 Change-Id: I1ddd6ef9f5361aed96f97aa1344823836c6ddecb Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4630 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* Do not call xdr_string() with a NULL error messageJeff Darcy2013-03-051-0/+1
| | | | | | | | | | | | | It is illegal to call xdr_string() with a NULL string. Linux just retruns false, NetBSD gets a SIGSEGV when xdr_string() calls strlen(NULL) BUG: 916439 Change-Id: Ia958470ada6e8e55a86d439922ec942d038f5f13 Original-author: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4629
* cluster/afr: do complete split-brain check in all the fd based fopsRaghavendra Bhat2013-03-054-19/+33
| | | | | | | | | | | | | | | | | fd based operations such as readv checked only for data split brain instead of complete split-brain (i.e both data + metadata) assuming that open would have done the complete split-brain check. However open-behind would have unwound open, without winding to afr thus preventing the complete split-brain check and some appliations will be able to read the contents of the file even though the file has metadata split-brain. So let all the fd based fops do a defensive check of complete split-brain. Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix check for task-id existence in 'volume status'Kaushal M2013-03-053-3/+8
| | | | | | | | | | | | | | This fixes the issue of task-id tests failing randomly. The condition used to check rebalance/remove-brick was running was wrong, which could lead to the task-id for these tasks to not be displayed even when the actual commit hadn't occured. BUG: 857330 Change-Id: I0f86c6bbe7acec586ee0ea6e663369ea26171904 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4617 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rpc: bring in root-squashing behavior in rpcRaghavendra Bhat2013-03-042-0/+10
| | | | | | | | | | | | | | | | | | * requests coming in as root are converted to nfsnobody * with open-behind some acl checks wont happen and nfsnobody can read the file "whose owner is root and other users do not have permission to read the file". This is becasue open-behind does not send the open to the brick and sends success to the application, thus the acl related tests on the file wont happen which would have prevented the file from being opened. Change-Id: I12a3e6b2a12884d00bb81f2779074fed09b1b2e4 BUG: 887145 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4619 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Reopen fds in migration internally as root:rootshishir gowda2013-03-041-1/+10
| | | | | | | | | | | | | | Though linkfile_create and rebalance dst file create sent a setattr with correct ownership, there is still a race window where the linkfile open (client open due to migration) will fail, as its ownership will be root:root. BUG: 884597 Change-Id: Iba73681eae4f280d39ee6c9a40009e195768bee7 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4612 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Prevent spurious multiple defrag crawlsshishir gowda2013-03-041-9/+14
| | | | | | | | | | | | | | | | | In dht_notify, we used to create a thread to start defrag crawls after we had heard from all child subvols. This was in-correct, as a later event, could also trigger the crawl again(due to the fact that all subvols had responded). The fix is to make sure, the thread is started only once after all subvols have responded the first time BUG: 916449 Change-Id: I1619344fbb1cb51d5e1db38d8a29821fa870fa8b Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4610 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/distribute: Preserve file size during rebalance migrationshishir gowda2013-03-041-0/+6
| | | | | | | | | | | | | | | | | | If holes are encountered, then we do not write these to the dst, which sometimes causes file size to be lesser than src. Data is not corrupted, as when non-zero reads are received, we do write that data. Calling a truncrate to give file size to prevent it from being truncated to less than src in case the file end has holes. Thanks to Brian Foster for providing the test case BUG: 915554 Change-Id: I7e1e0c475118b073c3ebb87e93220c1ec22e8b7d Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4609 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/distribute: Remove suprious fd_unref callshishir gowda2013-03-041-2/+0
| | | | | | | | | | | | After fix http://review.gluster.org/4282 (libglusterfsterfs/syncop: do not hold ref on the fd in cbk) was pushed, syncop_open does not take a ref anymore. BUG: 910661 Change-Id: Idedff91270966e6e70e71ee83785c0228e238d31 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4608 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/dht: Create linkfile with file uid/gidshishir gowda2013-03-044-4/+101
| | | | | | | | | | | | | | | | Currently, linkfile creation happens as root. use uid/gid returned from _cbk (link/rename) to set the correct ownership of the link files. Also added test/dht.rc to implement common dht functions BUG: 884597 Change-Id: I6bc0e04f62d4716fc033681e5678e852a1be7a2f Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: use gf_strdup() in place of strdup()Krutika Dhananjay2013-03-041-1/+1
| | | | | | | | | Change-Id: Idee71019dbc6eeaa0a808d671b29d6f3038a1a89 BUG: 913487 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4563 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* nfs/server: Fix multiple crashes in acl handler.Vijay Bellur2013-03-031-10/+16
| | | | | | | | | Change-Id: I67c224c74c02f7058bcf546713501dd7ab810826 BUG: 915280 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4606 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mount.glusterfs: Introduce mem-accounting as an optionVijay Bellur2013-03-031-9/+17
| | | | | | | | | | | | | | | | option mem-accounting enables memory accounting for the client process. re-factored to keep options with values and options without values in different sections of mount.glusterfs. Change-Id: I54ebc31a1eae6d7a5ce7b0255cd7df74d37d46c1 BUG: 834465 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4528 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: harden 'volume start' staging to check for brick dirs' presenceKrutika Dhananjay2013-02-081-13/+49
| | | | | | | | | | | | | | | | | | | | | PROBLEM: When the brick directory of a volume is absent on any of the servers, AND an attempt is made to start the volume, commit fails ONLY on the node where the brick dir is absent, leading to a split-brain like situation. FIX: Harden 'volume start' to check for the presence of brick directories at the time of staging, thereby preventing commit failure. Change-Id: I67faeb9afbd3aa76f08645924462db126bf7a977 BUG: 889996 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4365 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: pathinfo xattr changes for directoriesVenky Shankar2013-02-082-92/+224
| | | | | | | | | | | | | | | | | | | | Since directories have presence on all subvolumes there is no definite meaning of ->hashed_subvol or ->cached_subvol. getxattr() code path chooses ->cached_subvol for pathinfo extended attribute. While this makes sense of files, it makes less sense for directories. Further if a hashed or a cached subvolume is down, and there's a getxattr request for a directory, we return with an errno. This patch changes pathinfo extended attribute contents by aggregating information from all subvolumes that are up. Change-Id: I58adb741d63ccfd1d0239af75eb65f26f0fb384d Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 856455 Reviewed-on: http://review.gluster.org/4047 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made gsync set use synctask frameworkAvra Sengupta2013-02-082-6/+2
| | | | | | | | | Change-Id: I409fa5a9f55434ece47a8a51d4812d3eca42d269 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4473 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Making volume-reset use synctask frameworkAvra Sengupta2013-02-081-5/+1
| | | | | | | | | Signed-off-by: Avra Sengupta <asengupt@redhat.com> Change-Id: Ib25c8fa69d84b8132505ae3f1e67cf88d3f6f9ec BUG: 852147 Reviewed-on: http://review.gluster.org/4474 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd : Made volume-set use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I1aa08ca843b87839180f9097bca370270a856e6d BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4488 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-sync use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I048aac2af4d9da9ed541d3756fefefbb2a29198e BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4489 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd : Made volume clear-locks use synctask framework.Avra Sengupta2013-02-087-16/+36
| | | | | | | | | Change-Id: Ia1fe3d0500d999c1f95b43c9e53947834e39d680 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4490 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-stop use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I88270f70538bb89d828bb51830b54e9f59be258a BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4491 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-delete use synctask framework.Avra Sengupta2013-02-081-5/+1
| | | | | | | | | Change-Id: I52573bb49946f904484e2ead483e8f6f41cbd0c8 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4492 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Made volume-statedump use synctask framework.Avra Sengupta2013-02-081-4/+1
| | | | | | | | | Change-Id: I230ecbd8978725070b5910ead4249f21038224a6 BUG: 852147 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4494 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gsyncd: allow the override of the compiled-in python pathJoe Julian2013-02-071-3/+7
| | | | | | | | | | | .. using the environment variable $PYTHON Change-Id: Ieaad8be98b826c803268216826e250d9944c8190 BUG: 882127 Signed-off-by: Joe Julian <me@joejulian.name> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4252 Tested-by: Gluster Build System <jenkins@build.gluster.com>