summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* NFS is picking up geo-rep's already open (read-only) file descriptorv3.3.2qa4v3.3.2Kaleb S. KEITHLEY2013-07-051-1/+1
| | | | | | | | | | | | | | Add anonymous member to fd_t and use it instead of over-loading pid for geo-rep and self heal Change-Id: I4d6b29a044a8ed4b8f69ff6e3f35ee227739b2af Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 874272 Reviewed-on: http://review.gluster.org/4185 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/5283
* cluster/afr: detect in-progress creation in lookup and return ENOENTPranith Kumar K2013-06-181-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port of http://review.gluster.org/4625 if any subvol returned ENOENT while parent entrylk lock was held, yield and return ENOENT for the entire lookup. This is how the issue happens: Multiple clients A, B and C are attempting 'mkdir -p /mnt/a/b/c' 1 Client A is in the middle of mkdir(/a). It has acquired lock. It has performed mkdir(/a) on one subvol, and second one is still in progress 2 Client B performs a lookup, sees directory /a on one, ENOENT on the other, succeeds lookup. 3 Client B performs lookup on /a/b on both subvols, both return ENOENT (one subvol because /a/b does not exist, another because /a itself does not exist) 4 Client B proceeds to mkdir /a/b. It obtains entrylk on inode=/a with basename=b on one subvol, but fails on other subvol as /a is yet to be created by Client A. 5 Client A finishes mkdir of /a on other subvol 6 Client C also attempts to create /a/b, lookup returns ENOENT on both subvols. 7 Client C tries to obtain entrylk on on inode=/a with basename=b, obtains on one subvol (where B had failed), and waits for B to unlock on other subvol. 8 Client B finishes mkdir() on one subvol with GFID-1 and completes transaction and unlocks 9 Client C gets the lock on the second subvol, At this stage second subvol already has /a/b created from Client B, but Client C does not check that in the middle of mkdir transaction 10 Client C attempts mkdir /a/b on both subvols. It succeeds on ONLY ONE (where Client B could not get lock because of missing parent /a dir) with GFID-2, and gets EEXIST from ONE subvol. This way we have /a/b in GFID mismatch. One subvol got GFID-1 because Client B performed transaction on only one subvol (because entrylk() could not be obtained on second subvol because of missing parent dir -- caused by premature/speculative succeeding of lookup() on /a when locks are detected). Other subvol gets GFID-2 from Client C because while it was waiting for entrylk() on both subvols, Client B was in the middle of creating mkdir() on only one subvol, and Client C does not "expect" this when it is between lock() and pre-op()/op() phase of the transaction. Change-Id: I40107d4638ffdcb7b1ff4748c8e5ea92e62697e8 BUG: 860210 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5173 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Linkfiles creation with correct uid/gidv3.3.2qa3shishir gowda2013-05-162-20/+47
| | | | | | | | | | | | | | | | | | | | If renames are done with different uid/gid (non-owners), then we would end up with incorrect uid/gid. The fix is to create linkfiles, and heal the uid/gid as root:root. This preserves our notion of creation as root:root and heal the uid/gid as root:root in all paths. Additionally, we need to consider uid/gid from only src_cached subvol, and not from linkfiles. rename is also done as root:root if done on linkfile, as setattr of ownership on linkfile is done after the rename BUG: 884597 Change-Id: Ifaacd8dba0f39cb909761ffc8fe7e06cd44ec8de Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/5025 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Create linkfile with file uid/gidshishir gowda2013-05-164-3/+96
| | | | | | | | | | | | | | Currently, linkfile creation happens as root. use uid/gid returned from _cbk (link/rename) to set the correct ownership of the link files. Change-Id: I5345cff193d5095442ca446fbe5ea05f2c2d86a3 Signed-off-by: shishir gowda <sgowda@redhat.com> BUG: 884597 Reviewed-on: http://review.gluster.org/5024 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libglusterfs/statedump: move options file and statedumps from /tmpRaghavendra Bhat2013-05-143-5/+6
| | | | | | | | | Change-Id: I6b107b9a668b0521b955dba8895cbbeaf9e7cb02 BUG: 764890 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/5005 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: retire old style ssh setupCsaba Henk2013-04-273-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | Users are still using geo-rep with the old, deprecated, insecure, unsupported ssh setup. Not their fault -- the implementation of the new method had the following charasteristics: - old method is possible, but with default settings it's not working - it can be made operational by fiddling with "remote-gsyncd" tunable - with default setting, an unhelpful, actually misleading error message is produced - the UI gave no hint to the changes in the ssh setup http://review.gluster.org/4392 tried to fix these; what it accomplished was unrestricted support to the bad practice (by making the default old setup operational). From this on: - we disable the old method by reserving the "remote-gsyncd" tunable - if the old method is attempted, give a hint what to do Change-Id: Icade94725d8d8d2d4c89cab992d4226351637b86 BUG: 895656 Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: http://review.gluster.org/4892 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: replace obsolete /usr/local reference for remote ssh/gsyncdCsaba Henk2013-04-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | See https://bugzilla.redhat.com/show_bug.cgi?id=895656 https://bugzilla.redhat.com/show_bug.cgi?id=764679 (GLUSTER-2947) https://bugzilla.redhat.com/show_bug.cgi?id=764623 (GLUSTER-2891) The comments in the bzs are a bit obtuse and/or vague. As near as I can make out we had, for a while, a "convenience symlink" to or from /usr/local/libexec/gsyncd, which no longer exists. And, lacking any comments in the code, I gather this is some sort of fallback or failsafe logic: if the first, normal attempt to invoke gsyncd fails then an attempt is made to ssh to the box and invoke it. In any event, there's nothing in /usr/local/... so it's unquestionably wrong to try to invoke anything there. [Backporting Kaleb's patch] BUG: 895656 Change-Id: I3b7ac7a049b91ce101b930599294830147cc60ad Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Signed-off-by: Csaba Henk <csaba@redhat.com> Reviewed-on: http://review.gluster.org/4891 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* distribute: Fix fds being leaked during rebalanceKaushal M2013-04-261-8/+6
| | | | | | | | | | | | | | | | | | This patch is a backport of 2 patches from master branch which fixes the leak of fds during a rebalance process. The patches are, * libglusterfs/syncop: do not hold ref on the fd in cbk (e979c0de9dde14fe18d0ad7298c6da9cc878bbab) * cluster/distribute: Remove suprious fd_unref call (5d29e598665456b2b7250fdca14de7409098877a) Change-Id: Icea1d0b32cb3670f7decc24261996bca3fe816dc BUG: 928631 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4888 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/dht: Correct min_free_disk behaviourv3.3.2qa2Varun Shastry2013-04-172-27/+90
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Files were being created in subvol which had less than min_free_disk available even in the cases where other subvols with more space were available. Solution: Changed the logic to look for subvol which has more space available. In cases where all the subvols have lesser than Min_free_disk available , the one with max space and atleast one inode is available. Known Issue: Cannot ensure that first file that is created right after min-free-value is crossed on a brick will get created in other brick because disk usage stat takes some time to update in glusterprocess. Will fix that as part of another bug. Change-Id: Icaba552db053ad8b00be0914b1f4853fb7661bd3 BUG: 874554 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4839 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: improve transform/detransform of d_off (and be ext4 safe)shishir gowda2013-04-161-5/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backporting Avati's fix http://review.gluster.org/4711 The scheme to encode brick d_off and brick id into global d_off has two approaches. Since both brick d_off and global d_off are both 64-bit wide, we need to be careful about how the brick id is encoded. Filesystems like XFS always give a d_off which fits within 32bits. So we have another 32bits (actually 31, in this scheme, as seen ahead) to encode the brick id - which is typically plenty. Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off, as the d_off is calculated based on a hash function value. This leaves us no "unused" bits to encode the brick id. However both these filesystmes (EXT4 more importantly) are "tolerant" in terms of the accuracy of the value presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which has the "closest" true offset. This "two-prong" scheme exploits this behavior - which seems to be the best middle ground amongst various approaches and has all the advantages of the old approach: - Works against XFS and EXT4, the two most common filesystems out there. (which wasn't an "advantage" of the old approach as it is borken against EXT4) - Probably works against most of the others as well. The ones which would NOT work are those which return HUGE d_offs _and_ NOT tolerant to seekdir() to "closest" true offset. - Nothing to "remember in memory" or evict "old entries". - Works fine across NFS server reboots and also NFS head failover. - Tolerant to seekdir() to arbitrary locations. Algorithm: Each d_off can be encoded in either of the two schemes. There is no requirement to encode all d_offs of a directory or a reply-set in the same scheme. The topmost bit of the 64 bits is used to specify the "type" of encoding of this particular d_off. If the topmost bit (bit-63) is 1, it indicates that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0, it indicates that the "small" d_off encoding scheme is used. The goal of the "small" d_off encoding is to stay as dense as possible towards the lower bits even in the global d_off. The goal of the HUGE d_off encoding is to stay as accurate (close) as possible to the "true" d_off after a round of encoding and decoding. If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick ID (call it "n"). SMALL d_off =========== Encoding -------- If the top n + 1 bits are free in a brick offset, then we leave the top bit as 0 and set the remaining bits based on the old formula: hi_mask = 0xffffffffffffffff hi_mask = ~(hi_mask >> (n + 1)) if ((hi_mask & d_off_brick) != 0) do_large_d_off_encoding () d_off_global = (d_off_brick * N) + brick_id Decoding -------- If the top bit in the global offset is 0, it indicates that this is the encoding formula used. So decoding such a global offset will be like the old formula: if ((d_off_global & 0x8000000000000000) != 0) do_large_d_off_decoding() d_off_brick = (d_off_global % N) brick_id = d_off_global / N HUGE d_off ========== Encoding -------- If the top n + 1 bits are NOT free in a given brick offset, then we set the top bit as 1 in the global offset. The low n bits are replaced by brick_id. low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N)) d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id if (d_off_global == 0xffffffffffffffff) discard_entry(); Decoding -------- If the top bit in the global offset is set 1, it indicates that the encoding formula used is above. So decoding would look like: hi_mask = (0xffffffffffffffff << n) low_mask = ~(hi_mask) d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff) brick_id = global_d_off & low_mask If "losing" the low n bits in this decoding of d_off_brick looks "scary", we need to realize that till recently EXT4 used to only return what can now be expressed as (d_off_global >> 32). The extra 31 bits of hash added by EXT recently, only decreases the probability of a collision, and not eliminate it completely, anyways. In a way, the "lost" n bits are made up by decreasing the probability of collision by sharding the files into N bricks / EXT directories -- call it "hash hedging", if you will :-) Change-Id: I9551c581c3f3d4c9e719764881036d554f60c557 Thanks-to: Zach Brown <zab@redhat.com> BUG: 838784 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4799 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4822
* cluster/afr: Try for all locks before failing in renamePranith Kumar K2013-04-101-3/+35
| | | | | | | | | Change-Id: If0e917e5d4914f6807b4a96f81668a467b15d0df BUG: 922809 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4689 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve mtime in self-healPranith Kumar K2013-04-101-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Data self-heal may choose sink iatt to set mtimes. This happens because after syncing of data is done self-heal does one more xattrops/fstat to determine sources sinks to set the inode-ctx. Since this is done after data syncing and erase of xattrs, old source and old sink are now sources, but the mtimes of them differ. Old code just takes the first source from the list and update mtimes, which could be sink before the self-heal started. Fix: Set mtime from 'sources before syncing'. Change-Id: Id769e1b99aa4f041eaee775f64cbf2c57b799723 BUG: 918437 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4658 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4664 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* localtime and ctime are not MT-SAFEKaleb S. KEITHLEY2013-02-0811-115/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a number of nit-level issues throughout the source with the use of localtime and ctime. While they apparently aren't causing too many problems, apart from the one in bz 828058, they ought to be fixed. Among the "real" problems that are fixed in this patch: 1) general localtime and ctime not MT-SAFE. There's a non-zero chance that another thread calling localtime (or ctime) will over-write the static data about to be used in another thread 2) localtime(& <64-bit-type>) or ctime(& <64-bit-type>) generally not a problem on 64-bit or little-endian 32-bit. But even though we probably have zero users on big-ending 32-bit platforms, it's still incorrect. 3) multiple nested calls passed as params. Last one wins, i.e. over- writes result of prior calls. 4) Inconsistent error handling. Most of these calls are for logging, tracing, or dumping. I submit that if an error somehow occurs in the call to localtime or ctime, the log/trace/dump still should still occur. 5) Appliances should all have their clocks set to UTC, and all log entries, traces, and dumps should use GMT. 6) fix strtok(), change to strtok_r() Other things this patch fixes/changes (that aren't bugs per se): 1) Change "%Y-%m-%d %H:%M:%S" and similar to their equivalent shorthand, e.g. "%F %T" 2) change sizeof(timestr) to sizeof timestr. sizeof is an operator, not a function. You don't use i +(32), why use sizeof(<var>). (And yes, you do use parens with sizeof(<type>).) 3) change 'char timestr[256]' to 'char timestr[32]' where appropriate. Per-thread stack is limited. Time strings are never longer than ~20 characters, so why waste 220+ bytes on the stack? Things this patch doesn't fix: 1) hodgepodge of %Y-%m-%d %H:%M:%S versus %Y/%m/%d-%H%M%S and other variations. It's not clear to me whether this ever matters, not to mention 3rd party log filtering tools may already rely on a particular format. Still it would be nice to have a single manifest constant and have every call to localtime/strftime consistently use the same format. BUG: 832173 Change-Id: Iee9719db4576eacc6c75694d9107954d0912cba8 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/3613 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* performance/quick-read: fix race condition in unlinkRaghavendra G2013-02-071-2/+2
| | | | | | | | | | | use same lock (inode->lock), while incrementing/decrementing local->open_count. Change-Id: I08cbab5b5dec09b6057f43324fe3152f1564ce46 BUG: 902174 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4396 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* mount/fuse: add mount-option "enable-ino32" for the native clientNiels de Vos2013-02-074-30/+51
| | | | | | | | | | | | | | By default the GlusterFS-native client uses 64-bit inodes. Some 32-bit applications can not handle these correctly. Introduce a client-side mount option "enable-ino32" which causes the FUSE-client to squash the 64-bit inodes into a 32-bit value. Change-Id: I7544010a27b7eb2d3b9fadb84ed934e4e7dff21e BUG: 850352 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/3886 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* system/posix-acl: prevent NULL pointer dereference of group_ceVarun Shastry2013-01-211-1/+1
| | | | | | | | | | | Thanks Amar Tumballi. Change-Id: I3ac9b46d4c3fcd12d1eec779317a03c47d267556 BUG: 887098 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4395 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* protocol/client: Remember the gfid of opened fdRajesh Amaravathi2012-11-193-112/+107
| | | | | | | | | | | | | | | | This is needed when the fresh lookup triggers self-heal, gfid won't be present in inode yet. Similar situation happens with Rebalance as it does not perform inode_link. Added similar fix for re-opendir. Removed inode from fdctx and removed some duplication of code. BUG: 826080 Change-Id: I5840b86bf70ef73d40ae899b34a210b2dbcbf91f Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4192 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Filter O_TRUNC in afr-fix-openPranith Kumar K2012-11-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | RCA: When open was done while a brick is down, afr opens the file after the brick comes backup. If this happens after the self-heal on the file is completed by self-heald etc, the file will end up in truncated state. Fix: Filter O_TRUNC while afr-fix-open because afr_open turns O_TRUNC into truncate transaction, so there will be pending changelog for the subvolume on which open fails. Testing: Had to simulate the race by stopping fix-open until self-heald completes self-heal on the file after brick online. Change-Id: If99eb3eb272dea0ed8c7b754dce675eb6efaf802 BUG: 841840 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4147 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd-volgen: by default include 'cluster/distribute' in volfileEmmanuel Dreyfus2012-10-251-6/+9
| | | | | | | | | | | | | This is a backport of Ie9d559e6b26aafd3d67908ab20a006e4e5e70d73 We need it in order to avoid spurious EINVAL when scaling from 1 brick to more in distributed volumes. BUG: 815227 Change-Id: I9858af03bf6d7724ff997f341faca62e89aecfb0 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/3838 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: return -1 if lstat call returns non zero value apart from -1Raghavendra Bhat2012-10-111-10/+30
| | | | | | | | | | | | | * If lstat() call in posix_{pstat, istat} returns non zero return value other than -1, then treat lstat() call to have been failed and return -1 itself. This might happen if there is some bug in the backend filesystem. Change-Id: Ie23787f6c838f14f92edadad71b83471e3d22289 BUG: 864401 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4054 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fixed some general typing errors.Varun Shastry2012-09-273-6/+6
| | | | | | | | | | | Eg: changed recieved to received Change-Id: I360fcb99c97c8a0222e373fee20ea2fccfb938db BUG: 860543 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/3999 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfs SEGV on Fedora 17 from UFO fallocate(2) callv3.3.1qa3Kaleb S. KEITHLEY2012-09-171-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An upload of a file will cause the volume's glusterfs to SEGV when it fields a FUSE_FALLOCATE op. Swift inspects libc to determine if there is a symbol for fallocate(2) and if so will use it. And while the libc in RHEL 6 does have fallocate(2), the version of fuse in RHEL 6 does not support fallocate, and things are handled gracefully elsewhere (the kernel perhaps?) N.B. fallocate was added to version 7.19 of fuse. Fedora 17 and later (and maybe earlier too) has 7.19. RHEL 6 still has 7.13. Glusterfs uses the 7.13 version <linux/fuse.h> (in contrib/fuse-include/fuse_kernel.h) Thus on Fedora 17, with both fallocate(2) in libc and fallocate support in fuse, the fallocate invocation is dispatched to glusterfs, but the dispatch table (fuse_std_ops in xlators/mount/fuse/src/fuse-bridge.c) is too short for one thing; the fallocate opcode (43) indexes beyond the end of the table, and even when that doesn't directly cause a SEGV, the NULL pointer at that location does cause a SEGV when attempting to call the function through the pointer. BUG: 856704 Change-Id: Iffe3994dde6ca29444d07d27eb04d6f86773fa03 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/3941 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Mohammed Junaid <junaid@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount.glusterfs: Add support for {attribute,entry}-timeout optionsKaushal M2012-09-171-0/+11
| | | | | | | | | | | Change-Id: Ib0c9b5be6f05cf9a36271df67e5e5c251c4c4628 BUG: 829279 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3840 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jeff Darcy <obdurodon@gmail.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: nfs.disable fix for "volume set help"Kaushal M2012-09-161-2/+7
| | | | | | | | | | | | | Fixes volgen to include "nfs.disable" in output of "volume set help". Change-Id: Idaac2cee04b7b38aad5a77db558808c0eb699fcf BUG: 828027 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3881 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* storage/posix: Option to set brick(of a volume)'s root dir's uid/gidKrishnan Parthasarathi2012-09-142-6/+46
| | | | | | | | | | | | | | | | | | CLI --- gluster volume set VOLNAME owner-uid uid gluster volume set VOLNAME owner-gid gid where uid,gid are the owner's user id and group id respectively that would be set on the root of all brick (backend) fs. TODO: uid/gid should not be -1. Today we don't validate that in CLI. Change-Id: Ib6a2fb5e404691c5fe105a89faaeff3e1ab72e91 BUG: 853842 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3939 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount.glusterfs NerBSD portability fixEmmanuel Dreyfus2012-09-121-1/+9
| | | | | | | | | | | | | | | | NetBSD stat(1) gets inode using -f %i while Linux uses -c %i This has already been fixed a few lines above, but one test failed to be fixed. This is not based on master, as the code hasbeen reworked a lot, and is already bug-free. BUG: 764655 Change-Id: I5dc1196ddba06ff31f695b7dbb0c6d28df32f324 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/3926 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: Added special key "group" for bulk volume set.Krishnan Parthasarathi2012-09-121-29/+37
| | | | | | | | | | | | | | | | | | gluster volume set VOLNAME group group_name - where group_name is a file under /var/lib/glusterd/groups containing one key, value pair per line as below, key1=value1 key2=value2 [...] - the command sets key1 to value1 and so on. Change-Id: Ic4c8dedb98d013b29a74e57f8ee7c1d3573137d2 BUG: 851237 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3896 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Honour configure --localstatedir and --sysconfdirEmmanuel Dreyfus2012-09-091-3/+4
| | | | | | | | | | | | | | Makes sure /etc/glusterd to /var/lib/glusterd migration does nonour configure --localstatedir and --sysconfdir. Backport of I65a5f96424d67531e81e75b084265bd4e6e30f29 BUG: 764655 Change-Id: I71e0d3b7f0d27b490b591dcc92ddfe26fb8e818d Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/3911 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Expect setmntent(3) to return NULLKrishnan Parthasarathi2012-09-091-2/+9
| | | | | | | | | | | - Closed the mtab FILE * using endmntent(3) Change-Id: I5e1ebb7f092abda638cfbb5524da693dcac6c872 BUG: 851109 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3922 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: Proper xml output for "gluster peer status"v3.3.1qa2Kaushal M2012-08-301-0/+5
| | | | | | | | | Change-Id: I90952ba2ea606552cf4ad67dd296a440f90592d6 BUG: 847760 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/3870 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fixed incorrect assumptions in rpcsvc actors of glusterdKrishnan Parthasarathi2012-08-302-19/+27
| | | | | | | | | | | | | | | | | | Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3864 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Conflicts: xlators/mgmt/glusterd/src/glusterd-handler.c Change-Id: Iabfcb401de9d658e32433aa1e8c87b329cbd2cf7 BUG: 851109 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/3876 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Self-heald: Fix inode leakPranith Kumar K2012-08-301-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | RCA: There is an inode-leak because inode_link returns linked inode by taking a reference. That needs to be unreffed. Fix: Added the code to perform unrefs. In addition to that updated the loc inode with the linked-inode because that is the best practice. The code to update the input inode's gfid can be removed later, its already removed in master. Tests: Checked that opendir comes with an loc with valid inode Checked that re-opendir happens successfully. Tested index, full self-heal work fine with the fix. BUG: 826580 Change-Id: I0c68192ff98f76152ed112b393d497b8fee93355 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3518 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht/rebalance: set the correct ownership on the dst file.shishir gowda2012-08-301-0/+8
| | | | | | | | | | | | | | Currently, the dst file created has root:root ownership, till migration is completed. During this phase, open fails on the dst file if uid/gid is non-root. Setting the dst_file to the correct ownership fixes the issue Change-Id: Icfec89eb10dc866cdee38dab17695fe21174ef99 BUG: 852361 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3862 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* storage/posix: implement native linux AIO supportshishir gowda2012-08-278-6/+657
| | | | | | | | | | | | | Configurable via cli with "storage.linux-aio" settable option Backported Avati's patch http://review.gluster.org/#change,3627 BUG: 837495 Change-Id: Ia7c26f5734d34d341debd422a5c59bba31eef844 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3849 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr: Avoid excessive logging in self-heal.v3.3.1qa1Krishnan Parthasarathi2012-08-176-20/+26
| | | | | | | | | | | | | | | - (Excessive) Logging has been very useful as 'bread-crumbs' in many a root-cause analyses. This patch aims at avoiding logging when the information could be reconstructed using the xattrs, statedump, and/or "volume heal" CLI commands. Change-Id: I8f646cbee44e98495ea6963f9dfcae95375c8900 BUG: 844804 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.com/3827 Reviewed-by: Pranith Kumar Karampuri <pranithk@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Handle child_up & fd not opened case in xactionPranith Kumar K2012-08-171-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: When an fd is opened while a brick is down, after the brick comes back up afr issues open on the other brick. It can fail for a number of reasons (enoent etc). While the system is in that state, inode/entrylks pre-op happen only on the brick that is up and fd is opened for fd-fops. post-op should consider only the bricks where both pre-op and fop succeeded as success, rest of them as failures. Code now marks only the children that are down as failures as opposed to child_down & fd-not-opened. This makes change-log appear as success on the subvolume where we did not do any fop leading to no change-log but differences in data/metadata for reg-files. Fix: Mark non-participants of fop as failure. This is tracked in transaction.pre_op[]. Tests: Simulated the scenario using err-gen on top of one of the client xlator which fails all fops always. Performed fops and the changelog represented pending fops on the brick with err-gen loaded. Tested the case of brick down and perform entry/metadata/data operations to confirm they still work as expected. Change-Id: I41905936126b19abba56ca581c0301a894507e1a BUG: 844987 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3776 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpc: Reduce frame-timeout for glusterd connectionsKaushal M2012-08-173-8/+46
| | | | | | | | | | | | | | | | Reduce frame-timeout for glusterd connections from 30mins to 10 mins. 30mins is too long when compared to cli timeout of 2mins. Changing to 10mins reduces the disparity between cli and glusterd. Also, fix glusterfs_submit_reply() so that a reply is sent even if serialize failed. BUG: 843003 Change-Id: Ie8d5ec16fbbb54318a5935a47065e66fd3338b87 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/3812 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Optimize readdirp calls in DHTshishir gowda2012-08-165-3/+68
| | | | | | | | | | | | | | | | | Bring in option which is supported by posix xlator to filter out directory's entries from being returned. DHT would now request non-first subvols to filter out directory entries. dht xlator-option readdir-optimize will enable this optimization Change-Id: Ibf99f1bef501f285ff44a1cecfbebee9e16063b6 BUG: 838199 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.com/3806 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: use pthread_mutex_trylock to hold mutex in statedumpsRaghavendra Bhat2012-08-161-18/+81
| | | | | | | | | | | | Do not use pthread_mutex_lock and gf_log functions while dumping information to statedump, to avoid deadlocks. Change-Id: I6569366856fc2bc0fefb49c8379e2e4337717ce4 BUG: 843787 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3799 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/write-behind: use pthread_mutex_trylock to hold mutex in statedumpsRaghavendra Bhat2012-08-151-3/+15
| | | | | | | | | Change-Id: I24c83b1b5e83ef3e38a019043c7fbca13b19ff43 BUG: 841543 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3815 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/stripe: Filter coalesce xattr from getfattrShylesh Kumar2012-08-151-0/+3
| | | | | | | | | Change-Id: I1c5740e29699ef464a3d30365396711f03c24974 Signed-off-by: Shylesh Kumar <shmohan@redhat.com> BUG: 801887 Reviewed-on: http://review.gluster.com/3809 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <sgowda@redhat.com>
* cluster/afr: Avoid setting split-brain outside inode locksPranith Kumar K2012-08-125-34/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: The bug is observed because the decision to mark a file in split-brain is taken outside appropriate locks. Lookup gathers xattrs outside any lock. The xattrs being in split-brain in lookup should only be taken as a hint. Appropriate inodelks should be taken before confirming a split-brain. Self-heal confirms this at the moment. Fix: Self-heals are launched to inspect xattrs when the data/metadata self-heal options are turned on. Decision to set/reset split-brain flag is taken inside appropriate locks. Known Issue After fix: If data/metadata self-heal is turned off, inspecting of xattrs could not be performed so split-brain behavior does not work correctly if the self-heal options are turned off. This bug is handled only in upstream. Change-Id: I59a43d5ce7bf9ca35bff54a51bf4cfa55d717a9e BUG: 833727 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3691 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/locks: Fix statedump codePranith Kumar K2012-08-124-74/+52
| | | | | | | | | | | | | | | | | | | | | | | | RCA: Taking blocking mutex/spin locks lead to dead locks because of the locking order in statedumps. Also we were asked to remove gf_logs if possible to avoid extra cost in signal handlers. Fix: changed blocking mutes/spin locks to their non-blocking variants. Removed gf_logs in locks xlator statedump code-path. Tests: State-dump success cases are working fine. Triggered try-lock failures by putting statedumps in a while loop. In parallel did chown of the same file in a while loop. BUG: 843781 Change-Id: Iac9b75d79cd5e036cd3eafc1e106074e2c6b5c47 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3752 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-threads: Provide option to turn off least-priorityPranith Kumar K2012-08-123-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: In cases when self-heal is in progress, self-heal fops are starved because of least-priority. This affects other fops with conflicting inode, entry locks with self-heal. Fix: This patch provides configuring enable/disable of least-priority. Additional changes: Moved RCHECKSUM fop to low instead of least because it will still affect the performance of other fops if RCHECKSUM is in LEAST priority. Tests: Tested that the enabling/disabling of fops is working fine. Tested that RCHECKSUM fop priority is assigned LOW when least-priority is disabled. BUG: 843704 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Change-Id: I892f99d6d0a3e0ae6c0a280f82e2203af0c346f6 Reviewed-on: http://review.gluster.com/3751 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/read-ahead: use pthread_mutex_trylock to hold mutex in statedumpsRaghavendra Bhat2012-08-111-11/+18
| | | | | | | | | Change-Id: I4de64915a9c6a46e126ef4a5b987e49de558f827 BUG: 843796 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3801 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/quick-read: use pthread_mutex_trylock to hold mutex in statedumpsRaghavendra Bhat2012-08-111-3/+17
| | | | | | | | | | | | Do not use pthread_mutex_lock and gf_log functions while dumping information to statedump, to avoid deadlocks. Change-Id: Ic77d96bc52f2a2a32629c0ae20bba797317e0a81 BUG: 843789 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3800 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* protocol/server: use pthread_mutex_trylock while dumping statedumpsRaghavendra Bhat2012-08-111-11/+29
| | | | | | | | | Change-Id: I2b04dc35a51d940915197cf8e26e638f32fa4d7b BUG: 843821 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.com/3802 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* acl: enable handling of FMODE_EXEC flagAmar Tumballi2012-08-091-0/+7
| | | | | | | | | | | | | | | | | | | | | | on linux systems, with open(), we can get below flag as per 'linux/fs.h'. /* File is opened for execution with sys_execve / sys_uselib */ '#define FMODE_EXEC ((fmode_t)0x20)' Instead of adding '#include <linux/fs.h>, its better to copy this absolute number into other variable because then we have to deal with declaring fmode_t etc etc.. With the fix, we can handle the file with '0711' permissions in the same way as backend linux filesystems. Change-Id: Ib1097fc0d2502af89c92d561eb4123cba15713f5 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 843960 Reviewed-on: http://review.gluster.com/3746 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster: fix crash on link of named pipe in stripe/replicate volBrian Foster2012-08-092-12/+11
| | | | | | | | | | | | | | | | | | | | | | | A crash occurs when attempting to link a named pipe on a striped, replicated volume. The cause for this crash is attempting to deref a NULL inode pointer in stripe_link_cbk(). The RCA for this bug uncovered a couple of problems: - AFR ignores the inode pointer it receives on failure (returning NULL). - stripe assumes the inode pointer is valid on failure. Either one of these changes addresses the crash, but this patch includes both changes. AFR is modified to pass along the inode pointer it receives (which could still be NULL). stripe is modified to not assume the inode pointer is valid on fop failure. BUG: 842825 Change-Id: I368849b7cfbb137a08ae5f89d26406814ff5bb09 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.com/3790 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/stripe: don't fail if no fctx on a non-regular fileBrian Foster2012-08-091-10/+14
| | | | | | | | | | | | cluster/stripe broke directory rename. Only check for fctx on regular files. BUG: 842652 Change-Id: I29d7b265cbe40921226feb3e1c4e6b97b3a01d95 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.com/3789 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>