summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* build: declare lvm_lv_from_name() if it is missing from lvm2app.hv3.4.0beta4Niels de Vos2013-06-271-0/+6
| | | | | | | | | | | | | | | | | | | | | The bd-xlator can not be built successfully on certain Debian distributions due to a missing declaration of lvm_lv_from_name(). This function is available for linking, but it does not exist in the header file. This change adds a detection for lvm_lv_from_name() in both the library for linking, and the declaration in the header file. If the 1st is missing, the bd-xlator can not be built, and if only the 2nd one is missing, we'll declare lvm_lv_from_name() ourselves. This makes it possible to build the bd-xlator on the affected Debian distributions too. Change-Id: If1845f6b6d676793677ebbcc6daf9ff12f7c3fd6 BUG: 976946 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/5260 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Disable transport before cleaning up rpc objectKrishnan Parthasarathi2013-06-183-19/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/5000 Problem: rpc_transport object, which is part of rpc_clnt, is destroyed prematurely. This is because, rpc_transport object is ref'd by socket layer and rpc layer. These ref's, until the synctask'izing of operations, were unref'd sequentially in the epoll thread. With more threads at play, the sequential unref guarantee is off. Fix: Shutting down the transport before proceeding with cleaning up of rpc_clnt object would serialize the unref's on the rpc_transport object and thus eliminating the race. Also, we don't store the address of brickinfo in brick's rpc notify function, to avoid the possibility of referring a freed brickinfo. Instead we use a string based id to 'reach' the corresponding brickinfo. Change-Id: If2739e2eeaee1e8b071ab2b6754b7ea0f81cfceb BUG: 962619 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5214 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Add a cmd for getting uuid of local nodeKrishnan Parthasarathi2013-06-181-0/+99
| | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/5175 (upstream) Usage: gluster system:: uuid get This is needed since we generate uuid of a node in a lazy manner. ie, we generate a uuid for the node only on the first volume or peer operation, when the node needs an external identity. With this command, we can force[1] the uuid generation, without a volume or peer operation performed. [1]: Querying for uuid (or uuid get), forces uuid to come into existence. Change-Id: I62c8b6754117756aa4d773dd48af4ddeb1a1d878 BUG: 971661 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5204 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Ignore directories matching *.tmp in storeKrishnan Parthasarathi2013-06-141-0/+1
| | | | | | | | | | | | | | Backport of http://review.gluster.org/5177 store being glusterd's persistent store under /var/lib/glusterd/ Change-Id: I1c01a09a8ce4a73ea612f05e7f14d4ab39ad1628 BUG: 971796 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5212 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Perform delayed changelog wakeups for anon fdv3.4.0beta3Pranith Kumar K2013-06-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Nfs xlator never does open on a file for performing writes, afr does not perform changelog wakeup for this fd so operations which do metadata operations as soon as the data operations are completed perceive a delay od 'post-op-delay-secs'. Fix: Perform changelog wakeup on anon-fd if the fd with same pid is not present in inode-list. Note: This approach is a short-term fix. A proper fix needs a new domain for taking metadata locks so that data/metadata locks don't compete with each other. BUG: 966018 Change-Id: Ia9188a253e7943801b665e1b9205e2f551952d87 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5067 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Fix crash in dht_migration_complete_check_task because of NULL fdEmmanuel Dreyfus2013-06-082-1/+3
| | | | | | | | | | | This is a backport of Ia5a5d40bcea7bfb320ef7096af1e035b8847d4ff BUG: 960055 Change-Id: Ibf3547a775d7ca2f3a097c880cdf38ffafb322da Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/5139 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* dht,posix: support for case discoveryAnand Avati2013-06-082-0/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is support for discovering a filename in a given directory which has a case insensitive match of a given name. It is implemented as a virtual extended attribute on the directory where the required filename is specified in the key. E.g: sh# getfattr -e "text" -n user.glusterfs.get_real_filename:FiLe-B /mnt/samba/patchy getfattr: Removing leading '/' from absolute path names # file: mnt/samba/patchy user.glusterfs.get_real_filename:FiLe-B="file-b" In reality, there can be multiple "answers" as the backend filesystem is case sensitive and there can be multiple files which can strcasecamp() successfully. In this case we pick the first matched file from the first responding server. If a matching file does not exist, we return ENOENT (and NOT ENODATA). This way the caller can differentiate between "unsupported" glusterfs API and file not existing. This API is used by Samba VFS to perform efficient discovery of the real filename without doing a full scan at the Samba level. Change-Id: I53054c4067cba69e585fd0bbce004495bc6e39e8 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5163 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gfapi: link inodes in relevant entry FOPsAnand Avati2013-06-081-3/+3
| | | | | | | | | | | | Do not let inode linking to happen only in lookup(). While that works, it is inefficient. Change-Id: I51bbfb6255ec4324ab17ff00566375f49d120c06 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* md-cache: support negative xattr entriesAnand Avati2013-06-081-10/+31
| | | | | | | | | | | | | | | Add support for negative xattr caching. For this, we need to fetch xattrs in every opportunity (including readdirplus) in order to treat missing key in cached dict as negative entry. This is crucial to detect missing ACL xattrs in Samba workload. Change-Id: I918a2ef4ab804724256f7546b15e808332ed518d BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5160 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* quick-read: prune cache on write/[f]truncateAnand Avati2013-06-081-0/+43
| | | | | | | | | | | | | Cache needs to be pruned on write and [f]truncate. The lack of this is causing Samba ping-pong test to return wierd 'data increment' values during startup. Change-Id: I9cd6a839bcd02de738d78638211b78f382f58e0a BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5158 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix-acl: fetch ACLs in readdirplusAnand Avati2013-06-081-0/+6
| | | | | | | | | | | | | Not fetching ACLs in readdirplus can potentially result in spurious wrong ACL decisions (which magically go away on a lookup() which populates the ACLs) Change-Id: Ided38b4d868fab482b477ce51b4878289ef9eed0 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5156 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Avoid order mismatch in blocking entrylksPranith Kumar K2013-06-061-6/+9
| | | | | | | | | | | | | | | | | | | | | | Problem: When taking blocking entrylks, afr orders the entrylks based on uuid_compare of gfids of parent dirs, if they are equal then it orders them based on the basenames. While this approach works fine, the implementation assumes loc->gfids to be populated at the time of the comparison, but loc may have gfid in loc->inode->gfid instead of loc->gfid which was leading to order mismatches and dead-locks. Fix: Implemented loc_gfid which gives gfid by checking both loc->gfid, loc->inode->gfid. Used this for ordering the blocking entrylks. Change-Id: I2743fcaff3d670fbeb6b8e0a496f106a3585dde1 BUG: 965987 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5063 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd-volgen: Improve volume op-versions calculationKaushal M2013-06-054-454/+582
| | | | | | | | | | | | | | | | | | Backport of patch on master branch, under review at http://review.gluster.org/4952 Volume op-versions calculations now take into account if an option, a. enables/disables an xlator, or b. is a boolean option. This prevents op-versions from being updated when a feature is disabled. BUG: 954256 Change-Id: Ic68032b9e55a3f0191f8fc3ecd6b5ced385ad943 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5094 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd-volgen: Enable open-behind based on op-versionKaushal M2013-06-053-11/+50
| | | | | | | | | | | | | | | | Backport of patch on master branch, under review at http://review.gluster.org/4866 This patch enables the open-behind by default only when the op-version allows it. Also the volume op-version calculations take account of this enablement. BUG: 954256 Change-Id: Ie739bc23ba90ec2f009feecef28187912a37487c Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5095 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Set op-version on startup based on install statusKaushal M2013-06-051-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of change fa227c0 glusterd: Set op-version on startup based on install status from master. If the current installation of glusterfs doesn't have a stored op-version and is, a. a fresh install, then set op-version to maximum b. an upgrade from release which didn't have op-version support, set it to minimum. The install status is detected using the peer-uuid. If both peer-uuid and op-version are not present in the store, the installation is fresh. If peer-uuid is present, but op-version is absent in the store, the installation has been upgraded from a version which didn't support op-versions. By setting the initial op-version as above, we can ensure that a. features are not enabled accidentally during upgrades b. a fresh install starts with all possible features enabled. BUG: 954256 Change-Id: I5cdd0c63fd16ecfa2fede99684da6fd6167823a8 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5001 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Introduce volume op-versionsKaushal M2013-06-0510-389/+566
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of change 9153855 glusterd: Introduce volume op-versions from master. Each volume is now associated with two op-versions, * op_version - the op-version of the highest op-versioned feature enabled * client_op_version - the op-version of the highest op-versioned feature enabled which affects the clients only. These two op-versions are generated dynamically and kept updated during runtime. Glusterd now uses the respective volumes' client-op-version during getspec requests. To achieve the above a new field in the vme table is introduced, client_option, this boolean field tells if the option is a client side option. BUG: 907311 Change-Id: I59af02644a714e1c54fc89f1ead5aa551bba7ee7 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4957 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Backport of vme table changes from masterKaushal M2013-06-0512-260/+1168
| | | | | | | | | | | | | | | | | | | | | This patch backports the following changes from the master branch 99fe09f glusterd: Moved the volume entry table to a separate file. e306d08 glusterd: Changing the volume entry table's representation. eac54f6 glusterd: Added option description, and validation function fields. bcb4235 glusterd: Added validation function for performance cache max and min size. 8897d08 glusterd: Added validation function for quota-timeout. 4579609 glusterd: Added validation function for stripe-block-size. 6788bad glusterd: Fix some options in vme table 549231d glusterd: Added the validation function for subvols-per-directory 9636e63 glusterd: Added description for nfs.transport-type option in volume set help. Change-Id: I4a64ad94f17df4b45a3a32262a83e2c35fb5f7da BUG: 907311 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4956 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/write-behind: Enable write-behind when strict_O_DIRECT is not set.Vijay Bellur2013-05-291-2/+1
| | | | | | | | | | | | | | | When open() with O_DIRECT happens, write-behind was being disabled for the fd irrespective of strict_O_DIRECT option. This commit disables write-behind only when strict_O_DIRECT is enabled. Change-Id: Ieef180e52910c3bf64d46b26b0e5dc3b8542f6d2 BUG: 923556 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4697 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/5108 Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Give up biglock during rpc conn cleanupKrishnan Parthasarathi2013-05-281-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | glusterd could deadlock after a peer-detach command as follows, 1) glusterd_friend_cleanup function 'flushes' out messages in the rpc layer's queue, that haven't received a response. At this point, glusterd has already acquired the big lock. 2) The side-effect of flushing out the messages is that the corresponding call backs are called. Call backs themselves are executed after acquiring the big lock. This results in the big lock being acquired in a nested manner (in the same thread), which causes a deadlock. This can also happen during brick/NFS/SHD disconnect in volume-stop. Change-Id: Iab3aad143cd8ebbab53ea0b69687f0e7627dc8a9 BUG: 965533 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5084 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: check the inode context to be NULL before accessingv3.4.0beta2Raghavendra Bhat2013-05-231-0/+7
| | | | | | | | | | | | Change-Id: I475af7f8ffd5e5d8adbd2a74af20e56ad7751f69 BUG: 958108 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4916 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/5077 Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: Enable write-behind in nfsPranith Kumar K2013-05-211-1/+1
| | | | | | | | | | | | | We observed that the number of write requests thus inodelks are increasing very rapidly to thousands without write-behind in the graph. Change-Id: I901a6a820eb7b21b413d33e1a0a3420c7f4746a8 BUG: 928341 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4736 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* md-cache: Make options structure NULL terminatedKrishnan Parthasarathi2013-05-201-0/+1
| | | | | | | | | Change-Id: I8aa4f90ba7e1eecf3f978be04f8550049275464f BUG: 765785 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5028 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Start bricks on glusterd startup, only onceKrishnan Parthasarathi2013-05-164-5/+7
| | | | | | | | | | | | | | The restarting of bricks has been deffered until the cluster 'stabilizes' itself volumes' view. Since glusterd_spawn_daemons is executed everytime a peer 'joins' the cluster, it may inadvertently restart bricks that were taken offline for say, maintenance purposes. This fix avoids that. Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba BUG: 960190 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5022 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Refresh glusterd-syncop fixes from masterKrishnan Parthasarathi2013-05-162-46/+80
| | | | | | | | | | | | | | | | | | | Following commits were cherry-picked from master, 044f8ce syncop: Remove task from synclock's waitq before 'wake' cb6aeed glusterd: Give up big lock before performing any RPC 46572fe Revert "glusterd: Fix spurious wakeups in glusterd syncops" 5021e04 synctask: implement barriers around yield, not the other way 4843937 glusterd: Syncop callbks should take big lock too Change-Id: I5ae71ab98f9a336dc9bbf0e7b2ec50a6ed42b0f5 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4938 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5021 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Perform NULL check on rsp.op_errstr before using itKrutika Dhananjay2013-05-161-2/+1
| | | | | | | | | Change-Id: I2ae30f08965b26a21db541f87de78772cb17135a BUG: 962362 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/4996 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix uuid to hostname conversion for 'volume status'Kaushal M2013-05-141-3/+4
| | | | | | | | | | BUG: 927648 Change-Id: Ic016e9d1f090372329a8a2e530dac5fc6ed6c5ae Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4874 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Don't queue transactions during open-fd fixPranith Kumar K2013-05-086-385/+167
| | | | | | | | | | | | | | | | | | | | | Before Anonymous fds are available, afr had to queue up transactions if the file is not opened on one of its subvolumes. This happens until the attempt to open the file either succeeds or fails. These attempts happen until the file is successfully opened on the subvolume. Now client xlator uses anonymous fds to perform the fops if the fd used for the fop is not 'opened'. Fops will be successful even when the file is not opened so there is no need to queue up the transactions anymore in afr. Open is attempted on the subvolume where it is not opened independent of the fop. Change-Id: I6d59293023e2de41c606395028c8980b83faca3f BUG: 953887 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4868 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* posix: fix dangerous "sharing" of fd in readdir between two requestsv3.4.0beta1Anand Avati2013-05-071-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | posix_fill_readdir() is a multi-step function which performs many readdir() calls, and expects the directory cursor to have not "seeked away" elsewhere between two successive iterations. Usually this is not a problem as each opendir() from an application has its own backend fd, and there is nobody else to "seek away" the directory cursor. However in case of NFS's use of anonymous fd, the same fd_t is shared between all NFS readdir requests, and two readdir loops can be executing in parallel on the same dir dragging away the cursor in a chaotic manner. The fix in this patch is to lock on the fd around the loop. Another approach could be to reimplement posix_fill_readdir() with a single getdents() call, but that's for another day. Change-Id: Ia42e9c7fbcde43af4c0d08c20cc0f7419b98bd3f BUG: 948086 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4774 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4963
* storage/posix: honor O_SYNC and O_DSYNC sent in @flags of writev()Anand Avati2013-05-072-5/+10
| | | | | | | | | | | | | | | | | Historic bug - posix_writev() has been inspecting pfd->flushwrites for performing fsync() after write, instead of @flags for O_SYNC|O_DSYNC. pfd->flushwrites was never set anywhere and is unused completely. This is behavior from the time before anonymous FD where open() had @wbflags param. This is a leftover from that cleanup. Change-Id: Id9bfe562a60db4eb3bd0a7705bdba91f2df2f3ec BUG: 916372 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4738 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4962
* cluster/afr: Turn on eager-lock for fd DATA transactionsEmmanuel Dreyfus2013-05-072-21/+7
| | | | | | | | | | | | | | | | | | | | | | | Problem: With the present implementation, eager-lock is issued for any fd fop. eager-lock is being transferred to metadata transactions. But the lk-owner is set to local->fd address only for DATA transactions, but for METADATA transactions it is frame->root. Because of this unlock on the eager-lock fails and rebalance hangs. Fix: Enable eager-lock for fd DATA transactions This is a backport of change If30df7486a0b2f5e4150d3259d1261f81473ce8a http://review.gluster.org/#/c/4588/ BUG: 916226 Change-Id: Id41ac17f467c37e7fd8863e0c19932d7b16344f8 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/4899 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: Avoid double mem_put in ioc_readvPranith Kumar K2013-05-071-2/+3
| | | | | | | | | | | | | | On readv error io-cache frame->local is not set to NULL so the local is mem_put in STACK_DESTROY as well. This patch sets frame->local to NULL in all cases. BUG: 955751 Change-Id: I4a7340189efe02473452986b5870b02fcfa9038e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4886 Reviewed-by: Raghavendra G <raghavendra@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: big lock - a coarse-grained locking to prevent racesv3.4.0alpha3Krishnan Parthasarathi2013-04-1716-106/+657
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are primarily three lists that are part of glusterd process, that are concurrently accessed. Namely, priv->volumes, priv->peers and volinfo->bricks_list. Big-lock approach ----------------- WHAT IS IT? Big lock is a coarse-grained lock which protects all three lists, mentioned above, from racy access. HOW DOES IT WORK? At any given point in time, glusterd's thread(s) are in execution _iff_ there is a preceding, inbound network event. Of course, the sigwaiter thread and timer thread are exceptions. A network event is an external trigger to glusterd, via the epoll thread, in the form of POLLIN and POLLERR. As long as we take the big-lock at all such entry points and yield it when we are done, we are guaranteed that all the network events, accessing the global lists, are serialised. This amounts to holding the big lock at - all the handlers of all the actors in glusterd. (POLLIN) - all the cbks in glusterd. (POLLIN) - rpc_notify (DISCONNECT event), if we access/modify one of the three lists. (POLLERR) In the case of synctask'ized volume operations, we must remember that, if we held the big lock for the entire duration of the handler, we may block other non-synctask rpc actors from executing. For eg, volume-start would block in PMAP SIGNIN, if done incorrectly. To prevent this, we need to yield the big lock, when we yield the synctask, and reacquire on waking up of the synctask. BUG: 948686 Change-Id: I429832f1fed67bcac0813403d58346558a403ce9 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4835 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fixed spurious wakeups in glusterd syncopsKrishnan Parthasarathi2013-04-172-15/+26
| | | | | | | | | | | | | glusterd syncops perform a barrier_wake whenever rpc_clnt_submit returned -1. This is based on the wrong assumption that the cbkfn wasn't called. This would result in one more wakeup than there ought to be. BUG: 948686 Change-Id: I839fd218a81255fe50c2047d67461d45360e894d Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4834 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: fix segfault on volume status detailLars Ellenberg2013-04-161-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | If for some reason glusterd_get_brick_root() fails, it frees the gf_strdup'ed *mount_point in its own error path, and returns -1. Unfortunately it already had assigned that pointer value to the output argument, the caller function glusterd_add_brick_detail() sees a non-NULL pointer, and free() again: segfault. Could be fixed with a one-liner (*mount_point = NULL) in the error path, but I think glusterd_get_brick_root() should only assign to the output argument once all checks passed, so I use a local temporary pointer, which increases the patch a bit. Change-Id: I3f3035f01e80a5e9bdf2da895e4cf7baa3dfbd2f BUG: 919352 Signed-off-by: Lars Ellenberg <lars@linbit.com> Reviewed-on: http://review.gluster.org/4646 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4841 Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: allow multiple instances of glusterd on one machineKrishnan Parthasarathi2013-04-163-1/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to support automated testing of cluster-communication features such as probing and quorum. In order to use this, you need to do the following preparatory steps. * Copy /var/lib/glusterd to another directory for each virtual host * Ensure that each virtual host has a different UUID in its glusterd.info Now you can start each copy of glusterd with the following xlator-options. * management.transport.socket.bind-address=$ip_address * management.working-directory=$unique_working_directory You can use 127.x.y.z addresses for binding without needing to assign them to interfaces explicitly. Note that you must use addresses, not names, because of some stuff in the socket code that's not worth fixing just for this usage, but after that you can use names in /etc/hosts instead. At this point you can issue CLI commands to a specific glusterd using the --remote-host option. So far probe, volume create/start/stop, mount, and basic I/O all seem to work as expected with multiple instances. Change-Id: I1beabb44cff8763d2774bc208b2ffcda27c1a550 BUG: 913555 Original-author: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4838 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* license: xlators/protocol/server dual license GPLv2 and LGPLv3+Kaleb S. KEITHLEY2013-04-1210-145/+57
| | | | | | | | | | | | cherry-pick from: refs/changes/16/4816/1; http://review.gluster.org/#/c/4816/ BUG: 951551 Change-Id: I3de5bd86d4238a60a0a85ba2e15d9c131969b210 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/4817 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Fixed volume-sync in synctask codepath.Krishnan Parthasarathi2013-04-121-12/+18
| | | | | | | | | Change-Id: I2911d3ac80825310f84c5ba6bd7890e65e1ee219 BUG: 950048 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4643 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Preserve mtime in self-healPranith Kumar K2013-04-121-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Data self-heal may choose sink iatt to set mtimes. This happens because after syncing of data is done self-heal does one more xattrops/fstat to determine sources sinks to set the inode-ctx. Since this is done after data syncing and erase of xattrs, old source and old sink are now sources, but the mtimes of them differ. Old code just takes the first source from the list and update mtimes, which could be sink before the self-heal started. Fix: Set mtime from 'sources before syncing'. Change-Id: Id769e1b99aa4f041eaee775f64cbf2c57b799723 BUG: 918437 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4658 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/4663
* cluster/distribute: Ignore non-participating subvols for layout checksshishir gowda2013-04-112-20/+88
| | | | | | | | | | | | | | | | | | | | | | | | | Backporting fix http://review.gluster.org/#/c/4668/ When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. BUG: 921408 Change-Id: I75a8edcb92af5b53b3253c9addd7a812e9242836 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4800 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dht: improve transform/detransform of d_off (and be ext4 safe)shishir gowda2013-04-111-5/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backporting Avati's fix http://review.gluster.org/4711 The scheme to encode brick d_off and brick id into global d_off has two approaches. Since both brick d_off and global d_off are both 64-bit wide, we need to be careful about how the brick id is encoded. Filesystems like XFS always give a d_off which fits within 32bits. So we have another 32bits (actually 31, in this scheme, as seen ahead) to encode the brick id - which is typically plenty. Filesystems like the recent EXT4 utilize the upto 63 low bits in d_off, as the d_off is calculated based on a hash function value. This leaves us no "unused" bits to encode the brick id. However both these filesystmes (EXT4 more importantly) are "tolerant" in terms of the accuracy of the value presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which has the "closest" true offset. This "two-prong" scheme exploits this behavior - which seems to be the best middle ground amongst various approaches and has all the advantages of the old approach: - Works against XFS and EXT4, the two most common filesystems out there. (which wasn't an "advantage" of the old approach as it is borken against EXT4) - Probably works against most of the others as well. The ones which would NOT work are those which return HUGE d_offs _and_ NOT tolerant to seekdir() to "closest" true offset. - Nothing to "remember in memory" or evict "old entries". - Works fine across NFS server reboots and also NFS head failover. - Tolerant to seekdir() to arbitrary locations. Algorithm: Each d_off can be encoded in either of the two schemes. There is no requirement to encode all d_offs of a directory or a reply-set in the same scheme. The topmost bit of the 64 bits is used to specify the "type" of encoding of this particular d_off. If the topmost bit (bit-63) is 1, it indicates that the encoding scheme holds a HUGE d_off. If the topmost bit is is 0, it indicates that the "small" d_off encoding scheme is used. The goal of the "small" d_off encoding is to stay as dense as possible towards the lower bits even in the global d_off. The goal of the HUGE d_off encoding is to stay as accurate (close) as possible to the "true" d_off after a round of encoding and decoding. If DHT has N subvolumes, we need ROOF(Log2(N)) "bits" to encode the brick ID (call it "n"). SMALL d_off =========== Encoding -------- If the top n + 1 bits are free in a brick offset, then we leave the top bit as 0 and set the remaining bits based on the old formula: hi_mask = 0xffffffffffffffff hi_mask = ~(hi_mask >> (n + 1)) if ((hi_mask & d_off_brick) != 0) do_large_d_off_encoding () d_off_global = (d_off_brick * N) + brick_id Decoding -------- If the top bit in the global offset is 0, it indicates that this is the encoding formula used. So decoding such a global offset will be like the old formula: if ((d_off_global & 0x8000000000000000) != 0) do_large_d_off_decoding() d_off_brick = (d_off_global % N) brick_id = d_off_global / N HUGE d_off ========== Encoding -------- If the top n + 1 bits are NOT free in a given brick offset, then we set the top bit as 1 in the global offset. The low n bits are replaced by brick_id. low_mask = 0xffffffffffffffff << n // where n is ROOF(Log2(N)) d_off_global = (0x8000000000000000 | d_off_brick & low_mask) + brick_id if (d_off_global == 0xffffffffffffffff) discard_entry(); Decoding -------- If the top bit in the global offset is set 1, it indicates that the encoding formula used is above. So decoding would look like: hi_mask = (0xffffffffffffffff << n) low_mask = ~(hi_mask) d_off_brick = (global_d_off & hi_mask & 0x7fffffffffffffff) brick_id = global_d_off & low_mask If "losing" the low n bits in this decoding of d_off_brick looks "scary", we need to realize that till recently EXT4 used to only return what can now be expressed as (d_off_global >> 32). The extra 31 bits of hash added by EXT recently, only decreases the probability of a collision, and not eliminate it completely, anyways. In a way, the "lost" n bits are made up by decreasing the probability of collision by sharding the files into N bricks / EXT directories -- call it "hash hedging", if you will :-) Change-Id: I9551c581c3f3d4c9e719764881036d554f60c557 Thanks-to: Zach Brown <zab@redhat.com> BUG: 838784 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4799 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: Start fs-crawl in separate thread so as not to block epollVarun Shastry2013-03-191-4/+2
| | | | | | | | | | | | | | | tests/basic/quota.t covers test case for this. Patch is only for 3.4 branch, http://review.gluster.org/4495 fixes the issue in master. Change-Id: I92674f5413441cc896245d5b3d0925f44ce8b2d3 BUG: 919998 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/4680 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/distribute: Fix layout overlaps due to spread-count in selfheal pathshishir gowda2013-03-091-50/+12
| | | | | | | | | | | | | | | We needed to zero out the layout range, before we re-calculate the range. When spread-count is issued, we would end up with stale ranges in the layout. Replaced dht_selfheal_dir_xattr with dht_fix_dir_xattr, which correctly resets the un-used (after re-cal) layouts. Change-Id: I1a900d15df07335f59356bd23182ccec34381ab2 BUG: 884455 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4648 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/write-behind: guarantee non-overlapping concurrent writesv3.4.0alpha2Jeff Darcy2013-03-061-1/+65
| | | | | | | | | | | | | | | | | | Maintain a list of writes (either written behind or SYNC) which are currently "in progress" (i.e, STACK_WIND'ed towards server) and hold off any new STACK_WIND of write (either written behind or SYNC) which overlaps with any of the "in progress" writes. This is a guarantee which AFR's eager-lock depends upon (though not strictly a write-behind requirement) Change-Id: Icedd0b51b440366a906dc9223d62b7fd6ef2ca03 BUG: 857673 Original-author: Anand Avati <avati@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4642 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Increasing throughput of synctask based mgmt ops.Krishnan Parthasarathi2013-03-063-425/+563
| | | | | | | | | Change-Id: Ibd963f78707b157fc4c9729aa87206cfd5ecfe81 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* volgen: Use bind-address option for bricks when option set on glusterdKrishnan Parthasarathi2013-03-061-8/+20
| | | | | | | | | | | | | | | | Brick processes listen on all the interfaces on a given port. When multiple glusterds run on one machine, glusterd assumes that it 'owns' the ports on that machine. This can lead to the different glusterd instances to step on each other's ports. This fix ensures that brick processes listen only on the its host IP when glusterd has bind-address option set. Change-Id: I4c1b05643c64d3098bf56e977e768e611ffce0f5 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4637 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* performance/write-behind: mark fd bad if any written behind writes failRaghavendra G2013-03-061-57/+114
| | | | | | | | | BUG: 765473 Change-Id: I1ddd6ef9f5361aed96f97aa1344823836c6ddecb Signed-off-by: Raghavendra G <raghavendra@gluster.com> Reviewed-on: http://review.gluster.org/4630 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* Do not call xdr_string() with a NULL error messageJeff Darcy2013-03-051-0/+1
| | | | | | | | | | | | | It is illegal to call xdr_string() with a NULL string. Linux just retruns false, NetBSD gets a SIGSEGV when xdr_string() calls strlen(NULL) BUG: 916439 Change-Id: Ia958470ada6e8e55a86d439922ec942d038f5f13 Original-author: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4629
* cluster/afr: do complete split-brain check in all the fd based fopsRaghavendra Bhat2013-03-054-19/+33
| | | | | | | | | | | | | | | | | fd based operations such as readv checked only for data split brain instead of complete split-brain (i.e both data + metadata) assuming that open would have done the complete split-brain check. However open-behind would have unwound open, without winding to afr thus preventing the complete split-brain check and some appliations will be able to read the contents of the file even though the file has metadata split-brain. So let all the fd based fops do a defensive check of complete split-brain. Change-Id: I0ea52f782b371ce73e8e1c61f9def438fce1bd28 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4620 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix check for task-id existence in 'volume status'Kaushal M2013-03-053-3/+8
| | | | | | | | | | | | | | This fixes the issue of task-id tests failing randomly. The condition used to check rebalance/remove-brick was running was wrong, which could lead to the task-id for these tasks to not be displayed even when the actual commit hadn't occured. BUG: 857330 Change-Id: I0f86c6bbe7acec586ee0ea6e663369ea26171904 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4617 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* rpc: bring in root-squashing behavior in rpcRaghavendra Bhat2013-03-042-0/+10
| | | | | | | | | | | | | | | | | | * requests coming in as root are converted to nfsnobody * with open-behind some acl checks wont happen and nfsnobody can read the file "whose owner is root and other users do not have permission to read the file". This is becasue open-behind does not send the open to the brick and sends success to the application, thus the acl related tests on the file wont happen which would have prevented the file from being opened. Change-Id: I12a3e6b2a12884d00bb81f2779074fed09b1b2e4 BUG: 887145 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4619 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>