summaryrefslogtreecommitdiffstats
path: root/xlators/features
Commit message (Collapse)AuthorAgeFilesLines
* features/glupy: Rename Glupy python module to avoid namespace conflictrelease-3.5Justin Clift2014-03-249-17/+48
| | | | | | | | | | | | | | | | * Rename gluster.py to glupy.py to avoid namespace conflict (#1018619) * Move the main Glupy files into glusterfs-extra-xlators rpm * Move the Glupy Translator examples into glusterfs-devel rpm Backport of: http://review.gluster.org/#/c/6979/ Change-Id: Ie9b71b56502f8e98c527ade381c918446bc7f022 BUG: 1018619 Signed-off-by: Justin Clift <justin@gluster.org> Reviewed-on: http://review.gluster.org/7316 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* features/quota: fix the dict leak when quota is offVarun Shastry2014-03-151-2/+2
| | | | | | | | | Change-Id: I15036da90e96b69858eac5a19bd438df3bd8cc53 BUG: 1075506 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/7233 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep/gfid-access: Fix errno for non-existent GFID.Kotresh H R2014-03-021-0/+5
| | | | | | | | | | | | | | | | | | | | | Because of http://review.gluster.org/#/c/6318/ patch, ESTALE is returned for a lookukp on non-existent GFID. But ENOENT is more appropriate when lookup happens through virtual .gfid directory on aux-gfid-mount point. This is avoids confusion for the consumers of gfid-access-translator like geo-rep which expects ENOENT. Change-Id: I4add2edf5958bb59ce55d02726e6b3e801b101bb BUG: 1069191 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/7154 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/7163 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota: fix crash in error handling after building ancestry.Raghavendra G2014-02-081-1/+1
| | | | | | | | | Change-Id: Ifbebf1aa496d49a6c4bb30258b83aaf9792828e5 BUG: 1059833 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6924 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* quota: get directory size before enforcing quota on renameKrishnan Parthasarathi2014-02-052-12/+72
| | | | | | | | | | | Backport of http://review.gluster.org/6155 BUG: 1023974 Change-Id: Icb4bca8c4a5fa9b14855478e69b1315f2e9a3c3d Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6887 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* quota: fix recording of last alert log messageKrishnan Parthasarathi2014-02-051-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/6532 PROBLEM: Alert log messages, corresponding to disk usage crossing soft-limit on a directory, weren't being logged as often as expected. CAUSE: The mechanism in place to log alert messages, once every alert-time seconds, set the previous logged time incorrectly. FIX: Update previous logged time only if we logged an alert message, ie. when the "time was right" to alert. BUG: 969461 Change-Id: Ie019c180c97d2b049fa664b14ce087b88e3b578f Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6886 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/marker: Filter quota xattrs on file as wellPranith Kumar K2014-02-022-3/+70
| | | | | | | | | | | | | | | | | | | | | | Problem: Quota contributions of a file/directory are tracked by quota xlator using xattrs on the file. Quota allows these xattrs to be healed as part of metadata self-heal. This leads to wrong quota calculations on this brick after self-heal because quota xattrs don't represent the actual contributions on the brick anymore. Fix: Don't let self-heal of this xattr happen as part of self-heal by filtering quota xattrs on file in listxattr. Change-Id: Iea68a116595ba271e58c6fdcc3dd21c7bb55ebb3 BUG: 1035576 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6374 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6812
* mgmt/glusterd: make sure quota enforcer has established connection with ↵Raghavendra G2014-01-282-6/+48
| | | | | | | | | | | | | | | | | | quotad before marking quota as enabled. without this patch there is a window of time when quota is marked as enabled in quota-enforcer, but connection to quotad wouldn't have been established. Any checklimit done during this period can result in a failed fop because of unavailability of quotad. Change-Id: I0d509fabc434dd55ce9ec59157123524197fcc80 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6572 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6820
* features/quota: Handle the corner case in statfs callVarun Shastry2014-01-281-3/+3
| | | | | | | | | | | | | | Problem: Even though limit is not set quota used to send 'quota-deem-statfs' key in the dictionary resulting in incorrect calculations. Change-Id: I643cb35cca6648e40f1c745135280876958ddaa9 BUG: 1046894 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6840 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota: Metadata cleanupVarun Shastry2014-01-281-4/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quota and marker uses 'trusted.glusterfs.quota*' and 'trusted.pgfid*' xattrs to store its configurations and accounting information and also to build the parent inode chain in case of absense of path. Problem: After disabling and then enabling quota back, the xattrs may contain stale data leading to impaired accounting and thus improper enforcement. Solution: Clean up all the quota related xattrs after quota disable. Marker xlator implements a virtual xattr to cleanup quota and pgfid xattrs. In this approach glusterd mounts an auxiliary mount and sends the below command to all the files by crawling the mountpoint. #setfattr -n "glusterfs.quota-xattr-cleanup" -v 1 <path/to/file> Credit: Krishnan Parthasarathi <kparthas@redhat.com> Varun Shastry <vshastry@redhat.com> Change-Id: I9380eca58a285dc27dd572de1767aac8f2cd8049 BUG: 969461 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6369 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6838
* features/quota: maintain correct link_count in ancestry buildingRaghavendra G2014-01-281-53/+68
| | | | | | | | | | | | | | | | | | | | | | codepath. Ancestry building codepath can be executed simultaneously when a file has hardlinks. So, following fixes are needed: * modify local->link_count under locks in ancestry building codepath. * before invoking check_limit on newly constructed parents, link_count should not be set, but instead incremented by as many number of new invocations of check_limit. * decrementing link_count and the check whether the count has hit zero to resume the waiting call_stub should be atomic. Change-Id: I6f52e50547362a5ded6bd9f085570223aea372d1 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6491 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6819
* features/quota: remove in-memory accounting of files in enforcerRaghavendra G2014-01-282-221/+210
| | | | | | | | | | | | | | | | Accounting was done in enforcer (though marker is the ultimate source of truth) to offset cached directory size becoming stale. However, with enforcer being moved to brick we can no longer maintain correct cluster wide size for a directory. Hence removing accounting code from enforcer. Change-Id: I5ea94234da4da85ed5f5ced1354d8de3454b3fcb BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6434 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6818
* features/quota: use STACK_WIND_TAIL when quota is turned off.Raghavendra G2014-01-281-168/+207
| | | | | | | | | | Change-Id: I8a0b7f3a1995a72560c210efbad1eaafb0bdf329 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6373 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6816
* features/quota: log usage only if hard limit not exceeded.Raghavendra G2014-01-281-4/+9
| | | | | | | | | | Change-Id: I60abf576999996e0d0d65534e1e416f6e10994c8 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 969461 Reviewed-on: http://review.gluster.org/6479 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6814
* features/changelog: more changelog fixes.Ajeet Jha2014-01-278-38/+212
| | | | | | | | | | | | | | | | -> log additional records. -> include FOP number for metadata. -> prevent crash if inode is not found in a fop. Change-Id: I9edd4b71819ebd68c6a2b4150ae279c471d129da BUG: 1036536 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6403 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/6808 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/gfid-access: fix lookup on .gfid/<parent>/bnameVenky Shankar2014-01-271-0/+17
| | | | | | | | | | | | | | | In gfid translator, lookup was not handling the case when the lookup is sent on .gfid/<parent>/bname. In this case, we flip with fake inode of the parent with the real inode in loc and send it downwards. Change-Id: I639ff1dce10ffc045da419e333d455e208b6a0f0 BUG: 1057881 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/6795 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6807
* features/gfid-access: populating inode during virtual_lookup_cbk.Ajeet Jha2014-01-272-5/+6
| | | | | | | | | | | | Setting appropriate ia_type and gfid for the inode, obtained during virtual_lookup_cbk of a directory by doing an "inode_link". Change-Id: I9582570c82e70ff5f1d4e441f9e9868ef82e9dc6 BUG: 1054199 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6806 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gfid-access: fix the issue of entry creation with wrong gfidAmar Tumballi2014-01-271-6/+39
| | | | | | | | | | | | | | | | | | * dict_set was happening with a string instead of 'uuid_t' causing entry creations to happen with wrong gfid * revalidate was causing excessive ESTALE logs as lookup was happening on '.gfid/' path itself causing server to become clueless Change-Id: I3b76ce7fdec9c2ff785be21f506f859f489f80f0 BUG: 1054199 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/6520 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6805 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gfid-access: do chown() after creating the new entries.Amar Tumballi2014-01-272-8/+73
| | | | | | | | | | | | | | changing the 'frame->root->uid' on the fly is not a good idea as posix-acl xlator on brick process would fail the op. Change-Id: I996b43e4ce6efb04f52949976339dad6eb89bede Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 847839 Reviewed-on: http://review.gluster.org/5833 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6801 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* locks: set @lock->frame = NULL when lock is grantedAnand Avati2014-01-222-3/+5
| | | | | | | | | | | | This way disconnect cleanup code can differentiate which locks are granted vs blocked. Change-Id: I2a835c6865b6c804231d852953ea84eeccef35a3 BUG: 849630 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6762 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* locks: various fixesAnand Avati2014-01-147-626/+383
| | | | | | | | | | | | | | | | - implement ref/unref of entry locks (and fix bad pointer deref crashes) - code cleanup and deleted various data types - fix improper read/write lock conflict detection in entrylk - fix indefinite hang of blocked locks on disconnect - register locks in client_t synchronously, fix crashes in disconnect path Change-Id: Id273690c9111b8052139d1847060d1fb5a711924 BUG: 849630 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6695 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Use linkat() instead of link() for portability sakeEmmanuel Dreyfus2014-01-021-1/+2
| | | | | | | | | | | | | | | | | | This is a backport of Iccd27ac076b7a74e40dcbaa1c4762fd3ad59da5f POSIX does not says wether link(2) on symlink should link on symlink itself or on target. Linux use symlink, most other systems use target. Using linkat(2) allows the behavior to be specified, so that the behavior is portable. Also fix configure test for NetBSD linkat(2), which ceased to work. BUG: 764655 Change-Id: Ifcabda5e81b15cd80982bcfc05afda4c9e5370ef Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6612 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Use linkat() instead of link() for portability sakeEmmanuel Dreyfus2014-01-021-0/+20
| | | | | | | | | | | | | | | | | | This is a backport of Iccd27ac076b7a74e40dcbaa1c4762fd3ad59da5f POSIX does not says wether link(2) on symlink should link on symlink itself or on target. Linux use symlink, most other systems use target. Using linkat(2) allows the behavior to be specified, so that the behavior is portable. Also fix configure test for NetBSD linkata(2), which ceased to work. BUG: 764655 Change-Id: I7cf9e62ea19c7eb356935c11b480cf637c83126b Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6594 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli, glusterd: More quota fixes ...Krutika Dhananjay2013-12-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... which may be grouped under the following categories: 1. Fix incorrect cli exit status for 'quota list' cmd 2. Print appropriate error message on quota parse errors in cli Authored by: Anuradha Talur <atalur@redhat.com> 3. glusterd: Improve quota validation during stage-op 4. Fix peer probe issues resulting from quota conf checksum mismatches 5. Enhancements to CLI output in the event of quota command failures Authored by: Kaushal Madappa <kmadappa@redhat.com> 7. Move aux mount location from /tmp to /var/run/gluster Authored by: Krishnan Parthasarathi <kparthas@redhat.com> 8. Fix performance issues in quota limit-usage Authored by: Krutika Dhananjay <kdhananj@redhat.com> Note: Some functions that were used in earlier version of quota, that aren't called anymore have been removed. Change-Id: I963d4145f3ecdfe30c61bfa8920baccb33d2d4bd BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6386 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: instruct marker whenever it shouldn't do accountingRaghavendra G2013-11-261-40/+2
| | | | | | | | | | | | | | | | | | This is needed for two reasons: * since dht-linkfiles are internal, they shouldn't be accounted. * hardlink handling in marker is broken. link/unlink of hardlinks present in same directory can break marker accounting. Hence, if src and dst are in same directory in case of rename, dht - if it breaks rename into link/unlink operations - should instruct marker to not to do accounting. Change-Id: I9c9f7384569f75a2792f6450ee7a5279bf751ae7 BUG: 1022995 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6203 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/marker-quota: exclude dht-linkfiles from being accounted.Raghavendra G2013-11-261-2/+2
| | | | | | | | | Change-Id: I3239f5e8477664dcc04434e4d455ae447493a7ac BUG: 1022995 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6153 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/quota: make writes short when the entire write cannot fit into ↵Raghavendra G2013-11-262-10/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | available space. This patch aims to prevent creation of infinite zero byte sized files due to amount of storage available before exceeding quota limit being less than write sizes. Imagine x bytes of storage is available before we exceed quota limit and quota enforcer is receiving writes of size y and (y > x). In this scenario, if we run a shell script like: # for i in $(seq 1 10); do dd if=/dev/zero of=$i bs=y count=1; done Then, we would end up with 10 zero byte sized files, because we allow only complete writes and all writes will fail because of lack of space. However, creates succeed since a create itself will consume zero bytes. In this pattern of creates and writes, size of volume would never grow and x bytes of space will always be available and we can end up with an infinite number of zero byte sized files. Change-Id: Ice148d6a2207883e41759f7b0be73abaa3198b41 BUG: 1012216 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6035 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/quota: Improvements to quotaRaghavendra G2013-11-2610-984/+2947
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Two stages of quota enforcement is done: Soft and hard quota Upon reaching soft quota limit on the directory it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log) and no more writes allowed after hard quota limit. After reaching the soft-limit the daemon alerts the user/admin repeatively for every 'alert-time', which is configurable. * Quota enforcer is moved to server-side. It takes care of enforcing quota. Since enforcer doesn't have the cluster view, it relies on another service called quota-aggregator. Aggregator, on query can return the size of a directory based on the cluster view. Enforcer is always loaded in the server graph and is by passed if the feature is not enabled. Options specific to enforcer: server-quota - Specifies whether the feature is on/off. It is used to by pass the quota if turned off. deem-statfs - If set to on, it takes quota limits into consideration while estimating fs size. (df command). The algorithm followed is, i. Adjust statvfs based on limit configured on root. ii. If limit is set on the inode passed, use size/limits on that inode to populate statvfs. Otherwise, use size/limits configured on root. iii. Upon statvfs, update the ctx->size on the inode. iv. Don't let DHT aggregate, instead take the maximum of the usages from the subvols of the DHT, since each of it contains the complete information. Enforcer also makes use of gfid-to-path conversion functionality to work correctly when a client like nfs predominently relies on nameless lookups. * Quota Aggregator acts as a thin client to provide cluster view Its a lightweight *gluster client* process with no mount point, started upon enabling quota or restarting the volume. This is a single process run on each brick, which can answer queries on all volumes in the cluster. Its volfile stored in GLUSTERD_DEFAULT_WORKING_DIR/quotad/quotad.vol. Credits: Raghavendra Bhat <rabhat@redhat.com> Varun Shastry <vshastry@redhat.com> Shishir Gowda <sgowda@redhat.com> Kruthika Dhananjay <kdhananj@redhat.com> Brian Foster <bfoster@redhat.com> Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: Id1cb25b414951da34c665a55f77385d482e0f9de BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5952 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/marker: quota friendly changesRaghavendra G2013-11-266-90/+387
| | | | | | | | | | | | | | | | | | | | | | | | | * handles renames on dht linkfiles correctly * nameless lookup friendly changes. uses gfid-to-path conversion functionality from storage/posix to build ancestry till root. * log message cleanup. * build inode contexts in readdirp * Accounting still not correct with hardlinks. Credits: ======== Vijay Bellur <vbellur@redhat.com> Raghavendra Bhat <rabhat@redhat.com> Change-Id: I415b6fbbc9691f5a38d9fd3c5d083a61e578bb81 BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5953 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* NetBSD missing backtrace(3) portability fixEmmanuel Dreyfus2013-11-191-0/+2
| | | | | | | | | | | | Implement backtrace(3) and backtrace_symbols(3) which do not exist in NetBSD While there, remove duplicate #include <stdio.h> BUG: 764655 Change-Id: Iccd695765906e085c3f8fcb670506d4fea68fa39 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/6285 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-141-1/+1
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/compress: Compression/DeCompression translatorPrashanth Pai2013-11-117-1/+1039
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * When a writev call occurs, the client compresses the data before sending it to server. On the server, compressed data is decompressed. Similarly, when a readv call occurs, the server compresses the data before sending it to client. On the client, the compressed data is decompressed. Thus the amount of data sent over the wire is minimized. * Compression/Decompression is done using Zlib library. * During normal operation, this is the format of data sent over wire : <compressed-data> + trailer(8) The trailer contains the CRC32 checksum and length of original uncompressed data. This is used for validation. HOW TO USE ---------- Turning on compression xlator: gluster volume set <vol_name> compress on Configurable options: gluster volume set <vol_name> compress.compression-level 8 gluster volume set <vol_name> compress.min-size 50 Change-Id: Ib7a66b6f1f70fe002b7c513588cdf75c69370805 BUG: 923540 Original-author : Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Prashanth Pai <nullpai@gmail.com> Signed-off-by: Prashanth Pai <ppai@redhat.com> Reviewed-on: http://review.gluster.org/3251 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/qemu-block: simplify coroutine model to use single synctask, ucontextBrian Foster2013-11-107-248/+72
| | | | | | | | | | | | | | | | | | | | | | | | The current coroutine model, mapping synctasks 1-1 with qemu internal Coroutines, has some unresolved raciness issues. This problem usually manifests as lifecycle mismatches between top-level (gluster created) synctasks and the subsequently created internal coroutines from that context. Qemu's internal queueing (and locking) can cause situations where the top-level synctask is destroyed before the internal scheduler has released references to memory, leading to use after free crashes and asserts. Simplify the coroutine model to use a single synctask as a coroutine processor and rely on the existing native ucontext coroutine implementation. The syncenv thread is donated to qemu and ensures a single top-level coroutine is processed at a time. Qemu now has complete control over coroutine scheduling. BUG: 986775 Change-Id: I38223479a608d80353128e390f243933fc946fd6 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/6110 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/qemu-block: add qemu backing image support (clone)Brian Foster2013-11-104-23/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic backing image support to the block-format mechanism. This is a functionality checkpoint that enables the raw mechanism required to support client driven "snapshot" and "clone" requests. This change enhances the block-format setxattr command to support an additional and optional backing image reference. For example: setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:<bimg>" ./newimage ... where <bimg> refers to the backing image for unallocated blocks in newimage. <bimg> can be provided in one of two formats: - a gfid string in the following format (assuming a valid gfid): <gfid:00000000-0000-0000-0000-000000000000> - or a filename that must be resident in the same directory as the new clone file being formatted. E.g., setxattr -n trusted.glusterfs.block-format -v "qcow2:10GB:baseimg" ./newimage This latter format is more restrictive, simply provided for convenience or until something more refined is available. This change makes no assumptions about the backing image file and affords no additional protection. It is up to the user/client to recognize the relationship between the files and manage them appropriately (i.e., no writes to the backing image, etc.). BUG: 986775 Change-Id: I7aff7bdc59b85a6459001a6bfeae4db6bf74f703 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5967 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-101-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* client_t: phase 2, refactor server_ctx and locks_ctx outKaleb S. KEITHLEY2013-10-317-117/+423
| | | | | | | | | | | | | | remove server_ctx and locks_ctx from client_ctx directly and store as into discrete entities in the scratch_ctx hooking up dump will be in phase 3 BUG: 849630 Change-Id: I94cea328326db236cdfdf306cb381e4d58f58d4c Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/5678 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/gfid-access: Handle inode remap when parent inode is NULLPranith Kumar K2013-10-311-7/+2
| | | | | | | | | | Change-Id: Ic3f9d30d75df0bbbdf8fe28446fabe62d90fa854 BUG: 1024666 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6194 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Add monotonic clocking counter for timer threadHarshavardhana2013-10-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gettimeofday() returns the current wall clock time and timezone. Using these functions in order to measure the passage of time (how long an operation took) therefore seems like a no-brainer. This time suffer's from some limitations: a. They have a low resolution: “High-performance” timing by definition, requires clock resolutions into the microseconds or better. b. They can jump forwards and backwards in time: Computer clocks all tick at slightly different rates, which causes the time to drift. Most systems have NTP enabled which periodically adjusts the system clock to keep them in sync with “actual” time. The adjustment can cause the clock to suddenly jump forward (artificially inflating your timing numbers) or jump backwards (causing your timing calculations to go negative or hugely positive). In such cases timer thread could go into an infinite loop. From 'man gettimeofday': ---------- .. .. The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2). .. .. ---------- Rationale: For calculating interval timing for Timer thread, all that’s needed should be clock as a simple counter that increments at a stable rate. This is necessary to avoid the jumps which are caused by using "wall time", this counter must be monotonic that can never “tick” backwards, ever. Change-Id: I701d31e71a85a73d21a6c5cd15583e7a5a645eeb BUG: 1017993 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/6070 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: [Feature] Command implementation to get heal-countVenkatesh Somyajulu2013-10-142-13/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently to know the number of files to be healed, either user has to go to backend and check the number of entries present in indices/xattrop directory. But if a volume consists of large number of bricks, going to each backend and counting the number of entries is a time-taking task. Otherwise user can give gluster volume heal vol-name info command but with this approach if no. of entries are very hugh in the indices/ xattrop directory, it will comsume time. So as a feature, new command is implemented. Command 1: gluster volume heal vn statistics heal-count This command will get the number of entries present in every brick of a volume. The output displays only entries count. Command 2: gluster volume heal vn statistics heal-count replica 192.168.122.1:/home/user/brickname Here if we are concerned with just one replica. So providing any one of the brick of a replica will get the number of entries to be healed for that replica only. Example: Replicate volume with replica count 2. Backend status: -------------- [root@dhcp-0-17 xattrop]# ls -lia | wc -l 1918 NOTE: Out of 1918, 2 entries are <xattrop-gfid> dummy entries so actual no. of entries to be healed are 1916. [root@dhcp-0-17 xattrop]# pwd /home/user/2ty/.glusterfs/indices/xattrop Command output: -------------- Gathering count of entries to be healed on volume volume3 has been successful Brick 192.168.122.1:/home/user/22iu Status: Brick is Not connected Entries count is not available Brick 192.168.122.1:/home/user/2ty Number of entries: 1916 Change-Id: I72452f3de50502dc898076ec74d434d9e77fd290 BUG: 1015990 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/6044 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/qemu-block: invoke bdrv_init() only onceBrian Foster2013-10-091-1/+5
| | | | | | | | | | | | | | | | bdrv_init() is intended to be invoked only once. If invoked again after initialization (i.e., due to graph changes), the block driver registration code can reinsert entries into bdrv_drivers, effectively corrupting the NULL terminated linked list. This manifests as an infinite loop for callers attempt to traverse the list (to locate a driver). BUG: 986775 Change-Id: I2f95bda3c753dece4648cdad009d0e2a2d16f644 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/6021 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/changelog : Improvement in changelog "encoding-change".ajha2013-09-294-17/+26
| | | | | | | | | | | | | | | | | change in encoding method of changelog was critical section for "fop dispatch thread", "roll-over thread" and "reconfigure dispatch thread". In this patch the "encoding-method" is changed by the reconfigure dispatch thread lazily during handle_change, which solves the concurrency among the racing threads. BUG: 1002940 Change-Id: I78c3e8887efa46d0fcc60755cdf4243031cfa3eb Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/5844 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
* core: block unused signals in created threadsAnand Avati2013-09-253-9/+9
| | | | | | | | | | | | | | | Block all signal except those which are set for explicit handling in glusterfs_signals_setup(). Since thread spawning code in libglusterfs and xlators can get called from application threads when used through libgfapi, it is necessary to do this blocking. Change-Id: Ia320f80521a83d2edcda50b9ad414583a0175281 BUG: 1011662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* qemu-block: support readdirpBrian Foster2013-09-171-0/+58
| | | | | | | | | | | | | | Support the readdirp fop in qemu-block to ensure that image files are handled correctly when readdirp is enabled. E.g., without readdirp support, incorrect stat data for formatted files can be reported back (and cached) via the client. BUG: 986775 Change-Id: Ibc4bd0b42548953ebe30832f3d853bb68095f0ac Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5946 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* qemu-block: fix building from distribution tarball when glib2-devel is installedNiels de Vos2013-09-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | Building RPMs from a 'make dist' tarball fails when qemu-block is enabled. Enabling is done automatically when the glib2 development files are available (enabled by ./configure). Manual building with: $ ./autogen.sh && ./configure && make dist && rpmbuild -ta *.tar.gz Building in mock works fine, glib2-devel is not installed by default so the qemu-block xlator gets disabled. This change also adds glib2-devel to the BuildRequires in the glusterfs.spec file, causing the qemu-block xlator to be built by default, and included in the glusterfs RPM. Change-Id: Ibb73628772586d9e07bbfde7a8ff2fc973489086 BUG: 986775 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/5896 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfid-access: Error logs for ga_newfile_parse_argsAvra Sengupta2013-09-051-11/+50
| | | | | | | | | | Change-Id: I7aab98a70793bee936272f0b795f4c22c3733dd2 BUG: 1001055 Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5705 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* features/marker: force xtime updates (configurable) for client-pid = -1Venky Shankar2013-09-042-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is required by Geo-Replication that does auxillary mount with client-pid as -1 (which has special treatment at specific places in GlusterFS), to trigger xtime updates on the intermediate master in a cascading setup. Marker too had a check to "not" mark updates for geo-replication's auxillary mounts. With the new geo-replication design, xtimes are not set by the master on the slave for all entities. Due to this cascading setups were broken. This patch introduces "geo-replication.ignore-pid-check" option as a "override" for the client-pid check for gsyncd's client-pid. When this options is enabled, marker start "marking" even if the updates are from the special client. Geo-Replication on the detection of itself being an intermediate master, enables this option. Change-Id: I9f7140edd12fef5480595ee0f93f35b94cdb8345 BUG: 996371 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5591 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gsyncd / geo-rep: Introduce basic crawl instrumentationVenky Shankar2013-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends the persistent instrumentation work done by Aravinda (@avishwa), by introducing a handfull of instrumentation variables for crawl. These variables are "pulled up" by glusterd in the event of a geo-replication status cli command and looks something like below: "Uptime=00:21:10;FilesSyned=2982;FilesPending=0;BytesPending=0;DeletesPending=0;" "FilesPending", "BytesPending" and "DeletesPending" are short-lived variables that are non-zero when a changelog is being processes (ie. when an active sync in ongoing). After a successfull changelog process "FilesPending" is summed up into "FilesSynced". The three short-lived variabled are then reset to zero and the data is persisted Additionally this patch also reverts some of the changes made for BZ #986929 (those were not needed). Change-Id: I948f1a0884ca71bc5e5bcfdc017d16c8c54fc30b BUG: 990420 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/5441 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* features/qemu-block: support for QCOW2 and QED formatsAnand Avati2013-09-0314-1/+2757
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for internals snapshots using QCOW2 and general framework for external snapshots (next patch) with QCOW2 and QED. For internal snapshots, the file must be "initialized" or "formatted" into QCOW2 format, and specify a file size. Snapshots can be created, deleted, and applied ("goto"). e.g: // Format and Initialize sh# setfattr -n trusted.glusterfs.block-format -v qcow2:10GB /mnt/imgfile sh# ls -l /mnt/imgfile -rw-r--r-- 1 root root 10G Jul 18 21:20 imgfile // Create a snapshot sh# setfattr -n trusted.glusterfs.block-snapshot-create -v name1 imgfile // Apply a snapshot sh# setfattr -n trusted.gluterfs.block-snapshot-goto -v name1 imgfile Change-Id: If993e057a9455967ba3fa9dcabb7f74b8b2cf4c3 BUG: 986775 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5367 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* features/locks : Improves debuggability of inode/entry locks.Anuradha Talur2013-09-036-41/+94
| | | | | | | | | | | | | | | | | Prints, in the statedump, the information about the mount that performed the inode/entry lk. For the entrylks that are granted after a blocked state, the blocked time is not printed. A patch for that will be sent later. Change-Id: Ib0c1ed21fa9328b435f96b590dd343f59814a08d BUG: 915629 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/5712 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: remove GLUSTERFS_CREATE_MODE_KEY usageAmar Tumballi2013-08-231-12/+5
| | | | | | | | | Change-Id: I23b8cb7223b91a55af1cd4214f61bbe0e87351f6 BUG: 952029 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5683 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>