summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* cluster/afr: remove unnecessary child_up initializationXavier Hernandez2018-02-031-7/+0
| | | | | | | | | | | | The child_up array was initialized with all elements being -1 to allow afr_notify() to differentiate down bricks from bricks that haven't reported yet. With current implementation this is not needed anymore and it was causing unexpected results when other parts of the code considered that if child_up[i] != 0, it meant that it was up. Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472 BUG: 1541038 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/dht: Fixed a leak in inode_refN Balachandran2018-02-021-3/+2
| | | | | | | | Introduced by commit d9f773ba719397c128 Change-Id: I3f3103a5a80daed7562ace72e5aa53b77e74fb94 BUG: 1541264 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: avoid overwriting client writes during migrationSusant Palai2018-02-029-12/+213
| | | | | | | | | | | | | | | | | | | | | | | | For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Fixed leak in dht_populate_inode_for_dentryN Balachandran2018-02-022-4/+10
| | | | | | | | | | Fixed an issue in dht_populate_inode_for_dentry where a layout is set in the inode without checking if it is already set. This overwrites the value each time without freeing the already existing layout. Change-Id: I651bf539a0b82b4ddc4c355890c16a8e91f5f1fd BUG: 1541264 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* sdfs: crash fixesAmar Tumballi2018-02-011-8/+14
| | | | | | | | | | | | | * from the patch which got tested in experimental branch, there was a code cleanup involved, which missed setting of a local variable, which led to crash immediately after enabling the feature. * added a sanity test case to validate all the fops of sdfs. Updates: #397 Change-Id: I7e0bebfc195c344620577cb16c1afc5f4e7d2d92 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* afr: don't treat all cases all bricks being blamed as split-brainRavishankar N2018-02-012-9/+48
| | | | | | | | | | | | | | | | | | | | | | | Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks while I/O or heal are in progress. Fix: Instead of undoing the post-op, pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae BUG: 1539358 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* protocol: utilize the version 4 xdrAmar Tumballi2018-02-015-150/+430
| | | | | | | updates #384 Change-Id: Id80bf470988dbecc69779de9eb64088559cb1f6a Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster/ec: Do lock conflict check correctly for wait-listPranith Kumar K2018-02-011-8/+15
| | | | | | | | | | | | | | Problem: ec_link_has_lock_conflict() is traversing over only owner_list but the function is also getting called with wait_list. Fix: Modify ec_link_has_lock_conflict() to traverse lists correctly. Updated the callers to reflect the changes. BUG: 1540669 Change-Id: Ibd7ea10f4498e7c2761f9a6faac6d5cb7d750c91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* protocol/auth: options update for GD2Milind Changire2018-02-012-5/+35
| | | | | | | addr and login options update for GD2 Change-Id: I3bb9a2ad368326036c2e7f6bd48b624bdd053051 Signed-off-by: Milind Changire <mchangir@redhat.com>
* afr: capture the correct errno in post-op quorum checkRavishankar N2018-01-311-8/+8
| | | | | | | | | If the post-op phase of txn did not meet quorm checks, use that errno to unwind the FOP rather than blindly setting ENOTCONN. Change-Id: I0cb0c8771ec75a45f9a25ad4cd8601103deddf0c BUG: 1506140 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* protocol: Implement put fopPoornima G2018-01-316-1/+342
| | | | | | Updates #353 Change-Id: I755b9208690be76935d763688fa414521eba3a40 Signed-off-by: Poornima G <pgurusid@redhat.com>
* glusterd: optimize glusterd import volumes code pathAtin Mukherjee2018-01-311-5/+7
| | | | | | | | | | In case there's a version mismatch detected for one of the volumes glusterd was ending up with updating all the volumes which is a overkill. Change-Id: I6df792db391ce3a1697cfa9260f7dbc3f59aa62d BUG: 1539510 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* quiesce, gfproxy: Implement failover across multiple gfproxy nodesPoornima G2018-01-309-82/+312
| | | | | | Updates: #242 Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69 Signed-off-by: Poornima G <pgurusid@redhat.com>
* core: add some examples of site.h usageJeff Darcy2018-01-303-3/+3
| | | | | Change-Id: I6ce574a593eda8f3a6b2fc8969b5edf7c250b61c Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* protocol: Remove lock recovery logic from client and serverAnoop C S2018-01-2911-924/+36
| | | | | | Change-Id: I27f5e1e34fe3eac96c7dd88e90753fb5d3d14550 BUG: 1272030 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* debug/io-stats: io-stats options for GD2Krutika Dhananjay2018-01-261-0/+54
| | | | | | | Updates #302 Change-Id: I8e7ff391cec88ea76f63ffe05b7404bdb31eaf8e Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* features/shard: shard options for GD2.Krutika Dhananjay2018-01-261-0/+3
| | | | | | | Updates #302 Change-Id: Ife21440ffcf5805ce5858360dc94a456ead891e5 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* glusterd: add profile_enabled flag in get-stateAtin Mukherjee2018-01-254-22/+27
| | | | | | Change-Id: I09f348ed7ae6cd481f8c4d8b4f65f2f2f6aad84e BUG: 1537364 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: process pmap sign in only when port is marked as freeAtin Mukherjee2018-01-251-0/+15
| | | | | | | | | | | | | | | | Because of some crazy race in volume start code path because of friend handshaking with volumes with quorum enabled we might end up into a situation where glusterd would start a brick and get a disconnect and then immediately try to start the same brick instance based on another friend update request. And then if for the very first brick even if the process doesn't come up at the end sign in event gets sent and we end up having two duplicate portmap entries for the same brick. Since in brick start we mark the previous port as free, its better to consider a sign in request as no op if the corresponding port type is marked as free. Change-Id: I995c348c7b6988956d24b06bf3f09ab64280fc32 BUG: 1537362 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/dht: Skip '..' for the volume root dirN Balachandran2018-01-241-0/+5
| | | | | | | | | | | | dht_populate_inode_for_dentry tries to update the layout for the '..' entry when listing the root of the volume. This entry does not correspond to an entry in the volume and therefore does not have a gfid or a layout on disk, causing layout processing to fail. Change-Id: I2b7470e1c5e20d87b5545160697f24d041045140 BUG: 1537457 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* dentry fop serializer: added new server side xlator for dentry fop serializationSakshi Bansal2018-01-248-4/+1528
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: #397 BUG: 1304962 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* features/error-gen: update volume options for GD2Atin Mukherjee2018-01-231-1/+11
| | | | | | | Updates #302 Change-Id: I8d04bb4f46b1e53a8d176a75da2382fdbf0cfc6f Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* performance/read-ahead: volume option fixes for GD2Csaba Henk2018-01-221-0/+7
| | | | | | | Updates #302 Change-Id: I842ab746502bc4ff3b312ee19f397a6177edc6f7 Signed-off-by: Csaba Henk <csaba@redhat.com>
* protocol/server: Fix "auth-path" in options tableKotresh HR2018-01-221-0/+5
| | | | | | | | | | | Options set will crash the brick with glusterd2. glusterd used to set "auth-path" during volfile generation. With glusterd2, it should come from options table. Updates: #302 Change-Id: Ie41a17779c185b87ace7e0bce0d0ba594b415a75 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* performance/write-behind: volume option fixes for GD2Csaba Henk2018-01-221-0/+18
| | | | | | | Updates #302 Change-Id: I2ade3dc4aef4da619c5b64058872594f0f1e004f Signed-off-by: Csaba Henk <csaba@redhat.com>
* libgfapi: Add new api for supporting mandatory-locksAnoop C S2018-01-221-1/+1
| | | | | | | | | | | | | | | | The current API for byte-range locks [glfs_posix_lock()] doesn't allow applications to specify whether it is advisory or mandatory type locks. This particular change is to introduce an extended byte-range lock API with an additional argument for including the byte-range lock mode to be one among advisory(default) or mandatory. Patch also includes a gfapi test case which make use of this new api to acquire mandatory locks. Ref: https://github.com/gluster/glusterfs-specs/blob/master/done/GlusterFS%203.8/Mandatory%20Locks.md Change-Id: Ia09042c755d891895d96da857321abc4ce03e20c Updates #393 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* cluster/ec : EC options for GD2Sunil Kumar Acharya2018-01-221-1/+34
| | | | | | | Updates #302 Change-Id: I31b4648f7b1a394fceece5cba8120c579c66edd9 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* md-cache: Implement dynamic configuration of xattr list for cachingPoornima G2018-01-222-276/+166
| | | | | | | | | | | | | | | Currently, the list of xattrs that md-cache can cache is hard coded in the md-cache.c file, this necessiates code change and rebuild everytime a new xattr needs to be added to md-cache xattr cache list. With this patch, the user will be able to configure a comma seperated list of xattrs to be cached by md-cache Updates #297 Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e Signed-off-by: Poornima G <pgurusid@redhat.com>
* protocol: make on-wire-change of protocol using new XDR definition.Amar Tumballi2018-01-1915-641/+17627
| | | | | | | | | | | | | | | | | | | | | | With this patchset, some major things are changed in XDR, mainly: * Naming: Instead of gfs3/gfs4 settle for gfx_ for xdr structures * add iattx as a separate structure, and add conversion methods * the *_rsp structure is now changed, and is also reduced in number (ie, no need for different strucutes if it is similar to other response). * use proper XDR methods for sending dict on wire. Also, with the change of xdr structure, there are changes needed outside of xlator protocol layer to handle these properly. Mainly because the abstraction was broken to support 0-copy RDMA with payload for write and read FOP. This made transport layer know about the xdr payload, hence with the change of xdr payload structure, transport layer needed to know about the change. Updates #384 Change-Id: I1448fbe9deab0a1b06cb8351f2f37488cefe461f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* afr: add quorum checks in post-opRavishankar N2018-01-191-0/+29
| | | | | | | | | | afr relies on pending changelog xattrs to identify source and sinks and the setting of these xattrs happen in post-op. So if post-op fails, we need to unwind the write txn with a failure. Change-Id: I0f019ac03890108324ee7672883d774918b20be1 BUG: 1506140 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* readdir-ahead: Cleanup the xattr request codePoornima G2018-01-191-40/+2
| | | | | | | Updates #297 Change-Id: Ia0c697583751290a455da3cd1894e0c5685d1bd8 Signed-off-by: Poornima G <pgurusid@redhat.com>
* upcall: Allow md-cache to specify invalidations on xattr with wildcardPoornima G2018-01-191-4/+21
| | | | | | | | | | | | | Currently, md-cache sends a list of xattrs, it is inttrested in recieving invalidations for. But, it cannot specify any wildcard in the xattr names Eg: user.* - invalidate on updating any xattr with user. prefix. This patch, enable upcall to honor wildcard in the xattr key names Updates: #297 Change-Id: I98caf0ed72f11ef10770bf2067d4428880e0a03a Signed-off-by: Poornima G <pgurusid@redhat.com>
* features/locks: volume option fixes for GD2Pranith Kumar K2018-01-191-0/+18
| | | | | | Updates #302 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Change-Id: I1667fc32b7a31562b289fcab878d94be202407e4
* posix: In getxattr, honor the wildcard '*'Poornima G2018-01-192-23/+43
| | | | | | | | | | | | | | | | | Currently, the posix_xattr_fill performas a sys_getxattr on all the keys requested, there are requirements where the keys could contain a wildcard, in which case sys_getxattr would return ENODATA, eg: if the xattr requested is user.* all the xattrs with prefix user. should be returned, with their values. This patch, changes posix_xattr_fill, to honor wildcard in the keys requested. Updates #297 Change-Id: I3d52da2957ac386fca3c156e26ff4cdf0b2c79a9 Signed-off-by: Poornima G <pgurusid@redhat.com>
* cluster/afr: Adding option to take full file lockkarthik-us2018-01-194-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In replica 3 volumes there is a possibilities of ending up in split brain scenario, when multiple clients writing data on the same file at non overlapping regions in parallel. Scenario: - Initially all the copies are good and all the clients gets the value of data readables as all good. - Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks. - C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks. - All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain. Fix: Have an option to take either full lock or range lock on files while doing data transactions, to prevent the possibility of ending up in split brains. With this change, by default the files will take full lock while doing IO. If you want to make use of the old range lock change the value of "cluster.full-lock" to "no". Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962 BUG: 1535438 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* selinux-xlator : validate dict before calling dict_rename_key()Jiffin Tony Thottan2018-01-181-4/+4
| | | | | | Change-Id: I71da3b64e5e8c82e8842e119b2b05da3e2ace550 BUG: 1535772 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* md-cache: Serve nameless lookup from cachePoornima G2018-01-181-10/+8
| | | | | | Updates #232 Change-Id: I97e92312a53a50c2d1660bf8d657201fc05a76eb Signed-off-by: Poornima G <pgurusid@redhat.com>
* fix features/leases: volume option fixes for GD2Atin Mukherjee2018-01-181-0/+4
| | | | | | | Updates #302 Change-Id: I31091765b5bfd118fc5370b3515128f0e84e360f Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* readdir-ahead: Add parallel-readdir option in readdir-aheadPoornima G2018-01-182-0/+16
| | | | | | | | | | parallel-readdir option is defined as belonging to readdir-ahead in glusterd-volume-set.c, but was not defined in options of readdir-ahead, fixing the same. Change-Id: I97cc88b38ab99ade5f066519ca1cb9bfed03a7da BUG: 1506197 Signed-off-by: Poornima G <pgurusid@redhat.com>
* performance/io-threads: Fix checked_return coverity errorVarsha Rao2018-01-181-5/+18
| | | | | | | | | Change the return type of set_stack_size function to integer and also check the return value of pthread_attr_init(). This fixes the Checked_Return coverity issue. Change-Id: I270b8bd168b09f0b071437c117e4e23b01398534 Signed-off-by: Varsha Rao <varao@redhat.com>
* features/bit-rot: Fix default value for scrub-throttleKotresh HR2018-01-181-0/+1
| | | | | | Updates #302 Change-Id: Ifc23d5f8b5bc78b95f7c9d92d6df37a9168102f7 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* rpc/*: auth-header changesAmar Tumballi2018-01-173-11/+20
| | | | | | | | | | | | | | | | | Introduce another authentication header which can now send more data. This is useful because this data can be common for all the fops, and we don't need to change all the signatures. As part of this, made rpc-clnt.c little more modular to support multiple authentication structures. stack.h changes are placeholder for the ctime etc, can be moved later based on need. updates #384 Change-Id: I6111c13cfd2ec92e2b4e9295896bf62a8a33b2c7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* features/bit-rot: fix volume options for GD2Atin Mukherjee2018-01-171-1/+11
| | | | | | | Updates #302 Change-Id: I8809f269b93253bce049fdbf28a7f44e85a2b9e7 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* fuse: write out reverse notification to fuse dumpCsaba Henk2018-01-171-30/+59
| | | | | | BUG: 1534602 Change-Id: Ide42cf9cffe462d0cc46272b327c2a05999f09ba Signed-off-by: Csaba Henk <csaba@redhat.com>
* core: fix some of the dict_{get,set} with proper APIsAmar Tumballi2018-01-1715-48/+33
| | | | | | | updates #220 Change-Id: I6e25dbb69b2c7021e00073e8f025d212db7de0be Signed-off-by: Amar Tumballi <amarts@redhat.com>
* locks: added inodelk/entrylk contention upcall notificationsXavier Hernandez2018-01-1615-260/+937
| | | | | | | | | | | | | | The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* posix: delete stale gfid handles in nameless lookupRavishankar N2018-01-161-1/+16
| | | | | | | | | ..in order for self-heal of symlinks to work properly (see BZ for details). Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501 BUG: 1529488 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* storage/posix: Set f_bfree to 0 if brick fullN Balachandran2018-01-151-1/+12
| | | | | | | | | Return 0 free blocks if the brick is full or has less than the reserved limit. Change-Id: I2c5feda0303d0f4abe5af22fac903011792b2dc8 BUG: 1533736 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* performance/readdir-ahead: fix cache usage update issueZhang Huan2018-01-152-6/+9
| | | | | | | | | Use atomic operation to modify cache-size to protect it from concurrent modification. Change-Id: Ie73cdd4abbaf0232b1db4ac856c01d24603890ad BUG: 1533804 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
* cluster/afr: Fixing the flaws in arbiter becoming source patchkarthik-us2018-01-137-180/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Setting the write_subvol value to read_subvol in case of metadata transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03) might lead to the original problem of arbiter becoming source. Scenario: 1) All bricks are up and good 2) 2 writes w1 and w2 are in progress in parallel 3) ctx->read_subvol is good for all the subvolumes 4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on the disk 5) read/lookup comes on the same file and refreshes read_subvols back to all good 6) metadata transaction happens which makes ctx->write_subvol to be assigned with ctx->read_subvol which is all good 7) w2 succeeds on brick1 and fails on brick0 and this will update the brick in reverse order leading to arbiter becoming source Fix: Instead of setting the ctx->write_subvol to ctx->read_subvol in the pre-op statge, if there is a metadata transaction, check in the function __afr_set_in_flight_sb_status() if it is a data/metadata transaction. Use the value of ctx->write_subvol if it is a data transactions and ctx->read_subvol value for other transactions. With this patch we assign the value of ctx->write_subvol in the afr_transaction_perform_fop() with the on disk value, instead of assigning it in the afr_changelog_pre_op() with the in memory value. Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4 BUG: 1482064 Signed-off-by: karthik-us <ksubrahm@redhat.com>