summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* afr: capture the correct errno in post-op quorum checkrelease-3.13Ravishankar N2018-02-061-8/+8
| | | | | | | | | | If the post-op phase of txn did not meet quorm checks, use that errno to unwind the FOP rather than blindly setting ENOTCONN. Change-Id: I0cb0c8771ec75a45f9a25ad4cd8601103deddf0c BUG: 1536346 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 440a048f24b006c80af3d7bcd0a1f13fe3459d87)
* afr: don't treat all cases all bricks being blamed as split-brainRavishankar N2018-02-065-9/+165
| | | | | | | | | | | | | | | | | | | | | | | | Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks while I/O or heal are in progress. Fix: Instead of undoing the post-op, pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. Change-Id: I3d0224b883eb0945785ade0e9697a1c828aec0ae BUG: 1541458 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 0e6e8216823c2d9dafb81aae0f6ee3497c23d140)
* cluster/afr: remove unnecessary child_up initializationXavier Hernandez2018-02-051-7/+0
| | | | | | | | | | | | | | | The child_up array was initialized with all elements being -1 to allow afr_notify() to differentiate down bricks from bricks that haven't reported yet. With current implementation this is not needed anymore and it was causing unexpected results when other parts of the code considered that if child_up[i] != 0, it meant that it was up. Backport of: > BUG: 1541038 Change-Id: I2a9d712ee64c512f24bd5cd3a48dcb37e3139472 BUG: 1541929 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Do lock conflict check correctly for wait-listPranith Kumar K2018-02-021-8/+15
| | | | | | | | | | | | | | Problem: ec_link_has_lock_conflict() is traversing over only owner_list but the function is also getting called with wait_list. Fix: Modify ec_link_has_lock_conflict() to traverse lists correctly. Updated the callers to reflect the changes. BUG: 1540896 Change-Id: Ibd7ea10f4498e7c2761f9a6faac6d5cb7d750c91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: add quorum checks in post-opRavishankar N2018-02-012-1/+30
| | | | | | | | | | | afr relies on pending changelog xattrs to identify source and sinks and the setting of these xattrs happen in post-op. So if post-op fails, we need to unwind the write txn with a failure. Change-Id: I0f019ac03890108324ee7672883d774918b20be1 BUG: 1536346 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit a40a87ec3b226ae86a6ed8f4af25b45965a20cad)
* build: glibc has removed rpc headers and rpcgen in Fedora28, use libtirpcKaleb S. KEITHLEY2018-01-255-52/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other Linux distributions are doing the same; some have already done so. Switch to libtirpc(-devel) and unbundled rpcgen packages. For now rpcgen is still provided by the glibc-rpcgen RPM, but rpcsvc-proto's rpcgen subpackage is available now; it will not be used until glibc-rpcgen is retired. (note, rpcsvc-proto's rpcgen is just named rpcgen-...rpm. I.e. not rpcsvc-proto-rpcgen-...rpm.) Either one will satisfy the BuildRequires: rpcgen. Also, when a .spec file has BuildRequires: foo-devel it is not necessary to also have: BuildRequires: foo or even: BuildRequires: foo foo-devel The foo-devel package has a dependency on foo, which will install foo automatically. It's usually also not necessary to have a corresponding Requires: foo as the rpmbuild process will also automatically determine the install-time dependencies. See also Change-Id: I4a8292de2eddad16137df5998334133fc1e11261 and/or https://review.gluster.org/19311 and Change-Id: I97dc39c7844f44c36fe210aa813480c219e1e415 and/or https://review.gluster.org/#/c/19330/ Change-Id: I86f847dfda0fef83e22c6e8b761342d652a2d9ba BUG: 1536187 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* doc: Added release notes for 3.13.2v3.13.2ShyamsundarR2018-01-192-1/+34
| | | | | | Change-Id: I80f411f3820f82cb27fd5f8cf1cf99d5565d8b9d BUG: 1530334 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* selinux-xlator : validate dict before calling dict_rename_key()Jiffin Tony Thottan2018-01-191-4/+4
| | | | | | | | | | | Upstream reference : >Change-Id: I71da3b64e5e8c82e8842e119b2b05da3e2ace550 >BUG: 1535772 >Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> >(cherry picked from commit bee06ccd7b80e3f5804f0c7c7c56936fed6d2b4e) Change-Id: I71da3b64e5e8c82e8842e119b2b05da3e2ace550 BUG: 1536294
* cluster/afr: Adding option to take full file lockkarthik-us2018-01-195-4/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In replica 3 volumes there is a possibilities of ending up in split brain scenario, when multiple clients writing data on the same file at non overlapping regions in parallel. Scenario: - Initially all the copies are good and all the clients gets the value of data readables as all good. - Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks. - C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks. - All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain. Fix: Have an option to take either full lock or range lock on files while doing data transactions, to prevent the possibility of ending up in split brains. With this change, by default the files will take full lock while doing IO. If you want to make use of the old range lock change the value of "cluster.full-lock" to "no". Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962 BUG: 1535438 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* tests: Use /dev/urandom instead of /dev/random for ddPranith Kumar K2018-01-191-1/+1
| | | | | | | | | | | | | | | If there's not enough entropy in the system then reading /dev/random would take a significant time since it would take a long time for the /dev/random buffers to get full as is desired in this dd run. Milind found that this test file takes almost a 1000 seconds or more to pass instead of just a minute because of this. Backport of: >BUG: 1431955 BUG: 1533023 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Change-Id: I9145b17f77f09d0ab71816ae249c69b8fe14c1a5
* cluster/afr: Fixing the flaws in arbiter becoming source patchkarthik-us2018-01-187-180/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Setting the write_subvol value to read_subvol in case of metadata transaction during pre-op (commit 19f9bcff4aada589d4321356c2670ed283f02c03) might lead to the original problem of arbiter becoming source. Scenario: 1) All bricks are up and good 2) 2 writes w1 and w2 are in progress in parallel 3) ctx->read_subvol is good for all the subvolumes 4) w1 succeeds on brick0 and fails on brick1, yet to do post-op on the disk 5) read/lookup comes on the same file and refreshes read_subvols back to all good 6) metadata transaction happens which makes ctx->write_subvol to be assigned with ctx->read_subvol which is all good 7) w2 succeeds on brick1 and fails on brick0 and this will update the brick in reverse order leading to arbiter becoming source Fix: Instead of setting the ctx->write_subvol to ctx->read_subvol in the pre-op statge, if there is a metadata transaction, check in the function __afr_set_in_flight_sb_status() if it is a data/metadata transaction. Use the value of ctx->write_subvol if it is a data transactions and ctx->read_subvol value for other transactions. With this patch we assign the value of ctx->write_subvol in the afr_transaction_perform_fop() with the on disk value, instead of assigning it in the afr_changelog_pre_op() with the in memory value. Change-Id: Id2025a7e965f0578af35b1abaac793b019c43cc4 BUG: 1516313 Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit ba149bac92d169ae2256dbc75202dc9e5d06538e)
* cluster/ec: OpenFD heal implementation for ECSunil Kumar Acharya2018-01-1810-43/+307
| | | | | | | | | | | | | | | | | | | Existing EC code doesn't try to heal the OpenFD to avoid unnecessary healing of the data later. Fix implements the healing of open FDs before carrying out file operations on them by making an attempt to open the FDs on required up nodes. >BUG: 1431955 >Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15 >Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Upstream Patch: https://review.gluster.org/#/c/17077/ BUG: 1533023 Change-Id: Ib696f59c41ffd8d5678a484b23a00bb02764ed15 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* posix: delete stale gfid handles in nameless lookupRavishankar N2018-01-162-1/+81
| | | | | | | | | | | ..in order for self-heal of symlinks to work properly (see BZ for details). Backport of https://review.gluster.org/#/c/19070/ Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I9a011d00b07a690446f7fd3589e96f840e8b7501 BUG: 1534842
* cluster/dht: Add migration checks to dht_(f)xattropN Balachandran2018-01-101-2/+23
| | | | | | | | | | | | | | | | | | | The earlier backport was incorrect. Added the missing lines of code. The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. > Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c > BUG: 1471031 > Signed-off-by: N Balachandran <nbalacha@redhat.com> BUG: 1515434 Change-Id: I183d52530e0220e3007e73672991cb79b44c022a Signed-off-by: N Balachandran <nbalacha@redhat.com>
* mount/fuse: use fstat in getattr implementation if any opened fd is availableRaghavendra G2018-01-092-12/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The restriction of using fds opened by the same Pid means fds cannot be shared across threads of multithreaded application. Note that fops from kernel have different Pid for different threads. Imagine following sequence of operations: * Turn off performance.open-behind * Thread t1 opens an fd - fd1 - on file "file". Let's assume nodeid of "file" is "nodeid-file". * Thread t2 does RENAME ("newfile", "file"). Let's assume nodeid of "newfile" as "nodeid-newfile". * t2 proceeds to do fstat (fd1) The above set of operations can sometimes result in ESTALE/ENOENT errors. RENAME overwrites "file" with "newfile" changing its nodeid from "nodeid-file" to "nodeid-newfile" and post RENAME, "nodeid-file" is removed from the backend. If fstat carries nodeid-file as argument, which can happen if lookup has not refreshed the nodeid of "file" and since t2 doesn't have an fd opened, fuse_getattr_resume uses STAT which will fail as "nodeid-file" no longer exists. Since the above set of operations and sharing of fds across multiple threads are valid, this is a bug. The fix is to use any fd opened on the inode. In this specific example fuse_getattr_resume will find fd1 and winds down the call as fstat (fd1) which won't fail. Cross-checked with "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for any security issues with this solution and he approves the solution. Thanks to "Miklos Szeredi" <mszeredi.at.redhat.dot.com> for all the pointers and discussions. >Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c >BUG: 1510401 >Signed-off-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 8b57378e5596f287a7b9d106dd6fb56a624b42ee) Change-Id: I88dd29b3607cd2594eee9d72a1637b5346c8d49c BUG: 1529084 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Nullify pmap entry for bricks belonging to same portAtin Mukherjee2018-01-091-1/+1
| | | | | | | | | | | | | Commit 30e0b86 tried to address all the stale port issues glusterd had in case of a brick is abruptly killed. For brick multiplexing case because of a bug the portmap entry was not getting removed. This patch addresses the same. >mainline patch : https://review.gluster.org/#/c/19119/ Change-Id: Ib020b967a9b92f1abae9cab9492f0cacec59aaa1 BUG: 1530449 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: connect to an existing brick process when qourum status is ↵Atin Mukherjee2018-01-098-14/+40
| | | | | | | | | | | | | | | | NOT_APPLICABLE_QUORUM First of all, this patch reverts commit 635c1c3 as the same is causing a regression with bricks not coming up on time when a node is rebooted. This patch tries to fix the problem in a different way by just trying to connect to an existing running brick when quorum status is not applicable. >mainline patch : https://review.gluster.org/#/c/19134/ Change-Id: I0efb5901832824b1c15dcac529bffac85173e097 BUG: 1511293 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* cli: Fixed a use_after_freeN Balachandran2018-01-091-1/+2
| | | | | | | | | | | | | | gf_event in cli_cmd_volume_create_cbk was accessing memory that had already been freed. > Change-Id: I447c939fa9b31e18819a62c3b356c14cca390787 > BUG: 1530910 > Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit fa903173540df5b82c295a8f7b24848098e49a41) Change-Id: I447c939fa9b31e18819a62c3b356c14cca390787 BUG: 1531371 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Add migration checks to dht_(f)xattropN Balachandran2018-01-039-45/+341
| | | | | | | | | | | | | | | | The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. > Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c > BUG: 1471031 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c BUG: 1515434 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Serialize mds update code path with lookup unwind in selfhealMohit Agrawal2018-01-024-308/+216
| | | | | | | | | | | | | | | | | | | Problem: Sometime test case ./tests/bugs/bug-1371806_1.t is failing on centos due to race condition between fresh lookup and setxattr fop. Solution: In selfheal code path we do save mds on inode_ctx, it was not serialize with lookup unwind. Due to this behavior after lookup unwind if mds is not saved on inode_ctx and if any subsequent setxattr fop call it has failed with ENOENT because no mds has found on inode ctx.To resolve it save mds on inode ctx has been serialize with lookup unwind. > BUG: 1498966 > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I8d4bb40a6cbf0cec35d181ec0095cc7142b02e29 BUG: 1529055 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>
* performance/write-behind: fix bug while handling short writesRaghavendra G2017-12-261-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The variabled "fulfilled" in wb_fulfill_short_write is not reset to 0 while handling every member of the list. This has some interesting consequences: * If we break from the loop while processing last member of the list head->winds, req is reset to head as the list is a circular one. However, head is already fulfilled and can potentially be freed. So, we end up adding a freed request to wb_inode->todo list. This is the RCA for the crash tracked by the bug associated with this patch (Note that we saw "holder" which is freed in todo list). * If we break from the loop while processing any of the last but one member of the list head->winds, req is set to next member in the list, skipping the current request, even though it is not entirely synced. This can lead to data corruption. The fix is very simple and we've to change the code to make sure "fulfilled" reflects whether the current request is fulfilled or not and it doesn't carry history of previous requests in the list. >Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8 >BUG: 1528558 >Signed-off-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 0bc22bef7f3c24663aadfb3548b348aa121e3047) Change-Id: Ia3d6988175a51c9e08efdb521a7b7938b01f93c8 BUG: 1529094 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* doc: Release notes for 3.13.1v3.13.1ShyamsundarR2017-12-201-0/+29
| | | | | | Change-Id: I8398627b87dd08f1bfaa8f11dfb8f20889c9ec6d BUG: 1523780 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* write-behind: Allow trickling-writes to be configurableCsaba Henk2017-12-202-1/+15
| | | | | | | | | | | | | | This is the undisputed/trivial part of Shreyas' patch he attached to https://bugzilla.redhat.com/1364740 (of which the current bug is a clone). We need more evaluation for the page_size and window_size bits before taking them on. Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9 BUG: 1428060 Co-authored-by: Shreyas Siravara <sshreyas@fb.com> Signed-off-by: Csaba Henk <csaba@redhat.com>
* rpc-transport/rdma: Add a mutex for the list of RDMA Memory Region(MR) accessYi Wang2017-12-202-38/+72
| | | | | | | | | | | | | | | | | | | | Problem: gf_rdma_device_t->all_mr is a __gf_rdma_arena_mr(includes MR content) kind of list in the rdma rpc-transport. The rdma rpc-transport will add/delete items to the list when MRs register, deregister, and free. Because gf_rdma_device_t->all_mr is used by different threads and it is not mutex protected, rdma transport maybe access obsolete items in it. Solution: Add a mutex protection for the gf_rdma_device_t->all_mr. > Change-Id: I2b7de0f7aa516b90bb6f3c6aae3aadd23b243900 > BUG: 1522651 > Signed-off-by: Yi Wang <wangyi@storswift.com> (cherry picked from commit 8483ed87165c1695b513e223549d33d2d63891d9) Signed-off-by: Yi Wang <wangyi@storswift.com> Change-Id: I2b7de0f7aa516b90bb6f3c6aae3aadd23b243900 BUG: 1527699
* feature/bitrot: remove internal xattrs from lookup cbkRavishankar N2017-12-192-7/+21
| | | | | | | | | | | | | | | | | | | | | | Problem: afr requests all xattrs in lookup via the list-xattr key. If bitrot is enabled and later disabled, or if the bitrot xattrs were present due to an older version of bitrot which used to create the xattrs without enabling the feature, the xattrs (trusted.bit-rot.version in particular) was not getting filtered and ended up reaching the client stack. AFR, on noticing different values of the xattr across bricks of the replica, started triggering spurious metadata heals. Fix: Filter all internal xattrs in bitrot xlator before unwinding lookup, (f)getxattr. Thanks to Kotresh for the help in RCA'ing. Change-Id: I5bc70e4b901359c3daefc67b8e4fa6ddb47f046c BUG: 1527275 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit d341f20230b9921391aff22337eaf9be82f44d88)
* core/memacct: save allocs in mem_acct_rec listN Balachandran2017-12-114-1/+61
| | | | | | | | | | | | | | | | | | | | With configure --enable-debug, add all object allocations to a list in the corresponding mem_acct_rec. This allows us to see all objects of a particular type and allows for additional debugging in case of memory leaks. This is not compiled in by default and must be explicitly enabled. It is intended to be used by developers. > Change-Id: I7cf2dbeadecf994423d7e7591e85f18d2575cce8 > BUG: 1522662 > Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 47d01546a1826dc14a8331ea8700015f1cfdc4db) Change-Id: I7cf2dbeadecf994423d7e7591e85f18d2575cce8 BUG: 1523456 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: don't overfill the buffer in readdir(p)Raghavendra G2017-12-111-3/+18
| | | | | | | | | | | | | | | | | | | | | | | | | Superflous dentries that cannot be fit in the buffer size provided by kernel are thrown away by fuse-bridge. This means, * the next readdir(p) seen by readdir-ahead would have an offset of a dentry returned in a previous readdir(p) response. When readdir-ahead detects non-monotonic offset it turns itself off which can result in poor readdir performance. * readdirp can be cpu-intensive on brick and there is no point to read all those dentries just to be thrown away by fuse-bridge. So, the best strategy would be to fill the buffer optimally - neither overfill nor underfill. >Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84 >BUG: 1492625 >Signed-off-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit e785faead91f74dce7c832848f2e8f3f43bd0be5) Change-Id: Idb3d85dd4c08fdc4526b2df801d49e69e439ba84 BUG: 1522710 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: make rebalance use truncate incaseSusant Palai2017-12-113-71/+99
| | | | | | | | | | | | | .. the brick file system does not support fallocate. > Change-Id: Id76cda2d8bb3b223b779e5e7a34f17c8bfa6283c > BUG: 1488103 > Signed-off-by: Susant Palai <spalai@redhat.com> Change-Id: Id76cda2d8bb3b223b779e5e7a34f17c8bfa6283c BUG: 1520232 Signed-off-by: Susant Palai <spalai@redhat.com>
* glusterd: Free up svc->conn on volume deleteAtin Mukherjee2017-12-071-0/+5
| | | | | | | | | | | | | Daemons like snapd, tierd and gfproxyd are maintained on per volume basis and on a volume delete we should destroy the rpc connection established for them. >mainline patch : https://review.gluster.org/#/c/18957/ Change-Id: Id1440e39da07b990fdb9b207df18da04b1ca8014 BUG: 1523046 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 36ce4c614a3391043a3417aa061d0aa16e60b2d3)
* doc: Added final release notes for 3.13.0v3.13.0ShyamsundarR2017-12-011-84/+242
| | | | | | Change-Id: I1f2d42f6c24ba55414888bcaf938a905e34ecb4d BUG: 1510012 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* posix: Change GD_OP_VERSION to 3_13_0 from 3_12_0 for storage.reserveMohit Agrawal2017-12-011-1/+1
| | | | | | | | | | | Problem: Change GD_OP_VERSION to 3_13_0 from 3_12_0 for option storage.reserve Solution: Actually feature was merged in 3.13.0 branch so GD_OP_VERSION needs to change from 3_12_0 to 3_13_0 BUG: 1518512 Change-Id: I6f753978fd607919efcc60f73c37feaadc4f32ef Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* release-notes: add details for max-port rangeAtin Mukherjee2017-11-301-0/+6
| | | | | | Change-Id: I9801706f67332d6079e8ec58f9e08b332b35804a BUG: 1510012 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* release-notes: Add release notes for DISCARD FOP on ECSunil Kumar Acharya2017-11-301-0/+7
| | | | | | BUG: 1518744 Change-Id: Id7508eee3c0891d6e03cc2805facdc682a31f53d Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* Disable gfid2path by default on NetBSDEmmanuel Dreyfus2017-11-301-0/+11
| | | | | | | | | | | | | | | NetBSD storage of extended attributes for UFS1 badly scales when the list of extended attributes names rises. gfid2path can add as many extended attributes names as we have files, hence we keep it disabled for performance sake. > Change-Id: Id77b5f5ceb4d5eba1b3362b4b9fc693450ffbc2b > Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> > BUG: 1129939 Change-Id: I17c12251d80dbb41b7d4864d5739d1ad3d6877a0 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> BUG: 1513259
* cluster/ec: EC DISCARD doesn't punch hole properlySunil Kumar Acharya2017-11-292-3/+13
| | | | | | | | | | | | | | | | | | Problem: DISCARD operation on EC volume was punching hole of lesser size than the specified size in some cases. Solution: EC was not handling punch hole for tail part in some cases. Updated the code to handle it appropriately. >BUG: 1516206 >Change-Id: If3e69e417c3e5034afee04e78f5f78855e65f932 >Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> BUG: 1518257 Change-Id: If3e69e417c3e5034afee04e78f5f78855e65f932 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* features/locks: Fix memory leaksXavier Hernandez2017-11-285-5/+11
| | | | | | | | | Backport of: > BUG: 1515161 Change-Id: Ic1d2e17a7d14389b6734d1b88bd28c0a2907bbd6 BUG: 1517692 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/afr: Fix for arbiter becoming sourcekarthik-us2017-11-274-6/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When eager-lock is on, and two writes happen in parallel on a FD we were observing the following behaviour: - First write fails on one data brick - Since the post-op is not yet happened, the inode refresh will get both the data bricks as readable and set it in the inode context - In flight split brain check see both the data bricks as readable and allows the second write - Second write fails on the other data brick - Now the post-op happens and marks both the data bricks as bad and arbiter will become source for healing Fix: Adding one more variable called write_suvol in inode context and it will have the in memory representation of the writable subvols. Inode refresh will not update this value and its lifetime is pre-op through unlock in the afr transaction. Initially the pre-op will set this value same as read_subvol in inode context and then in the in flight split brain check we will use this value instead of read_subvol. After all the checks we will update the value of this and set the read_subvol same as this to avoid having incorrect value in that. Change-Id: I2ef6904524ab91af861d59690974bbc529ab1af3 BUG: 1516313 Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit 19f9bcff4aada589d4321356c2670ed283f02c03)
* cluster/afr: Print heal info summary output in stream fashionkarthik-us2017-11-271-0/+1
| | | | | | | | | | | | | | | Problem: The heal info summary was printing the output at the end after crawling for pending heal entries completes on all the bricks. Fix: Printing the output immediately after the crawl on individual brick completes, so that it won't give the impression of CLI being hung. Change-Id: Ieaf5718736a7ee6837bac02bd30a95836e605dab BUG: 1514419 Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit 77e3bc671aab2fda68ada53f38ec368b20675f59)
* doc: Added initial version of the release notes for 3.13.0v3.13.0rc0ShyamsundarR2017-11-221-0/+251
| | | | | | | | | | Change-Id: I1b6600cd519578faffe2f67ac0ee3e2555c3cb31 BUG: 1510012 Signed-off-by: ShyamsundarR <srangana@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: karthik-us <ksubrahm@redhat.com> [ndevos: added "memory pool in statedump" and "glfs_mem_header" notes] [ksubrahm: added notes for "Addition of summary option to heal info"]
* features/worm: new config option to manage deletion of Worm files.Vishal Pandey2017-11-225-1/+28
| | | | | | | | | | | | | | | | | | | | Add a new configuration option worm-files-deletable to file-level Worm in order to control behaviour of Worm files upon deletion. Steps to Test: 1. Add all the configuration options to a volume to activate file-level-worm 2. Option features.worm-files-deletable is set to 1 by default. 3. Create a new file and wait for the retention time to expire. 4. After retention time expires, do an truncate, rename, unlink, link or write to send the file in Worm state. 5. After that do `rm -f filename`. 6. The file is successfully removed. 7. Repeat from step 2 by setting features.worm-files-deletable 0. This time deletion should not be successful. Change-Id: Ibc89861ee296e065330b93a9f9606be5da40af31 BUG: 1508898 Signed-off-by: Vishal Pandey <vishpandey2014@gmail.com>
* afr: add checks for allowing lookupsRavishankar N2017-11-216-116/+164
| | | | | | | | | | | | | | | | | | | | | | | Problem: In an arbiter volume, lookup was being served from one of the sink bricks (source brick was down). shard uses the iatt values from lookup cbk to calculate the size and block count, which in this case were incorrect values. shard_local_t->last_block was thus initialised to -1, resulting in an infinite while loop in shard_common_resolve_shards(). Fix: Use client quorum logic to allow or fail the lookups from afr if there are no readable subvolumes. So in replica-3 or arbiter vols, if there is no good copy or if quorum is not met, fail lookup with ENOTCONN. With this fix, we are also removing support for quorum-reads xlator option. So if quorum is not met, neither read nor write txns are allowed and we fail the fop with ENOTCONN. Change-Id: Ic65c00c24f77ece007328b421494eee62a505fa0 BUG: 1515572 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit bd44d59741bb8c0f5d7a62c5b1094179dd0ce8a4)
* cluster/dht: Don't set ACLs on linkto fileN Balachandran2017-11-211-0/+11
| | | | | | | | | | | | | | | | The trusted.SGI_ACL_FILE appears to set posix ACLs on the linkto file that is a target of file migration. This can mess up file permissions and cause linkto identification to fail. Now we remove all ACL xattrs from the results of the listxattr call on the source before setting them on the target. > BUG: 1514329 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I56802dbaed783a16e3fb90f59f4ce849f8a4a9b4 BUG: 1515045 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr: Print heal info split-brain output in stream fashionkarthik-us2017-11-171-0/+3
| | | | | | | | | | | | | | | | | Problem: When we trigger the heal info split-brain command the o/p is not streamed as it is received, but dumped at the end for all the bricks together. This gives a perception that the command is hung. Fix: When we get a split brain entry while crawling throught the pending heal entries, flush that immediately so that it prints the output in a stream fashion and doesn't look like the cli is hung. Change-Id: I7547e86b83202d66616749b8b31d4d0dff0abf07 BUG: 1514419 Signed-off-by: karthik-us <ksubrahm@redhat.com> (cherry picked from commit 05f9c13f4d69e4113f5a851f4097ef35ba3f33b2)
* cluster/ec: Fix op-version for disperse.other-eager-lockXavier Hernandez2017-11-162-2/+2
| | | | | | | | | | | | | The op-version used for the new option was wrong. It has been set to 3.13.0. >Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf >BUG: 1502610 >Signed-off-by: Xavier Hernandez <jahernan@redhat.com> Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf BUG: 1512460 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: create eager-lock option for non-regular filesXavier Hernandez2017-11-168-13/+57
| | | | | | | | | | | | | A new option is added to allow independent configuration of eager locking for regular files and non-regular files. >Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 >BUG: 1502610 >Signed-off-by: Xavier Hernandez <jahernan@redhat.com> Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 BUG: 1512460 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cli: gluster help changesN Balachandran2017-11-167-76/+387
| | | | | | | | | | | | | gluster cli help now shows only the top level help commands. gluster <component> help will now show help commands for <component>. > BUG: 1474768 > Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 89dc54f50c9f800ca4446ea8fe736e4860588845) Change-Id: I263f53a0870d80ef4cfaad455fdaa47e2ac4423b BUG: 1509789 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* tests: fix bug-1483058-replace-brick-quorum-validation.t spurious failureAtin Mukherjee2017-11-143-1/+19
| | | | | | | | | >mainline patch : https://review.gluster.org/#/c/18710/ Change-Id: I04c35305bfb663eabbf715eee78695adfd4a2d20 BUG: 1512435 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 76a83f98b78a0bdf29bbb0f8e4c9ab74dae52be4)
* glusterd: display gluster volume status, when quorum type is serverSanju Rakonde2017-11-141-0/+6
| | | | | | | | | | | | | | Problem: when server-quorum-type is server, after restarting glusterd in the node which is up, gluster volume status is giving incorrect information. Fix: check whether server is blank, before adding other keys into the dictionary. Change-Id: I926ebdffab330ccef844f23f6d6556e137914047 BUG: 1511768 Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 046c7e3199fca715592762e271e6061ac99b0c4b)
* glusterd: restart the brick if qorum status is NOT_APPLICABLE_QUORUMAtin Mukherjee2017-11-141-1/+2
| | | | | | | | | | | | | | If a volume is not having server quorum enabled and in a trusted storage pool all the glusterd instances from other peers are down, on restarting glusterd the brick start trigger doesn't happen resulting into the brick not coming up. > mainline patch : https://review.gluster.org/#/c/18669/ Change-Id: If1458e03b50a113f1653db553bb2350d11577539 BUG: 1511293 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 635c1c3691a102aa658cf1219fa41ca30dd134ba)
* cli: correct rebalance status elapsed checkN Balachandran2017-11-091-1/+5
| | | | | | | | | | | | Check that elapsed time has crossed 10 mins for at least one rebalance process before displaying the estimates. > BUG: 1479528 > Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 56aef68530b3bab27730aa62e4fbc513d3dba65f) Change-Id: Ib357a6f0d0125a178e94ede1e31514fdc6ce3593 BUG: 1511274 Signed-off-by: N Balachandran <nbalacha@redhat.com>