summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: Implement FALLOCATE FOP for ECSunil Kumar Acharya2017-05-233-3/+209
| | | | | | | | | | | | | | | FALLOCATE file operations is not implemented in the existing EC code. This change set implements it for EC. BUG: 1448293 Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/15200 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Fix crash in dht_selfheal_dir_setattrN Balachandran2017-05-191-2/+6
| | | | | | | | | | | | | | | Use a local variable to store the call cnt used in the for loop for the STACK_WIND so as not to access local which may be freed by STACK_UNWIND after all fops return. Change-Id: I24f49b6dbd29a2b706e388e2f6d5196c0f80afc5 BUG: 1452102 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17343 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/afr: Remove debug logs in fix_quorum_options()Vijay Bellur2017-05-191-5/+0
| | | | | | | | | | | Change-Id: Id019b0c6425849eece8a9aba7acec9a521dfb10b BUG: 1452378 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://review.gluster.org/17335 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: initialize throttle option "normal" to same in init and reconfigureSusant Palai2017-05-181-77/+52
| | | | | | | | | | | | | | | | Normal value were different in dht_init and dht_reconfigure. Initialization/reconfigure of throttle option are carved out to a separate function (dht_configure_throttle) now. Normal value will be "2". Change-Id: Ie323eae019af41d6bef0a136e3d284dc82bab9a1 BUG: 1451162 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17303 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/afr: Return the list of node_uuids for the subvolumekarthik-us2017-05-173-50/+145
| | | | | | | | | | | | | | | | | | | | | | | | Problem: AFR was returning the node uuid of the first node for every file if the replica set was healthy, which was resulting in only one node migrating all the files. Fix: With this patch AFR returns the list of node_uuids to the upper layer, so that they can decide on which node to migrate which files, resulting in improved performance. Ordering of node uuids will be maintained based on the ordering of the bricks. If a brick is down, then the node uuid for that will be set to all zeros. Change-Id: I73ee0f9898ae473584fdf487a2980d7a6db22f31 BUG: 1366817 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/17084 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: return all node uuids from all subvolumesXavier Hernandez2017-05-172-105/+141
| | | | | | | | | | | | | | | | | | | | | | | | EC was retuning the UUID of the brick with smaller value. This had the side effect of not evenly balancing the load between bricks on rebalance operations. This patch modifies the common functions that combine multiple subvolume values into a single result to take into account the subvolume order and, optionally, other subvolumes that could be damaged. This makes easier to add future features where brick order is important. It also makes possible to easily identify the originating brick of each answer, in case some brick will have an special meaning in the future. Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f BUG: 1366817 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/17297 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Rebalance on all nodes should migrate filesN Balachandran2017-05-166-16/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Rebalance compares the node-uuid of a file against its own to and migrates a file only if they match. However, the current behaviour in both AFR and EC is to return the node-uuid of the first brick in a replica set for all files. This means a single node ends up migrating all the files if the first brick of every replica set is on the same node. Fix: AFR and EC will return all node-uuids for the replica set. The rebalance process will divide the files to be migrated among all the nodes by hashing the gfid of the file and using that value to select a node to perform the migration. This patch makes the required DHT and tiering changes. Some tests in rebal-all-nodes-migrate.t will need to be uncommented once the AFR and EC changes are merged. Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c BUG: 1366817 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17239 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Fix crash in dht rmdirN Balachandran2017-05-161-4/+10
| | | | | | | | | | | | | | | | | | | | Using local->call_cnt to check STACK_WINDs can cause dht_rmdir_do to be called erroneously if dht_rmdir_readdirp_cbk unwinds before we check if local->call_cnt is zero in dht_rmdir_opendir_cbk. This can cause frame corruptions and crashes. Thanks to Shyam (srangana@redhat.com) for the analysis. Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11 BUG: 1451083 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17305 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* afr: propagate correct errno for fop failures in arbiterRavishankar N2017-05-154-15/+12
| | | | | | | | | | | | | | | | | | | | | | | Problem: If quorum is not met in fop cbk, arbiter sends an ENOTCONN error to the upper xlators. In a VM workload with sharding enabled, this was leading to the VM pausing when replace-brick was performed as described in the BZ. Fix: Move the fop cbk arbitration logic to afr_handle_quorum() because in normal replica volumes, that is the function that has the quorum and errno checks in the fop cbk path before doing a post-op. Thanks to Pranith for suggesting this approach. Change-Id: Ie6315db30c5e36326b71b90a01da824109e86796 BUG: 1449610 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17235 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/tier: Don't update cached subvolN Balachandran2017-05-121-15/+3
| | | | | | | | | | | | | | | | | | | tier_readdirp_cbk updates the cached subvol to the hot tier if it finds a linkto file. However, if no lookup has been sent to the hot tier, lower layers will not have updated the inode-ctx causing later fops to fail. Change-Id: Ib8a5e58a6e7fd7750cf6a0ea85da611aa24c7512 BUG: 1402406 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/16163 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@gmail.com> Tested-by: Dan Lambright <dlambrig@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* afr: send the correct iatt values in fsync cbkRavishankar N2017-05-111-25/+43
| | | | | | | | | | | | | | | | | | | | | | | | Problem: afr unwinds the fsync fop with an iatt buffer from one of its children on whom fsync was successful. But that child might not be a valid read subvolume for that inode because of pending heals or because it happens to be the arbiter brick etc. Thus we end up sending the wrong iatt to mdcache which will in turn serve it to the application on a subsequent stat call as reported in the BZ. Fix: Pick a child on whom the fsync was successful *and* that is readable as indicated in the inode context. Change-Id: Ie8647289219cebe02dde4727e19a729b3353ebcf BUG: 1449329 RCA'ed-by: Miklós Fokin <miklos.fokin@appeartv.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17227 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* afr: fixes to quorum-type in afr_priv_dump()Ravishankar N2017-05-101-0/+2
| | | | | | | | | | | | | | | | | | Include the 'none' option as well in the output. This fixes the bug in commit 335555d256d444f4952ce239168f72b393370f01. Also added a test-case. This is a Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I479a14ae69ecae5a03e85e73ed50c19b483df603 BUG: 1448804 Reviewed-on: https://review.gluster.org/17215 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: fix incorrect answer check in seek fopXavier Hernandez2017-05-091-15/+8
| | | | | | | | | | | | | | | A bad check in the answer of a seek request caused a segmentation fault when seek reported an error. Change-Id: Ifb25ae8bf7cc4019d46171c431f7b09b376960e8 BUG: 1439068 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/16998 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: include quorum type and count when dumping afr privRavishankar N2017-05-081-0/+6
| | | | | | | | | | | | | | | | Dump the client quorum type ('auto' or 'fixed'). If it is 'fixed', also dump the quorum-count. This information will be available in the client statedump and in /<fuse_mount>/.meta/graphs/active/testvol-replicate-X/private. Change-Id: Idbd6e2acbd622d4e6cfabf511e649a6da0e42384 BUG: 1448804 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17196 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Fix ret checkN Balachandran2017-05-081-1/+1
| | | | | | | | | | | | | | Fixed an incorrect return code check in the rebalance code. Change-Id: I60804ff121cec7a2f0419e2ee70dd22ea7533c0c BUG: 1448640 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17197 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* coreutils: use coreutils instead of duplicate codeZhou Zhengping2017-05-051-3/+1
| | | | | | | | | | | | | Change-Id: I0e442331d2bbb22ec18c37af87ab2a8852737c43 BUG: 1448265 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/16975 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Fix wrong operatorsMichael Scherer2017-05-042-4/+3
| | | | | | | | | | | | | | | | | | | | | | Coverty rightfully note that if we verify that A =! C or A != B, it will always be true. In one case, that prevent healing from continuing. In the other, that trigger useless logs. Fixing this bug also show that ENOSPC shouldn't abort the rebalance operation, as seen during the review of the first patch on https://review.gluster.org/#/c/16676/1/xlators/cluster/dht/src/dht-rebalance.c Change-Id: I93c4df43b880b211da202a7e49cef6b1ce7ab68f BUG: 1424817 Signed-off-by: Michael Scherer <misc@redhat.com> Reviewed-on: https://review.gluster.org/16676 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Tested-by: Jeff Darcy <jeff@pl.atyp.us> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-025-58/+422
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Make rebalance throttle option tuned by numberSusant Palai2017-04-293-26/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current rebalance throttle options: lazy/normal/aggressive may not always be sufficient for the purpose of throttling. In our recent test, we observed for certain setups, normal and aggressive modes behaved similarly consuming full disk bandwidth. So in cases like this admin should be able to tune it down(or vice versa) depending on the need. Along with old throttle configurations, thread counts are tuned based on number. e.g. gluster v set vol-name cluster-rebal.throttle 5. Admin can tune up/down between 0 and the number of cores available. Note: For heterogenous servers, validation will fail on the old server if "number" is given for throttle configuration. The message looks something like this: "volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}" Test: Manual test by logging active thread number after reconfiguring throttle option. testcase: tests/basic/distribute/throttle-rebal.t Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3 BUG: 1438370 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16980 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: send lookup on old name inside rename with bname and pargfidSusant Palai2017-04-291-9/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | Inside rename, a lookup is done on the source name to make sure that the file is there. But we used to do a gfid based lookup and hence, even if the source name was renamed to a new name from some other client, lookup will be successful as server3_3_lookup will fetch the new path based on the gfid. So even if the source file does not exist any more rename will carry on, and as server3_3_link(destination is hashed to a different brick other than source cached scenario) also does gfid based resolve, it wont detect that the source name does not exist and hardlink creation will be successful (since gfid based resolve will get the new dentry). To solve this problem, do a name based lookup inside rename. So that rename will fail right away if the source does not exist. Change-Id: Ieba8bdd6675088dbf18de90ed4622df043d163bd BUG: 1412135 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16375 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: rebalance perf enhancementSusant Palai2017-04-292-108/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Throttle settings "normal" and "aggressive" for rebalance did not have performance difference. normal mode spawns $(no. of cores - 4)/2 threads and aggressive spawns $(no. of cores - 4) threads. Though aggressive mode has twice the number of threads compared to that of normal mode, there was no performance gain when switched to aggressive mode from normal mode. RCA: During the course of debugging the above problem, we tried assigning migration job to migration threads spawned by rebalance, rather than synctasks(as there is more overhead associated to manage the task queue and threads). This gave us a significant improvement over rebalance under synctasks. This patch does not really gurantee that there will be a clear performance difference between normal and aggressive mode, but this patch certainly maximized the disk utilization for 1GBfiles run. Results: Test enviroment: Gluster Config: Number of Bricks: 2 (one brick per disk(RAID-6 12 disk)) Bricks: Brick1: server1:/brick/test1/1 Brick2: server2:/brick/test1/1 Options Reconfigured: performance.readdir-ahead: on server.event-threads: 4 client.event-threads: 4 1000 files with 1GB each were created/renamed such that all files will have server1 as cached and server2 as hashed, so that all files will be migrated. Test machines had 24 cores each. Results with/without synctask based migration: ----------------------------------------------- mode normal(10threads) aggressive(20threads) timetaken 0:55:30 (h:m:s) 0:56:3 (h:m:s) withsynctask timetaken with migrator 0:38:3 (h:m:s) 0:23:41 (h:m:s) threads From above table it can be seen that, there is a clear 2x perf gain between rebalance with synctask vs rebalance with migrator threads. Additionally this patch modifies the code so that caller will have the exact error number returned by dht_migrate_file(earlier the errno meaning was overloaded). This will help avoiding scenarios where migration failure due to ENOENT, can result in rebalance abort/failure. Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e BUG: 1420166 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16427 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Pass the correct xdata in fremovexattr fopKrutika Dhananjay2017-04-281-8/+4
| | | | | | | | | | | Change-Id: Id84bc87e48f435573eba3b24d3fb3c411fd2445d BUG: 1440051 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17126 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht Remove redundant logs in dht rmdirN Balachandran2017-04-261-8/+7
| | | | | | | | | | | | | | | Removing redundant logs were introduced in https://review.gluster.org/#/c/17065/ Change-Id: I0d6055488b51a13c91d2121e87f653cdb94888b0 BUG: 1445590 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17118 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* feature/dht: Directory synchronizationKotresh HR2017-04-2610-1141/+1897
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Design doc: https://review.gluster.org/16876 Directory creation is now synchronized with blocking inodelk of the parent on the hashed subvolume followed by the entrylk on the hashed subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup selfheal mkdir. To maintain internal consistency of directories across all subvols of dht, we need locks. Specifically we are interested in: 1. Consistency of layout of a directory. Only one writer should modify the layout at a time. A writer (layout setting during directory heal as part of lookup) shouldn't modify the layout while there are readers (all other fops like create, mkdir etc., which consume layout) and readers shouldn't read the layout while a writer is in progress. Readers can read the layout simultaneously. Writer takes a WRITE inodelk on the directory (whose layout is being modified) across ALL subvols. Reader takes a READ inodelk on the directory (whose layout is being read) on ANY subvol. 2. Consistency of directory namespace across subvols. The path and associated gfid should be same on all subvols. A gfid should not be associated with more than one path on any subvol. All fops that can change directory names (mkdir, rmdir, renamedir, directory creation phase in lookup-heal) takes an entrylk on hashed subvol of the directory. NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a directory, the transaction itself is a consumer of layout on parent directory. So, the transaction is a reader of parent layout and does an inodelk on parent directory just like any other layout reader. So a mkdir (dir/subdir) would: > Acquire a READ inodelk on "dir" on any subvol. > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir". > creates directory on hashed subvol and possibly on non-hashed subvols. > UNLOCK (entrylk) > UNLOCK (inodelk) NOTE2: mkdir fop while setting the layout of the directory being created is considered as a reader, but NOT a writer. The reason is for a fop which can consume the layout of a directory to come either of the following conditions has to be true: > mkdir syscall from application has to complete. In this case no need of synchronization. > A lookup issued on the directory racing with mkdir has to complete. Since layout setting by a lookup is considered as a writer, only one of either mkdir or lookup will set the layout. Code re-organization: All the lock related routines are moved to "dht-lock.c" file. New wrapper function is introduced to take blocking inodelk followed by entrylk 'dht_protect_namespace' Updates #191 Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31 Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1443373 Reviewed-on: https://review.gluster.org/15472 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: correct space check for rebalanceSusant Palai2017-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | With rebalance doing fallocate on destination, we don't need to add file size to the "destination available space" to decide whether to migrate the file or not. Notes: Fallocate would have already occupied the file size space on destination Change-Id: If7f6a6654e6257726680cf20d618482a6e9095a6 BUG: 1441508 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17104 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Implement self-heal-window_size optionSunil Kumar Acharya2017-04-253-3/+16
| | | | | | | | | | | | | | | | Fix implements the heal window size option for EC. This option control the maximum size of read/write operation carried out in self-heal process. BUG: 1441491 Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17098 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Pass the req dict instead of NULL in dht_attr2()Krutika Dhananjay2017-04-243-54/+71
| | | | | | | | | | | | | | | | | | | | | This bug was causing VMs to pause during rebalance. When qemu winds down a STAT, shard fills the trusted.glusterfs.shard.file-size attribute in the req dict which DHT doesn't wind its STAT fop with upon detecting the file has undergone migration. As a result shard doesn't find the value to this key in the unwind path, causing it to fail the STAT with EINVAL. Also, the same bug exists in other fops too, which is also fixed in this patch. Change-Id: Id7823fd932b4e5a9b8779ebb2b612a399c0ef5f0 BUG: 1440051 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17085 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: rm -rf fails if dir has stale linkto filesN Balachandran2017-04-211-45/+203
| | | | | | | | | | | | | | | | | | | rm -rf <dir> fails with ENOENT if dir contains a lot of stale linkto files. This is because a single readdirp is sent as part of the rmdir which would return and delete only as many linkto files on the bricks as would fit in one readdirp buffer. Running rm -rf <dir> multiple times will eventually delete all the files. The fix sends readdirp on each subvol until no more entries are returned. Change-Id: I447f2d193de4bd8ac16e4541c6b919d22250e39e BUG: 1442724 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17065 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/afr: GFID split brain resolution with favorite-child-policykarthik-us2017-04-202-45/+162
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently the automatic split brain resolution with favorite child policy is not resolving the GFID split brains. Fix: When there is a GFID split brain and the favorite child policy is set to size/mtime/ctime/majority, based on the policy decide on the source and sinks. Delete the entry from the sinks and recreate it from the source. Mark the appropriate pending attributes and resolve the GFID split brain. When the heal takes place it will complete the pending heals and reset the attributes. Change-Id: Ie30e5373f94ca6f276745d9c3ad662b8acca6946 BUG: 1430719 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/16878 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Skip file migration if the subvol that meets min-free-diskSusant Palai2017-04-193-25/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | criteria happens to be the same subvol containing data-file Rebalance need to figure out a new subvol in case the hashed subvol does not have enough space. In the process of figuring out the new subvol, we need to ignore the source subvol, otherwise it will lead to data loss. Test: Manual Ran the following sizeof /tmp/1: 1.5GB sizeof /brick/1: 16GB sizeof /tmp/2: 1.5GB <start> glusterd; gluster v create test1 vm1:/brick/1 vm1:/tmp/1; gluster v start test1; mount -t glusterfs vm1:test1 /mnt; for i in {1..2000} do dd if=/dev/zero of=/mnt/file$i bs=1KB count=1 &> /dev/null; done gluster v add-brick test1 vm1:/tmp/2 gluster v set test1 min-free-disk 12GB gluster v remove-brick test1 vm1:/tmp/1 star <end> file count and data were intact. Change-Id: Ib8fc8467a3d48a7c12958824c4f0b88e160b86c1 BUG: 1441508 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17064 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr: don't do a post-op on a brick if op failedRavishankar N2017-04-181-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In afr-v2, self-blaming xattrs are not there by design. But if the FOP failed on a brick due to an error other than ENOTCONN (or even due to ENOTCONN, but we regained connection before postop was wound), we wind the post-op also on the failed brick, leading to setting self-blaming xattrs on that brick. This can lead to undesired results like healing of files in split-brain etc. Fix: If a fop failed on a brick on which pre-op was successful, do not perform post-op on it. This also produces the desired effect of not resetting the dirty xattr on the brick, which is how it should be because if the fop failed on a brick, there is no reason to clear the dirty bit which actually serves as an indication of the failure. Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4 BUG: 1438255 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16976 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* dht: Add missing braces in dht_opendirPoornima G2017-04-131-1/+2
| | | | | | | | | | | Change-Id: I6adce98f52e17953f501bc590ff7189cceac3c31 BUG: 1431908 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17057 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Make rebalance honor min-free-diskSusant Palai2017-04-133-17/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test: Manual created files of size 1K on 2 brick(of size 1GB) setup . added a brick of size 16GB. set min-free-disk to 12GB(so that first two bricks won't receive any files). removed one of the 1st brick of size 1GB. Logs from test: [2017-04-12 08:52:08.196484] W [MSGID: 0] [dht-rebalance.c:895:__dht_check_free_space] 0-test1-dht: Write will cross min-free-disk for file - /tile32 on subvol - test1-client-1. Looking for new subvol. [2017-04-12 08:52:08.196904] I [MSGID: 0] [dht-rebalance.c:925:__dht_check_free_space] 0-test1-dht: new target found - test1-client-2 for file - /tile32 - Post migration we have two files. The new destination (/brick/1) has the data file [root@vm1 ~]# ll /brick/1/tile32 -rw-r--r--. 2 root root 0 Apr 12 14:22 /brick/1/tile32 - On the old target the linkto file is there with linkto xattr pointing to /brick/1 [root@vm1 ~]# ll /tmp/2/tile32 ---------T. 2 root root 1000 Apr 12 14:22 /tmp/2/tile32 [root@vm1 ~]# getfattr -m . -de text /tmp/2/tile32 getfattr: Removing leading '/' from absolute path names security.selinux="unconfined_u:object_r:user_tmp_t:s0" trusted.gfid="����:Aс�#�/'b2" trusted.glusterfs.dht.linkto="test1-client-2" Marking ./tests/features/worm_sh.t as bad test. Reason being, this patch failed on master branch as well and it has nothing to do with rebalance/remove-brick. BUG: 1441508 Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17034 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec : Don't count healing brick as healthy brickAshish Pandey2017-04-121-1/+1
| | | | | | | | | | | | | | | | | | | In ec_child_select, we should send fop on healing bricks unconditionaly but to check the number of healthy bricks against fragments and minimum count, we should not count these healing bricks. Count bits of fop->mask before adding ealing brick to fop->mask Change-Id: I3fa80bdd5ca34ca070d610116b84154b917c5999 BUG: 1439527 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17007 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht/rebalance: Crawler performance improvementSusant Palai2017-04-101-127/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The job of the crawler in rebalance is to fetch files from each local subvolume and push them to migration queue if it is eligible for migration. And we do a lookup on the entries received to figure out the eligibilty. Since, the lookup done is on a local subvolume we receive linkto files and regular files as well. This requires us to do two lookups. first: do a lookup on the file to figure out whether it is a linkto file second: do a lookup on the file to figure out if it should be migrated Note: The migrator thread also does one lookup for the file before migration. Optimization: Remove the lookup done by the crawler. Offload these task to the migrator threads. For linkto file verification get the stat and xattr information from readdirp. So in total we have one lookup instead of three for each entry. Performance numbers: Create two node, two brick setup. Created 100000 files. And started rebalance. Since, there is no add-brick, no files will be migrated and we will get the crawler performance. Without patch: [root@gprfs039 ~]# grs Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 50070 0 0 completed 0:0:48 server2 0 0Bytes 49930 0 0 completed 0:0:44 volume rebalance: test1: success Total: 48 seconds WiththecurrentPatch: [root@gprfs039 mnt]# gluster v rebalance test1 status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 50070 0 0 completed 0:0:12 server2 0 0Bytes 49930 0 0 completed 0:0:12 volume rebalance: test1: success Total: 12 seconds That's 4X speed gain. :) Updates glusterfs#155 Change-Id: Idc8e5b366e76c54aa40d698876ae62fe1630b6cc BUG: 1439571 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/15781 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: The xattrs sent in readdirp should be sent in opendir aswellPoornima G2017-04-061-28/+44
| | | | | | | | | | | | | | As readdir-ahead can be loaded as a child of dht, dht has to specify the xattrs it is intrested in, as part of opendir call itself. Change-Id: I012ef96cc143b0cef942df78aa7150d85ec38606 BUG: 1431908 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16902 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* build: clang has __builtin_popcount() and __builtin_ffs()Kaleb S. KEITHLEY2017-04-051-1/+1
| | | | | | | | | | | | | | | | | Note: Even though gcc(1) will automatically treat ffs() and popcount() as built-in, calling them explicitly as __builtin in the source helps make it easier to find them. (And if no __builtin_ffs use the one in libc.) Change-Id: Ib74d9b221ff03a01df5ad05907024da1a83a7a88 BUG: 1438772 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/16993 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/dht: Modify local->loc.gfid in thread safe mannerPranith Kumar K2017-04-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: local->loc.gfid in dht_lookup_directory() will be null-gfid for a fresh lookup. dht_lookup_dir_cbk() updates local->loc.gfid while in other thread dht_lookup_directory() is still winding lookup calls to subvolumes so there is a chance of partial gfid being seen by EC. We saw in 12x(4+2) volume, ec is receiving an loc where the gfid has last 10 bytes matching with the gfid of the directory and the first 4 bytes are all-zeros. This is leading to EC erroring out the lookup with EINVAL which leads to NFS failing lookup with EIO. snip from gdb: $37 = (dht_local_t *) 0x7fde5de5b3cc (gdb) p /x $37->loc.gfid $39 = {0x3b, 0x82, 0x10, 0x5e, 0x40, 0x65, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56} (gdb) fr 7 state=<optimized out>) at ec-generic.c:837 837 ec_lookup_rebuild(fop->xl->private, fop, cbk); (gdb) p /x fop->loc[0].gfid $40 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x43, 0x14, 0xa0, 0xc6, 0x8, 0xf5, 0x6c, 0x2c, 0xb8, 0x56} snip from log: [2017-01-29 03:22:30.132328] W [MSGID: 122019] [ec-helpers.c:354:ec_loc_gfid_check] 0-butcher-disperse-4: Mismatching GFID's in loc [2017-01-29 03:22:30.132709] W [MSGID: 112199] [nfs3-helpers.c:3515:nfs3_log_newfh_res] 0-nfs-nfsv3: /linux-4.9.5/Documentation => (XID: b27b9474, MKDIR: NFS: 5(I/O error), POSIX: 5(Input/output error)), FH: exportid 00000000-0000-0000-0000-000000000000, gfid 00000000-0000-0000-0000-000000000000, mountid 00000000-0000-0000-0000-000000000000 [Invalid argument] Fix: update local->loc.gfid in last-call to make sure there are no races. BUG: 1438411 Change-Id: Ifcb7e911568c1f1f83123da6ff0cf742b91800a0 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16986 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* build: miscellaneous spelling fixesPatrick Matthäi2017-04-021-1/+1
| | | | | | | | | | | | | | | | Debian builds detected spelling issues with GlusterFS 3.10.1. Instead of carrying the patch in the Debian sources, let's include the fixes here too. Change-Id: I38db6adf142f7ec247bffd47aa1e6ff1a0c49e00 Reported-by: Patrick Matthäi <pmatthaei@debian.org> BUG: 1437853 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16973 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* syncop: don't wake task in synctask_wake unless really neededRavishankar N2017-03-281-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In EC and AFR, we launch synctasks during self-heal. (i) These tasks usually stackwind a FOP to all its children and call synctask_yield() which does a swapcontext to synctask_switchto() and puts the task in syncenv's waitq by calling __wait(task). This happends as long as the FOP ckbs from all children haven't been received. (ii) For each FOP cbk, we call synctask_wake() which again does a swapcontext to synctask_switchto() which now puts the task in syncenv's runq by calling __run(task). When the task runs and the conext switches back to the FOP path, it puts the task in waitq because we haven't heard from all children as explained in (i). Thus we are unnecessarily using the swapcontext syscalls to just toggle the task back and forth between the waitq and runq. Fix: Store the stackwind count in new variable 'syncbarrier->waitfor' before winding the fop. In each cbk when we call synctask_wake(), perform an actual wake only if the cbk count == stackwind count. Change-Id: Id62d3b6ffed5a8c50f8b79267fb34e9470ba5ed5 BUG: 1434274 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/16931 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Undo pending xattrs only on the up brickskarthik-us2017-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | Problem: While doing conservative merge, even if a brick is down, it will reset the pending xattr on that. When that brick comes up, as part of the heal, it will consider this brick as the source and removes the entries on the other bricks, which leads to data loss. Fix: Undo pending only for the bricks which are up. Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303 BUG: 1433571 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/16913 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* cluster/ec: Metadata healing fails to update the versionSunil Kumar Acharya2017-03-211-7/+5
| | | | | | | | | | | | | | | | | | During meatadata heal, we were not updating the version though all the inode attributes were in sync. Updated the code to adjust version when all the inode attributes are in sync. BUG: 1425703 Change-Id: I6723be3c5f748b286d4efdaf3c71e9d2087c7235 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/16772 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* afr: do not mention split-brain in log message in read_txnRavishankar N2017-03-201-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | I am seeing a lot of messages in qe/customer logs where read_txn complains that file is possibly in split-brain because of no readable subvol being found, does inode refresh and then there is no split-brain message post the inode refresh. This means that a lookup was not issued on the indoe to populate 'readable' or it can mean one brick is source for data and the other for metadata, making readable to be zero (because readable=intersection of (data,metadata readable) since commit 7a1c1e290470149696. Since we anyway log actual split-brains post inode-refresh, move this message to DEBUG log level. Change-Id: Idb88b8ea362515279dc9b246f06b6b646c6d8013 BUG: 1433838 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16879 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht/rebalance: update op_errno for rebalance failuresSusant Palai2017-03-091-3/+4
| | | | | | | | | | | | | | | | | | function "rebalance_task_completion" did not honor returned values other than -1 and 1. This led to ignoring errno e.g in the current bug, ENOTSUP returned by __is_file_migratable and it returned EINVAL(op_errno initialized) to it's caller leading to failure messages logged in ERROR level. Change-Id: I45549fba35c72b278539269b750768fd89e0faf2 BUG: 1393338 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/15810 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec: Don't mark dirty on entry/meta ops in query-infoPranith Kumar K2017-03-072-9/+2
| | | | | | | | | | | | | | | | | | | | | We wanted to mark dirty for metadata/entry operations whenever query-info is set and info is not yet there because we are anyway sending xattrop over the network. But this is causing 25% regression from 3.8.8 so removing this optimization Also fixed two small issues that we didn't find in the previous patch 1) reconfigure failure was sending return value 0 for optimistic-changelog 2) ec->optimistic_changelog was set to true even before OPTION_INIT BUG: 1408809 Change-Id: Iabb0b64bd4d3623688790e4b67e5c20b4da977a1 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16865 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* afr: restore atime/mtime for non-regular filesRavishankar N2017-03-064-51/+64
| | | | | | | | | | | | | | | | | | | AFR restores atime/mtime only as a part of data heal. For non-regular files (dirs, symlinks, char/block/socket files etc) which do not undergo data-heal, atime/mtime is not restored. This patch restores atime/mtime as a part of metadata heal for such files. Change-Id: Id8da885fc93fdf65c2f4bae2af3605b146ac1f16 BUG: 1429198 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16844 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Introduce optimistic changelog in ECPranith Kumar K2017-03-044-6/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Fix to https://bugzilla.redhat.com/show_bug.cgi?id=1316873 has made changes to set dirty flag before every update fop, data or metadata, and unset it after successful operation. That makes some of the fops very slow such as entry operations or metadata operations. Solution: File data operations are the only operation which take some time and setting dirty flag before a fop and unsetting it after serves the purpose as probability of failure of a fop is high when the time duration is more. For all the other operations, set dirty flag at the end of the fop, if any brick is down and need heal. Providing following option to choose between high performance or better heal marking for metadata and entry fops. Set/Unset dirty flag for every update fop at the start of the fop. If ON, this option impacts performance of entry operations or metadata operations as it will set dirty flag at the start and unset it at the end of ALL update fop. If OFF and all the bricks are good, dirty flag will be set at the start only for file fops For metadata and entry fops dirty flag will not be set at the start, if all the bricks are good. This does not impact performance for metadata operations and entry operation but has a very small window to miss marking entry as dirty in case it is required to be healed. Thanks to Xavi and Ashish for the design Picked the .t file from Ashish' patch https://review.gluster.org/16298 BUG: 1408809 Change-Id: I3ce860063f0e2901e50754dcfc3e4ed22daf819f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16821 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/dht: Fix crash in "nuke-dir" featureKrutika Dhananjay2017-03-021-1/+10
| | | | | | | | | | | | | | | | | | | My patch at https://review.gluster.org/16419 is resulting in core dumps everytime I run tests/features/nuke.t. Turns out dht, upon successfully "nuking" a directory, which was initiated through a setxattr, unwinds the operation with rmdir fop signature, resulting in readdir-ahead casting a struct iatt (preparent) to dict_t, leading to a crash. Change-Id: If5f50417be9eb93e731b06c79b9bf027e5dd4d55 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/16829 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* dht/rebalance: Increase maximum read block size from 128 KB to 1 MBShreyas Siravara2017-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - The maximum block size, `DHT_REBALANCE_BLKSIZE`, is set to 128 KB. - As a result, migrating files in the megabytes to gigabytes can take much longer than necessary. - Some preliminary results by bumping the blocksize: With 128 KB: [2016-08-04 11:40:19.251167] I [MSGID: 109028] [dht-rebalance.c:2196:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 15.00 secs [2016-08-04 11:40:19.251189] I [MSGID: 109028] [dht-rebalance.c:2200:gf_defrag_status_get] 0-glusterfs: Files migrated: 49, size: 2569011200, lookups: 149, failures: 0, skipped: 0 With 1 MB: [2016-08-04 11:41:21.093662] I [MSGID: 109028] [dht-rebalance.c:2196:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 7.00 secs [2016-08-04 11:41:21.093687] I [MSGID: 109028] [dht-rebalance.c:2200:gf_defrag_status_get] 0-glusterfs: Files migrated: 49, size: 2569011200, lookups: 149, failures: 0, skipped: 0 - This is a cherry-pick of D3670927 to 3.8. Test Plan: Tested rebalance on devserver. Reviewed By: dph, rwareing Change-Id: Ide2edbf87ef9ae2b32a03f189c57b63e2f233fc8 BUG: 1428055 Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Reviewed-on: https://review.gluster.org/16797 Tested-by: Vijay Bellur <vbellur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* Remove double free for disk_layoutMichael Scherer2017-02-271-2/+0
| | | | | | | | | | | | | | | | | Since disk_layout is freed after the jump to err, we do not need to free it again before the goto. Found by coverity scan. Change-Id: Ie0c0262f6b95c51c61a59faefbca70352bf1e604 BUG: 789278 Signed-off-by: Michael Scherer <misc@redhat.com> Reviewed-on: https://review.gluster.org/16720 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Michael Scherer <misc@fedoraproject.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>