glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	cluster/dht: Handle wrong rebalance status reporting	Susant Palai	2017-08-01	1	-29/+32
\| \| \| \| \| \| \| \| \| \| \|	Change-Id: Id91ef35f890055cd42b9a94462f92297c77f1fff Bug: 1475282 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17868 Tested-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: rebalance min-free-disk fix	Susant Palai	2017-08-01	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To calculate available space on a subvolume we used to do the following in __dht_check_free_space. post_availspace = (dst_statfs.f_bavail * dst_statfs.f_frsize) - stbuf->ia_size Now to subtracting the file size from available space is tricky here. Sometime available space will be lesser than the file size and since all the participating members in calculation are unsigned int, the result is a large number (integer overflow). Solution: We do not need to subtract the file size from the space available, since fallocate would have reserved file size space already. Change-Id: I4f724358c44b9911933742ff3ff8d55b3dfda1cb BUG: 1475282 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17876 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: Check for open fd only on EBADF	N Balachandran	2017-07-31	2	-160/+116
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DHT fd based fops will now check if the fd is open on the cached subvol only if the call fails with EBADF. This will improve performance for scenarios where a rebalance is not running which would be most of the time. Change-Id: Idfaeb8927af769c6110d07a165a0fe2307369239 BUG: 1476665 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17922 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	ec/cluster: Update failure of fop on a brick properly	Ashish Pandey	2017-07-31	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In case of truncate, if writev or open fails on a brick, in some cases it does not mark the failure onlock->good_mask. This causes the update of size and version on all the bricks even if it has failed on one of the brick. That ultimately causes a data corruption. Solution: In callback of such writev and open calls, mark fop->good for parent too. Thanks Pranith Kumar K <pkarampu@redhat.com> for finding the root cause. Change-Id: I8a1da2888bff53b91a0d362b8c44fcdf658e7466 BUG: 1476205 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17906 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: change log level to debug for thread activity	Susant Palai	2017-07-26	1	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Every time all the thread sleeps or wakes up, we log a message about that event. Sometime this can be noisy where the number of files eligible to be migrated are placed far away from each other. Moving the logs to DEBUG. Change-Id: I4dc2cc9fdf4f42d4001754532a5bc4aeb3f0f959 BUG: 1474639 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17866 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: Fix negative rebalance estimates	N Balachandran	2017-07-26	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The calculation of the rebalance estimates will start after the rebalance operation has been running for 10 minutes. This patch also changes the cli rebalance status code to use unsigned variables for the time calculations. Change-Id: Ic76f517c59ad938a407f1cf5e3b9add571690a6c BUG: 1457985 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17863 Reviewed-by: Amar Tumballi <amarts@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: Update size processed for non-migrated files	N Balachandran	2017-07-25	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The size of non-migrated files was not added to the size_processed causing incorrect rebalance estimate calculations. This has been fixed. Change-Id: I9f338c44da22b856e9fdc6dc558f732ae9a22f15 BUG: 1467209 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17867 Reviewed-by: Amar Tumballi <amarts@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: Correct iterator for decommissioned bricks	N Balachandran	2017-07-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Corrected the iterator for looping over the list of decommissioned bricks while checking if the new target determined because of min-free-disk values has been decommissioned. Change-Id: Iee778547eb7370a8069e954b5d629fcedf54e59b BUG: 1474318 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17861 Reviewed-by: Susant Palai <spalai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: Fixed crash in dht_rmdir_is_subvol_empty	N Balachandran	2017-07-20	1	-13/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The local->call_cnt was being accessed and updated inside the loop where the entries were being processed and the calls were being wound. This could end up in a scenario where the local->call_cnt became 0 before the processing was complete causing the crash when the next entry was being processed. Change-Id: I930f61f1a1d1948f90d4e58e80b7d6680cf27f2f BUG: 1472949 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17825 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
*	libglusterfs: Name threads on creation	Raghavendra Talur	2017-07-19	5	-22/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set names to threads on creation for easier debugging. Output of top -H -p <PID-OF-GLUSTERFSD> Before: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterfsd 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd After: 19773 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19774 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustertimer 19775 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterfsd 19776 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustermemsweep 19777 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc0 19778 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glustersproc1 19779 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll0 19780 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteridxwrker 19781 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusteriotwr0 19782 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrssign 19783 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterbrswrker 19784 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterclogecon 19785 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd0 19786 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd1 19787 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.01 glusterclogd2 19789 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixjan 19790 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixfsy 25178 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll1 5398 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterepoll2 7881 root 20 0 1301.3m 12.6m 8.4m S 0.0 0.1 0:00.00 glusterposixhc Change-Id: Id5f333755c1ba168a2ffaa4fce6e71c375e10703 BUG: 1254002 Updates: #271 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/11926 Reviewed-by: Prashanth Pai <ppai@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/afr: GFID split-brain resolution with existing CLI	karthik-us	2017-07-18	6	-135/+315
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Currently there is no way for the admin from CLI to resolve gfid split-brain based on some policy like choice of the brick, mtime or size. Fix: With the existing CLI options based on size, mtime, and choice of brick, we do lookup on the parent for the specified file. As part of the lookup, if we find gfid mismatch, we resolve them based on the policy and return. If the file is not in gfid split- brain, then we check for the data and metadata split-brain in the getxattr code path, and resolve if any. This will work provided absolute path to the file with the CLI and not with gfid of the file. Hence the source-brick policy without any file path will also not resolve the gfid split-brain since it uses the gfid of the files. But it can resolve any other type of split-brains and skip the gfid mismatch resolution with the usual error message. Reverting the change https://review.gluster.org/17290. This patch resolves the issue. Fixes gluster/glusterfs#135 Change-Id: Iaeba6fc32f184a34255d03be87cda02773130a09 BUG: 1459530 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/17485 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Non-disruptive upgrade on EC volume fails	Sunil Kumar Acharya	2017-07-14	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Enabling optimistic changelog on EC volume was not handling node down scenarios appropriately resulting in volume data inaccessibility. Solution: Update dirty xattr appropriately on good bricks whenever nodes are down. This would fix the metadata information as part of heal and thus ensures data accessibility. BUG: 1468261 Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17703 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	afr: mark non sources as sinks in metadata heal	Ravishankar N	2017-07-13	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a 3 way replica, when the source brick does not have pending xattrs for the sinks, but the 2 sinks blame each other, metadata heal was not happpening because we were not setting all non-sources as sinks. Fix: Mark all non-sources as sinks, like it is done in data and entry heal. Change-Id: I534978940f5087302e307fcc810a48ffe898ce08 BUG: 1468279 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17717 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: Get size of file in EC [f]xattrop	Pranith Kumar K	2017-07-13	2	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: For allowing parallel writes we shouldn't depend on ia_size to be same for all the bricks in each write_cbk(). But we need to make sure backend size is correct on all the bricks and no crashes/manual modifications happened. Fix: At the time of get_size_version() we do 1 check to make sure size of the file is same across the bricks. From then on the FOPs will give the status of the fop, so we rely on this information to keep which bricks are good/bad. Updates #251 Change-Id: I1df645347e2e9f2e09cfa4411b6cc305d7f4e4e5 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17741 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/rebalance: Fix hardlink migration failures	Susant Palai	2017-07-13	2	-15/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A brief about how hardlink migration works: - Different hardlinks (to the same file) may hash to different bricks, but their cached subvol will be same. Rebalance picks up the first hardlink, calculates it's hash(call it TARGET) and set the hashed subvolume as an xattr on the data file. - Now all the hardlinks those come after this will fetch that xattr and will create linkto files on TARGET (all linkto files for the hardlinks will be hardlink to each other on TARGET). - When number of hardlinks on source is equal to the number of hardlinks on TARGET, the data migration will happen. RACE:1 Since rebalance is multi-threaded, the first lookup (which decides where the TARGET subvol should be), can be called by two hardlink migration parallely and they may end up creating linkto files on two different TARGET subvols. Hence, hardlinks won't be migrated. Fix: Rely on the xattr response of lookup inside gf_defrag_handle_hardlink since it is executed under synclock. RACE:2 The linkto files on TARGET can be created by other clients also if they are doing lookup on the hardlinks. Consider a scenario where you have 100 hardlinks. When rebalance is migrating 99th hardlink, as a result of continuous lookups from other client, linkcount on TARGET is equal to source linkcount. Rebalance will migrate data on the 99th hardlink itself. On 100th hardlink migration, hardlink will have TARGET as cached subvolume. If it's hash is also the same, then a migration will be triggered from TARGET to TARGET leading to data loss. Fix: Make sure before the final data migration, source is not same as destination. RACE:3 Since a hardlink can be migrating to a non-hashed subvolume, a lookup from other client or even the rebalance it self, might delete the linkto file on TARGET leading to hardlinks never getting migrated. This will be addressed in a different patch in future. Change-Id: If0f6852f0e662384ee3875a2ac9d19ac4a6cea98 BUG: 1469964 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17755 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/dht: Clear clean_dst flag on target change	N Balachandran	2017-07-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the target of a file migration was changed because of min-free-disk limits, the dst_fd was closed but the clean_dst flag was not set to false. If the file could not be created on the new target for some reason, the ftruncate call to clean up the dst was sent on the now invalid fd causing the process to deadlock. Change-Id: I5bfa80f519b04567413d84229cf62d143c6e2f04 BUG: 1469029 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17735 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/dht: Fix fd check race	N Balachandran	2017-07-11	2	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a another race between the cached subvol being updated in the inode_ctx and the fd being opened on the target. 1. fop1 -> fd1 -> subvol0 2. file migrated from subvol0 to subvol1 and cached_subvol changed to subvol1 in inode_ctx 3. fop2 -> fd1 -> subvol1 [takes new cached subvol] 4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1 5. fop1 -> checks fd ctx (fd not open on subvol0) -> tries to open fd1 on subvol0 -> fails with "No such file on directory". Fix: If dht_fd_open_on_dst fails with ENOENT or ESTALE, wind to old subvol and let the phase1/phase2 checks handle it. Change-Id: I34f8011574a8b72e3bcfe03b0cc4f024b352f225 BUG: 1465075 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17731 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
*	cluster/dht: Use size to calculate estimates	N Balachandran	2017-07-10	3	-24/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The earlier approach of using the number of files to determine when the rebalance would complete did not work well when file sizes differed widely. The new approach now gets the total data size and uses that information to determine how long the rebalance is expected to take. Change-Id: I84e80a0893efab72ff06130e4596fa71c9c8c868 BUG: 1467209 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17668 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/ec : Don't try to heal when no sink is UP	Ashish Pandey	2017-07-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: 4 + 2 EC volume configuration. If untar of linux is going on and we kill a brick, indices will be created for the files/dir which need to be healed. ec_shd_index_sweep spawns threads to scan these entries and start heal. If in the middle of this we kill one more brick, we end up in a situation where we can not heal an entry as there are only "ec->fragment" number of bricks are UP. However, the scan will be continued and it will trigger the heal for those entries. Solution: When a heal is triggered for an entry, check if it CAN be healed or not. If not come out with ENOTCONN. Change-Id: I305be7701c289f36bd7bde22491b71074771424f BUG: 1464359 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17692 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	cluster/ec: correctly handle end of file for seek	Xavier Hernandez	2017-07-06	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a SEEK_HOLE was issued near to the end of file, sometimes an offset beyond the end of file was returned. Another problem was that using some offsets greater than the end of file returned successfully instead of failing with ENXIO. Change-Id: I238d2884ba02fd19a78116b0f8f8e8d6338fb3f5 BUG: 1449348 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/17228 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	core: assorted typos and spelling mistakes from Debian lintian	Kaleb S. KEITHLEY	2017-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Plus minor readability improvements. Reported-by: pmatthaei@debian.org Change-Id: I5393819a2fc9f240a19811143bb57b127df717cf BUG: 1466785 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17660 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster:dht Fix crash in dht_rename_lock_cbk	N Balachandran	2017-06-29	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a local variable to store the call count in the STACK_WIND for loop. Using frame->local is dangerous as it could be freed while the loop is still being processed Change-Id: Ie65cdcfb7868509b4a83bc2a5b5d6304eabfbc8e BUG: 1466110 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17645 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Nigel Babu <nigelb@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
*	cluster/dht: Check if fd is opened on dst subvol	N Balachandran	2017-06-28	6	-30/+543
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an fd is opened on a file, the file is migrated and the cached subvol is updated in the inode_ctx before an fd based fop is sent, the fop is sent to the dst subvol on which the fd is not opened. This causes the FOP to fail with EBADF. Now, every fd based fop will check to see that the fd has been opened on the dst subvol before winding it down. Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e BUG: 1465075 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17630 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	ec: Increase notification in all the cases	Ashish Pandey	2017-06-28	1	-31/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: "gluster v heal <volname> info" is taking long time to respond when a brick is down. RCA: Heal info command does virtual mount. EC wait for 10 seconds, before sending UP call to upper xlator, to get notification (DOWN or UP) from all the bricks. Currently, we are increasing ec->xl_notify_count based on the current status of the brick. So, if a DOWN event notification has come and brick is already down, we are not increasing ec->xl_notify_count in ec_handle_down. Solution: Handle DOWN even as notification irrespective of what is the current status of brick. Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f BUG: 1464091 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17606 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
*	cluster/rebalance: Use GF_XATTR_LIST_NODE_UUIDS_KEY to figure out local subvols.	Susant Palai	2017-06-26	3	-55/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Afr has introduced a new key GF_XATTR_LIST_NODE_UUIDS_KEY, through which rebalance will figure out its local subvolumes.(Reference bugid=1463250) key: GF_XATTR_NODE_UUID_KEY will continue to serve it's old purpose of returning the first afr chiild. test: prove tests/basic/distribute/rebal-all-nodes-migrate.t Change-Id: I4d602feda2a05b29d2210c712a07a4ac6b8bc112 BUG: 1463648 Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17595 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
*	cluster/dht: rebalance gets file count periodically	N Balachandran	2017-06-23	4	-30/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rebalance used to get the file count in the beginning and not update it. This caused estimates to fail if the number changed during the rebalance. The rebalance now updates the file count periodically. Change-Id: I1667ee69e8a1d7d6bc6bc2f060fad7f989d19ed4 BUG: 1464110 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17607 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/ec: Node uuid xattr support update for EC	Sunil Kumar Acharya	2017-06-23	2	-6/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The change in EC to return list of node uuids for GF_XATTR_NODE_UUID_KEY was causing problems with geo-rep. Fix: This patch will allow to get the single node uuid as it was doing before with the key "GF_XATTR_NODE_UUID_KEY", and will also allow to get the list of node uuids by using a new key "GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve the problem with geo-rep and any other features which were depending on this. BUG: 1462790 Change-Id: I2d9214a9658d4a41a3d6de08600884d2bda5f3eb Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17594 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	dht/hardlink : Remove stale linkto file incase of failure	Jiffin Tony Thottan	2017-06-22	3	-1/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a similar issue fixed for rename in https://review.gluster.org/#/c/16016/ For hardlinks, if cached and hashed subvolumes are different, then it will first create linkto file in hashed using root permission, but actually hardlink creation fails with EACESS and stale linkto file is never removed.All the followup hardlink calls with file name will result ESTALE because linktofile creation fails with EEXIST and follow up lookup on linkto file returns gfid-mismatching(old linkto file) and finally fails with ESTALE Steps to produce : (From link/00.t test from posix-testsuite) Steps executed in script * create a file "abc" using root * change the ownership of file to a non root user * create hardlink "link" for "abc" using a non root user, it fails with EACESS * delete "abc" * create directory "abc" using root * again try to create hadrlink "link" for "abc" using non root user, fails with ESTALE Also tried to fix other bugs in dht_linkfile_create_cbk() and posix_lookup. Thanks Susant for the help in debugging the issue and suggestion for this patch. Change-Id: I7a5a1899d3fd1fdb13578b37f9d52a084492e35d BUG: 1452084 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: https://review.gluster.org/17331 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/afr: Returning single and list of node uuids from AFR	karthik-us	2017-06-20	1	-9/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The change in afr to return list of node uuids was causing problems with geo-rep. Fix: This patch will allow to get the single node uuid as it was doing before with the key "GF_XATTR_NODE_UUID_KEY", and will also allow to get the list of node uuids by using a new key "GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve the problem with geo-rep and any other feature which were depending on this. Change-Id: I09885dac6dfca127be94b708470c8c2941356f9a BUG: 1462790 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/17576 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
*	dht: passing the errno as an argument to gf_msg	AnkitRaj	2017-06-20	1	-110/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many calls in gf_msg where errno is needed to pass as an argument instead of strerrno(error) Change-Id: I15048a5e0b41f9752a2023afe8470eca6f2cd383 Bug: 1454701 Signed-off-by: AnkitRaj <anraj@redhat.com> Reviewed-on: https://review.gluster.org/17464 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: N Balachandran <nbalacha@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Karthik U S <ksubrahm@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
*	cluster/dht: Additional checks for rebalance estimates	N Balachandran	2017-06-20	1	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The rebalance estimates calculation was not handling calculations correctly when no files had been processed, i.e., when rate_lookedup was 0. Now, the estimated time is set to 0 in such scenarios as there is no way for rebalance to figure out how long the process will take to complete without knowing the rate at which the files are being processed. Change-Id: I7b6378e297e1ba139852bcb2239adf2477336b5b BUG: 1457985 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17564 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/afr: Implement quorum for lk fop	Pranith Kumar K	2017-06-19	2	-23/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the moment when we have replica 3 or arbiter setup, even when lk succeeds on just one brick we give success to application which is wrong Fix: Consider quorum-number of successes as success when quorum is enabled. BUG: 1461792 Change-Id: I5789e6eb5defb68f8a0eb9cd594d316f5cdebaea Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17524 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
*	cluster/ec: lk shouldn't be a transaction	Pranith Kumar K	2017-06-16	1	-19/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When application sends a blocking lock, the lk fop actually waits under inodelk. This can lead to a dead-lock. 1) Let's say app-1 takes exculsive-fcntl-lock on the file 2) app-2 attempts an exclusive-fcntl-lock on the file which goes to blocking stage note: app-2 is blocked inside transaction which holds an inode-lock 3) app-1 tries to perform write which needs inode-lock so it gets blocked on app-2 to unlock inodelk and app-2 is blocked on app-1 to unlock fcntl-lock Fix: Correct way to fix this issue and make fcntl locks perform well would be to introduce 2-phase locking for fcntl lock: 1) Implement a try-lock phase where locks xlator will not merge lk call with existing calls until a commit-lock phase. 2) If in try-lock phase we get quorum number of success without any EAGAIN error, then send a commit-lock which will merge locks. 3) In case there are any errors, unlock should just delete the lock-object which was tried earlier and shouldn't touch the committed locks. Unfortunately this is a sizeable feature and need to be thought through for any corner cases. Until then remove transaction from lk call. BUG: 1455049 Change-Id: I18a782903ba0eb43f1e6526fb0cf8c626c460159 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17542 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	dht: reduce duplicate code in tier migrate	Amar Tumballi	2017-06-14	1	-75/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updates #234 Change-Id: I016f6d4f1e5ad2ea56a611c1bffbd189f10650db Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: https://review.gluster.org/17525 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: hari gowtham <hari.gowtham005@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/dht: Include dirs in rebalance estimates	N Balachandran	2017-06-07	3	-31/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Empty directories were not being considered while calculating rebalance estimates leading to negative time-left values being displayed as part of the rebalance status. Change-Id: I48d41d702e72db30af10e6b87b628baa605afa98 BUG: 1457985 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17448 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/ec: Update xattr and heal size properly	Ashish Pandey	2017-06-06	2	-9/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem-1 : Recursive healing of same file is happening when IO is going on even after data heal completes. Solution: RCA: At the end of the write, when ec_update_size_version gets called, we send it only on good bricks and not on healing brick. Due to this, xattr on healing brick will always remain out of sync and when the background heal check source and sink, it finds this brick to be healed and start healing from scratch. That involve ftruncate and writing all of the data again. To solve this, send xattrop on all the good bricks as well as healing bricks. Problem-2: The above fix exposes the data corruption during heal. If the write on a file is going on and heal finishes, we find that the file gets corrupted. RCA: The real problem happens in ec_rebuild_data(). Here we receive the 'size' argument which contains the real file size at the time of starting self-heal and it's assigned to heal->total_size. After that, a sequence of calls to ec_sync_heal_block() are done. Each call ends up calling ec_manager_heal_block(), which does the actual work of healing a block. First a lock on the inode is taken in state EC_STATE_INIT using ec_heal_inodelk(). When the lock is acquired, ec_heal_lock_cbk() is called. This function calls ec_set_inode_size() to store the real size of the inode (it uses heal->total_size). The next step is to read the block to be healed. This is done using a regular ec_readv(). One of the things this call does is to trim the returned size if the file is smaller than the requested size. In our case, when we read the last block of a file whose size was = 512 mod 1024 at the time of starting self-heal, ec_readv() will return only the first 512 bytes, not the whole 1024 bytes. This isn't a problem since the following ec_writev() sent from the heal code only attempts to write the amount of data read, so it shouldn't modify the remaining 512 bytes. However ec_writev() also checks the file size. If we are writing the last block of the file (determined by the size stored on the inode that we have set to heal->total_size), any data beyond the (imposed) end of file will be cleared with 0's. This causes the 512 bytes after the heal->total_size to be cleared. Since the file was written after heal started, the these bytes contained data, so the block written to the damaged brick will be incorrect. Solution: Align heal->total_size to a multiple of the stripe size. Thanks "Xavier Hernandez" <xhernandez@datalab.es> to find out the root cause and to fix the issue. Change-Id: I6c9f37b3ff9dd7f5dc1858ad6f9845c05b4e204e BUG: 1428673 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/16985 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
*	afr: update errno check in afr_inode_refresh_do	Ravishankar N	2017-06-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Addresses review comment in https://review.gluster.org/#/c/17413 Change-Id: Ic247729e5e92a5bb0148543764e0b30790444004 BUG: 1456582 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17436 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	core: fix spelling errors	Kaleb S. KEITHLEY	2017-06-02	2	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	fixes for various minor spelling errors and typos Reported-by: Patrick Matthäi <pmatthaei@debian.org> Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5 BUG: 1457808 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17442 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: Make optimal usage of buffer provided with readdir(p)	Sakshi	2017-05-31	4	-54/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dht_readdirp must unwind with list of entries only after the entire buffer requested by kernel is filled to avoid extra syscalls occuring when returning partially filled buffer. Also wind readdir call to next subvol on reaching EOD for directory on that subvol to avoid extra network call. Change-Id: If2e1a2722f813d95457c7542bff25fef56c7a041 BUG: 1356453 Signed-off-by: Sakshi <sabansal@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/12271 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com>
*	afr: add errno to afr_inode_refresh_done()	Ravishankar N	2017-05-31	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When parellel `rm -rf`s were being done from cifs clients, opendir might fail on some replicas with ENOENT. DHT ignores partial opendir failures in dht_fd_cbk() and winds readdirs on those replicas. Afr inode refresh (as a part of readdirp read_txn) sees in its fd context that the state of the fds is not AFR_FD_OPENED and bails out to afr_inode_refresh_done() without doing a refresh. When this happens, the errno is set as EIO due to lack of readable subvols, logging split-brain messages in the logs. Fix: Introduce an errno argument to afr_inode_refresh_do() to bail out with the right error value when inode refresh is not performed. Change-Id: I075707fbb73fd93a923b77b923a96aac79e847f9 BUG: 1456582 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17413 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/dht: fix on demand migration files from client	Susant Palai	2017-05-30	4	-20/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On demand migration of files i.e. migration done by clients triggered by a setfattr was broken. Dependency on defrag led to crash when migration was triggered from client. Note: This functionality is not available for tiered volumes. Migration from tier served client will fail with ENOTSUP. usage (But refer to the steps mentioned below to avoid any issues) : setfattr -n "trusted.distribute.migrate-data" -v "1" <filename> The purpose of fixing the on-demand client migration was to give a workaround where the user has lots of empty directories compared to files and want to do a remove-brick process. Here are the steps to trigger file migration for remove-brick process from client. (This is highly recommended to follow below steps as is) Let's say it is a replica volume and user want to remove a replica pair named brick1 and brick2. (Make sure healing is completed before you run these steps) Step-1: Start remove-brick process - gluster v remove-brick <volname> brick1 brick2 start Step-2: Kill the rebalance daemon - ps aux \| grep glusterfs \| grep rebalance\/ \| awk '{print $2}' \| xargs kill Step-3: Do a fresh mount as mentioned here - glusterfs -s ${localhostname} --volfile-id rebalance/$volume-name /tmp/mount/point Step-4: Go to one of the bricks (among brick1 and brick2) - cd <brick1 path> Step-5: Run the following command. - find . -not $ -path ./.glusterfs -prune $ -type f -not -perm 01000 -exec bash -c 'setfattr -n "distribute.fix.layout" -v "1" ${mountpoint}/$(dirname '{}')' \; -exec setfattr -n "trusted.distribute.migrate-data" -v "1" ${mountpoint}/'{}' \; This command will ignore the linkto files and empty directories. Do a fix-layout of the parent directory. And trigger a migration operation on the files. Step-6: Once this process is completed do "remove-brick force" - gluster v remove-brick <volname> brick1 brick2 force Note: Use the above script only when there are large number of empty directories. Since the script does a crawl on the brick side directly and avoids directories those are empty, the time spent on fixing layout on those directories are eliminated(even if the script does not do fix-layout on empty directories, post remove-brick a fresh layout will be built for the directory, hence not affecting application continuity). Detailing the expectation for hardlink migartion with this patch: Hardlink is migrated only for remove-brick process. It is highly essential to have a new mount(step-3) for the hardlink migration to happen. Why?: setfattr operation is an inode based operation. Since, we are doing setfattr from fuse mount here, inode_path will try to build path from the linked dentries to the inode. For a file without hardlinks the path construction will be correct. But for hardlinks, the inode will have multiple dentries linked. Without fresh mount, inode_path will always get the most recently linked dentry. e.g. if there are three hardlinks named dir1/link1, dir2/link2, dir3/link3, on a client where these hardlinks are looked up, inode_path will always return the path dir3/link3 if dir3/link3 was looked up most recently. Hence, we won't be able to create linkto files for all other hardlinks on destination (read gf_defrag_handle_hardlink for more details on hardlink migration). With a fresh mount, the lookup and setfattr become serialized. e.g. link2 won't be looked up until link1 is looked up and migrated. Hence, inode_path will always have the correct path, in this case link1 dentry is picked up(as this is the most recently looked up inode) and the path is built right. Note: If you run the above script on an existing mount(all entries looked up), hard links may not be migrated, but there should not be any other issue. Please raise a bug, if you find any issue. Tests: Manual Change-Id: I9854cdd4955d9e24494f348fb29ba856ea7ac50a BUG: 1450975 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17115 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	features/dht: Initialize local hashed_subvol	Kotresh HR	2017-05-25	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Self heal directory code path doesn't always have local->hashed_subvol populated. Populating the same which otherwise would fail the self heal. Change-Id: I03b64709fd7a68e28f9e7438243e817c53c6ef5d BUG: 1455104 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17381 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/ec: Implement FALLOCATE FOP for EC	Sunil Kumar Acharya	2017-05-23	3	-3/+209
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FALLOCATE file operations is not implemented in the existing EC code. This change set implements it for EC. BUG: 1448293 Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/15200 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: Fix crash in dht_selfheal_dir_setattr	N Balachandran	2017-05-19	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a local variable to store the call cnt used in the for loop for the STACK_WIND so as not to access local which may be freed by STACK_UNWIND after all fops return. Change-Id: I24f49b6dbd29a2b706e388e2f6d5196c0f80afc5 BUG: 1452102 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17343 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/afr: Remove debug logs in fix_quorum_options()	Vijay Bellur	2017-05-19	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \|	Change-Id: Id019b0c6425849eece8a9aba7acec9a521dfb10b BUG: 1452378 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: https://review.gluster.org/17335 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/dht: initialize throttle option "normal" to same in init and reconfigure	Susant Palai	2017-05-18	1	-77/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normal value were different in dht_init and dht_reconfigure. Initialization/reconfigure of throttle option are carved out to a separate function (dht_configure_throttle) now. Normal value will be "2". Change-Id: Ie323eae019af41d6bef0a136e3d284dc82bab9a1 BUG: 1451162 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17303 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/afr: Return the list of node_uuids for the subvolume	karthik-us	2017-05-17	3	-50/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: AFR was returning the node uuid of the first node for every file if the replica set was healthy, which was resulting in only one node migrating all the files. Fix: With this patch AFR returns the list of node_uuids to the upper layer, so that they can decide on which node to migrate which files, resulting in improved performance. Ordering of node uuids will be maintained based on the ordering of the bricks. If a brick is down, then the node uuid for that will be set to all zeros. Change-Id: I73ee0f9898ae473584fdf487a2980d7a6db22f31 BUG: 1366817 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/17084 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
*	cluster/ec: return all node uuids from all subvolumes	Xavier Hernandez	2017-05-17	2	-105/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EC was retuning the UUID of the brick with smaller value. This had the side effect of not evenly balancing the load between bricks on rebalance operations. This patch modifies the common functions that combine multiple subvolume values into a single result to take into account the subvolume order and, optionally, other subvolumes that could be damaged. This makes easier to add future features where brick order is important. It also makes possible to easily identify the originating brick of each answer, in case some brick will have an special meaning in the future. Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f BUG: 1366817 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/17297 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
*	cluster/dht: Rebalance on all nodes should migrate files	N Balachandran	2017-05-16	6	-16/+211
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Rebalance compares the node-uuid of a file against its own to and migrates a file only if they match. However, the current behaviour in both AFR and EC is to return the node-uuid of the first brick in a replica set for all files. This means a single node ends up migrating all the files if the first brick of every replica set is on the same node. Fix: AFR and EC will return all node-uuids for the replica set. The rebalance process will divide the files to be migrated among all the nodes by hashing the gfid of the file and using that value to select a node to perform the migration. This patch makes the required DHT and tiering changes. Some tests in rebal-all-nodes-migrate.t will need to be uncommented once the AFR and EC changes are merged. Change-Id: I5ce41600f5ba0e244ddfd986e2ba8fa23329ff0c BUG: 1366817 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17239 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
*	cluster/dht: Fix crash in dht rmdir	N Balachandran	2017-05-16	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using local->call_cnt to check STACK_WINDs can cause dht_rmdir_do to be called erroneously if dht_rmdir_readdirp_cbk unwinds before we check if local->call_cnt is zero in dht_rmdir_opendir_cbk. This can cause frame corruptions and crashes. Thanks to Shyam (srangana@redhat.com) for the analysis. Change-Id: I5362cf78f97f21b3fade0b9e94d492002a8d4a11 BUG: 1451083 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: https://review.gluster.org/17305 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>