glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	client/protocol: fix the log level for removexattr_cbk	Amar Tumballi	2018-05-17	2	-2/+12
\| \| \| \| \| \| \| \| \| \|	noticed that server protocol actually logs all the errors for removexattr as INFO, instead of WARNING like client, and hence, doesn't create a confusion in user. updates: bz#1576418 Change-Id: Ia6681e9ee433fda3c77a4509906c78333396e339 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterfs: Resolve brick crashes at the time of inode_unref	Mohit Agrawal	2018-05-15	2	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Sometimes brick process is getting crash at the time of calling inode_unref in fd_destroy Solution: Brick process is getting crash because inode is already free by xlator_mem_cleanup call by server_rpc_notify.To resolve the same move code specific to call transport_unref in last in free_state. BUG: 1577574 Change-Id: Ia517c230d68af4e929b6b753e4c374a26c39dc1a fixes: bz#1577574 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	glusterd/geo-rep: Fix glusterd crash	Kotresh HR	2018-05-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Using strdump instead of gf_strdup crashes during free if mempool is being used. gf_free checks the magic number in the header which will not be taken care if strdup is used. fixes: bz#1576392 Change-Id: Iab36496554b838a036af9d863e3f5fd07fd9780e Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	Glusterfsd: brick crash during get-state	hari gowtham	2018-05-11	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \|	The xprt's dereferencing wasn't checked before using it for the strcmp, which caused the segfault and crashed the brick process. fix: Check every deferenced variable before using it. Change-Id: I7f705d1c88a124e8219bb877156fadb17ecf11c3 fixes: bz#1575864 Signed-off-by: hari gowtham <hgowtham@redhat.com>
*	dht: Excessive 'dict is null' logs in dht_discover_complete	Mohit Agrawal	2018-05-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Problem: In Geo-Rep setup excessive "dict is null" logs in dht_discover_complete while xattr is NULL Solution: To avoid the logs update a condition in dht_discover_complete BUG: 1576767 Change-Id: Ic7aad712d9b6d69b85b76e4fdf2881adb0512237 fixes: bz#1576767 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	features/cloudsync: Remove multiple definition in Makefile.am	Anoop C S	2018-05-10	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CLOUDSYNC_SRC is defined twice in the same Makefile.am which generates the following warning: xlators/features/cloudsync/src/Makefile.am:9: warning: CLOUDSYNC_SRC multiply defined in condition TRUE ... xlators/features/cloudsync/src/Makefile.am:5: ... 'CLOUDSYNC_SRC' previously defined here Therefore removing the duplicate definition. Change-Id: I00c3e2f3d64ad45e4080c2c82766516cd3e2bf63 fixes: bz#1193929 BUG: 1193929 Signed-off-by: Anoop C S <anoopcs@redhat.com>
*	Quota: Turn on ssl for crawler clients if needed	Sanoj Unnikrishnan	2018-05-09	1	-1/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Quota uses per brick client generated by glusterd_generate_client_per_brick_volfile to crawl the individual bricks. These clients were not being configured with ssl if volume has client.ssl turned on. Solution: turn on client.ssl if the volume has client.ssl option set to on. Change-Id: Id3a13d5110c4376d734480c42da1ce6844cc8240 fixes: bz#1575858 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com>
*	cluster/dht: Debug virtual xattrs for dht	N Balachandran	2018-05-09	2	-0/+101
\| \| \| \| \| \| \| \| \|	Provide a virtual xattr with which to query the hashed subvol for a file. Change-Id: Ic7abd031f875da4b9084841ea7c25d6c8a851992 fixes: bz#1574421 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	cluster/dht: Debug logs in dht_readdir(p)_cbk	N Balachandran	2018-05-09	1	-2/+27
\| \| \| \| \| \| \| \| \|	Additional log messages to help debug issues with file listings. Change-Id: Iccd07498ba01d597c0c40f026f4177dd06d7e901 fixes: bz#1575887 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	protocol/server: don't assume there would be a volfile id	Amar Tumballi	2018-05-08	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Earlier glusterfs never had an assumption someone would start it with right arguments, and brick processes would be spawned by a management layer. It just assume the role based on the volfile. Other than volfile, no other arguments should be technically mandatory for working of glusterfs. With this patch, that assumption holds true. Updates: github issue # 352 A note on why this particular issue for this basic sanity? As per the design of thin-arbiter/tie-breaker, it can be started independently on any machine, without need of glusterd. So, similar to 'glusterd', we should be able to spawn a process with any translator without options/volume id etc. fixes: bz#1569399 Change-Id: I5c0650fe0bfde35ad94ccba60e63f6cdcd1ae5ff Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	dht: Avoid dict log flooding for internal MDS xattr	Mohit Agrawal	2018-05-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Before populate MDS internal xattr first dht checks if MDS is present in xattr or not.If xattr dictionary is NULL dict_get log the message either dict or key is NULL Solution: Before call dict_get check xattr, if it is NULL then no need to call dict_get. BUG: 1575910 Change-Id: I81604ec5945b85eba14b42f4583d06ec713028f4 fixes: bz#1575910 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	fuse: make sure the send lookup on root instead of getattr()	Amar Tumballi	2018-05-07	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change was done in https://review.gluster.org/16945. While the changes added there were required, it was not necessary to remove the getattr part. As fuse's lookup on root(/) comes as getattr only, this change is very much required. The previous fix for this bug was to add the check for revalidation in lookup when it was sent on root. But I had removed the part where getattr is coming on root. The removing was not requried to fix the issue then. Added back this part of the code, to make sure we have proper validation of root inode in many places like acl, etc. updates: bz#1437780 Change-Id: I859c4ee1a3f407465cbf19f8934530848424ff50 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd: handling brick termination in brick-mux	Sanju Rakonde	2018-05-07	4	-23/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: There's a race between the glusterfs_handle_terminate() response sent to glusterd from last brick of the process and the socket disconnect event that encounters after the brick process got killed. Solution: When it is a last brick for the brick process, instead of sending GLUSTERD_BRICK_TERMINATE to brick process, glusterd will kill the process (same as we do it in case of non brick multiplecing). The test case is added for https://bugzilla.redhat.com/show_bug.cgi?id=1549996 Change-Id: If94958cd7649ea48d09d6af7803a0f9437a85503 fixes: bz#1545048 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	cluster/dht: fixes to parallel renames to same destination codepath	Raghavendra G	2018-05-07	6	-81/+558
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test case: # while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid" "test" -f \|\| break; echo "done:$uuid"; done This script was run in parallel from multiple mountpoints Along the course of getting the above usecase working, many issues were found: Issue 1: ======= consider a case of rename (src, dst). We can encounter a situation where, * dst is a file present at the time of lookup * dst is removed by the time rename fop reaches glusterfs In this scenario, acquring inodelk on dst fails with ESTALE resulting in failure of rename. However, as per POSIX irrespective of whether dst is present or not, rename should be successful. Acquiring entrylk provides synchronization even in races like this. Algorithm: 1. Take inodelks on src and dst (if dst is present) on respective cached subvols. These inodelks are done to preserve backward compatibility with older clients, so that synchronization is preserved when a volume is mounted by clients of different versions. Once relevant older versions (3.10, 3.12, 3.13) reach EOL, this code can be removed. 2. Ignore ENOENT/ESTALE errors of inodelk on dst. 3. protect namespace of src and dst. To protect namespace of a file, take inodelk on parent on hashed subvol, then take entrylk on the same subvol on parent with basename of file. inodelk on parent is done to guard against changes to parent layout so that hashed subvol won't change during rename. 4. <rest of rename continues> 5. unlock all locks Issue 2: ======== linkfile creation in lookup codepath can race with a rename. Imagine the following scenario: * lookup finds a data-file with gfid - gfid-dst - without a corresponding linkto file on hashed-subvol. It decides to create linkto file with gfid - gfid-dst. - Note that some codepaths of dht-rename deletes linkto file of dst as first step. So, a lookup racing with an in-progress rename can easily run into this situation. * a rename (src-path:gfid-src, dst-path:gfid-dst) renames data-file and hence gfid of data-file changes to gfid-src with path dst-path. * lookup proceeds and creates linkto file - dst-path - with gfid - dst-gfid - on hashed-subvol. * rename tries to create a linkto file dst-path with src-gfid on hashed-subvol, but it fails with EEXIST. But EEXIST is ignored during linkto file creation. Now we've ended with dst-path having different gfids - dst-gfid on linkto file and src-gfid on data file. Future lookups on dst-path will always fail with ESTALE, due to differing gfids. The fix is to synchronize linkfile creation in lookup path with rename using the same mechanism of protecting namespace explained in solution of Issue 1. Once locks are acquired, before proceeding with linkfile creation, we check whether conditions for linkto file creation are still valid. If not, we skip linkto file creation. Issue 3: ======== gfid of dst-path can change by the time locks are acquired. This means, either another rename overwrote dst-path or dst-path was deleted and recreated by a different client. When this happens, cached-subvol for dst can change. If rename proceeds with old-gfid and old-cached subvol, we'll end up in inconsistent state(s) like dst-path with different gfids on different subvols, more than one data-file being present etc. Fix is to do the lookup with a new inode after protecting namespace of dst. Post lookup, we've to compare gfids and correct local state appropriately to be in sync with backend. Issue 4: ======== During revalidate lookup, if following a linkto file doesn't lead to a valid data-file, local->cached-subvol was not reset to NULL. This means we would be operating on a stale state which can lead to inconsistency. As a fix, reset it to NULL before proceeding with lookup everywhere. Issue 5: ======== Stale dentries left out in inode table on brick resulted in failures of link fop even though the file/dentry didn't exist on backend fs. A patch is submitted to fix this issue. Please check the dependency tree of current patch on gerrit for details In short, we fix the problem by not blindly trusting the inode-table. Instead we validate whether dentry is present by doing lookup on backend fs. Change-Id: I832e5c47d232f90c4edb1fafc512bf19bebde165 updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	cluster/dht: log error only if layout healing is required	Raghavendra G	2018-05-07	2	-188/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	selfhealing of directory is invoked on two conditions: 1. no layout on disk or layout has some anomalies (holes/overlaps) 2. mds xattr is not set on the directory When dht_selfheal_directory is called with a correct layout just to set mds xattr, we see error msgs complaining about "not able to form layout on directory", which is misleading as the layout is correct. So, log this msg only if layout has anomalies. Change-Id: I4af25246fc3a2450c2426e9902d1a5b372eab125 updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	glusterd: Fix for memory leak in volume tier status command	Sanju Rakonde	2018-05-06	1	-0/+8
\| \| \| \| \| \|	Fixes: bz#1573220 Change-Id: Ia60f40fa4f1e525cae6f571a24e5385ba1e004c0 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd/ctime: Provide option to enable/disable ctime feature	Kotresh HR	2018-05-06	6	-16/+41
\| \| \| \| \| \|	Updates: #208 Change-Id: If6f52b9b1b5b823ad64faeed662e96ceb848c54c Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd/utime: glusterd utime changes	Kotresh HR	2018-05-06	2	-0/+21
\| \| \| \| \| \| \| \| \|	Load utime xlator in the client side just after (below) performance xlators. Updates: #208 Change-Id: Ie15f156943fa8e7dac7050e5479c906da747b568 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	utime: ctime client side xlator	Kotresh HR	2018-05-06	11	-1/+613
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The client side utime xlator does two things. 1. Update unix epoch time in frame->root->ctime 2. Update the frame->root->flags based on the fop which indicates time attributes that should be updated for the parent/entry. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: I9cad297040c70798a0a8468a080eb4aeff73138d Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	posix/ctime: posix hook to set ctime xattr in relevant fops	Kotresh HR	2018-05-06	7	-34/+271
\| \| \| \| \| \| \| \| \| \| \|	This patch uses the ctime posix APIs to set consistent time across replica on disk. It also stores the time attributes in the inode context. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: I1a8d74d1e251f1d6d142f066fc99258025c0bcdd Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	posix/ctime: posix hooks to get consistent time xattr	Kotresh HR	2018-05-06	11	-103/+233
\| \| \| \| \| \| \| \| \| \| \| \|	This patch uses the ctime posix APIs to get consistent time across replica. The time attributes are got from from inode context or from on disk if not found and merged with iatt to be returned. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: Id737038ce52468f1f5ebc8a42cbf9c6ffbd63850 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	posix: APIs in posix to get and set time attributes	Kotresh HR	2018-05-06	8	-21/+614
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is part of the effort to provide consistent time across distribute and replica set for time attributes (ctime, atime, mtime) of the object. This patch contains the APIs to set and get the attributes from on disk and in inode context. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: I5d3cba53eef90ac252cb8299c0da42ebab3bde9f Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	afr: Add lease() fop	Poornima G	2018-05-05	3	-0/+157
\| \| \| \| \| \| \| \|	Change-Id: Ied047dd5ee44e9d5a5d3db214826f7df30332ef9 updates: #350 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
*	mount,fuse: make fuse dumping available as mount option	Csaba Henk	2018-05-04	1	-0/+7
\| \| \| \| \| \|	Updates: bz#1193929 Change-Id: I4dd4d0e607f89650ebb74b893b911b554472826d Signed-off-by: Csaba Henk <csaba@redhat.com>
*	fuse: add support for kernel writeback cache	Csaba Henk	2018-05-04	4	-4/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	feature/leases : fixing bugs found while testing glfs_test.t	Jiffin Tony Thottan	2018-05-04	3	-14/+56
\| \| \| \| \| \|	Change-Id: Iee8f431601ecda184108a079f665e05902b0f78b updates: #350 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
*	features/bitrot: print the path of the corrupted objects	Raghavendra Bhat	2018-05-04	4	-6/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently "gluster volume bitrot <volume name> scrub status" gives the list of the corrupted objects (files as of now). But only the gfids of those corrupted objects are seen and one has to do getfattr, find etc operations to get the actual path of those objects for removal etc. This change makes an attempt to print the path of those files as much as possible. * Try to get the path using the on disk gfid2path xattr. * If the above operation fails, then go for in memory path (provided that the object has its dentry properly created and linked in the inode table of the brick where the corrupted object is present) So the gfid to path resolution is a soft resolution, i.e. based on the inode and dentry cache in the brick's memory. If the path cannot be obtained via inode table also, then only gfid is printed. Change-Id: Ie9a30307f43a49a2a9225821803c7d40d231de68 fixes: bz#1570962 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	glusterd: enable self-heal in daemons	Ravishankar N	2018-05-04	3	-14/+0
\| \| \| \| \| \| \| \| \|	..like rebalance, quota and tier because that seems to be the consensus (see BZ). Change-Id: I912336a12f4e33ea4ec55f804df403fab0dc89fc BUG: 1536024 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	posix: Avoid changelog retries for geo-rep	Mohit Agrawal	2018-05-03	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: georep is slowdown to migrate directory from master volume to slave volume due to lot of changelog retries Solution: Update the condition in posix_getxattr to ignore MDS_INTERNAL_XATTR as it(posix) ignored other internal xattrs BUG: 1571069 Change-Id: I4d91ec73e5b1ca1cb3ecf0825ab9f49e261da70e fixes: bz#1571069 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	Don't use hardcoded /sbin, /usr/bin etc. paths. Fixes #1450546	Niklas Hambüchen	2018-05-03	3	-18/+6
\| \| \| \| \| \| \| \| \|	Instead, rely on programs to be in PATH, as gluster already does in many places across its code base. Change-Id: Id21152fe42f5b67205d8f1571b0656c4d5f74246 BUG: 1450546 Signed-off-by: Niklas Hambuechen <mail@nh2.me>
*	cluster/dht: unwind if dht_selfheal_dir_mkdir returns an error	Raghavendra G	2018-05-03	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	If dht_selfheal_dir_mkdir returns an error, cbk passed to dht_selfheal_directory is not invoked. So, Current codepath leaves an unwound frame resulting in a hung fop forever. Change-Id: I422308b8a34a074301ca46b029ffe676f5e0f66c fixes: bz#1574305 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	protocol/server : unwind as per op version	Ashish Pandey	2018-05-03	2	-3/+10
\| \| \| \| \| \| \|	Change-Id: Id6717640ac14881b490e512c4682e45ffffa7f5b fixes: bz#1570538 BUG: 1570538 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	glusterd/geo-rep: Fix UNUSED_VALUE coverity issue	Varsha Rao	2018-05-03	2	-12/+12
\| \| \| \| \| \| \| \| \| \| \|	The return value of glusterd_get_local_brickpaths is unused so add goto statement. As it is reinitialized outside the if block. Also change the if condition to check the failure case, when return value is -1 and path_list is NULL. Change-Id: I6b47d7751263f704bd69a6452a7e71bfcf226d49 updates: bz#789278 Signed-off-by: Varsha Rao <varao@redhat.com>
*	core/various: python3 compat, prepare for python2 -> python3	Kaleb S. KEITHLEY	2018-05-02	11	-190/+204
\| \| \| \| \| \| \| \| \| \|	see https://review.gluster.org/#/c/19788/ use print fn from __future__ Change-Id: If5075d8d9ca9641058fbc71df8a52aa35804cda4 updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	glusterd: Fix for memory leak in get-state detail	Sanju Rakonde	2018-05-01	1	-1/+8
\| \| \| \| \| \|	Fixes: bz#1573066 Change-Id: I76fe3bdde7351736b32eb3d6c4cc5f8f276257ed Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE error	Susant Palai	2018-04-30	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \|	Problem: A directory deletion can happen just before gf_defrag_settle_hash which internally does a setxattr operation on a directory. Solution: Ignore ENOENT and ESTALE errors Fixes: bz#1572581 Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2 Signed-off-by: Susant Palai <spalai@redhat.com>
*	cluster/afr: shd changes for thin arbiter	karthik-us	2018-04-30	1	-0/+184
\| \| \| \| \| \| \|	Updates #352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	afr: initial changes for thin arbiter	Ravishankar N	2018-04-30	6	-8/+229
\| \| \| \| \| \| \| \| \|	1. Create thin arbiter index file during mount. 2. Set pending marker in thin arbiter id file in case of failure. Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82 updates: #352 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	server/resolver: don't trust inode-table for RESOLVE_NOT	Raghavendra G	2018-04-30	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] https://github.com/gluster/glusterfs/issues/397 Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
*	libglusterfs: Capture the dict response in syncop_xattrop_cbk	karthik-us	2018-04-27	3	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Currently it is not possible to capture the xattrs values which are set on the bricks by calling syncop_(f)xattrop, because the response dict is not being assigned to any of the dictionaries. Fix: In the xattrop callback capture the response dict and send it back to the caller if it is requested. Change-Id: I9de9bcd97d6008091c9b060bcca3676cb9ae8ef9 fixes: bz#1572076 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	feature/thin-arbiter: Implement thin-arbiter translator	Ashish Pandey	2018-04-25	7	-1/+773
\| \| \| \| \| \| \|	Updates #352 Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf Signed-off-by: Ashish Pandey <aspandey@redhat.com>
*	performance/md-cache: purge cache on ENOENT/ESTALE errors	Raghavendra G	2018-04-25	1	-87/+438
\| \| \| \| \| \| \| \| \| \|	If not, next lookup could be served from cache and can be success, which is wrong. This can affect retry logic of VFS when it receives an ESTALE. Change-Id: Iad8e564d666aa4172823343f19a60c11e4416ef6 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1566303
*	cluster/afr: Keep child-up until ping-event	Pranith Kumar K	2018-04-25	3	-25/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If we have 2 bricks, brick-A and brick-B with brick-A within halo-max-latency and brick-B more than halo-max-latency. If we set both halo-min, halo-max replicas as '1'. In this case, brick-A comes online and then ping-latency will be updated for it. When brick-B comes online, we have 2 up-bricks, so the code tries to find the brick with worst latency to mark it down. Since Brick-B just came online it always had '0' latency so brick-B used to be marked offline and Brick-B would eventually be the one to be online even when brick-A is more suited. Fix: Consider latency of just-up child as HALO_MAX_LATENCY so that worst-child until ping-latency is found as the just-up brick. Also keep ping-latency as -1 until child-up during initialization. BUG: 1567881 fixes bz#1567881 Change-Id: I148262fe505468190f0eb99225d0f6d57cdb6f04 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	features/shard: Add option to barrier parallel lookup and unlink of shards	Krutika Dhananjay	2018-04-23	2	-28/+89
\| \| \| \| \| \| \| \| \|	Also move the common parallel unlink callback for GF_FOP_TRUNCATE and GF_FOP_FTRUNCATE into a separate function. Change-Id: Ib0f90a5f62abdfa89cda7bef9f3ff99f349ec332 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	cluster/dht: Fix dht_rename lock order	N Balachandran	2018-04-23	1	-18/+47
\| \| \| \| \| \| \| \| \| \|	Fixed dht_order_rename_lock to use the same inodelk ordering as that of the dht selfheal locks (dictionary order of lock subvolumes). Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d fixes: bz#1568348 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	server/auth: add option for strict authentication	Mohammed Rafi KC	2018-04-20	6	-12/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When this option is enabled, we will check for a matching username and password, if not found then the connection will be rejected. This also does a checksum validation of volfile The option is invalid when SSL/TLS is in use, at which point the SSL/TLS certificate user name is used to validate and hence authorize the right user. This expects TLS allow rules to be setup correctly rather than the default *. This option is not settable, as a result this cannot be enabled for volumes using the CLI. This is used with the shared storage volume, to restrict access to the same in non-SSL/TLS environments to the gluster peers only. Tested: ./tests/bugs/protocol/bug-1321578.t ./tests/features/ssl-authz.t - Ran tests on volumes with and without strict auth checking (as brick vol file needed to be edited to test, or rather to enable the option) - Ran tests on volumes to ensure existing mounts are disconnected when we enable strict checking Change-Id: I2ac4f0cfa5b59cc789cc5a265358389b04556b59 fixes: bz#1568844 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: ShyamsundarR <srangana@redhat.com>
*	shared storage: Prevent mounting shared storage from non-trusted client	Mohammed Rafi KC	2018-04-20	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster shared storage is a volume used for internal storage for various features including ganesha, geo-rep, snapshot. So this volume should not be exposed to the client, as it is a special volume for internal use. This fix wont't generate non trusted volfile for shared storage volume. Change-Id: I8ffe30ae99ec05196d75466210b84db311611a4c fixes: bz#1568844 BUG: 1568844 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	server: fix unresolved symbols by moving them to libglusterfs	Mohit Agrawal	2018-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: glusterd2 build is failed due to undefined symbol (xlator_mem_cleanup , glusterfsd_ctx) in server.so Solution: To resolve the same done below two changes 1) Move xlator_mem_cleanup code from glusterfsd-mgmt.c to xlator.c to be part of libglusterfs.so 2) replace glusterfsd_ctx to this->ctx because symbol glusterfsd_ctx is not part of server.so BUG: 1544090 Change-Id: Ie5e6fba9ed458931d08eb0948d450aa962424ae5 fixes: bz#1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	cluster/afr: Need heal-timeout to be configured as low as 5 seconds	Pranith Kumar K	2018-04-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In Halo replication, there are pending heals more often than not. It makes sense to give users the capability to configure it as low as 5 seconds. BUG: 1569489 fixes bz#1569489 Change-Id: I451c1975827f66398b903f659c981ef3121d5376 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	features/bitrot: show the corresponding brick for the corrupted objects	Raghavendra Bhat	2018-04-20	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \|	Currently with "gluster volume bitrot <volume name> scrub status" command the corrupted objects of a node are shown. But to what brick that corrupted object belongs to is not shown. Showing the brick of the corrupted object will help in situations where a node hosts multiple bricks of a volume. Change-Id: I7fbdea1e0072b9d3487eb10757468bc02d24df21 fixes: bz#1569198 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>