glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Update rfc.sh for release-7v7.0alpha	Rinku Kothiya	2019-06-27	1	-1/+1
\| \| \| \| \|	Change-Id: I37d0e2ff122e5250af7bb3f7c8de2a16bd8e3912 Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
*	glusterd/shd: Change shd logfile to a unique name	Mohammed Rafi KC	2019-06-24	6	-33/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the shd mux changes, shd was havinga a logfile with volname of the first started volume. This was creating a lot confusion, as other volumes data is also logging to a logfile which has a different vol name. With this changes the logfile will be changed to a unique name ie "/var/log/glusterfs/glustershd.log". This was the same logfile name before the shd mux Change-Id: I2b94c1f0b2cf3c9493505dddf873687755a46dda fixes: bz#1721601 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	core: replace inet_addr with inet_pton	Rinku Kothiya	2019-06-24	1	-1/+7
\| \| \| \| \| \| \| \| \| \|	Fixes warning raised by RPMDiff on the use of inet_addr, which may impact Ipv6 support fixes: bz#1721385 Change-Id: Id2d9afa1747efa64bc79d90dd2566bff54deedeb Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
*	shd/mux: Fix race between mux_proc unlink and stop	Mohammed Rafi KC	2019-06-24	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a small race window, where we have a shd proc without having a connection. That is when we stopped the last shd running on a process. The list was removed outside of a lock just after stopping the process. So there is a window where we stopped the process, but the shd proc list contains the entry. Change-Id: Id82a82509e5cd72acac24e8b7b87197626525441 fixes: bz#1722541 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	cluster/ec: Prevent double pre-op xattrops	Pranith Kumar K	2019-06-22	2	-6/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Race: Thread-1 Thread-2 1) Does ec_get_size_version() to perform pre-op fxattrop as part of write-1 2) Calls ec_set_dirty_flag() in ec_get_size_version() for write-2. This sets dirty[] to 1 3) Completes executing ec_prepare_update_cbk leading to ctx->dirty[] = '1' 4) Takes LOCK(inode->lock) to check if there are any flags and sets dirty-flag because lock->waiting_flag is 0 now. This leads to fxattrop to increment on-disk dirty[] to '2' At the end of the writes the file will be marked for heal even when it doesn't need heal. Fix: Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked as '1' in step 2) above Updates bz#1593224 Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	posix/ctime: Fix ctime upgrade issue	Kotresh HR	2019-06-21	4	-16/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: On a EC volume, during upgrade from the older version where ctime feature is not enabled(or not present) to the newer version where the ctime feature is available (enabled default), the self heal hangs and doesn't complete. Cause: The ctime feature has both client side code (utime) and server side code (posix). The feature is driven from client. Only if the client side sets the time in the frame, should the server side sets the time attributes in xattr. But posix setattr/fseattr was not doing that. When one of the server nodes is updated, since ctime is enabled by default, it starts setting xattr on setattr/fseattr on the updated node/brick. On a EC volume the first two updated nodes(bricks) are not a problem because there are 4 other bricks with consistent data. However once the third brick is updated, the new attribute(mdata xattr) will cause an inconsistency on metadata on 3 bricks, which prevents the file to be repaired. Fix: Don't create mdata xattr with utimes/utimensat system call. Only update if already present. Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c fixes: bz#1720201 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	WORM-Xlator: Avoid performing fsetxattr if fd is NULL	David Spisla	2019-06-21	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \|	If worm_create_cbk receives an error (op_ret == -1) fd will be NULL and therefore performing fsetxattr would lead to a segfault and the brick process crashes. To avoid this we allow setting fsetxattr only if op_ret >= 0 . If an error happens we explicitly unwind Change-Id: Ie7f8a198add93e5cd908eb7029cffc834c3b58a6 fixes: bz#1717757 Signed-off-by: David Spisla <david.spisla@iternity.com>
*	ec-heal: check file's gfid when deleting stale name	Kinglong Mee	2019-06-20	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \|	A name-less lookup does not contain parent's stat, It is hard to check the lookuped file is at the right path. This patch changes to a name lookup, and check file's gfid with expected gfid. If the gfid is different, mark it estale. fixes: bz#1702131 Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
*	fix template file after clang-format	Amar Tumballi	2019-06-20	2	-39/+37
\| \| \| \| \| \| \| \| \| \| \|	clang-format gets applied for all the files ending with .c or .h but in this case, new-xlator.c was a template file. hence change the suffix to reflect the same, also to avoid the auto-formatting on template file. updates: bz#1193929 Change-Id: I1c00a28f165f34dbe00fd3b6b070d868a56f9157 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	afr/read: Implement latency based read child selection	Mohammed Rafi KC	2019-06-20	3	-27/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Network latency is an important factor selecting a read subvolume. So this patch is adding two new policy. 1) We measure the latency of a child during a GF_DUMP rpc call. Then use this latency to pick a read subvol having the least latency. 2) Second one is an hybrid mode where it calculates the effective latency by multiplying outstanding pending read request and latency, and choose the least one. Change-Id: Ia49c8a08ab61f7dcdad8b8950aa4d338e7accf97 fixes: #520 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	posix: fix crash in posix_cs_set_state	Susant Palai	2019-06-20	2	-3/+9
\| \| \| \| \| \|	Fixes: bz#1721474 Change-Id: Ic2a53fa3d1e9e23424c6898e0986f80d52c5e3f6 Signed-off-by: Susant Palai <spalai@redhat.com>
*	encryption/crypt: remove from volume file	Amar Tumballi	2019-06-20	6	-492/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The feature is not supported and is moved out of the codebase from glusterfs-5.x release. Doesn't make sense to keep the code to support it. For those who want to upgrade from an version supporting it to higher version, please do a 'gluster volume reset $VOL encryption reset' and then continue with the upgrade process. updates: bz#1648169 Change-Id: I8cf822c0d7195940bd37f6af2432a3cac68d44d1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterfind: integrate with gfid2path	Milind Changire	2019-06-20	1	-4/+41
\| \| \| \| \| \| \| \| \| \| \| \|	Integration with gfid2path helps avoid file-system crawl and saves precious time. Extended attributes starting with "trusted.gfid2path." are read and the <PGFID>/<BN> values are extracted and the <PGFID> is iteratively resolved from the brick backend to arrive at the full path. Change-Id: I593b02880e3413b77bfceed4a36b00d401f03bc0 fixes: #529 Signed-off-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	extras/hooks: Install and package newly added post add-brick hook script	Anoop C S	2019-06-19	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	https://review.gluster.org/c/glusterfs/+/22834 added a new SELinux hook script as a post add-brick operation to label new brick paths. But the change failed to install and package new script. Therefore making necessary changes to Makefile and spec file to get it installed and packaged. Change-Id: I67b8f4982c2783c34a4bc749fb4387c19a038225 fixes: bz#1717953 Signed-off-by: Anoop C S <anoopcs@redhat.com>
*	md-cache: only update generation for inode at upcall and NULL stat	Kinglong Mee	2019-06-19	1	-20/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. For parallel writes from nfs-ganesha, two fops with two generations, but the fops reply maybe returned disordered. 2. The inode md-cache timeout should not increase conf->generation. With this patch, 1, Fop only gets generation from inode md-cache or conf, does not increase it. 2. The generation is increased at upcall invalidate, estal/enoent error invalidate, reply with zeroed out stat from write-behind. Change-Id: I897ecaa143fd18bc024c1948c7d1a6f831fd53da Updates: bz#1683594 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
*	cluster/dht: Strip out dht xattrs	N Balachandran	2019-06-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Some internal DHT xattrs were not being removed when calling getxattr in pass-through mode. This has been fixed. Change-Id: If7e3dbc7b495db88a566bd560888e3e9c167defa fixes: bz#1721435 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	geo-rep: Fix permissions for GEOREP_DIR in non-root setup	Sunny Kumar	2019-06-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	During mountbroker setup: 'gluster-mountbroker <mountbroker-root> <group>' commad to set the permission and group for GEOREP_DIR directory (/var/lib/glusterd/geo-replication) fails due to extra argument, which is enssential for non-root geo-rep setup. fixes: bz#1721441 Change-Id: Ia83442733bf0b29f630e8c9e398097316efca092 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	glusterd-volgen.c: remove BD xlator from the graph	Yaniv Kaul	2019-06-18	9	-506/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The BD xlator was removed some time ago. Remove it from the graph. We can also remove the caps settings - only the BD xlator was using it. Lastly, remove the caps (which only BD was using) and the document describing the translator. Change-Id: Id0adcb2952f4832a5dc6301e726874522e07935d updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	core: fedora 30 compiler warnings	SheetalPamecha	2019-06-18	7	-14/+10
\| \| \| \| \| \| \| \|	warning: ‘%s’ directive argument is null [-Wformat-overflow=] Change-Id: I69b8d47f0002c58b00d1cc947fac6f1c64e0b295 updates: bz#1193929 Signed-off-by: SheetalPamecha <spamecha@redhat.com>
*	tests: subdir-mount.t is failing for brick_mux regrssion	Mohit Agrawal	2019-06-17	1	-3/+8
\| \| \| \| \| \| \| \| \|	To avoid the failure wait to run hook script S13create-subdir-mounts.sh after executed add-brick command by test case. Change-Id: I063b6d0f86a550ed0a0527255e4dfbe8f0a8c02e fixes: bz#1720993 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
*	core: improve timer accuracy	Xavier Hernandez	2019-06-17	3	-83/+63
\| \| \| \| \| \| \| \|	Also fixed some issues on test ec-1468261.t. Change-Id: If156f86af986d9eed13cdd1f15c5a7214cd11706 Updates: bz#1193929 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
*	glusterd: log error message only when rsp.op_ret is negative	Sanju Rakonde	2019-06-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: commit d42221bec9 added a log message based on rsp.op_ret check. but while running subdir-mount.t, this message is seen even on successful mounts. Solution: in __server_getspec(), return value of sys_read() is assigned to ret, which will be a non-negative number in when sys_read() is success. This non-zero value is assigned to rsp.op_ret. We should log an error only when rsp.op_ret is negative. fixes: bz#1718848 Change-Id: Ieef8ba33c2c7b4a97d4aef17543f58e66fd3b341 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfile	Atin Mukherjee	2019-06-17	2	-1/+4
\| \| \| \| \| \| \| \| \|	... with out which volume creation fails with "volume create: <xyz>: failed: Failed to create volume files" Fixes: bz#1716812 Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	uss: Fix tar issue with ctime and uss enabled	Kotresh HR	2019-06-17	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If ctime and uss enabled, tar still complains with 'file changed as we read it' Cause: To clear nfs cache (gluster-nfs), the ctime was incremented in snap-view client on stat cbk. Fix: The ctime should not be incremented manually. Since gluster-nfs is planning to be deprecated, this code is being removed to fix the issue. Change-Id: Iae7f100c20fce880a50b008ba716077350281404 fixes: bz#1720290 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	afr/fini: Free local_pool data during an afr fini	Mohammed Rafi KC	2019-06-17	1	-0/+6
\| \| \| \| \| \| \| \| \|	We should free the mem_pool local_pool during an afr_fini. Otherwise this will lead to mem leak for shd Change-Id: I805a34a88077bf7b886c28b403798bf9eeeb1c0b Updates: bz#1716695 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	cli: don't fail if logging initialize fails	Amar Tumballi	2019-06-17	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	in many cases, gluster's cli can run as non-privileged mode (like in geo-rep non-root setup). Just because logging fails in cli, lets not fail the overall process. Not much of debugging help in CLI logs anyways. Most of the debugging happens once the call reaches server (glusterd). Fixes: bz#1535511 Change-Id: I9f07c61b8c3acc95ec08230ff539a35dfd0ff9dc Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	geo-rep/gsyncd: name is not freed in one of the cases	Sheetal Pamecha	2019-06-17	1	-9/+11
\| \| \| \| \| \| \|	CID: 1400730 updates: bz#789278 Change-Id: I0f6924050a31d3d2cc0b555f859920e349728e0a Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
*	build: always build glusterfs-cli to allow monitoring/managing from clients	Niels de Vos	2019-06-15	3	-10/+5
\| \| \| \| \| \|	Fixes: bz#1720615 Change-Id: I5071f3255ff615113b36b08cd5326be6e37d907d Signed-off-by: Niels de Vos <ndevos@redhat.com>
*	tests: Add missing NFS test tag to the testfile	Aravinda VK	2019-06-15	1	-0/+2
\| \| \| \| \| \| \| \|	$SRC/glusterfs/bugs/nfs/showmount-many-clients.t Change-Id: I48758cc66fcb55f48c4a8a0a738b06867f6814a1 Signed-off-by: Aravinda VK <avishwan@redhat.com> Updates: bz#1193929
*	clang-scan: resolve warning	Amar Tumballi	2019-06-15	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dht-common.c: because there was a 'goto err' before assigning the 'local' variable, there is possibility of NULL dereference. As the check which was done wouldn't ever be true, removed the check. glusterd-geo-rep.c: a possible path where 'slave_host' could be NULL when it gets passed to strcmp() is found. strcmp() expects a valid string. Add a NULL check. Updates: bz#1622665 Change-Id: I64c280bc1beac9a2b109e8fa88f2a5ce8b823c3a Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glfs: add syscall.h after header cleanup	Amar Tumballi	2019-06-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	in one of the recent patches, we cleaned-up the unneccesary header file includes. In the order of merging the patches, there cropped up an compile error. updates: bz#1193929 Change-Id: I2ad52aa918f9c698d5273bb293838de6dd50ac31 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	multiple files: another attempt to remove includes	Yaniv Kaul	2019-06-14	118	-363/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are many include statements that are not needed. A previous more ambitious attempt failed because of *BSD plafrom (see https://review.gluster.org/#/c/glusterfs/+/21929/ ) Now trying a more conservative reduction. It does not solve all circular deps that we have, but it does reduce some of them. There is just too much to handle reasonably (dht-common.h includes dht-lock.h which includes dht-common.h ...), but it does reduce the overall number of lines of include we need to look at in the future to understand and fix the mess later one. Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
*	eventsapi: Fix Exception class for Python3	Aravinda VK	2019-06-14	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Python2 exception provides message attribute for custom exceptions. But it is not available in Python3. Add init method for custom exception to handle the same. Original crash(IndexError reported in the bug) was fixed with https://review.gluster.org/#/c/glusterfs/+/22294/ But that patch only works in Python2 and fails in Python3 since Exception in Python 3 doesn't have "message" attribute. Fixes: bz#1573226 Change-Id: If9117048f9ff0615f5da1880075ec12c0ff4855e Signed-off-by: Aravinda VK <avishwan@redhat.com>
*	gfapi: provide an api for setting statedump path	Amar Tumballi	2019-06-14	5	-0/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently for an application using glfsapi to use glusterfs, when a statedump is taken, it uses /var/run/gluster dir to dump info. There can be concerns as this directory may be owned by some other user, and hence it may fail taking statedump. Such applications should have an option to use different path. This patch provides an API to do so. Updates: bz#1689097 Change-Id: I8918e002bc823d83614c972b6c738baa04681b23 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	gnfs: support inode dump	Xie Changlong	2019-06-14	1	-0/+16
\| \| \| \| \| \| \| \|	So, we will get more debug info. fixes: #679 Change-Id: I3588e204ad25c20b69271c1a4ee17d0d158bd794 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
*	upcall: Avoid sending notifications for invalid inodes	Soumya Koduri	2019-06-14	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For nameless LOOKUPs, server creates a new inode which shall remain invalid until the fop is successfully processed post which it is linked to the inode table. But incase if there is an already linked inode for that entry, it discards that newly created inode which results in upcall notification. This may result in client being bombarded with unnecessary upcalls affecting performance if the data set is huge. This issue can be avoided by looking up and storing the upcall context in the original linked inode (if exists), thus saving up on those extra callbacks. Change-Id: I044a1737819bb40d1a049d2f53c0566e746d2a17 fixes: bz#1718338 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	gfapi: fix incorrect initialization of upcall syncop arguments	Soumya Koduri	2019-06-14	1	-37/+72
\| \| \| \| \| \| \| \| \| \|	While sending upcall notifications via synctasks, the argument used to carry relevant data for these tasks is not initialized properly. This patch is to fix the same. Change-Id: I9fa8f841e71d3c37d3819fbd430382928c07176c fixes: bz#1718316 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	cli: Remove-brick warning seems unnecessary	Shwetha K Acharya	2019-06-12	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \|	As force-migration option is disabled by default, the warning seems unnessary. Rephrased the warning to make best sense out of it. fixes: bz#1712668 Change-Id: Ia18c3c5e7b3fec808fce2194ca0504a837708822 Signed-off-by: Shwetha K Acharya <sacharya@redhat.com>
*	extras/hooks: Add SELinux label on new bricks during add-brick	Anoop C S	2019-06-12	1	-0/+100
\| \| \| \| \| \|	Change-Id: Ifd8ae5eeb91b968cc1a9a9b5d15844c5233d56db fixes: bz#1717953 Signed-off-by: Anoop C S <anoopcs@redhat.com>
*	geo-rep : fix mountbroker setup	Sunny Kumar	2019-06-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: Unable to setup mountbroker root directory while creating geo-replication session for non-root user. Casue: With patch[1] which defines the max-port for glusterd one extra sapce got added in field of 'option max-port'. [1]. https://review.gluster.org/#/c/glusterfs/+/21872/ In geo-rep spliting of key-value pair form vol file was done on the basis of space so this additional space caused "ValueError: too many values to unpack". Solution: Use split so that it can treat consecutive whitespace as a single separator. Fixes: bz#1709248 Change-Id: Ia22070a43f95d66d84cb35487f23f9ee58b68c73 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	libglusterfs: cleanup iovec functions	Xavi Hernandez	2019-06-11	7	-114/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch cleans some iovec code and creates two additional helper functions to simplify management of iovec structures. iov_range_copy(struct iovec dst, uint32_t dst_count, uint32_t dst_offset, struct iovec src, uint32_t src_count, uint32_t src_offset, uint32_t size); This function copies up to 'size' bytes from 'src' at offset 'src_offset' to 'dst' at 'dst_offset'. It returns the number of bytes copied. iov_skip(struct iovec iovec, uint32_t count, uint32_t size); This function removes the initial 'size' bytes from 'iovec' and returns the updated number of iovec vectors remaining. The signature of iov_subset() has also been modified to make it safer and easier to use. The new signature is: iov_subset(struct iovec src, int src_count, uint32_t start, uint32_t size, struct iovec *dst, int32_t dst_count); This function creates a new iovec array containing the subset of the 'src' vector starting at 'start' with size 'size'. The resulting array is allocated if 'dst' is NULL, or copied to '*dst' if it fits (based on 'dst_count'). It returns the number of iovec vectors used. A new set of functions to iterate through an iovec array have been created. They can be used to simplify the implementation of other iovec-based helper functions. Change-Id: Ia5fe57e388e23392a8d6cdab17670e337cadd587 Updates: bz#1193929 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
*	tests: keep glfsxmp in tests directory	Amar Tumballi	2019-06-11	10	-17/+1821
\| \| \| \| \| \| \| \| \| \| \| \| \|	this is critical so all the tests will be contained in the same directory, and one can just 'cp -a tests/ <any-location>/' and run glusterfs tests. only 'glfsxmp.c' was an exception as it was just copying the file from api example directory. Now moved it to tests. updates: bz#1193929 Change-Id: I00359d64be580bffc5b3c3a090968d86c2c6952a Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	tests: Fix split-brain-favorite-child-policy.t failure	karthik-us	2019-06-10	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The test case is failing to heal the volume within $HEAL_TIMEOUT @195. This is happening because as part of split-brain resolution the file gets expunged from the sink and the new entry mark for that file will be done on the source bricks as part of impunging. Since the source bricks shd-threads failed to get the heal-domain lock, they will wait for the heal-timeout of 10 minutes, which is greater than $HEAL_TIMEOUT. Fix: Set the cluster.heal-timeout to 5 seconds to trigger the heal so that one of the source brick heals the file within the $HEAL_TIMEOUT. Change-Id: Ie73c578cc5361c0d617a48ccc86026734d20ba8c fixes: bz#1718998 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	Cluster/afr: Don't treat all bricks having metadata pending as split-brain	karthik-us	2019-06-10	4	-67/+133
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: We currently don't have a roll-back/undoing of post-ops if quorum is not met. Though the FOP is still unwound with failure, the xattrs remain on the disk. Due to these partial post-ops and partial heals (healing only when 2 bricks are up), we can end up in metadata split-brain purely from the afr xattrs point of view i.e each brick is blamed by atleast one of the others for metadata. These scenarios are hit when there is frequent connect/disconnect of the client/shd to the bricks. Fix: Pick a source based on the xattr values. If 2 bricks blame one, the blamed one must be treated as sink. If there is no majority, all are sources. Once we pick a source, self-heal will then do the heal instead of erroring out due to split-brain. This patch also adds restriction of all the bricks to be up to perform metadata heal to avoid any metadata loss. Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t as it was doing metadata heal even when only 2 of 3 bricks were up. Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09 fixes: bz#1717819 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	tests: added cleanup for lock files	Sunny Kumar	2019-06-10	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: useradd fails with: Cannot allocate memory useradd: cannot lock /etc/passwd; try again later. Solution: Lock files should get automatically removed once "usradd" or "groupadd" command finishes. But sometimes we encounter situations (bugs) where some of these files may not get properly unlocked after the execution of the command. In that case, when we execute useradd next time, it may show the error “cannot lock /etc/password” or “unable to lock group file”. So, to avoid any such errors, check for any lock files under /etc and remove those. updates: bz#1193929 Change-Id: If6456a271c2bc0717f768d7101a40ce44a9af3d7 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	features/shard: Fix extra unref when inode object is lru'd out and added back	Krutika Dhananjay	2019-06-09	2	-4/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Long tale of double unref! But do read... In cases where a shard base inode is evicted from lru list while still being part of fsync list but added back soon before its unlink, there could be an extra inode_unref() leading to premature inode destruction leading to crash. One such specific case is the following - Consider features.shard-deletion-rate = features.shard-lru-limit = 2. This is an oversimplified example but explains the problem clearly. First, a file is FALLOCATE'd to a size so that number of shards under /.shard = 3 > lru-limit. Shards 1, 2 and 3 need to be resolved. 1 and 2 are resolved first. Resultant lru list: 1 -----> 2 refs on base inode - (1) + (1) = 2 3 needs to be resolved. So 1 is lru'd out. Resultant lru list - 2 -----> 3 refs on base inode - (1) + (1) = 2 Note that 1 is inode_unlink()d but not destroyed because there are non-zero refs on it since it is still participating in this ongoing FALLOCATE operation. FALLOCATE is sent on all participant shards. In the cbk, all of them are added to fync_list. Resulting fsync list - 1 -----> 2 -----> 3 (order doesn't matter) refs on base inode - (1) + (1) + (1) = 3 Total refs = 3 + 2 = 5 Now an attempt is made to unlink this file. Background deletion is triggered. The first $shard-deletion-rate shards need to be unlinked in the first batch. So shards 1 and 2 need to be resolved. inode_resolve fails on 1 but succeeds on 2 and so it's moved to tail of list. lru list now - 3 -----> 2 No change in refs. shard 1 is looked up. In lookup_cbk, it's linked and added back to lru list at the cost of evicting shard 3. lru list now - 2 -----> 1 refs on base inode: (1) + (1) = 2 fsync list now - 1 -----> 2 (again order doesn't matter) refs on base inode - (1) + (1) = 2 Total refs = 2 + 2 = 4 After eviction, it is found 3 needs fsync. So fsync is wound, yet to be ack'd. So it is still inode_link()d. Now deletion of shards 1 and 2 completes. lru list is empty. Base inode unref'd and destroyed. In the next batched deletion, 3 needs to be deleted. It is inode_resolve()able. It is added back to lru list but base inode passed to __shard_update_shards_inode_list() is NULL since the inode is destroyed. But its ctx->inode still contains base inode ptr from first addition to lru list for no additional ref on it. lru list now - 3 refs on base inode - (0) Total refs on base inode = 0 Unlink is sent on 3. It completes. Now since the ctx contains ptr to base_inode and the shard is part of lru list, base shard is unref'd leading to a crash. FIX: When shard is readded back to lru list, copy the base inode pointer as is into its inode ctx, even if it is NULL. This is needed to prevent double unrefs at the time of deleting it. Change-Id: I99a44039da2e10a1aad183e84f644d63ca552462 Updates: bz#1696136 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
*	ec/fini: Fix race between xlator cleanup and on going async fop	Mohammed Rafi KC	2019-06-08	6	-15/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: While we process a cleanup, there is a chance for a race between async operations, for example ec_launch_replace_heal. So this can lead to invalid mem access. Solution: Just like we track on going heal fops, we can also track fops like ec_launch_replace_heal, so that we can decide when to send a PARENT_DOWN request. Change-Id: I055391c5c6c34d58aef7336847f3b570cb831298 fixes: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	xlator/log: Add more logging in xlator_is_cleanup_starting	Mohammed Rafi KC	2019-06-08	1	-3/+9
\| \| \| \| \| \| \| \|	This patch will add two extra logs for invalid argument Change-Id: I3950b4f4b9d88b1f1e788ef93d8f09d4bd8d4d8b updates: bz#1703948 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
*	Fixing formatting errors in markdown files	kshithijiyer	2019-06-08	19	-468/+474
\| \| \| \| \| \| \| \| \| \|	There are a lot of fromatting error is markdown files peresent under /doc directiory of the project. Fixing formatting errors and sending a patch. Fixes: bz#1718273 Change-Id: I08f938088bbaaafddf634f73616ea0dbfe7aedf3 Signed-off-by: kshithijiyer <kshithij.ki@gmail.com>
*	uss: Ensure that snapshot is deleted before creating a new snapshot	Raghavendra Bhat	2019-06-08	4	-3/+29
\| \| \| \| \| \| \| \|	* Also some logging enhancements in snapview-server Change-Id: I6a7646771cedf4bd1c62806eea69d720bbaf0c83 fixes: bz#1715921 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>