glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	leases: Reset lease_ctx->timer post deletion	Soumya Koduri	2019-01-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	To avoid use_after_free, reset lease_ctx->timer back to NULL after the structure has been freed. Change-Id: Icd213ec809b8af934afdb519c335a4680a1d6cdc updates: bz#1655532 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit a9b0003c717087ff168bc143c70559162e53e0d5)
*	leases: Do not conflict with internal fops	Soumya Koduri	2019-01-03	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Internal fops (with frame->root->pid < 0) are used to heal or move data and maintains data integrity. That is they do not modify client data which holds the lease. Hence no need to recall Lease for such fops. Note: Like for locks, we would need rebalance and self-heal daemon process to heal lease state as well. Change-Id: I8988693fef8d00e17c19dcc842e2238f9eb5ab48 updates: bz#1655532 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	lease: Treat unlk request as noop if lease not found	Soumya Koduri	2019-01-03	1	-4/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the glusterfs server recalls the lease, it expects client to flush data and unlock the lease. If not it sets a timer (starting from the time it sends RECALL request) and post timeout, it revokes it. Here we could have a race where in client did send UNLK lease request but because of network delay it may have reached after server revokes it. To handle such situations, treat such requests as noop and return sucesss. Change-Id: I166402d10273f4f115ff04030ecbc14676a01663 updates: bz#1655532 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	io-cache: xdata needs to be passed for readv operations	Soumya Koduri	2019-01-03	2	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \|	io-cache xlator has been skipping xdata references when the date needs to be read into page cache. This patch fixes the same. Note: similar changes may be needed for other fops as well which are handled by io-cache. Change-Id: I28d73d4ba471d13eb55d0fd0b5197d222df77a2a updates: bz#1655532 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit b3d88a0904131f6851f4185e43f815ecc3353ab5)
*	leases: Fix incorrect inode_ref/unrefs	Soumya Koduri	2018-12-26	2	-3/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From testing & code-reading, found couple of places where we incorrectly unref the inode resulting in use_after_free crash or ref leaks. This patch addresses couple of them. a) When we try to grant the very first lease for a inode, inode_ref is taken in __add_lease. This ref should be active till all the leases granted to that inode are released (i.e, till lease_cnt > 0). In addition even after lease_cnt becomes '0', the inode should be active till all the blocked fops are resumed. Hence release this ref, after resuming all those fops. To avoid granting new leases while resuming those fops, defined a new boolean (blocked_fops_resuming) to flag it in the lease_ctx. b) 'new_lease_inode' which creates new lease_inode_entry and takes ref on inode, is used while adding that entry to client_list and recall_list. Use its counter function '__destroy_lease_inode' which does unref while removing those entries from those lists. c) inode ref is also taken when added to timer->data. Unref the same after processing timer->data. Change-Id: Ie77c78ff4a971e0d9a66178597fb34faf39205fb updates: bz#1655532 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	afr: open_ftruncate_cbk should read fd from local->cont.open struct	Soumya Koduri	2018-12-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	afr_open stores the fd as part of its local->cont.open struct but when it calls ftruncate (if open flags contain O_TRUNC), the corresponding cbk function (afr_ open_ftruncate_cbk) is incorrectly referencing uninitialized local->fd. This patch fixes the same. Change-Id: Icbdedbd1b8cfea11d8f41b6e5c4cb4b44d989aba updates: bz#1655527 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	afr: assign gfid during name heal when no 'source' is present	Ravishankar N	2018-12-03	4	-51/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of https://review.gluster.org/#/c/glusterfs/+/21297/ Problem: If parent dir is in split-brain or has dirty xattrs set, and the file has gfid missing on one of the bricks, then name heal won't assign the gfid. Fix: Use the brick we select the gfid from as the 'source'. Note: Problem was found while trying to debug a split-brain issue on Cynthia Zhou's setup. fixes: bz#1655561 Change-Id: Id088d4f0fb017aa35122de426654194e581ed742 Reported-by: Cynthia Zhou <cynthia.zhou@nokia-sbell.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	io-stats: prevent taking file dump on server side	Amar Tumballi	2018-11-12	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	By allowing clients taking dump in a file on brick process, we are allowing compromised clients to create io-stats dumps on server, which can exhaust all the available inodes. Fixes: CVE-2018-14659 Fixes: bz#1647669 Change-Id: I32bfde9d4fe646d819a45e627805b928cae2e1ca Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	protocol: remove the option 'verify-volfile-checksum'	Amar Tumballi	2018-11-12	3	-357/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'getspec' operation is not used between 'client' and 'server' ever since we have off-loaded volfile management to glusterd, ie, at least 7 years. No reason to keep the dead code! The removed option had no meaning, as glusterd didn't provide a way to set (or unset) this option. So, no regression should be observed from any of the existing glusterfs deployment, supported or unsupported. Updates: CVE-2018-14653 Updates: bz#1647670 Change-Id: I4a2e0f673c5bcd4644976a61dbd2d37003a428eb Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	glusterd-handshake: prevent a buffer overflow	Amar Tumballi	2018-11-09	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	as key size in xdr can be anything, it can be bigger than the 'NAME_MAX' allowed in the structure, which can allow for service denial attacks. Fixes: CVE-2018-14653 Fixes: bz#1647670 Change-Id: I2dc5e99af27ddf44c12c94b07e51adb8674cce80 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	server: don't allow '/' in basename	Amar Tumballi	2018-11-09	2	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Server stack needs to have all the sort of validation, assuming clients can be compromized. It is possible for a compromized client to send basenames with paths with '/', and with that create files without permission on server. By sanitizing the basename, and not allowing anything other than actual directory as the parent for any entry creation, we can mitigate the effects of clients not able to exploit the server. Fixes: CVE-2018-14651 Fixes: bz#1647667 Change-Id: I5dc0da0da2713452ff2b65ac2ddbccf1a267dc20 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	features/locks: fix statedump string	Amar Tumballi	2018-11-09	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there are possibilities in few places, where a user-controlled (like filename, program parameter etc) string can be passed as 'fmt' for printf(), which can lead to segfault, if the user's string contains '%s', '%d' in it. Fixes: CVE-2018-14661 NOTE: this change is a focused fix for the CVE, but is just subset of changes in master. This is done so that we keep the changes in the codebase to minimum, and also as clang coding standard is implemented, the changes wouldn't apply cleanly from master, so there is scope for mistakes. By keeping it to minimum, we solve CVE, and also prevent errors. Fixes: bz#1647668 Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	features/locks:Use pthread_mutex_unlock() instead of pthread_mutex_lock()	Susant Palai	2018-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fixes CID 1396581 Change-Id: Ic04091b5783a75d8e1e605a9c1c28b77fea048d3 updates: bz#1647972 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com>
*	lock: Do not allow meta-lock count to be more than one	Susant Palai	2018-11-08	1	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the current scheme of glusterfs where lock migration is experimental, (ideally) the rebalance process which is migrating the file should request for a metalock. Hence, the metalock count should not be more than one for an inode. In future, if there is a need for meta-lock from other clients, this patch can be reverted. Since pl_metalk is called as part of setxattr operation, any client process(non-rebalance) residing outside trusted network can exhaust memory of the server node by issuing setxattr repetitively on the metalock key. The current patch makes sure that more than one metalock cannot be granted on an inode. Fixes CVE-2018-14660 updates: bz#1647972 Change-Id: Ie1e697766388718804a9551bc58351808fe71069 Signed-off-by: Susant Palai <spalai@redhat.com>
*	index: prevent arbitrary file creation outside entry-changes folder	Ravishankar N	2018-11-06	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch in master: https://review.gluster.org/#/c/glusterfs/+/21534/ A compromised client can set arbitrary values for the GF_XATTROP_ENTRY_IN_KEY and GF_XATTROP_ENTRY_OUT_KEY during xattrop fop. These values are consumed by index as a filename to be created/deleted according to the key. Thus it is possible to create/delete random files even outside the gluster volume boundary. Fix: Index expects the filename to be a basename, i.e. it must not contain any pathname components like "/" or "../". Enforce this. Fixes: CVE-2018-14654 Fixes: bz#1646200 Change-Id: I35f2a39257b5917d17283d0a4f575b92f783f143 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd: ensure volinfo->caps is set to correct value	Sanju Rakonde	2018-11-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the commit febf5ed4848, during the volume create op, we are setting volinfo->caps to 0, only if any of the bricks belong to the same node and brickinfo->vg[0] is null. Previously, we used to set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. With this patch, we set volinfo->caps to 0, when either brick doesn't belong to the same node or brickinfo->vg[0] is null. (as we do earlier without commit febf5ed4848). > BUG: bz#1635820 > Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> fixes: bz#1643052 Change-Id: I00a97415786b775fb088ac45566ad52b402f1a49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	posix/ctime: Avoid log flood in posix_update_utime_in_mdata	Kotresh HR	2018-11-05	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \|	posix_update_utime_in_mdata() unconditionally logs an error if consistent time attributes features is not enabled. This log does not add any value, prints an incorrect errno & floods the log file. Hence nuking this log message in this patch. fixes: bz#1644524 Change-Id: I01736d2ed48d14f12ccd8a808521f59145e42ccb Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	features/locks: add buffer overflow checks in pl_getxattr	Ravishankar N	2018-11-02	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: A compromised client can send a variable length buffer value for the GF_XATTR_CLRLK_CMD virtual xattr. If the length is greater than the size of the "key" used to send the response back, locks xlator can segfault when it tries to do a dict_set because of the buffer overflow in strncpy of pl_getxattr(). Fix: Perform size checks while forming the 'key'. Note: This fix is already there in the master branch upstream as a part of the commit 052849983e51a061d7fb2c3ffd74fa78bb257084 (https://review.gluster.org/#/c/glusterfs/+/20933/) This patch just picks the code change needed to fix the vulnerability. Fixes: CVE-2018-14652 fixes: bz#1645363 Change-Id: I101693e91f9ea2bd26cef6c0b7d82527fefcb3e2 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr/lease: Read child nodes from lease structure	root	2018-10-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	For lease operation, we allocate and store child nodes data in lease structure. Use the same in afr_lease_cbk() while checking for the quorum. Change-Id: If1fdd5a0798888afd39ad3df57d96487baf9d1e6 fixes: bz#1644474 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	leases:Mark the fop conflicting if lease_id not set	Soumya Koduri	2018-10-24	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Glusterfs leases expects lease_id to be set and sent for each fop to determine conflict resolution with the existing lease. Incase if not set (most likely if there is an older client in a mixed cluster), it makes sense to consider it as conflicitng fop and recall the lease. Also fixed the return status check for __remove_lease(), wherein non-negative value is considered as success case. This is backport of below mainline patch - https://review.gluster.org/21458 Change-Id: I5bcfba4f7c71a5af7cdedeb03436d0b818e85783 updates: #350 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
*	afr: prevent winding inodelks twice for arbiter volumes	Ravishankar N	2018-10-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of https://review.gluster.org/#/c/glusterfs/+/21380/ Problem: In an arbiter volume, if there is a pending data heal of a file only on arbiter brick, self-heal takes inodelks twice due to a code-bug but unlocks it only once, leaving behind a stale lock on the brick. This causes the next write to the file to hang. Fix: Fix the code-bug to take lock only once. This bug was introduced master with commit eb472d82a083883335bc494b87ea175ac43471ff Thanks to Pranith Kumar K <pkarampu@redhat.com> for finding the RCA. fixes: bz#1637953 Change-Id: I15ad969e10a6a3c4bd255e2948b6be6dcddc61e1 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	afr: fix incorrect reporting of directory split-brain	Ravishankar N	2018-10-05	7	-16/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of https://review.gluster.org/#/c/glusterfs/+/21135/ Problem: When a directory has dirty xattrs due to failed post-ops or when replace/reset brick is performed, AFR does a conservative merge as expected, but heal-info reports it as split-brain because there are no clear sources. Fix: Modify pending flag to contain information about pending heals and split-brains. For directories, if spit-brain flag is not set,just show them as needing heal and not being in split-brain. Change-Id: I09ef821f6887c87d315ae99e6b1de05103cd9383 fixes: bz#1633634 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd: make sure that brickinfo->uuid is not null	Sanju Rakonde	2018-10-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: After an upgrade from the version where shared-brick-count option is not present to a version which introduced this option causes issue at the mount point i.e, size of the volume at mount point will be reduced by shared-brick-count value times. Cause: shared-brick-count is equal to the number of bricks that are sharing the file system. gd_set_shared_brick_count() calculates the shared-brick-count value based on uuid of the node and fsid of the brick. https://review.gluster.org/#/c/glusterfs/+/19484 handles setting of fsid properly during an upgrade path. This patch assumed that when the code path is reached, brickinfo->uuid is non-null. But brickinfo->uuid is null for all the bricks, as the uuid is null https://review.gluster.org/#/c/glusterfs/+/19484 couldn't reached the code path to set the fsid for bricks. So, we had fsid as 0 for all bricks, which resulted in gd_set_shared_brick_count() to calculate shared-brick-count in a wrong way. i.e, the logic written in gd_set_shared_brick_count() didn't work as expected since fsid is 0. Solution: Before control reaches the code path written by https://review.gluster.org/#/c/glusterfs/+/19484, adding a check for whether brickinfo->uuid is null and if brickinfo->uuid is having null value, calling glusterd_resolve_brick will set the brickinfo->uuid to a proper value. When we have proper uuid, fsid for the bricks will be set properly and shared-brick-count value will be caluculated correctly. Please take a look at the bug https://bugzilla.redhat.com/show_bug.cgi?id=1632889 for complete RCA Steps followed to test the fix: 1. Created a 2 node cluster, the cluster is running with binary which doesn't have shared-brick-count option 2. Created a 2x(2+1) volume and started it 3. Mouted the volume, checked size of volume using df 4. Upgrade to a version where shared-brick-count is introduced (upgraded the nodes one by one i.e, stop the glusterd, upgrade the node and start the glusterd). 5. after upgrading both the nodes, bumped up the cluster.op-version 6. At mount point, df shows the correct size for volume. > BUG: 1632889 > Change-Id: Ib9f078aafb15e899a01086eae113270657ea916b > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit f1e9b878ce2067db83a0baa5f384eda87287719d) fixes: bz#1633479 Change-Id: Ib9f078aafb15e899a01086eae113270657ea916b Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	cluster/afr: Make data eager-lock decision based on number of locks	Pranith Kumar K	2018-10-05	3	-10/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For both Virt and block workloads the file is opened multiple times leading to dynamically setting eager-lock to off for the workload. Instead of depending on the number-of-open-fds, if we change the logic to depend on number of inodelks, then it will give better performance than the earlier logic. When there is an eager-lock and number of inodelks is more than 1 we know that there is a conflicting lock, so depend on that information to decide whether to keep the current transaction go through delayed-post-op or not. Locks xlator doesn't have implementation to query number of locks in fxattrop in releases older than 3.10 so to keep things backward compatible in 3.12, data transactions will use new logic where as fxattrop transactions will use old logic. I am planning to send one more patch which makes metadata domain locks also depend on inodelk-count Profile info for a dd of 500MB to a file with another fd opened on the file using exec 250>filename Without this patch: 0.14 67.41 us 16.72 us 3870.82 us 892 FINODELK 0.59 279.87 us 95.71 us 2085.89 us 898 FXATTROP 3.46 366.43 us 81.75 us 6952.79 us 4000 WRITE 95.79 148733.99 us 50568.12 us 919127.86 us 273 FSYNC With this patch: 0.00 51.01 us 38.07 us 80.16 us 4 FINODELK 0.00 235.43 us 235.43 us 235.43 us 1 TRUNCATE 0.00 125.07 us 56.80 us 193.33 us 2 GETXATTR 0.00 135.86 us 62.13 us 209.59 us 2 INODELK 0.00 197.88 us 155.39 us 253.90 us 4 FXATTROP 0.00 450.59 us 394.28 us 506.89 us 2 XATTROP 0.00 56.96 us 19.06 us 406.59 us 23 FLUSH 37.81 273648.93 us 48.43 us 6017657.05 us 44 LOOKUP 62.18 4951.86 us 93.80 us 1143154.75 us 3999 WRITE postgresql benchmark performance changed from ~1130 TPS to ~2300TPS randio fio job inside Ovirt based VM went from ~600IOPs to ~2000IOPS fixes bz#1635980 Change-Id: If7f7388d2f08cf7f17ca517a4ea222560661dc36 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Batch writes in same lock even when multiple fds are open	Pranith Kumar K	2018-10-05	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When eager-lock is disabled because of multiple-fds opened and app writes come on conflicting regions, the number of locks grows very fast leading to all the CPU being spent just in locking and unlocking by traversing huge queues in locks xlator for granting locks. Fix: Reduce the number of locks in transit by bundling the writes in the same lock and disable delayed piggy-pack when we learn that multiple fds are open on the file. This will reduce the size of queues in the locks xlator. This also reduces the number of network calls like inodelk/fxattrop. Please note that this problem can still happen if eager-lock is disabled as the writes will not be bundled in the same lock. fixes bz#1635979 Change-Id: I8fd1cf229aed54ce5abd4e6226351a039924dd91 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	mgmt/glusterd: use proper path to the volfile	Raghavendra Bhat	2018-10-04	3	-8/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Till now, glusterd was generating the volfile path for the snapshot volume's bricks like this. /snaps/<snap name>/<brick volfile> But in reality, the path to the brick volfile for a snapshot volume is /snaps/<snap name>/<snap volume name>/<brick volfile> The above workaround was used to distinguish between a mount command used to mount the snapshot volume, and a brick of the snapshot volume, so that based on what is actually happening, glusterd can return the proper volfile (client volfile for the former and the brick volfile for the latter). But, this was causing problems for snapshot restore when brick multiplexing is enabled. Because, with brick multiplexing, it tries to find the volfile and sends GETSPEC rpc call to glusterd using the 2nd style of path i.e. /snaps/<snap name>/<snap volume name>/<brick volfile> So, when the snapshot brick (which is multiplexed) sends a GETSPEC rpc request to glusterd for obtaining the brick volume file, glusterd was returning the client volume file of the snapshot volume instead of the brick volume file. Change-Id: I28b2dfa5d9b379fe943db92c2fdfea879a6a594e fixes: bz#1636218 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
*	cluster/afr: Delegate name-heal when possible	Pranith Kumar K	2018-09-21	2	-27/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When name-self-heal is triggered on the mount, it blocks lookup until name-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs name heal and all of them trigger heals waiting for other clients to complete heal. Fix: When a name-heal is needed but quorum number of names have the file and pending xattrs exist on the parent, then better to delegate the heal to SHD which will be completed as part of entry-heal of the parent directory. We could also do the same for quorum-number of names not present but we don't have any known use-case where this is a frequent occurrence so not changing that part at the moment. When there is a gfid mismatch or missing gfid it is important to complete the heal so that next rename doesn't assume everything is fine and perform a rename etc fixes bz#1625575 Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	cluster/afr: Delegate metadata heal with pending xattrs to SHD	Pranith Kumar K	2018-09-21	3	-38/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When metadata-self-heal is triggered on the mount, it blocks lookup until metadata-self-heal completes. But that can lead to hangs when lot of clients are accessing a directory which needs metadata heal and all of them trigger heals waiting for other clients to complete heal. Fix: Only when the heal is needed but the pending xattrs are not set, trigger metadata heal that could block lookup. This is the only case where different clients may give different metadata to the clients without heals, which should be avoided. Updates bz#1625575 Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
*	libgfchangelog: Fix changelog history API	Kotresh HR	2018-09-21	1	-5/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If requested start time and end time doesn't fall into first HTIME file, then history API fails even though continuous changelogs are avaiable for the requested range in other HTIME files. This is induced by changelog disable and enable which creates fresh HTIME index file. Cause and Analysis: Each HTIME index file represents the availability of continuous changelogs. If changelog is disabled and enabled, a new HTIME index file is created represents non availability of continuous changelogs. So as long as the requested start and end falls into single HTIME index file and not across, history API should succeed. But History API checks for the changelogs only in first HTIME index file and errors out if not available. Fix: Check in all HTIME index files for availability of continuous changelogs for requested change. Backport of: > Patch: https://review.gluster.org/21016/ > BUG: bz#1622549 > Change-Id: I80eeceb5afbd1b89f86a9dc4c320e161907d3559 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 35aa67001c8fac99b040fbc61f36ef4f1b1590ac) fixes: bz#1630141 Change-Id: I80eeceb5afbd1b89f86a9dc4c320e161907d3559 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	geo-rep: Fix issues related config set	Kotresh HR	2018-09-21	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. '--ignore-mising-args' option for rsync is not being used even though the rsync version is greater than 3.1.0. Fixed the same. 2. '--existing' option for rsync is also not being used. Fixed the same. 3. geo-rep config fails to set rsync-options as the value contains '--'. Interestingly, python argsparse treats the value with '--' (e.g., --ignore-missing-args) as option. But when passed with something like --value=--ignore-missing-args, it succeeds. Fixed the same. Backport of: > Patch: https://review.gluster.org/21191 > Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 > Signed-off-by: Kotresh HR <khiremat@redhat.com> > BUG: 1629561 Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1 Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1630140
*	storage/posix: Avoid log flood in posix_set_parent_ctime()	Vijay Bellur	2018-09-17	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	posix_set_parent_ctime() unconditionally logs an error if consistent time attributes is not enabled. This log does not add any value, prints an incorrect errno & floods the log file. Hence nuking this log message in this patch. Backport of : > Patch: https://review.gluster.org/20547/ > Change-Id: I82a78f2f8ce5ab518f8cdf6d9086a97049712f75 > BUG: 1607049 > Signed-off-by: Vijay Bellur <vbellur@redhat.com> (cherry picked from commit e0df887ba044ce92e9a2822be9261d0f712b02bd) Change-Id: I82a78f2f8ce5ab518f8cdf6d9086a97049712f75 fixes: bz#1629548 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
*	posix: disable open/read/write on special files	Amar Tumballi	2018-09-06	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the file system, the responsibility w.r.to the block and char device files is related to only support for 'creating' them (using mknod(2)). Once the device files are created, the read/write syscalls for the specific devices are handled by the device driver registered for the specific major number, and depending on the minor number, it knows where to read from. Hence, we are at risk of reading contents from devices which are handled by the host kernel on server nodes. By disabling open/read/write on the device file, we would be safe with the bypass one can achieve from client side (using gfapi) Fixes: bz#1625096 Change-Id: I48c776b0af1cbd2a5240862826d3d8918601e47f Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	protocol: don't use alloca	Amar Tumballi	2018-09-06	2	-103/+58
\| \| \| \| \| \| \| \| \| \| \| \|	current implementation of alloca can cause issues when strings larger than the allocated buffer is passed to the xdr. Hence it makes sense to allow XDR decode functions to deal with memory allocations, which we can free later. Fixes: bz#1625097 Change-Id: I3a05553f5702de9575c244649ca0e5ac9abaac94 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	io-stats: dump io-stats info in /var/run/gluster	Amar Tumballi	2018-09-06	1	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It wouldn't make sense to allow iostats file to be written in any directory. While the formating makes sure we try to append io-stats-name for the file, so overwriting existing file is slim, but in any case it makes sense to restrict dumping to one directory. Below are the sample commands, and files created for the corresponding values: $ setfattr -n trusted.io-stats-dump -v file-for-dump $M0 In this case, the file would be in /var/run/gluster/file-for-dump $ setfattr -n trusted.io-stats-dump -v /dir1/dir2/file-for-dump $M0 In this case, then the dump file is in /var/run/gluster/dir1-dir2-file-for-dump Note that the value passed for this virtual xattr would be treated as a file, and even if the value has '/' in it, it would be changed to '-' for sanity. Fixes: bz#1625106 Change-Id: Id9ae6a40a190b8937c51662e6e1c2a0f6c86a0e0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	server-protocol: don't allow '../' path in 'name'	Amar Tumballi	2018-09-06	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	This will prevent any arbitrary file creation through glusterfs by modifying the client bits. Also check for the similar flaw inside posix too, so we prevent any changes in layers in-between. Fixes: bz#1625095 Signed-off-by: Amar Tumballi <amarts@redhat.com> Change-Id: Id9fe0ef6e86459e8ed85ab947d977f058c5ae06e
*	posix: remove not supported get/set content	Amar Tumballi	2018-09-06	3	-186/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getting and setting a file's content using extended attribute worked great as a GET/PUT alternative when an object storage is supported on top of Gluster. But it needs application changes, and also, it skips some caching layers. It is not used over years, and not supported any more. Removing the dead code. Fixes: bz#1625102 Change-Id: Ide3b3f1f644f6ca58558bbe45561f346f96b95b7 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	storage/posix: Increment trusted.pgfid in posix_mknod	N Balachandran	2018-08-29	1	-4/+14
\| \| \| \| \| \| \| \| \| \|	The value of trusted.pgfid.xx was always set to 1 in posix_mknod. This is incorrect if posix_mknod calls posix_create_link_if_gfid_exists. Change-Id: Ibe87ca6f155846b9a7c7abbfb1eb8b6a99a5eb68 fixes: bz#1623317 Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	dht: Delete MDS internal xattr from dict in dht_getxattr_cbk	Mohit Agrawal	2018-08-16	2	-31/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: At the time of fetching xattr to heal xattr by afr it is not able to fetch xattr because posix_getxattr has a check to ignore if xattr name is MDS Solution: To ignore same xattr update a check in dht_getxattr_cbk instead of having a check in posix_getxattr Backport of: > BUG: 1584098 > Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc fixes: bz#1611116 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	geo-rep : fix possible crash	Sunny Kumar	2018-08-16	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : In 'glusterd_verify_slave' while tokenizing error message we call 'strtok_r' and store return value in 'tmp' which can be NULL. We are passing this 'tmp' as 1st argument to 'strcmp' which will lead to segmentation fault. Solution : before calling 'strcmp' we should NULL check 'tmp'. Backport of: > Change-Id: Ifd3864b904afe6cd09d9e5a4b55c6d0578e22b9d > BUG: 1602121 > Signed-off-by: Sunny Kumar <sunkumar@redhat.com> Change-Id: Ifd3864b904afe6cd09d9e5a4b55c6d0578e22b9d fixes: bz#1611115 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	glusterd: memory leak in geo-rep status	Sanju Rakonde	2018-08-16	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \|	Backport of: > BUG: 1580352 > Change-Id: I9648e73090f5a2edbac663a6fb49acdb702cdc49 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> fixes: bz#1611110 Change-Id: I9648e73090f5a2edbac663a6fb49acdb702cdc49 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd/geo-rep: Fix glusterd crash	Kotresh HR	2018-08-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using strdump instead of gf_strdup crashes during free if mempool is being used. gf_free checks the magic number in the header which will not be taken care if strdup is used. Backport of: > BUG: 1576392 > Change-Id: Iab36496554b838a036af9d863e3f5fd07fd9780e > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1611106 Change-Id: Iab36496554b838a036af9d863e3f5fd07fd9780e Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	storage/posix: Fix excessive logging in WRITE fop path	Krutika Dhananjay	2018-08-10	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Backport of: https://review.gluster.org/#/c/glusterfs/+/20250 I was running some write-intensive tests on my volume, and in a matter of 2 hrs, the 50GB space in my root partition was exhausted. On inspecting further, figured that excessive logging in bricks was the cause - specifically in posix write when posix_check_internal_writes() does dict_get() without a NULL-check on xdata. Change-Id: I89de57a3a90ca5c375e5b9477801a9e5ff018bbf fixes: bz#1596686 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 81701e4d92ae7b1d97e5bc955703719f2e9e773a)
*	glusterd: _is_prefix should handle 0-length paths	Kaushal M	2018-07-24	1	-0/+9
\| \| \| \| \| \| \| \| \|	If one of the paths given to _is_prefix is 0-length, then it is not a prefix of the other. Hence, _is_prefix should return false. Change-Id: I54aa577a64a58940ec91872d0d74dc19cff9106d fixes: bz#1599785 Signed-off-by: Kaushal M <kaushal@redhat.com>
*	afr: switch lk_owner only when pre-op succeeds	Ravishankar N	2018-07-23	2	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In a disk full scenario, we take a failure path in afr_transaction_perform_fop() and go to unlock phase. But we change the lk-owner before that, causing unlock to fail. When mount issues another fop that takes locks on that file, it hangs. Fix: Change lk-owner only when we are about to perform the fop phase. Also fix the same issue for arbiters when afr_txn_arbitrate_fop() fails the fop. Also removed the DISK_SPACE_CHECK_AND_GOTO in posix_xattrop. Otherwise truncate to zero will fail pre-op phase with ENOSPC when the user is actually trying to freee up space. Change-Id: Ic4c8a596b4cdf4a7fc189bf00b561113cf114353 fixes: bz#1603056 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit ec0d7d77de3e4bd485a4fa2e53c9137e25c71ce7)
*	posix: check before removing stale symlink	Ravishankar N	2018-07-23	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BZ 1564071 complains of directories with missing gfid symlinks and corresponding "Found stale gfid handle" messages in the logs. Hence add a check to see if the symlink points to an actual directory before removing it. Note: Removing stale symlinks was added via commit 3e9a9c029fac359477fb26d9cc7803749ba038b2 Change-Id: I5d91fab8e5f3a621a9ecad4a1f9c898a3c2d346a Updates: bz#1603099 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 9ab218be5e69b9f71fe4eea9ca8d114b78cafd25)
*	cluster/afr: Prevent execution of code after call_count decrementing	Pranith Kumar K	2018-07-10	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: When call_count is decremented by one thread, another thread can go ahead with the operation leading to undefined behavior for the thread executing statements after decrementing call count. Fix: Do the operations necessary before decrementing call count. fixes bz#1599629 Change-Id: Icc90cd92ac16e5fbdfe534d9f0a61312943393fe Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> (cherry picked from commit 03f1f5bdc46076178f1afdf8e2a76c5b973fe11f)
*	afr: heal gfids when file is not present on all bricks	Ravishankar N	2018-07-09	5	-12/+51
\| \| \| \| \| \| \| \| \| \| \| \|	commit 20fa80057eb430fd72b4fa31b9b65598b8ec1265 introduced a regression wherein if a file is present in only 1 brick of replica and doesn't have a gfid associated with it, it doesn't get healed upon the next lookup from the client. Fix it. Change-Id: I7d1111dcb45b1b8b8340a7d02558f05df70aa599 fixes: bz#1597117 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit eb472d82a083883335bc494b87ea175ac43471ff)
*	cluster/afr: Make sure lk-owner is assigned at the time of lock	Pranith Kumar K	2018-07-04	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In the new eager-lock implementation lk-owner is assigned after the 'local' is added to the eager-lock list, so there exists a possibility of lock being sent even before lk-owner is assigned. Fix: Make sure to assign lk-owner before adding local to eager-lock list fixes bz#1598193 Change-Id: I26d1b7bcf3e8b22531f1dc0b952cae2d92889ef2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> (cherry picked from commit c6f93e422855f656d3a86461a8458f37ad0103eb)
*	posix/ctime: Fix differential ctime duing entry operations	Kotresh HR	2018-07-02	1	-51/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should not be relying on backend file's time attributes to load the initial ctime time attribute structure. This is incorrect as each replica set would have witnessed the file creation at different times. For new file creation, ctime, atime and mtime should be same, hence initiate the ctime structure with the time from the frame. But for the files which were created before ctime feature is enabled, this is not accurate but still fine as the times would get eventually accurate. Backport of: > Patch: https://review.gluster.org/#/c/20281/ > BUG: 1592275 > Change-Id: I206a469c83ee7b26da2fe096ae7bf8ff5986ad67 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 841991130c94e3fcf4076917be6da9ce90406932) fixes: bz#1593537 Change-Id: I206a469c83ee7b26da2fe096ae7bf8ff5986ad67 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	afr: don't update readables if inode refresh failed on all children	Ravishankar N	2018-07-02	4	-32/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: If inode refresh failed on all children of afr due to ENOENT (say file migrated by dht), it resets the readables to zero. Any inflight txn which then later comes on the inode fails with EIO because no readable children present for the inode. Fix: Don't update readables when inode refresh fails on all children of afr. In that way any inflight txns will either proceed with its own inode refresh if needed and fail it with the right errno or use the old value of readables and continue with the txn. Also, add quorum checks to the beginning of afr_transaction(). Otherwise, we seem to be winding the lock and checking for quorum only in pre-op pahse. Note: This should ideally fix BZ 1329505 since the stop gap fix for it is has been reverted at https://review.gluster.org/#/c/20028. Change-Id: Ia638c092d8d12dc27afb3cdad133394845061319 updates: bz#1597116 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 0f13eed0c1fa74cefed486538b02e0c8a8708456)