glusterfs.git, branch v4.1.5

doc: Added release notes for release 4.1.5

2018-09-21T13:37:19+00:00

Fixes: bz#1630186
Change-Id: Ie5ea9b69fea22eab65d7e85215f8538b617da456
Signed-off-by: ShyamsundarR

cluster/afr: Delegate name-heal when possible

2018-09-21T13:27:00+00:00

Problem:
When name-self-heal is triggered on the mount, it blocks
lookup until name-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs name heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
When a name-heal is needed but quorum number of names have the
file and pending xattrs exist on the parent, then better to
delegate the heal to SHD which will be completed as part of
entry-heal of the parent directory. We could also do the same
for quorum-number of names not present but we don't have
any known use-case where this is a frequent occurrence so
not changing that part at the moment. When there is a gfid
mismatch or missing gfid it is important to complete the heal
so that next rename doesn't assume everything is fine and
perform a rename etc

fixes bz#1625575
Change-Id: I8b002c85dffc6eb6f2833e742684a233daefeb2c
Signed-off-by: Pranith Kumar K

cluster/afr: Delegate metadata heal with pending xattrs to SHD

2018-09-21T13:27:00+00:00

Problem:
When metadata-self-heal is triggered on the mount, it blocks
lookup until metadata-self-heal completes. But that can lead
to hangs when lot of clients are accessing a directory which
needs metadata heal and all of them trigger heals waiting
for other clients to complete heal.

Fix:
Only when the heal is needed but the pending xattrs are not set,
trigger metadata heal that could block lookup. This is the only
case where different clients may give different metadata to the
clients without heals, which should be avoided.

Updates bz#1625575
Change-Id: I6089e9fda0770a83fb287941b229c882711f4e66
Signed-off-by: Pranith Kumar K

libgfchangelog: Fix changelog history API

2018-09-21T13:25:59+00:00

Problem:
If requested start time and end time doesn't fall into
first HTIME file, then history API fails even though
continuous changelogs are avaiable for the requested range
in other HTIME files. This is induced by changelog disable
and enable which creates fresh HTIME index file.

Cause and Analysis:
Each HTIME index file represents the availability of
continuous changelogs. If changelog is disabled and enabled,
a new HTIME index file is created represents non availability
of continuous changelogs. So as long as the requested start
and end falls into single HTIME index file and not across,
history API should succeed.

But History API checks for the changelogs only in first
HTIME index file and errors out if not available.

Fix:
Check in all HTIME index files for availability of continuous
changelogs for requested change.

Backport of:
 > Patch: https://review.gluster.org/21016/ 
 > BUG: bz#1622549
 > Change-Id: I80eeceb5afbd1b89f86a9dc4c320e161907d3559
 > Signed-off-by: Kotresh HR 
(cherry picked from commit 35aa67001c8fac99b040fbc61f36ef4f1b1590ac)


fixes: bz#1630141
Change-Id: I80eeceb5afbd1b89f86a9dc4c320e161907d3559
Signed-off-by: Kotresh HR

geo-rep: Fix issues related config set

2018-09-21T13:25:43+00:00

1. '--ignore-mising-args' option for rsync is not
   being used even though the rsync version is
   greater than 3.1.0. Fixed the same.

2. '--existing' option for rsync is also not being
   used. Fixed the same.

3. geo-rep config fails to set rsync-options as the
   value contains '--'. Interestingly, python argsparse
   treats the value with '--' (e.g., --ignore-missing-args)
   as option. But when passed with something like
   --value=--ignore-missing-args, it succeeds. Fixed the
   same.

Backport of:
 > Patch: https://review.gluster.org/21191
 > Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1
 > Signed-off-by: Kotresh HR 
 > BUG: 1629561

Change-Id: Iaeb838acaff1c2920fee9c7f920c99edce13a0a1
Signed-off-by: Kotresh HR 
fixes: bz#1630140

geo-rep: Fix deadlock during worker start

2018-09-21T13:25:43+00:00

Analysis:
Monitor process spawns monitor threads (one per brick).
Each monitor thread, forks worker and agent processes.
Each monitor thread, while intializing, updates the
monitor status file. It is synchronized using flock.
The race is that, some thread can fork worker while
other thread opened the status file resulting in
holding the reference of fd in worker process.

Cause:
flock gets unlocked either by specifically unlocking it
or by closing all duplicate fds referring to the file.
The code was relying on fd close, hence a reference
in worker/agent process by fork could cause the deadlock.

Fix:
1. flock is unlocked specifically.
2. Also made sure to update status file in approriate places so that
the reference is not leaked to worker/agent process.

With this fix, both the deadlock and possible fd
leaks is solved.

Backport of:
 > Patch: https://review.gluster.org/20704
 > BUG: bz#1614799
 > Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
 > Signed-off-by: Kotresh HR 

fixes: bz#1630145
Change-Id: I0d1ce93072dab07d0dbcc7e779287368cd9f093d
Signed-off-by: Kotresh HR

geo-rep/hook-script: Fix ssh/scp options

2018-09-21T13:25:43+00:00

Always use ssh and scp with "-oPasswordAuthentication=no"
and "-oStrictHostKeyChecking=no" options. It might hang
the post script otherwise leading geo-rep setup failure

Also increased geo-rep timeout. Occasionally, it's taking
more time to reach Active/Passive status. Especially, the
first start after create.

Backport of:
 > Patch: https://review.gluster.org/20601
 > BUG: bz#1610405
 > Change-Id: I9560d64dbe0edf5db73446a9fc97dda19b88d233
 > Signed-off-by: Kotresh HR 

fixes: bz#1630144
Change-Id: I9560d64dbe0edf5db73446a9fc97dda19b88d233
Signed-off-by: Kotresh HR

rpc: handle EAGAIN when SSL_ERROR_SYSCALL is returned

2018-09-21T13:09:41+00:00

Problem:
A return value of ENODATA was forcibly returned in the case where
SSL_get_error(r) returned SSL_ERROR_SYSCALL. Sometimes SSL_ERROR_SYSCALL
is a transient error which is identified by setting errno to EAGAIN.
EAGAIN is not a fatal error and indicates that the syscall needs to be
retried.

Solution:
Bubble up the errno in case SSL_get_error(r) returns SSL_ERROR_SYSCALL
and let the upper layers handle it appropriately.

fixes: bz#1601356
Change-Id: I76eff278378930ee79abbf9fa267a7e77356eed6
Signed-off-by: Milind Changire

storage/posix: Avoid log flood in posix_set_parent_ctime()

2018-09-17T04:56:46+00:00

posix_set_parent_ctime() unconditionally logs an error if consistent
time attributes is not enabled. This log does not add any value, prints
an incorrect errno & floods the log file.

Hence nuking this log message in this patch.

Backport of :
> Patch: https://review.gluster.org/20547/
> Change-Id: I82a78f2f8ce5ab518f8cdf6d9086a97049712f75
> BUG: 1607049
> Signed-off-by: Vijay Bellur 
(cherry picked from commit e0df887ba044ce92e9a2822be9261d0f712b02bd)


Change-Id: I82a78f2f8ce5ab518f8cdf6d9086a97049712f75
fixes: bz#1629548
Signed-off-by: Vijay Bellur

doc: Release notes for v4.1.4

2018-09-06T16:11:39+00:00

Change-Id: Idfce8b9ec79303b92045e68ab98765f7e2f98940
fixes: bz#1623161
Signed-off-by: Jiffin Tony Thottan