glusterfs.git/tests, branch v6.9

features/utime: Don't access frame after stack-wind

2020-04-22T05:23:09+00:00

Problem:
frame is accessed after stack-wind. This can lead to crash
if the cbk frees the frame.

Fix:
Use new frame for the wind instead.

Updates: #832
Change-Id: I64754609f1114b0bbd4d1336fa81a56f2cca6e03
Signed-off-by: Pranith Kumar K

mount/fuse: Wait for 'mount' child to exit before dying

2020-04-22T05:21:08+00:00

Problem:
tests/bugs/protocol/bug-1433815-auth-allow.t fails
sometimes because of stale mount. This stale mount
comes into picture when parent process dies without
waiting for the child process which mounts fuse fs
to die

Fix:
Wait for mounting child process to die before dying.

Fixes: #1152
Change-Id: I8baee8720e88614fdb762ea822d5877973eef8dc
Signed-off-by: Pranith Kumar K

write-behind: fix data corruption

2020-04-20T09:33:13+00:00

There was a bug in write-behind that allowed a previous completed write
to overwrite the overlapping region of data from a future write.

Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are
sequential, and W3 writes at the same offset of W2:

    W2.offset = W3.offset = W1.offset + W1.size

Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes.
So W3 should *always* overwrite the overlapping part of W2.

Suppose write-behind processes the requests from 2 concurrent threads:

    Thread 1                    Thread 2

    
                                
    wb_enqueue_tempted(W1)
    /* W1 is assigned gen X */
                                wb_enqueue_tempted(W2)
                                /* W2 is assigned gen X */

                                wb_process_queue()
                                  __wb_preprocess_winds()
                                    /* W1 and W2 are sequential and all
                                     * other requisites are met to merge
                                     * both requests. */
                                    __wb_collapse_small_writes(W1, W2)
                                    __wb_fulfill_request(W2)

                                  __wb_pick_unwinds() -> W2
                                  /* In this case, since the request is
                                   * already fulfilled, wb_inode->gen
                                   * is not updated. */

                                wb_do_unwinds()
                                  STACK_UNWIND(W2)

                                /* The application has received the
                                 * result of W2, so it can send W3. */
                                

                                wb_enqueue_tempted(W3)
                                /* W3 is assigned gen X */

                                wb_process_queue()
                                  /* Here we have W1 (which contains
                                   * the conflicting W2) and W3 with
                                   * same gen, so they are interpreted
                                   * as concurrent writes that do not
                                   * conflict. */
                                  __wb_pick_winds() -> W3

                                wb_do_winds()
                                  STACK_WIND(W3)

    wb_process_queue()
      /* Eventually W1 will be
       * ready to be sent */
      __wb_pick_winds() -> W1
      __wb_pick_unwinds() -> W1
        /* Here wb_inode->gen is
         * incremented. */

    wb_do_unwinds()
      STACK_UNWIND(W1)

    wb_do_winds()
      STACK_WIND(W1)

So, as we can see, W3 is sent before W1, which shouldn't happen.

The problem is that wb_inode->gen is only incremented for requests that
have not been fulfilled but, after a merge, the request is marked as
fulfilled even though it has not been sent to the brick. This allows
that future requests are assigned to the same generation, which could
be internally reordered.

Solution:

Increment wb_inode->gen before any unwind, even if it's for a fulfilled
request.

Special thanks to Stefan Ring for writing a reproducer that has been
crucial to identify the issue.

Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45
Signed-off-by: Xavi Hernandez 
Fixes: #884

snap_scheduler: python3 compatibility and new test case

2020-04-20T07:50:18+00:00

Problem:
"snap_scheduler.py init" command failing with the below traceback:

[root@dhcp43-104 ~]# snap_scheduler.py init
Traceback (most recent call last):
  File "/usr/sbin/snap_scheduler.py", line 941, in 
    sys.exit(main(sys.argv[1:]))
  File "/usr/sbin/snap_scheduler.py", line 851, in main
    initLogger()
  File "/usr/sbin/snap_scheduler.py", line 153, in initLogger
    logfile = os.path.join(process.stdout.read()[:-1], SCRIPT_NAME + ".log")
  File "/usr/lib64/python3.6/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib64/python3.6/genericpath.py", line 151, in _check_arg_types
    raise TypeError("Can't mix strings and bytes in path components") from None
TypeError: Can't mix strings and bytes in path components

Solution:

Added the 'universal_newlines' flag to Popen to support backward compatibility.

Added a basic test for snapshot scheduler.

Backport of:

   >Upstream Patch: https://review.gluster.org/#/c/glusterfs/+/24257/
   >Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e
   >Fixes: #1134
   >Signed-off-by: Sunny Kumar 
   >(cherry picked from commit a7d7ec066e56ac03bf252c26beb20fdc2c3b6772)

Change-Id: I78e8fabd866fd96638747ecd21d292f5ca074a4e
Fixes: #1134
Signed-off-by: Sunny Kumar

afr: mark pending xattrs as a part of metadata heal

2020-04-20T07:40:10+00:00

...if pending xattrs are zero for all children.

Problem:
If there are no pending xattrs and a metadata heal needs to be
performed, it can be possible that we end up with xattrs inadvertendly
deleted from all bricks, as explained in the  BZ.

Fix:
After picking one among the sources as the good copy, mark pending xattrs on
all sources to blame the sinks. Now even if this metadata heal fails midway,
a subsequent heal will still choose one of the valid sources that it
picked previously.

Updates: #1067
Change-Id: If1b050b70b0ad911e162c04db4d89b263e2b8d7b
Signed-off-by: Ravishankar N 
(cherry picked from commit 2d5ba449e9200b16184b1e7fc84cabd015f1f779)

glusterd: Brick process fails to come up with brickmux on

2020-03-04T07:42:10+00:00

Issue:
1- In a cluster of 3 Nodes N1, N2, N3. Create 3 volumes vol1,
vol2, vol3 with 3 bricks (one from each node)
2- Set cluster.brick-multiplex on
3- Start all 3 volumes
4- Check if all bricks on a node are running on same port
5- Kill N1
6- Set performance.readdir-ahead for volumes vol1, vol2, vol3
7- Bring N1 up and check volume status
8- All bricks processes not running on N1.

Root Cause -
Since, There is a diff in volfile versions in N1 as compared
to N2 and N3 therefore glusterd_import_friend_volume() is called.
glusterd_import_friend_volume() copies the new_volinfo and deletes
old_volinfo and then calls glusterd_start_bricks().
glusterd_start_bricks() looks for the volfiles and sends an rpc
request to glusterfs_handle_attach(). Now, since the volinfo
has been deleted by glusterd_delete_stale_volume()
from priv->volumes list before glusterd_start_bricks() and
glusterd_create_volfiles_and_notify_services() and
glusterd_list_add_order is called after glusterd_start_bricks(),
therefore the attach RPC req gets an empty volfile path
and that causes the brick to crash.

Fix- Call glusterd_list_add_order() and
glusterd_create_volfiles_and_notify_services before
glusterd_start_bricks() cal is made in glusterd_import_friend_volume

> Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
> Bug: bz#1773856
> Signed-off-by: Mohammed Rafi KC 

Change-Id: Idfe0e8710f7eb77ca3ddfa1cabeb45b2987f41aa
Fixes: bz#1808966
Signed-off-by: Sanju Rakonde

afr: prevent spurious entry heals leading to gfid split-brain

2020-02-28T06:06:10+00:00

Problem:
In a hyperconverged setup with granular-entry-heal enabled, if a file is
recreated while one of the bricks is down, and an index heal is triggered
(with the brick still down), entry-self heal was doing a spurious heal
with just the 2 good bricks. It was doing a post-op leading to removal
of the filename from .glusterfs/indices/entry-changes as well as
erroneous setting of afr xattrs on the parent. When the brick came up,
the xattrs were cleared, resulting in the renamed file not getting
healed and leading to gfid split-brain and EIO on the mount.

Fix:
Proceed with entry heal only when shd can connect to all bricks of the replica,
just like in data and metadata heal.

fixes: bz#1804594
Change-Id: I916ae26ad1fabf259bc6362da52d433b7223b17e
Signed-off-by: Ravishankar N 
(cherry picked from commit 06453d77d056fbaa393a137ca277a20e38d2f67e)

Cluster/afr: Don't treat all bricks having metadata pending as split-brain

2020-02-25T07:06:51+00:00

Problem:
We currently don't have a roll-back/undoing of post-ops if quorum is not met.
Though the FOP is still unwound with failure, the xattrs remain on the disk.
Due to these partial post-ops and partial heals (healing only when 2 bricks
are up), we can end up in metadata split-brain purely from the afr xattrs
point of view i.e each brick is blamed by atleast one of the others for
metadata. These scenarios are hit when there is frequent connect/disconnect
of the client/shd to the bricks.

Fix:
Pick a source based on the xattr values. If 2 bricks blame one, the blamed
one must be treated as sink. If there is no majority, all are sources. Once
we pick a source, self-heal will then do the heal instead of erroring out
due to split-brain.
This patch also adds restriction of all the bricks to be up to perform
metadata heal to avoid any metadata loss.

Removed the test case tests/bugs/replicate/bug-1468279-source-not-blaming-sinks.t
as it was doing metadata heal even when only 2 of 3 bricks were up.

Change-Id: I07a9d62f84ceda329dcab1f02a33aeed258dcb09
fixes: bz#1805097
Signed-off-by: karthik-us

server: Mount fails after reboot 1/3 gluster nodes

2020-02-11T08:44:38+00:00

Problem: At the time of coming up one server node(1x3) after reboot
client is unmounted.The client is unmounted because a client
is getting AUTH_FAILED event and client call fini for the graph.The
client is getting AUTH_FAILED because brick is not attached with a
graph at that moment

Solution: To avoid the unmounting the client graph throw ENOENT error
          from server in case if brick is not attached with server at
          the time of authenticate clients.

> Credits: Xavi Hernandez 
> Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
> Fixes: bz#1793852
> Signed-off-by: Mohit Agrawal 
> (cherry picked from commit > f6421dff22a6ddaf14134f6894deae219948c89d)

Change-Id: Ie6fbd73cbcf23a35d8db8841b3b6036e87682f5e
Fixes: bz#1794020
Signed-off-by: Mohit Agrawal

test: fix non-root test case for geo-rep

2019-12-24T05:20:57+00:00

Problem:
On a freshly installed system non-root geo-rep test case gets blocked.

Solution:

On a freshly installed system, the remote key need to be accepted automatically by ssh-copy-id.

Credits: M. Scherer 

Backport of:

>    Change-Id: I5077f99a6681660f7e3e84c25ef216f521b7c29c
>    Fixes: bz#1779742
>    Signed-off-by: Sunny Kumar 

Change-Id: I5077f99a6681660f7e3e84c25ef216f521b7c29c
Fixes: bz#1784796
Signed-off-by: Sunny Kumar