glusterfs.git/xlators/performance/write-behind, branch v6.9

write-behind: fix data corruption

2020-04-20T09:33:13+00:00

There was a bug in write-behind that allowed a previous completed write
to overwrite the overlapping region of data from a future write.

Suppose we want to send three writes (W1, W2 and W3). W1 and W2 are
sequential, and W3 writes at the same offset of W2:

    W2.offset = W3.offset = W1.offset + W1.size

Both W1 and W2 are sent in parallel. W3 is only sent after W2 completes.
So W3 should *always* overwrite the overlapping part of W2.

Suppose write-behind processes the requests from 2 concurrent threads:

    Thread 1                    Thread 2

    
                                
    wb_enqueue_tempted(W1)
    /* W1 is assigned gen X */
                                wb_enqueue_tempted(W2)
                                /* W2 is assigned gen X */

                                wb_process_queue()
                                  __wb_preprocess_winds()
                                    /* W1 and W2 are sequential and all
                                     * other requisites are met to merge
                                     * both requests. */
                                    __wb_collapse_small_writes(W1, W2)
                                    __wb_fulfill_request(W2)

                                  __wb_pick_unwinds() -> W2
                                  /* In this case, since the request is
                                   * already fulfilled, wb_inode->gen
                                   * is not updated. */

                                wb_do_unwinds()
                                  STACK_UNWIND(W2)

                                /* The application has received the
                                 * result of W2, so it can send W3. */
                                

                                wb_enqueue_tempted(W3)
                                /* W3 is assigned gen X */

                                wb_process_queue()
                                  /* Here we have W1 (which contains
                                   * the conflicting W2) and W3 with
                                   * same gen, so they are interpreted
                                   * as concurrent writes that do not
                                   * conflict. */
                                  __wb_pick_winds() -> W3

                                wb_do_winds()
                                  STACK_WIND(W3)

    wb_process_queue()
      /* Eventually W1 will be
       * ready to be sent */
      __wb_pick_winds() -> W1
      __wb_pick_unwinds() -> W1
        /* Here wb_inode->gen is
         * incremented. */

    wb_do_unwinds()
      STACK_UNWIND(W1)

    wb_do_winds()
      STACK_WIND(W1)

So, as we can see, W3 is sent before W1, which shouldn't happen.

The problem is that wb_inode->gen is only incremented for requests that
have not been fulfilled but, after a merge, the request is marked as
fulfilled even though it has not been sent to the brick. This allows
that future requests are assigned to the same generation, which could
be internally reordered.

Solution:

Increment wb_inode->gen before any unwind, even if it's for a fulfilled
request.

Special thanks to Stefan Ring for writing a reproducer that has been
crucial to identify the issue.

Change-Id: Id4ab0f294a09aca9a863ecaeef8856474662ab45
Signed-off-by: Xavi Hernandez 
Fixes: #884

perf/write-behind: Clear frame->local on conflict error

2019-10-04T05:22:36+00:00

WB saves the wb_inode in frame->local for the truncate and
ftruncate fops. This value is not cleared in case of error
on a conflicting write request. FRAME_DESTROY finds a non-null
frame->local and tries to free it using mem_put. However,
wb_inode is allocated using GF_CALLOC, causing the
process to crash.

credit: vpolakis@gmail.com

Change-Id: I217f61470445775e05145aebe44c814731c1b8c5
fixes: bz#1755679
Signed-off-by: N Balachandran

performance/write-behind: fix use-after-free in readdirp

2019-02-22T03:33:07+00:00

Two issues were found:
1. in wb_readdirp_cbk, inode should unrefed after wb_inode is
unlocked. Otherwise, inode and hence the context wb_inode can be freed
by the type we try to unlock wb_inode
2. wb_readdirp_mark_end iterates over a list of wb_inodes of children
of a directory. But inodes could've been freed and hence the list
might be corrupted. To fix take a reference on inode before adding it
to invalidate_list of parent.

Change-Id: I911b0e0b2060f7f41ded0b05db11af6f9b7c09c5
Signed-off-by: Raghavendra Gowdappa 
Updates: bz#1678570
(cherry picked from commit 64cc4458918e8c8bfdeb114da0a6501b2b98491a)

performance/write-behind: handle call-stub leaks

2019-02-19T06:23:34+00:00

Change-Id: I7be9a5f48dcad1b136c479c58b1dca1e0488166d
Signed-off-by: Raghavendra Gowdappa 
Fixes: bz#1678570
(cherry picked from commit 6175cb10cd5f59f3c7ae4100bc78f359b68ca3e9)

clang: Fix various missing checks for empty list

2018-12-14T04:33:15+00:00

When using list_for_each_entry(_safe) functions, care needs
to be taken that the list passed in are not empty, as these
functions are not empty list safe.

clag scan reported various points where this this pattern
could be caught, and this patch fixes the same.

Additionally the following changes are present in this patch,
- Added an explicit op_ret setting in error case in the
macro MAKE_INODE_HANDLE to address another clang issue reported
- Minor refactoring of some functions in quota code, to address
possible allocation failures in certain functions (which in turn
cause possible empty lists to be passed around)

Change-Id: I1e761a8d218708f714effb56fa643df2a3ea2cc7
Updates: bz#1622665
Signed-off-by: ShyamsundarR

write-behind/bit-rot: fix identifier

2018-12-11T06:43:38+00:00

Rename the identifiers, bit-rot-server to bit-rot in bit-rot.c & write-ahead to
write-behind in write-behind.c to ensure GD2 understands the options

Change-Id: Id271ae97de2e54f4e30174482c4e1fb6afc728d3
Fixes: #164
Signed-off-by: rishubhjain

New xlator option to control enable/disable of xlators in Gd2

2018-12-07T05:28:23+00:00

Since glusterd2 don't maintain the xlator option details in code, it
directly reads the xlators options table from `*.so` files. To support
enable and disable of xlator new option added to the option table with
the name same as xlator name itself.

This change will not affect the functionality with glusterd1.

Change-Id: I23d9e537f3f422de72ddb353484466d3519de0c1
updates: #302
Signed-off-by: Aravinda VK

all: add xlator_api to many translators

2018-12-06T07:54:28+00:00

Fixes: #164
Change-Id: I93ad6f0232a1dc534df099059f69951e1339086f
Signed-off-by: Amar Tumballi

libglusterfs: Move devel headers under glusterfs directory

2018-12-05T21:47:04+00:00

libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation <> in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR

all: fix the format string exceptions

2018-11-05T18:50:59+00:00

Currently, there are possibilities in few places, where a user-controlled
(like filename, program parameter etc) string can be passed as 'fmt' for
printf(), which can lead to segfault, if the user's string contains '%s',
'%d' in it.

While fixing it, makes sense to make the explicit check for such issues
across the codebase, by making the format call properly.

Fixes: CVE-2018-14661

Fixes: bz#1644763
Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4
Signed-off-by: Amar Tumballi