glusterfs.git/xlators, branch v7dev

glusterd: get-state command should not fail if any brick is gone bad

2019-02-05T14:40:25+00:00

Problem: get-state command will error out, if any of the underlying
brick(s) of volume(s) in the cluster go bad.

It is expected that get-state command should not error out, but
should generate an output successfully.

Solution: In glusterd_get_state(), a statfs call is made on the
brick path for every bricks of the volumes to calculate the total
and free memory available. If any of statfs call fails on any
brick, we should not error out and should report total memory and free
memory of that brick as 0.

This patch also handles a statfs failure scenario in
glusterd_store_retrieve_bricks().

fixes: bz#1672205

Change-Id: Ia9e8a1d8843b65949d72fd6809bd21d39b31ad83
Signed-off-by: Sanju Rakonde

glusterd: manage upgrade to current master

2019-02-04T03:47:15+00:00

Scenarios tested:

* Upgrade the node when there are stripe / tiering and regular
type of volumes are present.
  - All volumes are started fine (as the change was not on brick volfile)
  - For tier, the functionality may not even work, as changetimerecorder
    is not present.
  - 'gluster volume info' properly shows as 'NOT SUPPORTED' for stripe and
    tier type of volume.

* Upgrade in a rolling upgrade scenario, where an old version is
able to connect to higher master.
  - on a normal volume, if the volfile-server was new, the newer client
    volfiles needed to have utime xlator conditionally.
  - with this one change, all other changes seem to work fine.

Change-Id: Ib2d3b69dafa02b2c695a735b13c1aa70aba07cb8
updates: bz#1635688
Signed-off-by: Amar Tumballi

cluster/dht: Do not use gfid-req in fresh lookup

2019-02-02T03:09:36+00:00

Fuse sets a random gfid-req value for a fresh lookup. Posix
lookup will set this gfid on entries with missing gfids causing
a GFID mismatch for directories.
DHT will now ignore the Fuse provided gfid-req and use the GFID
returned from other subvols to heal the missing gfid.

Change-Id: I5f541978808f246ba4542564251e341ec490db14
fixes: bz#1670259
Signed-off-by: N Balachandran

mount/fuse: expose auto-invalidation as a mount option

2019-02-02T03:07:35+00:00

Auto invalidation is necessary when same (meta)data is shared/access
across multiple mounts. However, if (meta)data is not shared, all
relevant I/O goes through the cache of single mount and hence is
coherent with (meta)data on bricks always. So, fuse-auto-invalidation
can be disabled for this case which gives a huge performance boost for
workloads that write data and then immediately read the data they just
wrote.

From glusterfs --help,


      --auto-invalidation[=BOOL]   controls whether fuse-kernel can
                             auto-invalidate attribute, dentry and page-cache.
                             Disable this only if same files/directories are
                             not accessed across two different mounts
                             concurrently [default: "on"]


Details on how disabling auto-invalidation helped to reduce pgbench
init times can be found at [1]. Time taken for pgbench init of scale
8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s)
with auto-invalidations turned off along with other
optimizations. Just disabling auto-invalidation contributed 56%
improvement by reducing the total time taken by 33260s.

[1] https://www.spinics.net/lists/gluster-devel/msg25907.html

Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce
Signed-off-by: Raghavendra Gowdappa 
updates: bz#1664934

core: make gf_thread_create() easier to use

2019-02-01T09:37:20+00:00

This patch creates a specific function to set the thread name using a
string format and a variable argument list, like printf().

This function is used to set the thread name from gf_thread_create(),
which now accepts a variable argument list to create the full name. It's
not necessary anymore to use a local array to build the name of the
thread. This is done automatically.

Change-Id: Idd8d01fd462c227359b96e98699f8c6d962dc17c
Updates: bz#1193929
Signed-off-by: Xavi Hernandez

cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelog

2019-02-01T05:44:36+00:00

If a fop to create an entry fails on one of the data brick,
we mark the pending changelog on the entry on brick for which
it was successful. This is done as part of post op phase to
make sure that entry gets healed even if it gets renamed to
some other path where its parent was not marked as bad.

As it happens as part of post op, we should consider thin-arbiter
to check if the brick, which was successful, is the good brick or not.
This will avoide split brain and other issues.

Change-Id: I12686675be98f02f70a5186b3ed748c541514d53
updates: bz#1662264
Signed-off-by: Ashish Pandey

cluster/dht: Remove internal permission bits

2019-02-01T05:29:59+00:00

Rebalance sets the sgid and t bits on a file
that is being migrated. These permissions are
not removed in dht_readdirp_cbk when listing files
causing them to show up on the mountpoint.
We now remove these permissions if a non-linkto
file has the linkto xattr set.

Change-Id: I5c69b2ecfe2df804fe50faea903b242d01729596
fixes: bz#1669937
Signed-off-by: N Balachandran

feature/bitrot: Avoid thread creation if xlator is not enabled

2019-01-31T09:47:03+00:00

Problem: Avoid thread creation for bitrot-stub
         for a volume if feature is not enabled

Solution: Before thread creation check the flag if feature
          is enabled

Updates: #475
Change-Id: I2c6cc35bba142d4b418cc986ada588e558512c8e
Signed-off-by: Mohit Agrawal 
Signed-off-by: Kotresh HR

core: heketi-cli is throwing error "target is busy"

2019-01-31T06:23:39+00:00

Problem: When rpc-transport-disconnect happens, server_connection_cleanup_flush_cbk() 
         is supposed to call rpc_transport_unref() after open-files on 
         that transport are flushed per transport.But open-fd-count is 
         maintained in bound_xl->fd_count, which can be incremented/decremented 
         cumulatively in server_connection_cleanup() by all transport 
         disconnect paths. So instead of rpc_transport_unref() happening 
         per transport, it ends up doing it only once after all the files 
         on all the transports for the brick are flushed leading to 
         rpc-leaks.

Solution: To avoid races maintain fd_cnt at client instead of maintaining
          on brick

Credits: Pranith Kumar Karampuri
Change-Id: I6e8ea37a61f82d9aefb227c5b3ab57a7a36850e6
fixes: bz#1668190
Signed-off-by: Mohit Agrawal

readdir-ahead: do not zero-out iatt in fop cbk

2019-01-31T06:18:14+00:00

...when ctime is zero. ia_type and ia_gfid always need to be non-zero
for things to work correctly.

Problem:
Commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac zeroed out the iatt
buffer in the cbks of modification fops before unwinding if the ctime in
the buffer was zero. This was causing the fops to fail: noticeable when
AFR's 'consistent-metadata' option was enabled. (AFR zeros out the ctime
when the option is set. See commit
4c4624c9bad2edf27128cb122c64f15d7d63bbc8).

Fixes:
-Do not zero out the ia_type and ia_gfid of the iatt buff under any
circumstance.
-Also, fixed _rda_inode_ctx_update_iatts() to always update these values from
the incoming buf when ctime is zero. Otherwise we end up with zero
ia_type and ia_gfid the first time the function is called *and* the
incoming buf has ctime set to zero.

fixes: bz#1670253
Reported-By:Michael Hanselmann 
Change-Id: Ib72228892d42c3513c19fc6dfb543f2aa3489eca
Signed-off-by: Ravishankar N