glusterfs.git/xlators/features, branch exp

upcall: Fix 'use after free' in a log message

2016-12-13T14:48:41+00:00

There is chance of accessing freed pointer in a log message at TRACE
level while cleaning up expired client entries.

Change-Id: I06b4dad755df63978ab04ca52442bfd4600d139a
BUG: 1404168
Reported-by: Ravishankar N 
Signed-off-by: Soumya Koduri 
Reviewed-on: http://review.gluster.org/16117
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Niels de Vos 
Smoke: Gluster Build System

all: remove dead translators

2016-11-29T20:49:54+00:00

The following have been completely removed from the source tree,
makefiles, configure script, and RPM specfile.

  cluster/afr/pump
  cluster/ha
  cluster/map
  features/filter
  features/mac-compat
  features/path-convertor
  features/protect

Change-Id: I2f966999ac3c180296ff90c1799548fba504f88f
Signed-off-by: Jeff Darcy 
Reviewed-on: http://review.gluster.org/15906
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

features/index: Delete granular entry indices of already healed directories during crawl

2016-11-25T03:03:14+00:00

If granular name indices are already in existence for a volume, and
before they are healed, granular entry heal be disabled, a crawl on
indices/xattrop will clear the changelogs on these directories. When
their corresponding entry-changes indices are crawled subsequently,
if it is found that the directories don't need heal anymore, the
granular indices are not cleaned up.
This patch fixes that problem by ensuring that the zero-xattrop
also deletes the stale indices at the level of index translator.

Change-Id: Ifbaa6bec2a14e3041addfee4054131babbf4d35e
BUG: 1370410
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15880
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Pranith Kumar Karampuri 
CentOS-regression: Gluster Build System

marker: Fix inode value in loc, in setxattr fop

2016-11-17T10:53:15+00:00

On recieving a rename fop, marker_rename() stores the,
oldloc and newloc in its 'local' struct, once the rename
is done, the xtime marker(last updated time) is set on
the file, but sending a setxattr fop. When upcall
receives the setxattr fop, the loc->inode is NULL and
it crashes. The loc->inode can be NULL only in one valid
case, i.e. in rename case where the inode of new loc
can be NULL. Hence, marker should have filled the inode
of the new_loc before issuing a setxattr.

Change-Id: Id638f678c3daaf4a5c29b970b58929d377ae8977
BUG: 1394131
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15826
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Kotresh HR 
Smoke: Gluster Build System 
Reviewed-by: Rajesh Joseph

upcall: Fix a log level

2016-11-08T20:44:47+00:00

In upcall_cache_invalidation(), the gfid can be NULL in certain
valid test cases(eg: entry for ".." in readdirp), hence change
the log level from WARNING to DEBUG.

Change-Id: Ic90167a0e2076694e9131913114460df7b939b30
BUG: 1392167
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15777
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Smoke: Gluster Build System 
Reviewed-by: Rajesh Joseph 
Reviewed-by: Jeff Darcy

features/shard: Fill loc.pargfid too for named lookups on individual shards

2016-11-08T11:05:27+00:00

On a sharded volume when a brick is replaced while IO is going on, named
lookup on individual shards as part of read/write was failing with
ENOENT on the replaced brick, and as a result AFR initiated name heal in
lookup callback. But since pargfid was empty (which is what this patch
attempts to fix), the resolution of the shards by protocol/server used
to fail and the following pattern of logs was seen:

Brick-logs:

[2016-11-08 07:41:49.387127] W [MSGID: 115009]
[server-resolve.c:566:server_resolve] 0-rep-server: no resolution type
for (null) (LOOKUP)
[2016-11-08 07:41:49.387157] E [MSGID: 115050]
[server-rpc-fops.c:156:server_lookup_cbk] 0-rep-server: 91833: LOOKUP(null)
(00000000-0000-0000-0000-000000000000/16d47463-ece5-4b33-9c93-470be918c0f6.82)
==> (Invalid argument) [Invalid argument]

Client-logs:
[2016-11-08 07:41:27.497687] W [MSGID: 114031]
[client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-0: remote
operation failed. Path: (null) (00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2016-11-08 07:41:27.497755] W [MSGID: 114031]
[client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-1: remote
operation failed. Path: (null) (00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2016-11-08 07:41:27.498500] W [MSGID: 114031]
[client-rpc-fops.c:2930:client3_3_lookup_cbk] 2-rep-client-2: remote
operation failed. Path: (null) (00000000-0000-0000-0000-000000000000)
[Invalid argument]
[2016-11-08 07:41:27.499680] E [MSGID: 133010]

Also, this patch makes AFR by itself choose a non-NULL pargfid even if
its ancestors fail to initialize all pargfid placeholders.

Change-Id: I5f85b303ede135baaf92e87ec8e09941f5ded6c1
BUG: 1392445
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/15788
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Ravishankar N 
Reviewed-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System

xlators/trash : Remove upper limit for trash max file size

2016-11-03T14:09:41+00:00

Currently file which size exceeds more than 1GB never moved to
trash directory. This is due to the hard coded check using
GF_ALLOWED_MAX_FILE_SIZE.

Change-Id: I2ed707bfe1c3114818896bb27a9856b9a164be92
BUG: 1386766
Signed-off-by: Jiffin Tony Thottan 
Reviewed-on: http://review.gluster.org/15689
Smoke: Gluster Build System 
Reviewed-by: Anoop C S 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System 
Reviewed-by: Vijay Bellur

md-cache, afr: Reduce the window of stale read

2016-10-20T07:07:55+00:00

Problem:
Consider a replica setup, where one mount writes data to a
file and the other mount reads the file. In afr, read operations
are not transaction based, a brick(read subvolume) is chosen as
a part of lookup or other operations, read is always wound only
to the read subvolume, even if there was write from a different client
that failed on this brick. This stale read continues until there is
a lookup or any write operation from the mount point. Currently, this
is not a major issue, as a lookup is issued before every read and it will
switch the read subvolume to a correct one. But with the plan of
increasing md-cache timeout to 600s, the stale read problem will be
more pronounced, i.e. stale read can continue for 600s(or more if cascaded
with readdirp), as there will be no lookups.

Solution:
Afr doesn't have any built-in solution for stale read(without affecting
the performance). The solution that came up, was to use upcall. When a file
on any brick is marked bad for the first time, upcall sends a notification
to all the clients that had recently accessed the file. The solution has
2 parts:
- Identifying when a file is marked bad, on any of the bricks,
  for the first time
- Client side actions on recieving the notifications

Identifying when a file is marked bad on any of the bricks for the first time:
-----------------------------------------------------------------------------
The idea is to track xattrop in upcall. xattrop currently comes with 2 afr
xattrs - afr dirty bit and afr pending xattrs.
   Dirty xattr is set to 1 before every write, and is unset if write succeeds.
In certain scenarios, dirty xattr can be 0 and still the file could be bad
copy. Hence do not track dirty xattr.
   Pending xattr is set on the good copy, indicating the other bricks that have
bad copy. It is still not as simple as, notifying when any of the pending xattrs
change. It could lead to flood of notifcations, in case the other brick is
completely down or consistantly failing. Hence it is important to notify only
once, the first time a good copy is marked bad.

Client side actions on recieving pending xattr change, notification:
--------------------------------------------------------------------
md-cache will invalidate the cache of that file, so that further lookup is
passed down to afr and hence update the read subvolume. Invalidating only in
md-cache is not enough, consider the folling oder of opertaions:
- pending xattr invalidation - invalidate md-cache
- readdirp on the bad read subvolume - fill md-cache
- lookup (served from md-cache)
- read - wound to the old read subvol.
Hence, along with invalidating md-cache, it is very important to reset the
read subvolume for that file, in afr.

Design Credit: Anuradha Talur, Ravishankar N

1. xattrop doesn't carry info saying post op/pre op.
2. Pre xattrop will have 0 value for all pending xattrs,
   the cbk of pre xattrop carries the on-disk xattr value.
   Non zero indicated healing is required.
3. Post xattrop will have non zero value for any of the
   pending xattrs, if the fop failed on any of the bricks.

Change-Id: I469cbc111714c433984fe1c922be2ef113c25804
BUG: 1211863
Signed-off-by: Poornima G 
Reviewed-on: http://review.gluster.org/15398
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri 
Smoke: Gluster Build System 
NetBSD-regression: NetBSD Build System 
CentOS-regression: Gluster Build System

cluster/tier: handle fast demotions

2016-10-19T19:51:48+00:00

Demote files on priority if hi-watermark has been breached and continue
to demote until the watermark drops below hi-watermark.

Monitor watermark more frequently.
Trigger demotion as soon as hi-watermark is breached.
Add cluster.tier-emergency-demote-query-limit option to limit number
of files returned from the database query for every iteration of
tier_migrate_using_query_file(). If watermark hasn't dropped below
hi-watermark during the first iteration, the next iteration will be
triggered approximately 1 second after tier_demote() returns to the
main tiering loop.
Update changetimerecorder xlator to handle query for emergency demote
mode.

Add tier-ctr-interface.h:
Move tier and ctr interface specific macros and struct definition from
libglusterfs/src/gfdb/gfdb_data_store.h to new header
libglusterfs/src/tier-ctr-interface.h

Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
BUG: 1366648
Signed-off-by: Milind Changire 
Reviewed-on: http://review.gluster.org/15158
Smoke: Gluster Build System 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System 
Reviewed-by: Dan Lambright

features/read-only: reten_mode is invalid to be free by mem_put()

2016-10-18T14:13:08+00:00

priv->reten_mode is initialised by option 'retention-mode'. and it
reference the memory in this->options. so fini() use mem_put to free
priv->reten_mode will cause a problem.
there is no need to call mem_put(), so just remove it will be fine.

Change-Id: Iee6f9d1d54df38cba8c9b9100e2824f4f2b18ab4
BUG: 1369523
Signed-off-by: Ryan Ding 
Reviewed-on: http://review.gluster.org/15296
Smoke: Gluster Build System 
Reviewed-by: Niels de Vos 
CentOS-regression: Gluster Build System 
NetBSD-regression: NetBSD Build System