glusterfs.git/xlators/cluster/dht/src/dht-helper.c, branch v3.6.6

cluster/dht: fix memory corruption in locking api.

2014-09-17T04:25:56+00:00



     The  contents  of the array are sorted in ascending order
     according to a comparison function pointed to by compar, which is
     called with two arguments that "point to the objects being
     compared".



qsort passes "pointers to members of the array" to comparision
function. Since the members of the array happen to be (dht_lock_t *),
the arguments passed to dht_lock_request_cmp are of type (dht_lock_t
**). Previously we assumed them to be of type (dht_lock_t *), which
resulted in memory corruption.

Change-Id: Iee0758704434beaff3c3a1ad48d549cbdc9e1c96
BUG: 1142406
Signed-off-by: Raghavendra G 
Reviewed-on-master: http://review.gluster.org/8659
Tested-by: Gluster Build System 
Reviewed-by: Shyamsundar Ranganathan 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8750

cluster/dht: invoke callback when there are no locks to be unlocked.

2014-09-12T11:23:26+00:00

Change-Id: I375cb68f1075c2d58cf9d09ed6bd5e2746e1637d
BUG: 1138395
Signed-off-by: Raghavendra G 
Reviewed-on-master: http://review.gluster.org/8549
Tested-by: Gluster Build System 
Reviewed-by: N Balachandran 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8611

cluster/dht: introduce locking api.

2014-09-09T19:23:13+00:00

Change-Id: I41389ba91951d3e63e617aa32cd0bee848261c72
BUG: 1138395
Signed-off-by: Raghavendra G 
Reviewed-on-master: http://review.gluster.org/8521
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/8609
Reviewed-by: Jeff Darcy

Cluster/DHT : Logging changes

2014-06-19T05:59:57+00:00

Removed trailing spaces from the code

Change-Id: I427c9a01b514824f903e301863c2c29071db6483
BUG: 1075611
Signed-off-by: Nithya Balachandran 
Reviewed-on: http://review.gluster.org/8096
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

Cluster/DHT: New logging framework

2014-06-16T13:25:51+00:00

Moved all relevant DHT gf_log calls to the new logging
framework.

Change-Id: I3af3cfe0416e332774a6c4ff6a091d006c400af2
BUG: 1075611
Signed-off-by: Nithya Balachandran 
Reviewed-on: http://review.gluster.org/7929
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

DHT/readdirp: Directory not shown/healed on mount point if exists

2014-06-16T12:14:20+00:00

              on single brick(non first up subvolume).

Problem: If snapshot is taken, when mkdir has succeeded only on
hashed_subvolume, then after restoring snapshot the directory
is not shown on mount point.

Why:    dht_readdirp takes only those directory entries in to
account, which are present on first_up_subvolume. Hence, if the
"hashed subvolume" is not same as first_up_subvolume, it wont be listed
on mount point and also not healed.

Solution:
Case 1: (Rebalance not running)If hashed subvolume is NULL or down then
filter in first_up_subvolume. Other wise the corresponding hashed subvolume
will take care of the directory entry.

Case 2: If readdirp_optimize option is turned on then read from first_up_subvol

Change-Id: Idaad28f1c9f688dbfb1a8a3ab8b244510c02365e
BUG: 1092433
Signed-off-by: Susant Palai 
Reviewed-on: http://review.gluster.org/7599
Reviewed-by: Raghavendra G 
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

cluster/dht: force set dir inode ctx cached time in setattr()

2014-04-17T18:41:48+00:00

In setattr, the inode times may have been explicitly set "back
in time". In such cases, if the inode ctx times are not force
set, then they continue to be higher and continue serving the
higher/older value in future calls to dht_inode_ctx_time_update()

Change-Id: I9cbfa7cf7c4069b0106d1f462de08c5d59bc91b5
BUG: 1083324
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/7378
Reviewed-by: Harshavardhana 
Tested-by: Harshavardhana 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra G 
Reviewed-by: Vijay Bellur

cluster/dht: Set restrictive open flags for files under rebalance

2014-02-04T17:56:57+00:00

Files that are being rebalanced are created in the new volume
and access path needs to open these files to write changing
data in parallel to both the old and new locations. While opening
the file in the new location, we need to restrict the open flags
to not use truncate or create and fail if exist flags, to prevent
open failures or inadvertently truncate the file under rebalance.

Change-Id: I12130e0377adc393f1925c45585200ad991fd0d5
BUG: 1058569
Signed-off-by: ShyamsundarR 
Reviewed-on: http://review.gluster.org/6830
Reviewed-by: Raghavendra G 
Reviewed-by: Krutika Dhananjay 
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat 
Reviewed-by: Vijay Bellur

cluster/dht: set op_errno correctly during migration.

2014-01-25T08:03:26+00:00

Change-Id: I65acedf92c1003975a584a2ac54527e9a2a1e52f
BUG: 1010241
Signed-off-by: Raghavendra G 
Reviewed-on: http://review.gluster.org/6219
Reviewed-by: Shyamsundar Ranganathan 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

syncop: Change return value of syncop

2014-01-20T07:05:15+00:00

Problem:
We found a day-1 bug when syncop_xxx() infra is used inside a synctask with
compilation optimization (CFLAGS -O2).

Detailed explanation of the Root cause:
We found the bug in 'gf_defrag_migrate_data' in rebalance operation:

Lets look at interesting parts of the function:

int
gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
                        dict_t *migrate_data)
{
.....
code section - [ Loop ]
        while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
                                       &entries)) != 0) {
.....
code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a
thread)
                /* Need to keep track of ENOENT errno, that means, there is no
                   need to send more readdirp() */
                readdir_operrno = errno;
.....
code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
                        ret = syncop_getxattr (this, &entry_loc, &dict,
                                               GF_XATTR_LINKINFO_KEY);
code section - [ ERRNO-2]   (checking for failures of syncop_getxattr(). This
may not always be executed in same thread which executed [SYNCOP-1])
                        if (ret < 0) {
                                if (errno != ENODATA) {
                                        loglevel = GF_LOG_ERROR;
                                        defrag->total_failures += 1;
.....
}

the function above could be executed by thread(t1) till [SYNCOP-1] and code
from [ERRNO-2] can be executed by a different thread(t2) because of the way
syncop-infra schedules the tasks.

when the code is compiled with -O2 optimization this is the assembly code that
is generated:
 [ERRNO-1]
1165                        readdir_operrno = errno; <<---- errno gets expanded
as *(__errno_location())
   0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0 
   0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------ Address
returned by __errno_location() is stored in a special location in stack for
later use.
   0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
   0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
....
 [ERRNO-2]
1281                                        if (errno != ENODATA) {
   0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----  Because
it already stored the address returned by __errno_location(), it just
dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE
EXECUTED BY SAME THREAD!!!
   0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
   0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
   0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi

The problem is that __errno_location() value of t1 and t2 are different. So
[ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is
executing [ERRNO-2] code section.

When code is compiled without any optimization for [ERRNO-2]:
1281                                        if (errno != ENODATA) {
   0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300
<<--- As it is calling __errno_location() again it gets the
location from t2 so it works as intended.
   0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
   0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
   0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1


Fix:
Make syncop_xxx() return (-errno) value as the return value in
case of errors and all the functions which make syncop_xxx() will need to use
(-ret) to figure out the reason for failure in case of syncop_xxx() failures.

Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57
BUG: 1040356
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/6475
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati