glusterfs.git/xlators/features/locks, branch v3.7dev

Changes in statedump for stack, frame, locks

2014-07-07T17:27:00+00:00

Internal call-stacks don't have lk-owner so it is a bit
difficult to confirm if a stack hung by comparing two
statedump files. This change prints call-stack, frame's
address. This should solve the comparison problem.

Lock times and log times don't have same timezone because
of which one has to manually convert the times for debugging
the issues. This change prints blocked, granted times also
in UTC.

Also fixed line truncation issue when client-unique-string
is big.

Change-Id: I116372c0d63476823a36ca6dbfba91648f9234cc
BUG: 1114188
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/8197
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat 
Reviewed-by: Ravishankar N 
Reviewed-by: Vijay Bellur

features/locks: Clean up logging of cleanup in DISCONNECT codepath

2014-06-12T01:42:54+00:00

Now, gfid is printed as opposed to path in cleanup messages.

Also, refkeeper update is eliminated in inodelk and entrylk.
Instead, the patch ensures inode and pl_inode are kept alive as
long as there is atleast one lock (granted/blocked) on an inode.

Also, every inode is unref'd appropriately on a DISCONNECT from the
lock-owning client.

Change-Id: I531b1a02fe1b889fdd7f54b1fd522e78a18ed1df
BUG: 1104915
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/7981
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Pranith Kumar Karampuri

features/locks: Remove stale entrylk objects from 'blocked_locks' list

2014-04-26T07:52:25+00:00

* In the event of a DISCONNECT from a client, as part of cleanup,
  entrylk objects are not removed from the blocked_locks list before
  being unref'd and freed, causing the brick process to crash at
  some point when the (now) stale object is accessed again in the list.

* Also during cleanup, it is pointless to try and grant lock to a
  previously blocked entrylk (say L1) as part of releasing another
  conflicting lock (L2), (which is a side-effect of L1 not being
  deleted from blocked_locks list before grant_blocked_entry_locks()
  in cleanup) if L1 is also associated with the DISCONNECTing client.
  This patch fixes the problem.

Change-Id: I3d684c6bafc7e6db89ba68f0a2ed1dcb333791c6
BUG: 1089470
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/7560
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

features/locks: Remove stale inodelk objects from 'blocked_locks' list

2014-04-25T07:34:29+00:00

* In the event of a DISCONNECT from a client, as part of cleanup,
  inodelk objects are not removed from the blocked_locks list before
  being unref'd and freed, causing the brick process to crash at
  some point when the (now) stale object is accessed again in the list.

* Also during cleanup, it is pointless to try and grant lock to a
  previously blocked inodelk (say L1) as part of releasing another
  conflicting lock (L2), (which is a side-effect of L1 not being
  deleted from blocked_locks list before grant_blocked_inode_locks()
  in cleanup) if L1 is also associated with the DISCONNECTing client.
  This patch fixes the problem.

* Also, the codepath in cleanup of entrylks seems to be granting
  blocked inodelks, when it should be attempting to grant blocked
  entrylks, which is fixed in this patch.

Change-Id: I8493365c33020333b3f61aa15f505e4e7e6a9891
BUG: 1089470
Signed-off-by: Krutika Dhananjay 
Reviewed-on: http://review.gluster.org/7512
Reviewed-by: Raghavendra G 
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Krishnan Parthasarathi 
Reviewed-by: Anand Avati

build: MacOSX Porting fixes

2014-04-24T21:41:48+00:00

git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs

Working functionality on MacOSX

 - GlusterD (management daemon)
 - GlusterCLI (management cli)
 - GlusterFS FUSE (using OSXFUSE)
 - GlusterNFS (without NLM - issues with rpc.statd)

Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac
BUG: 1089172
Signed-off-by: Harshavardhana 
Signed-off-by: Dennis Schafroth 
Tested-by: Harshavardhana 
Tested-by: Dennis Schafroth 
Reviewed-on: http://review.gluster.org/7503
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

features/locks: Fix a missing assignment in new_entrylk_lock()

2014-04-09T06:40:26+00:00

Change-Id: If5c03456d61ec930d588b57781fb545eed18e4a2
BUG: 1085220
Signed-off-by: Vijay Bellur 
Reviewed-on: http://review.gluster.org/7413
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Santosh Pradhan

locks: fix unconditional op_ret success of entrylk

2014-03-12T17:58:15+00:00

Bug introduced in recent refactoring. op_ret of entrylk() was always
getting set to 0 even though second locker wouldn't have gotten a lock.
This was resulting in multiple contenders to get locks granted at the
same time.

Change-Id: I99c187a9285fb80cc500b38f468f2ebda7048cab
Signed-off-by: Anand Avati 
BUG: 849630
Reviewed-on: http://review.gluster.org/7224
Reviewed-by: Pranith Kumar Karampuri 
Tested-by: Gluster Build System 
Reviewed-by: Kaleb KEITHLEY

locks: set @lock->frame = NULL when lock is granted

2014-01-23T01:30:24+00:00

This way disconnect cleanup code can differentiate which locks
are granted vs blocked.

Change-Id: I2a835c6865b6c804231d852953ea84eeccef35a3
BUG: 849630
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/6730
Tested-by: Gluster Build System 
Reviewed-by: Raghavendra Bhat 
Reviewed-by: Krishnan Parthasarathi

syncop: Change return value of syncop

2014-01-20T07:05:15+00:00

Problem:
We found a day-1 bug when syncop_xxx() infra is used inside a synctask with
compilation optimization (CFLAGS -O2).

Detailed explanation of the Root cause:
We found the bug in 'gf_defrag_migrate_data' in rebalance operation:

Lets look at interesting parts of the function:

int
gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
                        dict_t *migrate_data)
{
.....
code section - [ Loop ]
        while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
                                       &entries)) != 0) {
.....
code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a
thread)
                /* Need to keep track of ENOENT errno, that means, there is no
                   need to send more readdirp() */
                readdir_operrno = errno;
.....
code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
                        ret = syncop_getxattr (this, &entry_loc, &dict,
                                               GF_XATTR_LINKINFO_KEY);
code section - [ ERRNO-2]   (checking for failures of syncop_getxattr(). This
may not always be executed in same thread which executed [SYNCOP-1])
                        if (ret < 0) {
                                if (errno != ENODATA) {
                                        loglevel = GF_LOG_ERROR;
                                        defrag->total_failures += 1;
.....
}

the function above could be executed by thread(t1) till [SYNCOP-1] and code
from [ERRNO-2] can be executed by a different thread(t2) because of the way
syncop-infra schedules the tasks.

when the code is compiled with -O2 optimization this is the assembly code that
is generated:
 [ERRNO-1]
1165                        readdir_operrno = errno; <<---- errno gets expanded
as *(__errno_location())
   0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0 
   0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------ Address
returned by __errno_location() is stored in a special location in stack for
later use.
   0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
   0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
....
 [ERRNO-2]
1281                                        if (errno != ENODATA) {
   0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----  Because
it already stored the address returned by __errno_location(), it just
dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE
EXECUTED BY SAME THREAD!!!
   0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
   0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
   0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi

The problem is that __errno_location() value of t1 and t2 are different. So
[ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is
executing [ERRNO-2] code section.

When code is compiled without any optimization for [ERRNO-2]:
1281                                        if (errno != ENODATA) {
   0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300
<<--- As it is calling __errno_location() again it gets the
location from t2 so it works as intended.
   0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
   0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
   0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1


Fix:
Make syncop_xxx() return (-errno) value as the return value in
case of errors and all the functions which make syncop_xxx() will need to use
(-ret) to figure out the reason for failure in case of syncop_xxx() failures.

Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57
BUG: 1040356
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/6475
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

locks: various fixes

2014-01-14T05:44:19+00:00

- implement ref/unref of entry locks (and fix bad pointer deref crashes)
- code cleanup and deleted various data types
- fix improper read/write lock conflict detection in entrylk
- fix indefinite hang of blocked locks on disconnect
- register locks in client_t synchronously, fix crashes in disconnect path

Change-Id: Id273690c9111b8052139d1847060d1fb5a711924
BUG: 849630
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/6638
Tested-by: Gluster Build System 
Reviewed-by: Kaleb KEITHLEY 
Reviewed-by: Vijay Bellur