glusterfs.git/xlators/cluster/afr/src/afr-inode-write.c, branch v3.5.7

cluster/afr: Fix bugs in quorum implementation

2014-05-10T20:30:48+00:00

- Have common place to perform quorum fop wind check
- Check if fop succeeded in a way that matches quorum
  to avoid marking changelog in split-brain.

Change-Id: I663072ece0e1de6e1ee9fccb03e1b6c968793bc5
BUG: 1066996
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/7513
Reviewed-by: Ravishankar N 
Tested-by: Gluster Build System 
Reviewed-by: Niels de Vos

zerofill: Change the type of len argument of glfs_zerofill() to off_t

2013-11-15T07:29:48+00:00

glfs_zerofill() can be potentially called to zero-out entire file and
hence allow for bigger value of length parameter.

Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63
BUG: 1028673
Signed-off-by: Bharata B Rao 
Reviewed-on: http://review.gluster.org/6266
Tested-by: Gluster Build System 
Reviewed-by: M. Mohan Kumar 
Tested-by: M. Mohan Kumar 
Reviewed-by: Anand Avati

glusterfs: zerofill support

2013-11-11T05:25:49+00:00

Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
the specified range. This fop will be useful when a whole file needs to
be initialized with zero (could be useful for zero filled VM disk image
provisioning or  during scrubbing of VM disk images).

Client/application can issue this FOP for zeroing out. Gluster server
will zero out required range of bytes ie server offloaded zeroing. In
the absence of this fop,  client/application has to repetitively issue
write (zero) fop to the server, which is very inefficient method because
of the overheads involved in RPC calls  and acknowledgements.

WRITESAME is a  SCSI T10 command that takes a block of data as input and
writes the same data to other blocks and this write is handled
completely within the storage and hence is known as offload . Linux ,now
has support for SCSI WRITESAME command which is exposed to the user in
the form of BLKZEROOUT ioctl.  BD Xlator can exploit BLKZEROOUT ioctl to
implement this fop. Thus zeroing out operations can be completely
offloaded to the storage device , making it highly efficient.

The fop takes two arguments offset and size. It zeroes out 'size' number
of bytes in an opened file starting from 'offset' position.

This patch adds zerofill support to the following areas:
	- libglusterfs
	- io-stats
	- performance/md-cache,open-behind
	- quota
	- cluster/afr,dht,stripe
	- rpc/xdr
	- protocol/client,server
	- io-threads
	- marker
	- storage/posix
	- libgfapi

Client applications can exloit this fop by using glfs_zerofill introduced in
libgfapi.FUSE support to this fop has not been added as there is no system call
for this fop.

Changes from previous version 3:
* Removed redundant memory failure log messages

Changes from previous version 2:
* Rebased and fixed build error

Changes from previous version 1:
* Rebased for latest master

TODO :
     * Add zerofill support to trace xlator
     * Expose zerofill capability as part of gluster volume info

Here is a performance comparison of server offloaded zeofill vs zeroing
out using repeated writes.

[root@llmvm02 remote]# time ./offloaded aakash-test log 20

real	3m34.155s
user	0m0.018s
sys	0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20

real	4m23.043s
user	0m2.197s
sys	0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;

real	4m28.363s
user	0m0.021s
sys	0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25

real	5m34.278s
user	0m2.957s
sys	0m18.808s

The argument log is a file which we want to set for logging purpose and
the third argument is size in GB .

As we can see there is a performance improvement of around 20% with this
fop.

Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
BUG: 1028673
Signed-off-by: Aakash Lal Das 
Signed-off-by: M. Mohan Kumar 
Reviewed-on: http://review.gluster.org/5327
Tested-by: Gluster Build System 
Reviewed-by: Vijay Bellur

cluster/afr: Have common inode-write-fop cbk

2013-09-18T21:01:52+00:00

Change-Id: Ia7b324b86d6a7051d187106d7a060155e77defc5
BUG: 910217
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/5238
Reviewed-by: Ravishankar N 
Tested-by: Gluster Build System 
Reviewed-by: Anand Avati

afr: make NOP truncate/ftruncate efficient

2013-09-03T23:09:09+00:00

If truncate/ftruncate is called with the offset as the current size
of file, then skip the durability fsync and unwind quickly.

Change-Id: I0baec68d96c6d4d8217d33bd9738f7ed0d1b40c5
BUG: 958118
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/5737
Reviewed-by: Pranith Kumar Karampuri 
Reviewed-by: Vijay Bellur 
Tested-by: Gluster Build System

cluster/afr: Add special handling for failure postops

2013-08-29T00:42:59+00:00

Idea is to not leave the file in FOOL-FOOL scenario in case on
all the bricks data transaction failed with EDQUOT to avoid
increasing un-necessary load of self-heals in the system.

For directory transactions don't leave pending changelog in case
the failures are seen on all the subvolumes.

Change-Id: I38a5561d1d581a78347a76a4a509514e4a0c3fb7
BUG: 969461
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/5709
Reviewed-by: Anand Avati 
Tested-by: Gluster Build System

cluster/afr: Don't delay post op in cases of failures

2013-08-28T22:40:31+00:00

Change-Id: Ib0c3af6babc61dc3ed45252582876e2f243d6446
BUG: 958118
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/5635
Tested-by: Gluster Build System 
Reviewed-by: Jeff Darcy 
Reviewed-by: Anand Avati

cluster/afr: Add largest file is source policy

2013-08-14T16:30:43+00:00

For Write Once Read Many times type of work-load choosing largest
file to be the source will always resolve fool-fool
scenarios correctly. In other cases we fsync() the files and
will have a reliable 'wise man'.

Change-Id: Ic4dbea8d06db6d578fbcb866fb65ee2d066ac7ba
BUG: 958118
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/5519
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

afr: treat appending writes as stable writes.

2013-08-14T06:45:03+00:00

Durability of appending writes is implicit in the file size. Therefore
performing an explicit fsync() is unnecessary in such cases as self-heal
can check for the size of file when pending changelog is not unambiguous.

Change-Id: I05446180a91d20e0dbee5de5a7085b87d57f178a
BUG: 927146
Signed-off-by: Anand Avati 
Reviewed-on: http://review.gluster.org/5501
Tested-by: Gluster Build System 
Reviewed-by: Pranith Kumar Karampuri

cluster/afr: Disable eager-lock if open-fd-count > 1

2013-08-02T09:25:55+00:00

Lets say mount1 has eager-lock(full-lock) and after the eager-lock
is taken mount2 opened the same file, it won't be able to
perform any data operations until mount1 releases eager-lock.
To avoid such scenario do not enable eager-lock for transaction
if open-fd-count is > 1. Delaying of changelog piggybacking is
avoided in this situation.

Change-Id: I51b45d6a7c216a78860aff0265a0b8dabc6423a5
BUG: 910217
Signed-off-by: Pranith Kumar K 
Reviewed-on: http://review.gluster.org/5432
Tested-by: Gluster Build System 
Reviewed-by: venkatesh somyajulu 
Reviewed-by: Vijay Bellur