summaryrefslogtreecommitdiffstats
path: root/xlators/performance
Commit message (Collapse)AuthorAgeFilesLines
* make snapview-server more compatible with NFS serverRaghavendra Bhat2014-07-161-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * There was no handle based API for listxattr. With this change, glfs_h_getxattrs also handles the listxattr functionality by checking whether the name is NULL or not (like posix). But all the gfapi functions for listxattr (glfs_h_getxattrs AND glfs_listxattr AND glfs_flistxattr) returns the names of the xattrs in a buffer provided by the caller. But snapview-server has to return the list of xattrs in a dict itself (similar to posix xlator). But the buffer just contains the names of the xattrs. So for each xattr, a zero byte value is set (i.e. "") into the dict and sent back. Translators which do xattr caching (as of now md-cache which caches selinux and acl related xattrs) should not cache those xattrs whose value is a zero byte data (""). So made changes in md-cache to ignore zero byte values. * NFS server was not linking the inodes to inode table in readdirp. This was leading to applications getting errors. The below set of operations would lead to applications getting error 1) ls -l in one of the snaopshots (snapview-server would generate gfids for each entry on the fly and link the inodes associated with those entries) 2) NFS server upon getting readdirp reply would not link the inodes of the entries. But it used to generate filehandles for each entry and associate the gfid of that entry with the filehandle and send it as part of the reply to nfs client. 3) NFS client would send the filehandle of one of those entries when some activity is done on it. 4) NFS server would not be able to find the inode for the gfid present in the filehandle (as the inode was not linked) and would go for hard resolution by sending a lookup on the gfid by creating a new inode. 5) snapview-client will not able to identify whether the inode is a real inode existing in the main volume or a virtual inode existing in the snapshots as there would not be any inode context. 6) Since the gfid upon which lookup is sent is a virtual gfid which is not present in the disk, lookup would fail and the application would get an error. To handle above situation, now nfs server also does inode linking in readdirp. Change-Id: Ibb191408347b6b5f21cff72319ccee619ea77bcd BUG: 1115949 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/8230 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* porting: Port for FreeBSD rebased from Mike Ma's effortsHarshavardhana2014-07-021-0/+1
| | | | | | | | | | | | | | | | | | | - Provides a working Gluster Management Daemon, CLI - Provides a working GlusterFS server, GlusterNFS server - Provides a working GlusterFS client - execinfo port from FreeBSD is moved into ./contrib/libexecinfo for ease of portability on NetBSD. (FreeBSD 10 and OSX provide execinfo natively) - More portability cleanups for Darwin, FreeBSD and NetBSD - Provides a new rc script for FreeBSD Change-Id: I8dff336f97479ca5a7f9b8c6b730051c0f8ac46f BUG: 1111774 Original-Author: Mike Ma <mikemandarine@gmail.com> Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/8141 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* performance/md-cache: Guard against null dictPranith Kumar K2014-07-011-1/+1
| | | | | | | | | BUG: 1114677 Change-Id: Ica4f4ad97d7d1edc3e48e7f1a6ec70b14acffc66 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/8205 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* build: MacOSX Porting fixesHarshavardhana2014-04-245-19/+25
| | | | | | | | | | | | | | | | | | | | | git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs Working functionality on MacOSX - GlusterD (management daemon) - GlusterCLI (management cli) - GlusterFS FUSE (using OSXFUSE) - GlusterNFS (without NLM - issues with rpc.statd) Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac BUG: 1089172 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Signed-off-by: Dennis Schafroth <dennis@schafroth.com> Tested-by: Harshavardhana <harsha@harshavardhana.net> Tested-by: Dennis Schafroth <dennis@schafroth.com> Reviewed-on: http://review.gluster.org/7503 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* build: do not create versioned <xlator>.so filesNiels de Vos2014-03-211-1/+1
| | | | | | | | | | | | | | | | | | There has been a misspelled option in the Makefile.am files. The option is called -avoid-version, and not -avoidversion. It is not trivial to provide a test-case for this. One way would be to check generated RPMs with a command like this (output should be empty): $ rpm -qlp *.rpm | grep -E '/xlator/.+.so.0' Change-Id: I2a6cc557eada4d098b73af5a254f8c75707543da BUG: 1078365 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/7299 Reviewed-by: Lalatendu Mohanty <lmohanty@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* write-behind: track filesize when doing extending writesNiels de Vos2014-02-271-4/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A program that calls mmap() on a newly created sparse file, may receive a SIGBUS signal. If SIGBUS is not handled, a segmentation fault will occur and the program will exit. A bug in the write-behind translator can cause the creation of a sparse file created with open(), seek(), write() to be cached. The last write() may not be sent to the server, until write-behind deems this necessary. * open(.., O_TRUNC, ...)/creat() the file, it is 0 bytes big * seek() into the file, use offset 31 * write() 1 byte to the file * the range from byte 0-30 are unwritten so called 'sparse' The following illustration tries to capture this: Legend: [ = start of file _ = unallocated/unwritten bytes # = allocated bytes in the file ] = end of file [_______________#] | | '- byte 0 '- byte 31 Without this change, reading from byte 0-30 will return an error, and reading the same area through an mmap()'d pointer will trigger a SIGBUS. Reading from this range did not trigger the outstanding write() to be flushed. The brick that receives the read() (translated over the network from mmap()) does not know that the file has been extended, and returns -EINVAL. This error gets transported back from the brick to the glusterfs-fuse client, and translated by the Linux kernel/VFS into SIGBUS triggered by mmap(). In order to solve this, a new attribute to the wb_inode structure is introduced; the current size of the file. All FOPs that can modify the size, are expected to update wb_inode->size. This makes it possible for extending writes with an offset bigger than EOF to mark the unwritten area as modified/pending. Change-Id: If5ba6646732e6be26568541ea9b12852a5d0b988 BUG: 1058663 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/6835 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/io-threads: Remove code duplicationPranith Kumar K2014-02-241-1896/+78
| | | | | | | | | Change-Id: Ic905cc6074c796efce2972857b79ab53700a2de4 BUG: 1065657 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/7010 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* io-cache: Unlock and then goto out in failure case.Raghavendra Talur2014-02-171-0/+1
| | | | | | | | | | | Fix for coverity bug CID:1124625 Change-Id: I76df453a17f2af7c48a80b6fc0ccd411ab96e371 BUG: 789278 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/6949 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/io-cache: Fix dereferencing of freed pointerPoornima2014-02-121-2/+6
| | | | | | | | | Change-Id: Ic4276c6d76c36f4eb77282dc06d2b8b212b58f08 BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6822 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* protocol/server: do not do root-squashing for trusted clientsRaghavendra Bhat2014-02-101-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * As of now clients mounting within the storage pool using that machine's ip/hostname are trusted clients (i.e clients local to the glusterd). * Be careful when the request itself comes in as nfsnobody (ex: posix tests). So move the squashing part to protocol/server when it creates a new frame for the request, instead of auth part of rpc layer. * For nfs servers do root-squashing without checking if it is trusted client, as all the nfs servers would be running within the storage pool, hence will be trusted clients for the bricks. * Provide one more option for mounting which actually says root-squash should/should not happen. This value is given priority only for the trusted clients. For non trusted clients, the volume option takes the priority. But for trusted clients if root-squash should not happen, then they have to be mounted with root-squash=no option. (This is done because by default blocking root-squashing for the trusted clients will cause problems for smb and UFO clients for which the requests have to be squashed if the option is enabled). * For geo-replication and defrag clients do not do root-squashing. * Introduce a new option in open-behind for doing read after successful open. Change-Id: I8a8359840313dffc34824f3ea80a9c48375067f0 BUG: 954057 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4863 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: Fix for the bugs reported by coverityPoornima2014-02-101-0/+2
| | | | | | | | | | Change-Id: I24c10d874511a2f24dda2fb84d31f5074da1616f BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6869 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/quick-read: Allocated memory not freed when not used.Christopher R. Hertel2014-02-081-2/+3
| | | | | | | | | | | | | | | If memory is successfully allocated in the call to qr_content_extract(), but is not used, it is not being freed. This patch frees the allocated memory if it is not passed to qr_content_refresh(). BUG: 789278 CID: 1124735 Change-Id: I1c1f03a3b92fa26321ec6ee8822e6fa41da79875 Signed-off-by: Christopher R. Hertel <crh@redhat.com> Reviewed-on: http://review.gluster.org/6827 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* quick-read: Remove unref of a freed iobuf.Poornima2014-02-071-1/+0
| | | | | | | | | Change-Id: Ie21414658db571c9a483730b6d5e8997f04255c1 BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6823 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Fix for 'use after free' errors reported by coverity.Poornima2014-02-052-2/+3
| | | | | | | | | | Change-Id: I941fc89b2d696c7f227330321ed4bba3ed1deac4 BUG: 789278 Signed-off-by: Poornima <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/6868 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/io-cache: reduce the severity of log-messageRaghavendra G2014-01-031-1/+1
| | | | | | | | | | | | | During a genuine error condition like network outage, the log grows with redundant information. Change-Id: I5a4f2f62da10ef656f14200c4c84a6917b1f0ddd Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1048084 Reviewed-on: http://review.gluster.org/6635 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* write-behind: handle iobref_merge() error gracefullyAnand Avati2013-11-261-2/+3
| | | | | | | | | | | .. by UNWINDing ENOMEM error, rather than crashing. Change-Id: Ica2d6399eaf7e381e7ebc41155620559c139c4d3 BUG: 1034398 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6349 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com>
* io-cache: handle iobref_merge() error gracefullyAnand Avati2013-11-261-1/+4
| | | | | | | | | | | | .. by UNWINDing ENOMEM, rather than leaving pointer in vector pointing to stale memory. Change-Id: I7f3917ac056fae144f845c9d123233e91e278187 BUG: 1034398 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6351 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com>
* read-ahead: handle iobref_merge() error gracefullyAnand Avati2013-11-261-1/+6
| | | | | | | | | | | | .. by UNWINDing ENOMEM rather than leaving pointers in vector which points to unref'ed (or even worse, re-used) iobufs. Change-Id: I849d8cbe5fc02ee992d4e28b7212c49aad4925c7 BUG: 1034398 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6350 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-145-6/+6
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs/inode: introduce new APIs for ctx handlingAmar Tumballi2013-11-131-1/+1
| | | | | | | | | | | | | | * inode_ctx_reset{0,1,2}() for reseting value1, value2, and both respectively * inode_ctx_get0() - to get the first value only * inode_ctx_set0() - to set the first value only * inode_ctx_get1() - to get the second value only * inode_ctx_set1() - to set the second value only Change-Id: I4dfbdac81d6a3f4e5784e060c76edabb1692ce03 Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/5890 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-105-0/+191
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gNFS: Incorrect NFS ACL encoding for XFSSantosh Kumar Pradhan2013-09-291-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Incorrect NFS ACL encoding causes "system.posix_acl_default" setxattr failure on bricks on XFS file system. XFS (potentially others?) doesn't understand when the 0x10 prefix is added to the ACL type field for default ACLs (which the Linux NFS client adds) which causes setfacl()->setxattr() to fail silently. NFS client adds NFS_ACL_DEFAULT(0x1000) for default ACL. FIX: Mask the prefix (added by NFS client) OFF, so the setfacl is not rejected when it hits the FS. Original patch by: "Richard Wareing" Change-Id: I17ad27d84f030cdea8396eb667ee031f0d41b396 BUG: 1009210 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5980 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: block unused signals in created threadsAnand Avati2013-09-251-1/+1
| | | | | | | | | | | | | | | Block all signal except those which are set for explicit handling in glusterfs_signals_setup(). Since thread spawning code in libglusterfs and xlators can get called from application threads when used through libgfapi, it is necessary to do this blocking. Change-Id: Ia320f80521a83d2edcda50b9ad414583a0175281 BUG: 1011662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/readdir-ahead: introduce directory read-ahead translatorBrian Foster2013-09-046-1/+649
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a translator to improve the performance of typical, sequential directory reads (i.e., ls). readdir-ahead begins preloading the contents of a directory on open and serves readdir requests from the preloaded content. readdir-ahead is currently implemented to only handle the single threaded directory read case. readdir-ahead is currently disabled by default. It can be enabled with the following command: gluster volume set <volname> readdir-ahead on The following are results of a getdents test on a single brick volume. Test info: - Single VM, gluster client/server. - Volume mounted with native client using --gid-timeout=2. - getdents on single directory with 100k 0-byte files. Test results: - !readdir-ahead read 3120080 bytes from offset 0 3 MiB, 4348 ops, 0:00:07.00 (416.590 KiB/sec and 594.4737 ops/sec) - readdir-ahead read 3120080 bytes from offset 0 3 MiB, 4348 ops, 0:00:03.00 (820.116 KiB/sec and 1170.3043 ops/sec) BUG: 980517 Change-Id: Ieceb9e1eb47d1d5b5af8da2bf03839537364653f Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4519 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mount/fuse: perform lookup() on inodes linked through readdirplusAnand Avati2013-08-231-0/+7
| | | | | | | | | | | | | Some xlators still require lookup() fop to be sent for proper working. This patch remembers inodes which have been linked through readdiprlus and makes the resolver send lookups on them. Change-Id: Ibe8a04a659539d90dfc794521b51bf2bda017a0b BUG: 979910 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5267 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* io-cache: fix unsafe typcasting of pointer to uint64Anand Avati2013-08-221-1/+3
| | | | | | | | | | | | | | | The typecast of pointer to uint64_t *, followed by setting of 64bit in inode_ctx_get() results in memory corruption on 32bit system. Change-Id: I32fa3bf3b853ed2690a9b9a471099a59b9d7186a BUG: 997902 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5682 Tested-by: Morten Johansen <morten@bzzt.no> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* md-cache: invalidate attributes on xattr updateAnand Avati2013-08-191-0/+164
| | | | | | | | | | | | | xattr update will result in at least ctime change. So invalidate attributes in xattr callback. Change-Id: Ie6e8f2fd9a11c56c27e78bd58c2ff1e1d6edce6e BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5641 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* performance/write-behind: invoke request queue processing ifRaghavendra G2013-08-141-19/+30
| | | | | | | | | | | | | | | | | | | | | | we find fd marked bad while trying to fulfill lies. * flush was queued behind some unfulfilled write. * A previously wound write returned an error and hence fd was marked bad with corresponding error. * wb_fulfill_head (invocation probably rooted in wb_flush), before winding checks for failures of previous writes and since there was a failure, calls wb_head_done without even winding one request in head. * wb_head_done unrefs all the requests in list "head". * since flush was last operation on fd (and most likely last operation on inode itself), no one invokes wb_process_queue and flush is stuck in request queue for eternity. Change-Id: I3b5b114a1c401d477dd7ff64fb6119b43fda2d18 BUG: 988642 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/5398 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* md-cache: fix xattr caching code in getxattrAnand Avati2013-08-071-2/+2
| | | | | | | | | | | | Bad condition check, fix it! Change-Id: I6e047de70f77d7b98b2ca771a467f14a76fd62fe BUG: 994392 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5513 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/open-behind: Fix fd-leaks in unlink, renamePranith Kumar K2013-08-031-0/+4
| | | | | | | | | Change-Id: Ia8d4bed7ccd316a83c397b53b9c1b1806024f83e BUG: 991622 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5493 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/io-threads: fix potential use after free crashBrian Foster2013-08-011-1/+1
| | | | | | | | | | | | | | | | | | | do_iot_schedule() enqueues the stub and kicks the worker thread. The stub is eventually destroyed after it has been resumed and thus unsafe to access after being enqueued. Though likely difficult to reproduce in a real deployment, a crash is reproducible by running a smallfile benchmark on a replica 2 volume on a single vm. Reorder the debug log message prior to the do_iot_schedule() call to avoid the crash. BUG: 989579 Change-Id: Ifc6502c02ae455c959a90ff1ca62a690e31ceafb Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5418 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* write-behind: preserve error returned as-isAmar Tumballi2013-07-241-5/+1
| | | | | | | | | Change-Id: Ib766403774c1323e0bbddafedeaa47e7fa3a59fa Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 987415 Reviewed-on: http://review.gluster.org/5296 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/io-cache: check for non-null gfid before calling inode_pathRaghavendra G2013-07-101-10/+13
| | | | | | | | | | | | | | | A new non-linked inode is added to lru list. Hence it might be possible that gfid might be NULL when inode_dump is called. To pass asserts in inode_path, we've to check for non-null gfid before invoking that procedure. Signed-off-by: Raghavendra G <raghavendra@gluster.com> Change-Id: Iff14efc6d6e2faa33b9f7a81e0a66f6a947b77ed BUG: 976189 Reviewed-on: http://review.gluster.org/5241 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: discard (hole punch) supportBrian Foster2013-06-135-0/+190
| | | | | | | | | | | | | | | | Add support for the DISCARD file operation. Discard punches a hole in a file in the provided range. Block de-allocation is implemented via fallocate() (as requested via fuse and passed on to the brick fs) but a separate fop is created within gluster to emphasize the fact that discard changes file data (the discarded region is replaced with zeroes) and must invalidate caches where appropriate. BUG: 963678 Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5090 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gluster: add fallocate fop supportBrian Foster2013-06-133-1/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement support for the fallocate file operation. fallocate allocates blocks for a particular inode such that future writes to the associated region of the file are guaranteed not to fail with ENOSPC. This patch adds fallocate support to the following areas: - libglusterfs - mount/fuse - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi BUG: 949242 Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4969 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/write-behind: Enable write-behind when strict_O_DIRECT is not set.Vijay Bellur2013-05-281-2/+1
| | | | | | | | | | | | | When open() with O_DIRECT happens, write-behind was being disabled for the fd irrespective of strict_O_DIRECT option. This commit disables write-behind only when strict_O_DIRECT is enabled. Change-Id: Ieef180e52910c3bf64d46b26b0e5dc3b8542f6d2 BUG: 923556 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4697 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* md-cache: support negative xattr entriesAnand Avati2013-05-251-10/+31
| | | | | | | | | | | | | | | Add support for negative xattr caching. For this, we need to fetch xattrs in every opportunity (including readdirplus) in order to treat missing key in cached dict as negative entry. This is crucial to detect missing ACL xattrs in Samba workload. Change-Id: I918a2ef4ab804724256f7546b15e808332ed518d BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4929 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* xlator: NULL terminate volume_options structSantosh Kumar Pradhan2013-05-222-5/+7
| | | | | | | | | | | | | | | | | Problem: volume_options struct for open-behind and quick-read xlators were not NULL terminated. Fix: Make them NULL terminated. Change-Id: I2615a1f15c6e5674030a219a99ddf91596bf346b BUG: 965995 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/5064 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* md-cache: Make options structure NULL terminated.Krishnan Parthasarathi2013-05-201-0/+1
| | | | | | | | | Change-Id: I8aa4f90ba7e1eecf3f978be04f8550049275464f BUG: 765785 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4994 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Anand Avati <avati@redhat.com>
* quick-read: prune cache on write/[f]truncateAnand Avati2013-05-201-0/+43
| | | | | | | | | | | | | Cache needs to be pruned on write and [f]truncate. The lack of this is causing Samba ping-pong test to return wierd 'data increment' values during startup. Change-Id: I9cd6a839bcd02de738d78638211b78f382f58e0a BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5033 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mount/fuse: enable fuse real async dio when availableBrian Foster2013-05-151-0/+5
| | | | | | | | | | | | | | | | | | | | | fuse has support for optimized async. direct I/O handling via the FUSE_ASYNC_DIO init flag. Enable FUSE_ASYNC_DIO when advertised by fuse. performance/write-behind: fix dio hang Also fix a hang observed during aio-stress testing due to conflicting request handling in write-behind. Overlapping requests are skipped in pick_winds and may never continue when the conflicting write in progress returns. Add a wb_process_queue() call after a non-wb request completes to keep the queue moving. BUG: 963258 Change-Id: Ifba6e8aba7a7790b288a32067706b75f263105d4 Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5014 Reviewed-by: Anand Avati <avati@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/io-cache: check the inode context to be NULL before accessingRaghavendra Bhat2013-05-011-0/+7
| | | | | | | | | | Change-Id: I475af7f8ffd5e5d8adbd2a74af20e56ad7751f69 BUG: 958108 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4916 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Avoid self-healing extended attribute used by SELinux.Vijay Bellur2013-04-301-1/+1
| | | | | | | | | | | | | | | | | | Since removexattr() fails to remove "security.selinux" in a system where SELinux is enforcing, xattr self-healing fails. As a consequence of this, user extended attributes are not being healed. Added a check in afr to prune SELinux xattr from the dictionary used for removing xattrs from the sink. Minor changes in tests and md-cache as well. Signed-off-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I854bfc0098dde812ce2afe64b125ee40c04bdeb1 BUG: 957877 Reviewed-on: http://review.gluster.org/4905 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* performance/io-cache: Avoid double mem_put in ioc_readvPranith Kumar K2013-04-261-2/+3
| | | | | | | | | | | | | On readv error io-cache frame->local is not set to NULL so the local is mem_put in STACK_DESTROY as well. This patch sets frame->local to NULL in all cases. Change-Id: I00013df1377475aa5f3c0c681dcb58b32e1e8063 BUG: 955751 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4884 Reviewed-by: Raghavendra G <raghavendra@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/io-threads: Fix range-check for least-rate-limitPranith Kumar K2013-03-211-0/+1
| | | | | | | | | | | | | The issue could be fixed with .validate=GF_OPT_VALIDATE_MIN. But adding max value is more robust. Change-Id: Ia69c6f86855dbd34a26e20391e77bfa0f796a200 BUG: 923573 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4698 Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* performance/write-behind: guarantee non-overlapping concurrent writesAnand Avati2013-02-201-1/+65
| | | | | | | | | | | | | | | | | Maintain a list of writes (either written behind or SYNC) which are currently "in progress" (i.e, STACK_WIND'ed towards server) and hold off any new STACK_WIND of write (either written behind or SYNC) which overlaps with any of the "in progress" writes. This is a guarantee which AFR's eager-lock depends upon (though not strictly a write-behind requirement) Change-Id: Icedd0b51b440366a906dc9223d62b7fd6ef2ca03 BUG: 857673 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4551 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <raghavendra@gluster.com>
* read-ahead: re-enable support for variable page sizeAnand Avati2013-02-201-1/+12
| | | | | | | | | | | | | | | | Support for variable size page-size was disabled with the introduction of fixed size iobufs. Since the introduction of variable sized iobufs there is no reason to not have configurable page-size in read-ahead. This patch enables necessary changes in the translator for configurable page-size. Change-Id: I677d70fef50641eb041269aca92a088b9d4961cc BUG: 764204 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4526 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <raghavendra@gluster.com>
* open-behind: propagate errors from ob_wake_cbkAnand Avati2013-02-201-9/+25
| | | | | | | | | | | | | | | If opening fd in background fails, then remember the error and fail all further calls on the fd. Use the newly introduced call_unwind_error() function from call-stub cleanup to fail the future calls. Change-Id: I3b09b7969c98d915abd56590a2777ce833b81813 BUG: 846240 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4521 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* call-stub: internal refactorAnand Avati2013-02-191-40/+40
| | | | | | | | | | | | | | | | | - re-structure members of call_stub_t with new simpler layout - easier to inspect call_stub_t contents in gdb now - fix a bunch of double unrefs and double frees in cbk stub - change all STACK_UNWIND to STACK_UNWIND_STRICT and thereby fixed a lot of bad params - implement new API call_unwind_error() which can even be called on fop_XXX_stub(), and not necessarily fop_XXX_cbk_stub() Change-Id: Idf979f14d46256af0afb9658915cc79de157b2d7 BUG: 846240 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4520 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* performance/open-behind: use anonymous fd for doing fstat and readvRaghavendra Bhat2013-02-191-2/+2
| | | | | | | | | Change-Id: I61a3c221e0a15736ab6315e2538c03dac27480a5 BUG: 846240 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/4483 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>