summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/syncop.h
Commit message (Collapse)AuthorAgeFilesLines
* Add GF_FOP_IPC for inter-translator communication.Jeff Darcy2014-03-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several features - e.g. encryption, erasure codes, or NSR - involve multiple cooperating translators which sometimes need a "private" means of communication amongst themselves. Historically we've used virtual or synthetic xattrs, but that's not very elegant and clutters up the getxattr/setxattr path which must also handle real xattr requests. This new fop should address that. The only argument is an int32_t "op" which should be recognized by the target translator. It is recommended that translators using these feature follow some convention regarding the ops that they define, to avoid conflicts. Using a hash of the target translator's type string as a base for a series of ops would probably be a good start. Any other information can be passed in both directions using xdata. The default behavior for this fop, as with any other, is to pass through to FIRST_CHILD. That makes use of this fop "transparent" to other translators that were written before it existed, but it also means that it only really works with pass-through translators. If a routing translator (such as DHT) or a fan-out translator (such as AFR) is involved, the IPC might not reach its intended destination unless those translators are modified to forward IPC fops along all paths. If an IPC gets all the way to storage/posix it is considered an error, much like an uncaught exception. We don't actually *do* anything in that case, but we do flag it as an error in the log. Change-Id: I7f37c9247ee35536f8136c7aea758e6fe04616c4 Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
* Merge branch 'upstream' into mergeJeff Darcy2014-03-041-4/+13
|\ | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Conflicts: api/src/glfs-fops.c libglusterfs/src/syncop.c libglusterfs/src/syncop.h Change-Id: I8c3fa7a20fb167d9e6bc2749e177c0c8b366827b
| * syncops: add support for custom PIDAnand Avati2014-02-131-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | AFR self-heal needs to issue syncops with special PID. Extend the custom UID/GID support to include custom PIDs Change-Id: I736c0e177f862b029f203acc87f9eb46c8cb839b BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6888 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
| * core: add @xdata parameter to syncop_[f]removexattr()Anand Avati2014-02-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | To be used in afr metadata self-heal Change-Id: I8dac4b19d61e331702427eeb5b606aab3d20b328 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6941 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* | Roll-up patch for NSR so far.Jeff Darcy2013-12-111-0/+32
|/ | | | | | | | Previous history: https://forge.gluster.org/~jdarcy/glusterfs-core/glusterfs-nsr Change-Id: I2b56328788753c6a74d9589815f2dd705ac9ce6a Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
* syncops: expose @flags in syncop_rmdir()Anand Avati2013-11-211-1/+1
| | | | | | | | | Change-Id: I9b73c1db728e4cb3948fc118cceb292b21d48b96 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6112 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-141-1/+1
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gfapi: object handle based API extensionsR.Shyamsundar2013-10-111-17/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an ongoing effort to integrate NFS Ganesha ( https://github.com/nfs-ganesha/nfs-ganesha/wiki ) with GlusterFS as one of the file system back ends. Towards this we need extensions to gfapi that can handle object based operations. Meaning, instead of using full paths or relative paths from cwd, it is required that we can work with APIs, like the *at POSIX variants, to be able to create, lookup, open etc. files and directories. Hence the objects are the files or directories themselves and we give out handles to these objects that can be used for further operations. This code drop is an initial implementation of the proposed APIs. The new APIs are implemented as glfs_h_XXX variants in the file glfs-handleops.c to mirror glfs-fops.c style. The code leverages holding onto inode references and doling these out as opaque/cookie type objects to the callers, to enable them to be used as handles in other operations. An fd based approach was considered, but due to the extra footprint that the fd structure and its counterparts would incur, this was dropped to take the approach of holding inode references themselves. Tested by extending glfsxmp.c to invoke and exercise the added APIs, and further tested with a reference integration of the same as an FSAL with NFS Ganesha. Change-Id: I23629c99e905b54070fa2e6565147812e5f3fa5d BUG: 1016000 Signed-off-by: R.Shyamsundar <srangana@redhat.com> Reviewed-on: http://review.gluster.org/5936 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* synctask: minor enhancementsAnand Avati2013-08-281-1/+8
| | | | | | | | | | | | | | | | | | | | | | | - Enhance syncenv_new() to accept scaling parameters of syncproc. Previously the scaling parameters were hardcoded and decided at compile time. - New API synctask_create() which returns the created synctask. This is similar to synctask_new which only returned the status of whether a synctask could be created or not. The meaning of NULL cbk in synctask_create() means the task is "joinable". Until synctask_join() is called on such a synctask, the task is not reaped and resources are not destroyed. The task would be in a zombie state after synctask_fn returns and before synctask_join() is called. Change-Id: I368ec9037de9510d2ba951f0aad86aaf18d9a6b6 BUG: 986775 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5365 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* core: increase the auxillary group limit to 65536Anand Avati2013-07-241-1/+16
| | | | | | | | | | | | | Make the allocation of groups dynamic and increase the limit to 65536. Change-Id: I702364ff460e3a982e44ccbcb3e337cac9c2df51 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5111 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfs: discard (hole punch) supportBrian Foster2013-06-131-0/+1
| | | | | | | | | | | | | | | | Add support for the DISCARD file operation. Discard punches a hole in a file in the provided range. Block de-allocation is implemented via fallocate() (as requested via fuse and passed on to the brick fs) but a separate fop is created within gluster to emphasize the fact that discard changes file data (the discarded region is replaced with zeroes) and must invalidate caches where appropriate. BUG: 963678 Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5090 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gluster: add fallocate fop supportBrian Foster2013-06-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement support for the fallocate file operation. fallocate allocates blocks for a particular inode such that future writes to the associated region of the file are guaranteed not to fail with ENOSPC. This patch adds fallocate support to the following areas: - libglusterfs - mount/fuse - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi BUG: 949242 Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4969 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: link inodes in relevant entry FOPsAnand Avati2013-05-251-4/+5
| | | | | | | | | | | | Do not let inode linking to happen only in lookup(). While that works, it is inefficient. Change-Id: I51bbfb6255ec4324ab17ff00566375f49d120c06 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4931 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* syncop: synctask shouldn't yawn, it could miss a 'wake'Krishnan Parthasarathi2013-05-211-9/+6
| | | | | | | | | Change-Id: I7731fd33ca0c925cc52f8d105275b44fc625a1e2 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5058 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Update synctask state appropriatelyKrishnan Parthasarathi2013-05-201-2/+0
| | | | | | | | | | | | | | | | | | | | * Earlier, SYNCOP macro, the only consumer of synctask_yield, would set the task->state to SYNCTASK_SUSPEND. Today, we have glusterd having its own wrapper macros which don't set task's state. There is also the syncbarrier and synclock framework, which also participate in a synctask's scheduling (and need to keep a task's state up to date). It only makes more sense to leave a synctask's state to the synctask library, since its an internal affair. * Need to 'yawn' before 'yield' to avoid re-running tasks to set task->woken appropriately. Change-Id: Ic7a59e6ebcc46f03e53223ca237668d45a3cba40 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4985 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* synctask: implement barriers around yield, not the other wayAnand Avati2013-05-041-13/+27
| | | | | | | | | | | | | | | | | | | | In the current implementation, barriers are in the core of the syncprocessors. Wake()s are treated as syncbarrier wake. This is however delicate, as spurious wake()s of the synctask can mess up the accounting of the barrier and waking it prematurely. The fix is to keep yield() and wake() as the basic primitives, and implement barriers as an object impelemented on top of these primitives. This way, only an explicit barrier_wake() gets counted towards the barrier accounting, and spurious wakes will be truly safe. Change-Id: I8087f0f446113e5b2d0853431c0354335ccda076 BUG: 948686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4921 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* gfapi: POSIX locking supportAnand Avati2013-04-241-0/+3
| | | | | | | | | Change-Id: I37d9e1fb4a715094876be6af3856c1b4cf398021 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4881 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: set credentials of running process in @frameAnand Avati2013-04-241-1/+18
| | | | | | | | | | | | | | | Inherit the pid/euid/egid/groups of the running process in the frame. Do this only in cases where a loaded frame was not presented to the synctask. This behavior is required for Samba VFS. Change-Id: Ib181c90f47c6741197b9ce9f67a19e2914b647d2 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4878 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* synctask: introduce synclocks for co-operative lockingAnand Avati2013-04-021-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a synclocks - co-operative locks for synctasks. Synctasks yield themselves when a lock cannot be acquired at the time of the lock call, and the unlocker will wake the yielded locker at the time of unlock. The implementation is safe in a multi-threaded syncenv framework. It is also safe for sharing the lock between non-synctasks. i.e, the same lock can be used for synchronization between a synctask and a regular thread. In such a situation, waiting synctasks will yield themselves while non-synctasks will sleep on a cond variable. The unlocker (which could be either a synctask or a regular thread) will wake up any type of lock waiter (synctask or regular). Usage: Declaration and Initialization ------------------------------ synclock_t lock; ret = synclock_init (&lock); if (ret) { /* lock could not be allocated */ } Locking and non-blocking lock attempt ------------------------------------- ret = synclock_trylock (&lock); if (ret && (errno == EBUSY)) { /* lock is held by someone else */ return; } synclock_lock (&lock); { /* critical section */ } synclock_unlock (&lock); Change-Id: I081873edb536ddde69a20f4a7dc6558ebf19f5b2 BUG: 763820 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4717 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <raghavendra@gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterd: Increasing throughput of synctask based mgmt ops.Krishnan Parthasarathi2013-02-261-0/+1
| | | | | | | | | | Change-Id: Ibd963f78707b157fc4c9729aa87206cfd5ecfe81 BUG: 913662 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4570 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* synctask: support for (assymetric) counted barriersAnand Avati2013-02-211-19/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new set of primitives: - synctask_barrier_init (stub) - synctask_barrier_waitfor (stub, count) - synctask_barrier_wake (stub) Unlike pthread_barrier_t, this barrier has an explicit notion of "waiter" and "waker". The "waiter" waits for @count number of "wakers" to call synctask_barrier_wake() before returning. The wait performed by the waiter via synctask_barrier_waitfor() is co-operative in nature and yields the thread for scheduling other synctasks in the mean time. Intended use case: Eliminate excessive serialization in glusterd and allow for concurrent RPC transactions. Code which are currently in this format: ---old--- list_for_each_entry (peerinfo, peers, op_peers_list) { ... GD_SYNCOP (peerinfo->rpc, stub, rpc_cbk, ...); } ... int rpc_cbk (rpc, stub, ...) { ... __wake (stub); } ---old--- Can be restructred into the format: ---new--- synctask_barrier_init (stub); { list_for_each_entry (peerinfo, peers, op_peers_list) { ... rpc_submit (peerinfo->rpc, stub, rpc_cbk, ...); count++; } } synctask_barrier_wait (stub, count); ... int rpc_cbk (rpc, stub, ...) { ... synctask_barrier_wake (stub); } ---new--- In the above structure, from the synctask's point of view, the region between synctask_barrier_init() and synctask_barrier_wait() are spawning off asynchronous "threads" (or RPC) and keep count of how many such threads have been spawned. Each of those threads are expected to make one call to synctask_barrier_wake(). The call to synctask_barrier_wait() makes the synctask thread co-operatively wait/sleep till @count such threads call their wake function. This way, the synctask thread retains the "synchronous" flow in the code, yet at the same time allows for asynchronous "threads" to acheive parallelism over RPC. Change-Id: Ie037f99b2d306b71e63e3a56353daec06fb0bf41 BUG: 913662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4558 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* syncop: Fixed indentation and whitespaces.Krishnan Parthasarathi2013-02-201-76/+76
| | | | | | | | | Change-Id: I90e496b5d5027ac702ab3804ba52f26d537812a0 BUG: 764890 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4554 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* synctask: implement setuid-like SYNCTASK_SETID()Anand Avati2013-02-131-0/+11
| | | | | | | | | | | | | | | synctasks can now call SYNCTASK_SETID(uid,gid) to set the effective uid/gid of the frame with which the FOP will be performed. Once called, the uid/gid is set either till the end of the synctask or till the next call of SYNCTASK_SETID() Change-Id: I7eb74f7c473099bcae39310d2ab353d58f8eb2ba BUG: 884597 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4269 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <sgowda@redhat.com>
* syncop: fix symlink paramAnand Avati2012-10-031-1/+2
| | | | | | | | | | | | make syncop_symlink() accept 'const char *linkname' instead of 'char *linkname' Change-Id: I7751d552e4a4cc6e8b8e587b9e520213f4e11b45 BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4020 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* syncop: Implement some missing operationsAnand Avati2012-10-031-0/+5
| | | | | | | | | | | | | - syncop_mkdir() - syncop_rmdir() - syncop_rename() Change-Id: I177db0f9af7c99fc6645d59521c8fb82f73812ca BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4019 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* syncop: Implement access fopPranith Kumar K2012-09-191-0/+1
| | | | | | | | | Change-Id: I959144451790d7e47ae48564923d324451a9db23 BUG: 858602 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3958 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* libglusterfs: Implementation of syncop_fsyncdirPranith Kumar K2012-09-061-0/+1
| | | | | | | | | | Change-Id: I832b9c0bfbe804fbca98dc9e8fbe7d3174fecc82 BUG: 854326 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3902 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Added scaling down logicPranith Kumar K2012-08-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCA: Whenever the self-heald tests are done with more than 16 replicates The number of sync procs goes to > 2. These threads never die. Fix: Added scaling down logic in syncops so that the threads terminate themselves whenever the extra thread is idle for ~10 minutes. Minimum number of threads is still 2. Tests: Added logs for launching and terminating procs, made timeout to 6 seconds and ran volume-heal in a while loop. After logs say max number of procs are launched, attached process to gdb and verified that the number of syncop threads are 16. Stopped volume-heal and observed the logs for terminating the procs. Attached gdb to process again to check that the syncop threads are just 2. Did this 5 times. Things worked fine. Which procs were terminated was random. No proc structure was erroneously re-used. Procs never exceeded 16 and were never < 2. Change-Id: I61dd9c25cc478ac8cbda190bee841a995b93c55c BUG: 814074 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.org/3195 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: handle 'dataonly' flag in syncop_fsync()Amar Tumballi2012-08-201-1/+1
| | | | | | | | | | | | * and also in syncop_readv(), don't look at _cbk args if op_ret is < 0. Change-Id: I3ab2982bc6d186e75b6adb74c8981e4ff7058bbe Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 839950 Reviewed-on: http://review.gluster.org/3828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: accomodate non-syncenv callsAnand Avati2012-07-131-7/+60
| | | | | | | | | | | | | | | | Use mutex/cond and support syncop_XXXXXX() calls in non-syncenv environments. syncenv environments continue to use swapcontext based soft context switches. In non-syncenv environments this blocks the caller thread on the mutex. The intended use case is in libgfapi where it is expected to block the caller thread while performing synchronous calls. Change-Id: Id6470c99bdc2fe4b7610372139f7fa99b2da400b BUG: 839950 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.com/3662 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: syncop for flush ()Rajesh Amaravathi2012-07-131-0/+1
| | | | | | | | | | Change-Id: I17f925345782313c75102c4767121ba8e283028e BUG: 764813 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/3667 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* license: dual license under GPLV2 and LGPLV3+Kaleb KEITHLEY2012-05-101-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the license was not changed in any of the following: .../argp-standalone/... .../booster/... .../cli/... .../contrib/... .../extras/... .../glusterfsd/... .../glusterfs-hadoop/... .../mod_clusterfs/... .../scheduler/... .../swift/... The license was not changed in any of the non-building xlators. The license was not changed in any of the xlators that seemed — to me — to be clearly server-side only, e.g. protocol/server Note too that copyright was changed along with the license; I did not change the copyright in files where the license did not change. If you find any errors or ommissions please don't hesitate to let me know. The complete list of files with the license change is: libglusterfs/src/byte-order.h libglusterfs/src/call-stub.c libglusterfs/src/call-stub.h libglusterfs/src/checksum.c libglusterfs/src/checksum.h libglusterfs/src/circ-buff.c libglusterfs/src/circ-buff.h libglusterfs/src/common-utils.c libglusterfs/src/common-utils.h libglusterfs/src/compat-errno.c libglusterfs/src/compat-errno.h libglusterfs/src/compat.c libglusterfs/src/compat.h libglusterfs/src/daemon.c libglusterfs/src/daemon.h libglusterfs/src/defaults.c libglusterfs/src/defaults.h libglusterfs/src/dict.c libglusterfs/src/dict.h libglusterfs/src/event-history.c libglusterfs/src/event-history.h libglusterfs/src/event.c libglusterfs/src/event.h libglusterfs/src/fd-lk.c libglusterfs/src/fd-lk.h libglusterfs/src/fd.c libglusterfs/src/fd.h libglusterfs/src/gf-dirent.c libglusterfs/src/gf-dirent.h libglusterfs/src/globals.c libglusterfs/src/globals.h libglusterfs/src/glusterfs.h libglusterfs/src/graph-print.c libglusterfs/src/graph-utils.h libglusterfs/src/graph.c libglusterfs/src/hashfn.c libglusterfs/src/hashfn.h libglusterfs/src/iatt.h libglusterfs/src/inode.c libglusterfs/src/inode.h libglusterfs/src/iobuf.c libglusterfs/src/iobuf.h libglusterfs/src/latency.c libglusterfs/src/latency.h libglusterfs/src/list.h libglusterfs/src/lkowner.h libglusterfs/src/locking.h libglusterfs/src/logging.c libglusterfs/src/logging.h libglusterfs/src/mem-pool.c libglusterfs/src/mem-pool.h libglusterfs/src/mem-types.h libglusterfs/src/options.c libglusterfs/src/options.h libglusterfs/src/rbthash.c libglusterfs/src/rbthash.h libglusterfs/src/run.c libglusterfs/src/run.h libglusterfs/src/scheduler.c libglusterfs/src/scheduler.h libglusterfs/src/stack.c libglusterfs/src/stack.h libglusterfs/src/statedump.c libglusterfs/src/statedump.h libglusterfs/src/syncop.c libglusterfs/src/syncop.h libglusterfs/src/syscall.c libglusterfs/src/syscall.h libglusterfs/src/timer.c libglusterfs/src/timer.h libglusterfs/src/trie.c libglusterfs/src/trie.h libglusterfs/src/xlator.c libglusterfs/src/xlator.h libglusterfsclient/src/libglusterfsclient-dentry.c libglusterfsclient/src/libglusterfsclient-internals.h libglusterfsclient/src/libglusterfsclient.c libglusterfsclient/src/libglusterfsclient.h rpc/rpc-lib/src/auth-glusterfs.c rpc/rpc-lib/src/auth-null.c rpc/rpc-lib/src/auth-unix.c rpc/rpc-lib/src/protocol-common.h rpc/rpc-lib/src/rpc-clnt.c rpc/rpc-lib/src/rpc-clnt.h rpc/rpc-lib/src/rpc-transport.c rpc/rpc-lib/src/rpc-transport.h rpc/rpc-lib/src/rpcsvc-auth.c rpc/rpc-lib/src/rpcsvc-common.h rpc/rpc-lib/src/rpcsvc.c rpc/rpc-lib/src/rpcsvc.h rpc/rpc-lib/src/xdr-common.h rpc/rpc-lib/src/xdr-rpc.c rpc/rpc-lib/src/xdr-rpc.h rpc/rpc-lib/src/xdr-rpcclnt.c rpc/rpc-lib/src/xdr-rpcclnt.h rpc/rpc-transport/rdma/src/name.c rpc/rpc-transport/rdma/src/name.h rpc/rpc-transport/rdma/src/rdma.c rpc/rpc-transport/rdma/src/rdma.h rpc/rpc-transport/socket/src/name.c rpc/rpc-transport/socket/src/name.h rpc/rpc-transport/socket/src/socket.c rpc/rpc-transport/socket/src/socket.h xlators/cluster/afr/src/afr-common.c xlators/cluster/afr/src/afr-dir-read.c xlators/cluster/afr/src/afr-dir-read.h xlators/cluster/afr/src/afr-dir-write.c xlators/cluster/afr/src/afr-dir-write.h xlators/cluster/afr/src/afr-inode-read.c xlators/cluster/afr/src/afr-inode-read.h xlators/cluster/afr/src/afr-inode-write.c xlators/cluster/afr/src/afr-inode-write.h xlators/cluster/afr/src/afr-lk-common.c xlators/cluster/afr/src/afr-mem-types.h xlators/cluster/afr/src/afr-open.c xlators/cluster/afr/src/afr-self-heal-algorithm.c xlators/cluster/afr/src/afr-self-heal-algorithm.h xlators/cluster/afr/src/afr-self-heal-common.c xlators/cluster/afr/src/afr-self-heal-common.h xlators/cluster/afr/src/afr-self-heal-data.c xlators/cluster/afr/src/afr-self-heal-entry.c xlators/cluster/afr/src/afr-self-heal-metadata.c xlators/cluster/afr/src/afr-self-heal.h xlators/cluster/afr/src/afr-self-heald.c xlators/cluster/afr/src/afr-self-heald.h xlators/cluster/afr/src/afr-transaction.c xlators/cluster/afr/src/afr-transaction.h xlators/cluster/afr/src/afr.c xlators/cluster/afr/src/afr.h xlators/cluster/afr/src/pump.c xlators/cluster/afr/src/pump.h xlators/cluster/dht/src/dht-common.c xlators/cluster/dht/src/dht-common.h xlators/cluster/dht/src/dht-diskusage.c xlators/cluster/dht/src/dht-hashfn.c xlators/cluster/dht/src/dht-helper.c xlators/cluster/dht/src/dht-inode-read.c xlators/cluster/dht/src/dht-inode-write.c xlators/cluster/dht/src/dht-layout.c xlators/cluster/dht/src/dht-linkfile.c xlators/cluster/dht/src/dht-mem-types.h xlators/cluster/dht/src/dht-rebalance.c xlators/cluster/dht/src/dht-rename.c xlators/cluster/dht/src/dht-selfheal.c xlators/cluster/dht/src/dht.c xlators/cluster/dht/src/nufa.c xlators/cluster/dht/src/switch.c xlators/cluster/stripe/src/stripe-helpers.c xlators/cluster/stripe/src/stripe-mem-types.h xlators/cluster/stripe/src/stripe.c xlators/cluster/stripe/src/stripe.h xlators/features/index/src/index-mem-types.h ¹ xlators/features/index/src/index.c ¹ xlators/features/index/src/index.h ¹ xlators/performance/io-cache/src/io-cache.c xlators/performance/io-cache/src/io-cache.h xlators/performance/io-cache/src/ioc-inode.c xlators/performance/io-cache/src/ioc-mem-types.h xlators/performance/io-cache/src/page.c xlators/performance/io-threads/src/io-threads.c xlators/performance/io-threads/src/io-threads.h xlators/performance/io-threads/src/iot-mem-types.h xlators/performance/md-cache/src/md-cache-mem-types.h xlators/performance/md-cache/src/md-cache.c xlators/performance/quick-read/src/quick-read-mem-types.h xlators/performance/quick-read/src/quick-read.c xlators/performance/quick-read/src/quick-read.h xlators/performance/read-ahead/src/page.c xlators/performance/read-ahead/src/read-ahead-mem-types.h xlators/performance/read-ahead/src/read-ahead.c xlators/performance/read-ahead/src/read-ahead.h xlators/performance/symlink-cache/src/symlink-cache.c xlators/performance/write-behind/src/write-behind-mem-types.h xlators/performance/write-behind/src/write-behind.c xlators/protocol/auth/addr/src/addr.c ¹ xlators/protocol/auth/login/src/login.c ¹ xlators/protocol/client/src/client-callback.c xlators/protocol/client/src/client-handshake.c xlators/protocol/client/src/client-helpers.c xlators/protocol/client/src/client-lk.c xlators/protocol/client/src/client-mem-types.h xlators/protocol/client/src/client.c xlators/protocol/client/src/client.h xlators/protocol/client/src/client3_1-fops.c ¹ Copyright only, license reverted to original Change-Id: If560e826c61b6b26f8b9af7bed6e4bcbaeba31a8 BUG: 820551 Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.com/3304 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* libgluster: Fix counting, synctask state errorsPranith Kumar K2012-05-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | When a synctask is executed in synctask_switchto, if by the time it goes to check the woken/sleep part a reply already comes and __wake is called, already running task is going to be put in runq this generates a false warning "re-running already running task". If the reply does not come before the woken/sleep check, then the running task is put in waitq which decrements env->runcount even when the task is not in runq, this leads to -ve runcount everytime a task goes from runq->switchto->waitq. This patch fixes both of them by introducing a new state for the task called SYNCTASK_SUSPEND just when the task is suspended before yeild in SYNCOP. Change-Id: Ib82182cf950f9d85b5656f6243541489a104ca3d BUG: 816551 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3249 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* libglusterfs: Never block syncprocPranith Kumar K2012-04-231-38/+4
| | | | | | | | | | Change-Id: I64cd8a2ef37926173c19a33df0716183530e22bf BUG: 814074 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/3194 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: adding extra data for fopsAmar Tumballi2012-03-221-0/+1
| | | | | | | | | | | | | with this change, the xlator APIs will have a dictionary as extra argument, which is passed between all the layers. This can be utilized for overloading in some of the operations. Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2960 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: bring in feature to use syncop for mgmt opsAmar Tumballi2012-03-211-0/+5
| | | | | | | | | | | | | | | | * new sycnop routines added to mgmt program * one should not use 'glusterd_op_begin()', instead can use the synctask framework, 'glusterd_op_begin_synctask()' * currently using for below operations: 'volume start', 'volume rebalance', 'volume quota', 'volume replace-brick' and 'volume add-brick' Change-Id: I0bee76d06790d5c5bb5db15d443b44af0e21f1c0 BUG: 762935 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/479 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Make syntask scalablePranith Kumar K2012-03-091-3/+6
| | | | | | | | | | | | | | | At the moment, synctask uses task->frame to perform all the syncops, this will lead to high-memory usage if the task crawls millions of directories. i.e millions of STACK_WINDS/UNWINDS. To prevent this, in each task a new stack is created to perform the fops which is reset after every syncop. Change-Id: I53c262ec348be9b1d91af73da01f1c217f31ce6e BUG: 798907 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2850 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/afr: Self-heald, Index integrationPranith Kumar K2012-02-201-0/+3
| | | | | | | | | Change-Id: Ic68eb00b356a6ee3cb88fe2bde50374be7a64ba3 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2749 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* syncop: Multi-processor support in syncenvAnand Avati2012-02-201-19/+32
| | | | | | | | | | | | | | | | | | | This patch introduces: - multithreading of syncop processors permitting synctasks to be executed concurrently if the runqueue has many tasks. - Auto scaling of syncop processors based on runqueue length. - Execute a synctask (synctask_new) in a blocking way if callback function is set NULL. The return value of the syncfn will be the return value of synctask_new() Change-Id: Iff369709af9adfd07be3386842876a24e1a5a9b5 BUG: 763820 Reviewed-on: http://review.gluster.com/443 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Support for hardlink rebalance when decommissioningshishir gowda2012-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The support for hardlink rebalance is only available for decommissioning of a node. this can be triggered in two ways 1. remove-brick start 2. if decommission node value is set in vol file, then a normal rebalance command The way we handle it is- if (nlink > 1) do * if src file doesnt have linkto xattr * mark src's linkto to the dst * else * perform a link on the dst * do a look up * if nlinks = dst.nlinks * migrate data * else * continue crawling done Signed-off-by: shishir gowda <shishirng@gluster.com> Change-Id: If43b5524b872fd1413e9f7aa7f436cb244e30d8d BUG: 763844 Reviewed-on: http://review.gluster.com/2737 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* core: add an extra flag to readv()/writev() APIAmar Tumballi2012-02-141-2/+4
| | | | | | | | | | | | needed to implement a proper handling of open flag alterations using fcntl() on fd. Change-Id: Ic280d5db6f1dc0418d5c439abb8db1d3ac21ced0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2723 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* complete the implementation of missing 'f**xattr()' fopsAmar Tumballi2012-01-251-0/+1
| | | | | | | | | | | | | | in debug/* and cluster/* translators and a syncop_fsetxattr() added a test case for testing the working of 'f-fop()' on fuse mount. Change-Id: I0c2aeeb30a0fb382ef2495cca1e66b00abaffd35 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 766571 Reviewed-on: http://review.gluster.com/802 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: add 'fremovexattr()' fopAmar Tumballi2012-01-251-0/+1
| | | | | | | | | | | so operations can be done on fd for extended attribute removal Change-Id: Ie026f1b53793aeb4ae33e96ea5408c7a97f34bf6 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 766571 Reviewed-on: http://review.gluster.com/778 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: get xattrs also as part of readdirpAmar Tumballi2012-01-251-0/+1
| | | | | | | | | | | | | readdirp_req() call sends a dict_t * as an argument, which contains all the xattr keys for which the entries got in readdirp_rsp() are having xattr value filled dictionary. Change-Id: I8b7e1290740ea3e884e67d19156ce849227167c0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765785 Reviewed-on: http://review.gluster.com/771 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* distribute: handle migration of symlink and special filesAmar Tumballi2011-10-201-0/+5
| | | | | | | | | | | | TODO: currently, wrt. rebalance/decommissioning, only pending thing is hardlink migration. Change-Id: I30cd06802e84c95601a5a081198f1f09c6d6bc01 BUG: 3714 Reviewed-on: http://review.gluster.com/578 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <shishirng@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* distribute rebalance: handle the open file migrationAmar Tumballi2011-09-121-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Complexity involved: To migrate a file with open fd, we have to notify the other client process which has the open fd, and make sure the write()s happening on that fd is properly synced to the migrated file. Once the migration is complete, the client process which has open-fd should get notified and it should start performing all the operations on the new subvolume, instead of earlier cached volume. How to solve the notification part: We can overload the 'postbuf' attribute in the _cbk() function to understand if a file is 'under-migration' or 'migration-complete' state. (This will be something similar to deciding whether a file is DHT-linkfile by its 'mode'). Overall change includes below mentioned major changes: 1. dht_linkfile is decided by only 2 factors (mode(01000), xattr(trusted.glusterfs.dht.linkto)), instead of earlier 3 factors (size==0) 2. in linkfile self-heal part (in 'dht_lookup_everywhere_cbk()'), don't delete a linkfile if there is a open-fd on it. It means, there may be a migration in progress. 3. if a file's revalidate fails with ENOENT, it may be due to file migration, and hence need a lookup_everywhere() 4. There will be 2 phases of file-migration. -> Phase 1: Migration in progress * The source data file will have SGID and STICKY bit set in its mode. * The source data file will have a 'linkto' xattr pointing the destination. * Destination file will have mode set to '01000', and 'linkto' xattr set to itself. -> Phase 2: File migration Complete * The source data file will have mode '01000', and will be 'truncated' to size 0. * The destination file will have inherited mode from the source. (without sgid and sticky bit) and its 'linkto' attribute will be removed. 4. Changes in distribute to work smoothly with a file which is in migration / got migrated. The 'fops' are divided into 3 categories, inode-read, inode-write and others. inode-read fops need to handle only 'phase 2' notification, where as, the inode-write fops need to handle both 'phase 1' and phase2. The inode-write operations will be done on source file, and if any of 'file-migration' procedures are detected in _cbk(), then the operations should be performed on the destination too. when a phase-2 is detected, then the inode-ctx itself should be changed to represent a new layout. With these changes, the open file migration will work smoothly with multiple clients. Change-Id: I512408463814e650f34c62ed009bf2101d016fd6 BUG: 3071 Reviewed-on: http://review.gluster.com/209 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Change Copyright current yearPranith Kumar K2011-08-101-1/+1
| | | | | | | | Change-Id: I2d10f2be44f518f496427f257988f1858e888084 BUG: 3348 Reviewed-on: http://review.gluster.com/200 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* LICENSE: s/GNU Affero General Public/GNU General Public/Pranith Kumar K2011-08-061-3/+3
| | | | | | | | Change-Id: I3914467611e573cccee0d22df93920cf1b2eb79f BUG: 3348 Reviewed-on: http://review.gluster.com/182 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* libglusterfs/syncop: add more functionsAmar Tumballi2011-07-111-1/+6
| | | | | | | | | | do proper 'ref's and implement 'write()' and 'ftruncate()' Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 3081 (synchronous operations should be enhanced) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3081