summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/syncop.c
Commit message (Collapse)AuthorAgeFilesLines
* core: add lease fopPoornima G2016-04-211-0/+49
| | | | | | | | | | | | | Change-Id: Ia27d66b1061b0377857827515590eb89b18515c9 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/11596 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* syncop: add seek() FOPNiels de Vos2016-01-311-1/+36
| | | | | | | | | | | | | | | Add the new seek() FOP to the syncop framework. gfapi will use this in the future. Change-Id: I0c15153beb27de73d5844b6f692175750fc28f60 BUG: 1220173 Singed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11481 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* syncop: Include iatt to 'syncop_link' argsSoumya Koduri2015-07-101-2/+8
| | | | | | | | | | | | | | Include iatt to 'syncop_link' args to fetch proper attributes of the newly linked inode. Signed-off-by: Soumya Koduri <skoduri@redhat.com> Change-Id: If6b92961bd7a89add3791ed3a9b494087348b492 BUG: 1241788 Reviewed-on: http://review.gluster.org/11611 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* quota/marker: use smaller stacksize in synctask for marker updationvmallika2015-07-091-7/+22
| | | | | | | | | | | | | | | | | Default stacksize that synctask uses is 2M. For marker we set it to 16k Also move market xlator close to io-threads to have smaller stack Change-Id: I8730132a6365cc9e242a3564a1e615d94ef2c651 BUG: 1207735 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11499 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs removing strerror in logging Mohamed Ashiq Liyazudeen2015-07-091-4/+2
| | | | | | | | | | Change-Id: I8a0f40834da1151ddaef6139af3782bc076df57e BUG: 1194640 Signed-off-by: Mohamed Ashiq Liyazudeen <mliyazud@redhat.com> Reviewed-on: http://review.gluster.org/11464 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* libglusterfs: Use GF_CALLOC/GF_FREE instead of CALLOC/FREEPranith Kumar K2015-07-021-8/+8
| | | | | | | | | | | | | - Also removed numbers for the types as the string form of type is printed in statedump now, so the numbers are not needed anymore. Change-Id: I6e8c15a1dc8cb6187842f96f1d46ec0f26a602b4 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11495 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mem-pool,stack,store,syncop,timer/libglusterfs : Porting to a new logging ↵Mohamed Ashiq2015-06-261-40/+51
| | | | | | | | | | | framework Change-Id: Idd3dcaf7eeea5207b3a5210676ce3df64153197f BUG: 1194640 Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Reviewed-on: http://review.gluster.org/10827 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* libglusterfs: Copy d_len and dict as well into dst direntKrutika Dhananjay2015-06-021-2/+16
| | | | | | | | | | | | | | | Also, added memory allocation failure checks in light of the comments received @ http://review.gluster.org/#/c/10809/2/libglusterfs/src/gf-dirent.c, and http://review.gluster.org/#/c/10809/1/xlators/features/shard/src/shard.c Change-Id: Ie4092218545c8f4f8a0e6cc1fec6ba37bbbf2620 BUG: 1226551 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11026 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-291-5/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-071-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Implementation of sync lock as recursive lock to avoid crash.anand2015-04-281-48/+126
| | | | | | | | | | | | | | | | | | | | | | | Problem : In glusterd,we are using big lock which is implemented based on sync task frame work for thread synchronization and rcu lock for data consistency. sync task frame work swap the threads if there is no worker poll threads available,due to this rcu lock and rcu unlock was happening in different threads (urcu-bp will not allow this),resulting into glusterd crash. fix : To avoid releasing the sync lock(big lock) in between rcu critical section,implemented sync lock as recursive lock. More details: link : http://www.spinics.net/lists/gluster-devel/msg14632.html Change-Id: I2b56c1caf3f0470f219b1adcaf62cce29cdc6b88 BUG: 1211640 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/10285 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs/syncop: Pass xdata even in error caseRaghavendra Talur2015-04-271-2/+2
| | | | | | | | | | | | | xdata should be passed even in error cases. lookup() call was missed in previous patch set. Change-Id: I1ad2c452d05a3b4433b640762aaea5d3a91f2ba5 BUG: 1209869 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/10193 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: Implement syncop_fxattropPranith Kumar K2015-04-261-0/+20
| | | | | | | | | | | Change-Id: Ifc7937ceb451f6e11e40a9513017226fd0f115b0 BUG: 1215265 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10382 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs/syncop: Add xdata to all syncop callsRaghavendra Talur2015-04-081-109/+394
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for xdata in both the request and response path of syncops. Few calls like lookup already had the support; have renamed variables in few places to maintain uniformity. xdata passed downwards is known as xdata_in and xdata passed upwards is known as xdata_out. There is an old patch by Jeff Darcy at http://review.gluster.org/#/c/8769/3 which does the same for some selected calls. It also brings in xdata support at gfapi level. xdata support at gfapi level would be introduced in subsequent patches. Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc BUG: 1158621 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9859 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Quota/marker : Support for inode quotavmallika2015-03-171-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only way to retrieve the number of files/objects in a directory or volume is to do a crawl of the entire directory/volume. This is expensive and is not scalable. The new mechanism proposes to store count of objects/files as part of an extended attribute of a directory. Each directory's extended attribute value will indicate the number of files/objects present in a tree with the directory being considered as the root of the tree. Currently file usage is accounted in marker by doing multiple FOPs like setting and getting xattrs. Doing this with STACK WIND and UNWIND can be harder to debug as involves multiple callbacks. In this code we are replacing current mechanism with syncop approach as syncop code is much simpler to follow and help us implement inode quota in an organized way. Change-Id: Ibf366fbe07037284e89a241ddaff7750fc8771b4 BUG: 1188636 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/9567 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* every/where: add GF_FOP_IPC for inter-translator communicationJeff Darcy2015-03-171-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several features - e.g. encryption, erasure codes, or NSR - involve multiple cooperating translators which sometimes need a "private" means of communication amongst themselves. Historically we've used virtual or synthetic xattrs, but that's not very elegant and clutters up the getxattr/setxattr path which must also handle real xattr requests. This new fop should address that. The only argument is an int32_t "op" which should be recognized by the target translator. It is recommended that translators using these feature follow some convention regarding the ops that they define, to avoid conflicts. Using a hash of the target translator's type string as a base for a series of ops would probably be a good start. Any other information can be passed in both directions using xdata. The default behavior for this fop, as with any other, is to pass through to FIRST_CHILD. That makes use of this fop "transparent" to other translators that were written before it existed, but it also means that it only really works with pass-through translators. If a routing translator (such as DHT) or a fan-out translator (such as AFR) is involved, the IPC might not reach its intended destination unless those translators are modified to forward IPC fops along all paths. If an IPC gets all the way to storage/posix it is considered an error, much like an uncaught exception. We don't actually *do* anything in that case, but we do log it send back an EOPNOTSUPP error. This makes the "unrecognized opcode" condition distinguishable from the "no IPC support" condition (which would yield an RPC error instead) so clients can probe for the presence of a handler for their own favorite opcode and either use that or use old-school xattrs depending on the result. BUG: 1158628 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: I84af1b17babe5b30ec03ecf027ae37d09b873968 Reviewed-on: http://review.gluster.org/8812 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: assign lk_owner for the newly created framevmallika2015-03-141-0/+2
| | | | | | | | | | | | | | | | syncop_inodelk doesn't work properly as lk_owner is not set in the frame created by 'synctask_create'. There is a possibility that more than one thread can acquire inode lock with syncop_inodelk Change-Id: I8193edb0d24b3a6e3a3f6a0c5d7ab5a1be8e7daf BUG: 1188636 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: Add the code to destroy the synenv processors andPoornima G2015-03-021-3/+70
| | | | | | | | | | | syncenv structures Change-Id: I28020eb2fc08d886cd7c05ff96daf7ebb4264ffe BUG: 1093594 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/9693 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Moved common functions as utils in syncop/common-utilsPranith Kumar K2015-02-271-166/+0
| | | | | | | | | | | | | | | These will be used by both afr and ec. Moved syncop_dirfd, syncop_ftw, syncop_dir_scan functions also into syncop-utils.c Change-Id: I467253c74a346e1e292d36a8c1a035775c3aa670 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9740 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Added support to set 'frame->root->lkowner'Soumya Koduri2015-02-231-0/+40
| | | | | | | | | | | | | | This support can be used by the clients using SYNCOP framework, to pass unique owners for various locks taken on a file, so that the glusterfs-server can treat them as being locks from different owners. Change-Id: Ie88014053af40fc7913ad6c1f7730d54cc44ddab BUG: 1186713 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/9482 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* syncop: Provide syncop_ftw and syncop_dir_scan utilsPranith Kumar K2015-02-061-0/+166
| | | | | | | | | | | | | | | | | ftw provides file tree walk. dir_scan does just a readdir not readdirp. Also changed Afr's self-heal-daemon's crawling functions to use this. These utils will be used by ec in future to do proactive/full healing. Change-Id: I05715ddb789592c1b79a71e98f1e8cc29aac5c26 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9485 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: change signature of syncop_(f)getxattrRavishankar N2015-01-051-4/+6
| | | | | | | | | | | | | | | | | Pass xdata dict to syncop_(f)getxattr calls. This patch [1/3] is required as a part of afr automated split-brain resolution implementation. Change-Id: I3970b3dd6daf64681a031e37f8e9afb14fb3d668 BUG: 1136769 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9375 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rebalance: ``check_free_space`` should ignore quota_statfsHarshavardhana2014-10-311-2/+9
| | | | | | | | | | | | | | | | | | | | | quota_statfs() returns aggregated details of space usage of bricks this causes distribute to be confused during ``rebalance``, where ``statfs()`` values are used to schedule file migration. We can make sure the values of ``statfs`` are from individual bricks by selectively instructing ``quota_statfs()`` to return non aggregated values. Change-Id: I1397faeee66a1b9c26709cfda693286d227a4170 BUG: 1158262 Signed-off-by: Harshavardhana <harsha@harshavardhana.net> Reviewed-on: http://review.gluster.org/8996 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* synctask: add backtrace per waiting taskKrishnan Parthasarathi2014-09-221-1/+3
| | | | | | | | | | | | | | | The backtrace is 'saved' in a per-task buffer. This would come handy while debugging code using synctasks. Change-Id: I732b275f6d15b31f31361f5ecf2ba47cacde9b54 BUG: 1138503 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/8622 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: Invoke dict_unref() in inodelk only if dictionary is not NULLVijay Bellur2014-09-171-2/+4
| | | | | | | | | | | | | | | | | | In the absence of this check, logs can get flooded with messages like this when rebalance is run: [2014-09-04 17:48:07.148262] W [dict.c:480:dict_unref] (-->/lib64/libc.so.6() [0x30daa47a00] (-->/usr/local/lib/libglusterfs.so.0(synctask_wrap+0x12) [0x7fa20b7c6ec2] (-->/usr/local/lib/glusterfs/3.7dev/xlator/cluster/distribute.so(dht_migrate_file+0x23f) [0x7fa200fdb58f]))) 0-dict: dict is NULL Change-Id: I4c93e4485293b35d86ba07df4d583d2758ec3f49 BUG: 1130888 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/8601 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* libglusterfs/syncop: implement inodelkRaghavendra G2014-08-261-0/+40
| | | | | | | | | | Change-Id: Iea489157490b70cb2bb03576b0d4943c6d8f052d BUG: 1130888 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/8522 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* user servicable snapshotsRaghavendra Bhat2014-05-291-1/+16
| | | | | | | | | | Change-Id: Idbf27dbe088e646a8ab81cedc5818413795895ea Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Anand Subramanian <anands@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/7700 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncops: add support for custom PIDAnand Avati2014-02-131-0/+40
| | | | | | | | | | | | AFR self-heal needs to issue syncops with special PID. Extend the custom UID/GID support to include custom PIDs Change-Id: I736c0e177f862b029f203acc87f9eb46c8cb839b BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6888 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* core: add @xdata parameter to syncop_[f]removexattr()Anand Avati2014-02-131-4/+4
| | | | | | | | | | | To be used in afr metadata self-heal Change-Id: I8dac4b19d61e331702427eeb5b606aab3d20b328 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6941 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* syncop: Change return value of syncopPranith Kumar K2014-01-191-39/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: We found a day-1 bug when syncop_xxx() infra is used inside a synctask with compilation optimization (CFLAGS -O2). Detailed explanation of the Root cause: We found the bug in 'gf_defrag_migrate_data' in rebalance operation: Lets look at interesting parts of the function: int gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc, dict_t *migrate_data) { ..... code section - [ Loop ] while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL, &entries)) != 0) { ..... code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a thread) /* Need to keep track of ENOENT errno, that means, there is no need to send more readdirp() */ readdir_operrno = errno; ..... code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread) ret = syncop_getxattr (this, &entry_loc, &dict, GF_XATTR_LINKINFO_KEY); code section - [ ERRNO-2] (checking for failures of syncop_getxattr(). This may not always be executed in same thread which executed [SYNCOP-1]) if (ret < 0) { if (errno != ENODATA) { loglevel = GF_LOG_ERROR; defrag->total_failures += 1; ..... } the function above could be executed by thread(t1) till [SYNCOP-1] and code from [ERRNO-2] can be executed by a different thread(t2) because of the way syncop-infra schedules the tasks. when the code is compiled with -O2 optimization this is the assembly code that is generated: [ERRNO-1] 1165 readdir_operrno = errno; <<---- errno gets expanded as *(__errno_location()) 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0 <address@hidden> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------ Address returned by __errno_location() is stored in a special location in stack for later use. 0x00007fd149d48b77 <+519>: mov (%rax),%eax 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp) .... [ERRNO-2] 1281 if (errno != ENODATA) { 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<----- Because it already stored the address returned by __errno_location(), it just dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE EXECUTED BY SAME THREAD!!! 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp 0x00007fd149d492b8 <+2376>: mov (%rax),%edi 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi The problem is that __errno_location() value of t1 and t2 are different. So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is executing [ERRNO-2] code section. When code is compiled without any optimization for [ERRNO-2]: 1281 if (errno != ENODATA) { 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300 <address@hidden><<--- As it is calling __errno_location() again it gets the location from t2 so it works as intended. 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1 <gf_defrag_migrate_data+2287> Fix: Make syncop_xxx() return (-errno) value as the return value in case of errors and all the functions which make syncop_xxx() will need to use (-ret) to figure out the reason for failure in case of syncop_xxx() failures. Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57 BUG: 1040356 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6475 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncops: expose @flags in syncop_rmdir()Anand Avati2013-11-211-2/+2
| | | | | | | | | Change-Id: I9b73c1db728e4cb3948fc118cceb292b21d48b96 BUG: 1021686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6112 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* zerofill: Change the type of len argument of glfs_zerofill() to off_tBharata B Rao2013-11-141-1/+1
| | | | | | | | | | | | | | glfs_zerofill() can be potentially called to zero-out entire file and hence allow for bigger value of length parameter. Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63 BUG: 1028673 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-on: http://review.gluster.org/6266 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterfs: zerofill supportM. Mohan Kumar2013-11-101-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in the specified range. This fop will be useful when a whole file needs to be initialized with zero (could be useful for zero filled VM disk image provisioning or during scrubbing of VM disk images). Client/application can issue this FOP for zeroing out. Gluster server will zero out required range of bytes ie server offloaded zeroing. In the absence of this fop, client/application has to repetitively issue write (zero) fop to the server, which is very inefficient method because of the overheads involved in RPC calls and acknowledgements. WRITESAME is a SCSI T10 command that takes a block of data as input and writes the same data to other blocks and this write is handled completely within the storage and hence is known as offload . Linux ,now has support for SCSI WRITESAME command which is exposed to the user in the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to implement this fop. Thus zeroing out operations can be completely offloaded to the storage device , making it highly efficient. The fop takes two arguments offset and size. It zeroes out 'size' number of bytes in an opened file starting from 'offset' position. This patch adds zerofill support to the following areas: - libglusterfs - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi Client applications can exloit this fop by using glfs_zerofill introduced in libgfapi.FUSE support to this fop has not been added as there is no system call for this fop. Changes from previous version 3: * Removed redundant memory failure log messages Changes from previous version 2: * Rebased and fixed build error Changes from previous version 1: * Rebased for latest master TODO : * Add zerofill support to trace xlator * Expose zerofill capability as part of gluster volume info Here is a performance comparison of server offloaded zeofill vs zeroing out using repeated writes. [root@llmvm02 remote]# time ./offloaded aakash-test log 20 real 3m34.155s user 0m0.018s sys 0m0.040s [root@llmvm02 remote]# time ./manually aakash-test log 20 real 4m23.043s user 0m2.197s sys 0m14.457s [root@llmvm02 remote]# time ./offloaded aakash-test log 25; real 4m28.363s user 0m0.021s sys 0m0.025s [root@llmvm02 remote]# time ./manually aakash-test log 25 real 5m34.278s user 0m2.957s sys 0m18.808s The argument log is a file which we want to set for logging purpose and the third argument is size in GB . As we can see there is a performance improvement of around 20% with this fop. Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18 BUG: 1028673 Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5327 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gfapi: object handle based API extensionsR.Shyamsundar2013-10-111-0/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an ongoing effort to integrate NFS Ganesha ( https://github.com/nfs-ganesha/nfs-ganesha/wiki ) with GlusterFS as one of the file system back ends. Towards this we need extensions to gfapi that can handle object based operations. Meaning, instead of using full paths or relative paths from cwd, it is required that we can work with APIs, like the *at POSIX variants, to be able to create, lookup, open etc. files and directories. Hence the objects are the files or directories themselves and we give out handles to these objects that can be used for further operations. This code drop is an initial implementation of the proposed APIs. The new APIs are implemented as glfs_h_XXX variants in the file glfs-handleops.c to mirror glfs-fops.c style. The code leverages holding onto inode references and doling these out as opaque/cookie type objects to the callers, to enable them to be used as handles in other operations. An fd based approach was considered, but due to the extra footprint that the fd structure and its counterparts would incur, this was dropped to take the approach of holding inode references themselves. Tested by extending glfsxmp.c to invoke and exercise the added APIs, and further tested with a reference integration of the same as an FSAL with NFS Ganesha. Change-Id: I23629c99e905b54070fa2e6565147812e5f3fa5d BUG: 1016000 Signed-off-by: R.Shyamsundar <srangana@redhat.com> Reviewed-on: http://review.gluster.org/5936 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: block unused signals in created threadsAnand Avati2013-09-251-4/+4
| | | | | | | | | | | | | | | Block all signal except those which are set for explicit handling in glusterfs_signals_setup(). Since thread spawning code in libglusterfs and xlators can get called from application threads when used through libgfapi, it is necessary to do this blocking. Change-Id: Ia320f80521a83d2edcda50b9ad414583a0175281 BUG: 1011662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* synctask: minor enhancementsAnand Avati2013-08-281-30/+73
| | | | | | | | | | | | | | | | | | | | | | | - Enhance syncenv_new() to accept scaling parameters of syncproc. Previously the scaling parameters were hardcoded and decided at compile time. - New API synctask_create() which returns the created synctask. This is similar to synctask_new which only returned the status of whether a synctask could be created or not. The meaning of NULL cbk in synctask_create() means the task is "joinable". Until synctask_join() is called on such a synctask, the task is not reaped and resources are not destroyed. The task would be in a zombie state after synctask_fn returns and before synctask_join() is called. Change-Id: I368ec9037de9510d2ba951f0aad86aaf18d9a6b6 BUG: 986775 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5365 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
* glusterfs: discard (hole punch) supportBrian Foster2013-06-131-0/+30
| | | | | | | | | | | | | | | | Add support for the DISCARD file operation. Discard punches a hole in a file in the provided range. Block de-allocation is implemented via fallocate() (as requested via fuse and passed on to the brick fs) but a separate fop is created within gluster to emphasize the fact that discard changes file data (the discarded region is replaced with zeroes) and must invalidate caches where appropriate. BUG: 963678 Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5090 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gluster: add fallocate fop supportBrian Foster2013-06-131-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement support for the fallocate file operation. fallocate allocates blocks for a particular inode such that future writes to the associated region of the file are guaranteed not to fail with ENOSPC. This patch adds fallocate support to the following areas: - libglusterfs - mount/fuse - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi BUG: 949242 Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4969 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: link inodes in relevant entry FOPsAnand Avati2013-05-251-4/+28
| | | | | | | | | | | | Do not let inode linking to happen only in lookup(). While that works, it is inefficient. Change-Id: I51bbfb6255ec4324ab17ff00566375f49d120c06 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4931 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* syncop: copy inode pointer in readdirplusAnand Avati2013-05-251-0/+2
| | | | | | | | | Change-Id: I9ab2b8ac2da9fe13f56b8b08f715a0b603ece0cb BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4930 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* syncop: synctask shouldn't yawn, it could miss a 'wake'Krishnan Parthasarathi2013-05-211-17/+0
| | | | | | | | | Change-Id: I7731fd33ca0c925cc52f8d105275b44fc625a1e2 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5058 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Remove task from syncbarrier's waitq before 'wake'Krishnan Parthasarathi2013-05-201-7/+5
| | | | | | | | | | | | | | | | Removing task from syncbarrier's waitq after wake could result in a subsequent syncbarrier_wake, wake'ing up the already running task. This fix makes the removal from waitq and wake 'atomic' The root cause and the fix are similar in spirit to what was observed in synclock's waitq implementation. Change-Id: I7dd9e6ad5945742bcda20eb5a06a9376bb18528e BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5047 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Update synctask state appropriatelyKrishnan Parthasarathi2013-05-201-0/+5
| | | | | | | | | | | | | | | | | | | | * Earlier, SYNCOP macro, the only consumer of synctask_yield, would set the task->state to SYNCTASK_SUSPEND. Today, we have glusterd having its own wrapper macros which don't set task's state. There is also the syncbarrier and synclock framework, which also participate in a synctask's scheduling (and need to keep a task's state up to date). It only makes more sense to leave a synctask's state to the synctask library, since its an internal affair. * Need to 'yawn' before 'yield' to avoid re-running tasks to set task->woken appropriately. Change-Id: Ic7a59e6ebcc46f03e53223ca237668d45a3cba40 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4985 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Remove task from synclock's waitq before 'wake'Krishnan Parthasarathi2013-05-151-6/+6
| | | | | | | | | | | | | Removing task from synclock's waitq after wake could result in a subsequent unlock, wake'ing up the already running task. This fix makes the removal from waitq and wake 'atomic'. Change-Id: Ie51ccf9d38f2cee84471097644aab95496f488b1 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4987 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* synctask: implement barriers around yield, not the other wayAnand Avati2013-05-041-22/+120
| | | | | | | | | | | | | | | | | | | | In the current implementation, barriers are in the core of the syncprocessors. Wake()s are treated as syncbarrier wake. This is however delicate, as spurious wake()s of the synctask can mess up the accounting of the barrier and waking it prematurely. The fix is to keep yield() and wake() as the basic primitives, and implement barriers as an object impelemented on top of these primitives. This way, only an explicit barrier_wake() gets counted towards the barrier accounting, and spurious wakes will be truly safe. Change-Id: I8087f0f446113e5b2d0853431c0354335ccda076 BUG: 948686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4921 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* Fix uninitialized mutex usage in synctask_destroyEmmanuel Dreyfus2013-04-301-3/+4
| | | | | | | | | | | | | | | synctask_new() initialize task->mutex is task->synccbk is NULL. synctask_done() calls synctask_destroy() if task->synccbk is not NULL. synctask_destroy() always destroys the mutex. Fix that by checking for task->synccbk in synctask_destroy() BUG: 764655 Change-Id: I50bb53bc6e2738dc0aa830adc4c1ea37b24ee2a0 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/4913 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gfapi: POSIX locking supportAnand Avati2013-04-241-0/+34
| | | | | | | | | Change-Id: I37d9e1fb4a715094876be6af3856c1b4cf398021 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4881 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncenv: be robust against spurious wake()sAnand Avati2013-04-091-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation, when the callers of synctasks perform a spurious wake() of a sleeping synctask (i.e, an extra wake() soon after a wake() which already woke up a yielded synctask), there is now a possibility of two sync threacs picking up the same synctask. This can result in a crash. The fix is to change ->slept = 0|1 and membership of synctask in runqueue atomically. Today we dequeue a task from the runqueue in syncenv_task(), but reset ->slept = 0 much later in synctask_switchto() in an unlocked manner -- which is safe, when there are no spurious wake()s. However, this opens a race window where, if a second wake() happens after the dequeue, but before setting ->slept = 0, it results in queueing the same synctask in the runqueue once again, and get picked up by a different synctask. This is has been diagnosed to be the crashes in the regression tests of http://review.gluster.org/4784. However that patch still has a spurious wake() [the trigger for this bug] which is yet to be fixed. Change-Id: I9b4b9dd5115d6e62ba45162ae90dd5e917a4f83d BUG: 948686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4795 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* synctask: introduce synclocks for co-operative lockingAnand Avati2013-04-021-1/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a synclocks - co-operative locks for synctasks. Synctasks yield themselves when a lock cannot be acquired at the time of the lock call, and the unlocker will wake the yielded locker at the time of unlock. The implementation is safe in a multi-threaded syncenv framework. It is also safe for sharing the lock between non-synctasks. i.e, the same lock can be used for synchronization between a synctask and a regular thread. In such a situation, waiting synctasks will yield themselves while non-synctasks will sleep on a cond variable. The unlocker (which could be either a synctask or a regular thread) will wake up any type of lock waiter (synctask or regular). Usage: Declaration and Initialization ------------------------------ synclock_t lock; ret = synclock_init (&lock); if (ret) { /* lock could not be allocated */ } Locking and non-blocking lock attempt ------------------------------------- ret = synclock_trylock (&lock); if (ret && (errno == EBUSY)) { /* lock is held by someone else */ return; } synclock_lock (&lock); { /* critical section */ } synclock_unlock (&lock); Change-Id: I081873edb536ddde69a20f4a7dc6558ebf19f5b2 BUG: 763820 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4717 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <raghavendra@gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* synctask: support for (assymetric) counted barriersAnand Avati2013-02-211-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new set of primitives: - synctask_barrier_init (stub) - synctask_barrier_waitfor (stub, count) - synctask_barrier_wake (stub) Unlike pthread_barrier_t, this barrier has an explicit notion of "waiter" and "waker". The "waiter" waits for @count number of "wakers" to call synctask_barrier_wake() before returning. The wait performed by the waiter via synctask_barrier_waitfor() is co-operative in nature and yields the thread for scheduling other synctasks in the mean time. Intended use case: Eliminate excessive serialization in glusterd and allow for concurrent RPC transactions. Code which are currently in this format: ---old--- list_for_each_entry (peerinfo, peers, op_peers_list) { ... GD_SYNCOP (peerinfo->rpc, stub, rpc_cbk, ...); } ... int rpc_cbk (rpc, stub, ...) { ... __wake (stub); } ---old--- Can be restructred into the format: ---new--- synctask_barrier_init (stub); { list_for_each_entry (peerinfo, peers, op_peers_list) { ... rpc_submit (peerinfo->rpc, stub, rpc_cbk, ...); count++; } } synctask_barrier_wait (stub, count); ... int rpc_cbk (rpc, stub, ...) { ... synctask_barrier_wake (stub); } ---new--- In the above structure, from the synctask's point of view, the region between synctask_barrier_init() and synctask_barrier_wait() are spawning off asynchronous "threads" (or RPC) and keep count of how many such threads have been spawned. Each of those threads are expected to make one call to synctask_barrier_wake(). The call to synctask_barrier_wait() makes the synctask thread co-operatively wait/sleep till @count such threads call their wake function. This way, the synctask thread retains the "synchronous" flow in the code, yet at the same time allows for asynchronous "threads" to acheive parallelism over RPC. Change-Id: Ie037f99b2d306b71e63e3a56353daec06fb0bf41 BUG: 913662 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4558 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>