summaryrefslogtreecommitdiffstats
path: root/xlators/features
Commit message (Collapse)AuthorAgeFilesLines
* features/bit-rot: check for both inmemory and ondisk stalenessRaghavendra Bhat2015-06-152-19/+138
| | | | | | | | | | | | | | | | | | | | | * Let bit-rot stub check both on disk ongoing version, signed version xattrs and the in memory flags in the inode and then decide whether the inode is stale or not. This information is used by one shot crawler in BitD to decide whether to trigger the sign for the object or skip it. NOTE: The above check should be done only for BitD. For scrubber its still the old way of comparing on disk ongoing version with signed version. * BitD's one shot crawler should not sign zero byte objects if they do not contain signature. (Means the object was just created and nothing was written to it). Change-Id: I6941aefc2981bf79a6aeb476e660f79908e165a8 BUG: 1224611 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10947 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* quota: don't log error when disk quota exceededvmallika2015-06-151-6/+9
| | | | | | | | | | | | | | When disk quota exceeded, quota enforcer logs alert message, so no need to log error message as this can fill up the log file Change-Id: Ia913f47bc0cedb7c0a9c611330ee5124d3bb6c9d BUG: 1229609 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11135 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/bitrot: tuanble object signing waiting time value for bitrotGaurav Kumar Garg2015-06-152-8/+31
| | | | | | | | | | | | | | Currently bitrot using 120 second waiting time for object to be signed after all fop's released. This signing waiting time value should be tunable. Command for changing the signing waiting time will be #gluster volume bitrot <VOLNAME> signing-time <waiting time value in second> Change-Id: I89f3121564c1bbd0825f60aae6147413a2fbd798 BUG: 1228680 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11105
* features/quota: port QUOTA messages to new logging frameworkSusant Palai2015-06-146-322/+594
| | | | | | | | | | | Change-Id: I5e3df8860ea35bce14a802391be9b22ad64f1ad4 BUG: 1075611 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7574 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* upcall: prevent busy loop in reaper threadNiels de Vos2015-06-141-0/+5
| | | | | | | | | | | | | | | | http://review.gluster.org/10342 introduced a cleanup thread for expired client entries. When enabling the 'features.cache-invalidation' volume option, the brick process starts to run in a busy-loop. Obviously this is not intentional, and a process occupying 100% of the cycles on a CPU or core is not wanted. Change-Id: I453c612d72001f4d8bbecdd5ac07aaed75b43914 BUG: 1200267 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11198 Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/changelog: Do htime setxattr without XATTR_REPLACE flagKotresh HR2015-06-122-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | HTIME_KEY marks the last changelog rolled over. The xattr is maintained on .glusterfs/changelog/htime/HTIME.TSTAMP file. On every rollover of the changelog file, the xattr is updated. It is being updated with XATTR_REPLACE flag as xattr gets created during changelog enable. But it is once found that the xattrs on the file is cleared and is not reproduced later on. This patch protects that case, if it happens by setting xattr without XATTR_REPLACE flag in failure case. The reason behind doing this in failure case is not to mask the actual cause of xattrs getting cleared. This provides the log message if the original issue still exists but the consequential effects are fixed. Also changed the log messages to depict the events happened during changelog enable. Change-Id: I699ed09a03667fd823d01d65c9c360fa7bc0e455 BUG: 1230015 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11150 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/qemu-block: Add comment about root_inode handlingPranith Kumar K2015-06-121-1/+2
| | | | | | | | | | | | | This is handling the comment on 3.7 backport http://review.gluster.com/11047 Change-Id: I253a8e0bfebde63f25c839efd55ed3cc30e6cd9b BUG: 1226276 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11076 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Poornima G <pgurusid@redhat.com>
* features/changelog: Avoid setattr fop logging during renameSaravanakumar Arumugam2015-06-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When a file is renamed and the (renamed)file's Hashing falls into a different brick, DHT creates a special file(linkto file) in the brick(Hashed subvolume) and carries out setattr operation on that file. Currently, Changelog records this(setattr) operation in Hashed subvolume. glusterfind in turn records this operation as MODIFY operation. So, there is a NEW entry in Cached subvolume and MODIFY entry in Hashed subvolume for the same file. Solution: Avoid logging setattr operation carried out, by marking the operation as internal fop using xdata. In changelog translator, check whether setattr is set as internal fop and skip accordingly. Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd BUG: 1230007 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11137 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/marker : Pass along xdata to lower translatorSachin Pandit2015-06-111-1/+3
| | | | | | | | | | | | | | | The problem was in marker xlator, where during rename a NULL value is passed during STACK_WIND. Marker needs to pass the xdata un-modified to next translator if marker does not rely on that. Change-Id: I9e47e504fd241263987645abfed7ca13c0d54a80 BUG: 1228492 Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/11089 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* changetimerecorder : Porting to new logging frameworkMohamed Ashiq2015-06-104-125/+198
| | | | | | | | | Change-Id: I66e7ccc5e62482c3ecf0aab302568e6c9ecdc05d BUG: 1194640 Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Reviewed-on: http://review.gluster.org/10938 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes
* Upcall/cache-invalidation: Ignore fops with frame->root->client not setSoumya Koduri2015-06-092-6/+23
| | | | | | | | | | | | | | | | | | | Server-side internally generated fops like 'quota/marker' will not have any client associated with the frame. Hence we need a check for clients to be valid before processing for upcall cache invalidation. Also fixed an issue with initializing reaper-thread. Added a testcase to test the fix. Change-Id: If7419b98aca383f4b80711c10fef2e0b32498c57 BUG: 1227204 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10909 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* gfid-access: Remove dead increment (dead store)Prasanna Kumar Kalever2015-06-071-3/+0
| | | | | | | | | | | | This patch remove stores to variables that are no longer live. Change-Id: Ib6acd8c70cbb7ea875c01b7cfd6620ac1d641d36 BUG: 1223378 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/10841 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/marker: Cleanup loc in case of errorsVijay Bellur2015-06-041-3/+1
| | | | | | | | | | | | | | | | | Missing loc_wipe() for error paths in mq_readdir_cbk() can cause memory leaks. loc_wipe() is now done for both happy and unhappy paths. Change-Id: I882aa5dcca06e25b56a828767fb2b91a1efaf83b BUG: 1227904 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/11074 Reviewed-by: Sachin Pandit <spandit@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* uss: Take ref on root inodeMohammed Rafi KC2015-06-041-2/+3
| | | | | | | | | | | | | | If we recieve a statfs call on snap directory, we will redirect the call into the root, by creating a new root loc. So it is better to take a ref on the root inode. (http://review.gluster.org/#/c/10358/5/xlators/features/ snapview-client/src/snapview-client.c) Change-Id: I5649addac442d391b2550346b115dec58fed5b86 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10750 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* features/quota: Fix ref-leakPranith Kumar K2015-06-031-0/+1
| | | | | | | | | | Change-Id: I0b44b70f07be441e044d9dfc5c2b64bd5b4cac18 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11045 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Handle symlinks appropriately in fopsKrutika Dhananjay2015-06-031-5/+15
| | | | | | | | | | | | (f)stat, unlink and rename must skip doing inode_ctx_get() of shard block size on symbolic links. Change-Id: I68688532164dd2ab491ff5c59b343174f8c4ce7f BUG: 1223759 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10995 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Fix incorrect parameter to get_lowest_block()Krutika Dhananjay2015-06-021-2/+3
| | | | | | | | | | | | | | | | | | Due to get_lowest_block() being a macro, what needs to be passed to it is the evaluation of the expression (local->offset - 1), without which its substitution can cause junk values to be assigned to local->first_block. This patch also fixes calls to get_highest_block() where if offset and size are both equal to zero, it could return negative values. Change-Id: I3ae918a0a3251ffd9ce8d2294bc5f9b681447627 BUG: 1200082 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10804 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/quota : Do unwind if postbuf is NULLAnuradha2015-06-011-1/+1
| | | | | | | | | | | | | | | If postbuf in quota_writev_cbk is NULL directly an unwind should be done. Trying to dereference it will lead to a crash. Change-Id: Idba6ce3cd1bbf37ede96c7f17d01007d6c07057a BUG: 1221577 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10898 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/changelog: Remove inline keyword to avoid warnings (gcc v5.1.1)Anoop C S2015-06-015-16/+16
| | | | | | | | | | | | | | | | | | | | | | | When compiled with gcc5, following warnings were displayed and volume start failed: changelog-helpers.h:499:1: warning: inline function 'changelog_dispatch_event' declared but never defined changelog_dispatch_event (xlator_t *, changelog_priv_t *, changelog_event_t *); gf-changelog-journal-handler.c:692:17: warning: 'list_add_tail' is static but used in inline function 'gf_changelog_queue_journal' which is not static list_add_tail (&entry->list, &jnl_proc->entries); Fix is to remove the keyword from function prototype and definitions. Change-Id: I188b35b7ca087a94d7a48a052b05a6d845e3b74b BUG: 1226307 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/11004 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* quota: retry connecting to quotad on ENOTCONN errorvmallika2015-05-313-25/+111
| | | | | | | | | | | | | | | | | | | | Suppose if there are two volumes vol1 and vol2, and quota is enabled and limit is set on vol1. Now if IO is happening on vol1 and quota is enabled/disabled on vol2, quotad gets restarted and client will receive ENOTCONN in the IO path of vol1. This patch will retry connecting to quotad upto 60sec in a interval of 5sec (12 retries) If not able to connect with 12 retries, then return ENOTCONN Change-Id: Ie7f5d108633ec68ba9cc3a6a61d79680485193e8 BUG: 1211220 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10230 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/quota: prevent statfs frame-loss when an error happens duringvmallika2015-05-311-15/+14
| | | | | | | | | | | | | | | ancestry building. We do quota_build_ancestry in function 'quota_get_limit_dir', suppose if quota_build_ancestry fails, then we don't have a frame saved to continue the statfs FOP and client can hang. Change-Id: I92e25c1510d09444b9d4810afdb6b2a69dcd92c0 BUG: 1178619 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9380 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/shard: Fix issue with readdir(p) fopKrutika Dhananjay2015-05-312-43/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When readdir(p) is performed on '/' and ".shard" happens to be the last of the entries read in a given iteration of dht_readdir(p) (in other words the entry with the highest offset in the dirent list sorted in ascending order of d_offs), shard xlator would delete this entry as part of handling the call so as to avoid exposing its presence to the application. This would cause xlators above (like fuse, readdir-ahead etc) to wind the next readdirp as part of the same req at an offset which is (now) the highest d_off (post deletion of .shard) from the previously unwound list of entries. This offset would be less than that of ".shard" and therefore cause /.shard to be read once again. If by any chance this happens to be the only entry until end-of-directory, shard xlator would delete this entry and unwind with 0 entries, causing the xlator(s) above to think there is nothing more to readdir and the fop is complete. This would prevent DHT from gathering entries from the rest of its subvolumes, causing some entries to disappear. Fix: At the level of shard xlator, if ".shard" happens to be the last entry, make shard xlator wind another readdirp at offset equal to d_off of ".shard". That way, if ".shard" happens to be the only other entry under '/' until end-of-directory, DHT would receive an op_ret=0. This would enable it to wind readdir(p) on the rest of its subvols and gather the complete picture. Also, fixed a bug in shard_lookup_cbk() wherein file_size should be fetched unconditionally in cbk since it is set unconditionally in the wind path, failing which, lookup would be unwound with ia_size and ia_blocks only equal to that of the base file. Change-Id: I6c2bc770f1bcdad51c273c777ae0b42c88c53f61 BUG: 1222379 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10809 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/bit-rot-stub: implement mknod fopRaghavendra Bhat2015-05-311-0/+51
| | | | | | | | | | | | | | | | | | | With the absence of mknod() fop implementation in bitrot stub, further operations that trigger versioning resulted in crashes as they expect the inode context to be valid. Therefore, this patch implements mknod() following similar simantics to fops such as create(). Furthermore, bitrot stub test C program is fixed to stop lying and validate obj versions according to the versioning protocol. Change-Id: If76f252577445d1851d6c13c7e969e864e2183ef BUG: 1221914 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10790 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* featuress/changelog: On snapshot, notify irrespective of failuresKotresh HR2015-05-313-23/+33
| | | | | | | | | | | | | | | | | | | | During snapshot, changelog barrier is enabled and a explicit rollover of changelog is initiated. During rollover of changelog, if any error or changelog is empty, the notification was not sent to reconfigure and hence snapshot was failing because of timeout. This patch addresses it by sending notification irrespective of failures and sends error if any back to barrier. Change-Id: I898af624b44555281a9e43c69066077e0e121c17 BUG: 1225542 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10951 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bitrot: serialize versioningVenky Shankar2015-05-313-32/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current signing interface (fsetxattr()) had couple of issues: One, a signing request (by bitrot daemon) is denied if the version against which an object is to be signed is unequal to the current version of the object (cases where another subsequent modification increments the version). Such request(s) are rejected with EINVAL sent back to the signer resulting in a bunch of errors (in logs) reported by bitrot daemon. Although, the object would be eventaully signed with the version matching the current version, the "lagging" request should be correctly handled. Two, more than one signing request could race against each other with the object getting signed with a version depending on which request ended up last in the race. Although harmless to some extent, such a case could end up marking the object's signature as stale for infinity (if the object is *never* touched) thereby resulting in scrubber skipping the object during verification. This patch fixes these issues by ordering signing request(s) and fixing version comparison checks at the time of signing. Change-Id: I9fa83dfa3be664ba4db61d7f2edc408f4bde77dd BUG: 1221938 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10832 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/qemu-block: Don't unref root inodePranith Kumar K2015-05-301-2/+1
| | | | | | | | | | | | Root inode doesn't participate in ref/unref. Don't do it in fini as by the time fini is called itable would be destroyed. BUG: 1226276 Change-Id: I704d0a3c0813cb8f6c3f1f7d613c89aca8f4f9ad Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11002 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-2975-373/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* worm: Let lock, zero xattrop calls succeedPranith Kumar K2015-05-291-32/+42
| | | | | | | | | | | | | Locks can be taken just to inspect the data as well, so allow them. Xattrops are internal fops so we can allow them as well as longs as it doesn't change the xattr value, i.e. All-zero xattrop. Change-Id: Idc06d2043eb472c064db40d811a80058f0bda378 BUG: 1211123 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10727 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System
* features/bitrot: refactor brick connection logicRaghavendra Bhat2015-05-282-63/+68
| | | | | | | | | | | | | | | | | | | | | | Brick connection was bloated (and not implemented efficiently) with calls which were not required to be called under lock. This resulted in starvation of lock by critical code paths. This eventally did not scale when the number of bricks per volume increases (add-brick and the likes). Also, this patch cleans up some of the weird reconnection logic that added more to the starvation of resources and cleans up uncontrolled growing of log files. Change-Id: I05e737f2a9742944a4a543327d167de2489236a4 BUG: 1207134 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10763 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: NetBSD Build System
* features/bitrot: reimplement scrubbing frequencyVenky Shankar2015-05-285-180/+302
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reimplments existing scrub-frequency mechanism used to schedule scrubber runs. Existing mechanism uses periodic sleeps (waking up periodically on minimum granularity) and performing a number of tracking checks based on counters and sleep times. This patch does away with all the nifty counters and uses timer-wheel to schedule scrub runs. Scheduling changes are peformed by merely calculating the new expiry time and calling mod_timer() [mod_timer_pending() in some cases] making the code more debuggable and easier to follow. This also introduces "hourly" scrubbing tunable as an aid for testing scrubbing during development/testing cycle. One could also implement on-demand scrubbing with ease: by invoking mod_timer() with an expiry of one (1) second, thereby scheduling a scrub run the very next second. Change-Id: I6c7c5f0c6c9f886bf574d88c04cde14b76e60a8b BUG: 1224596 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10893 Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: stub improvements and fixesVenky Shankar2015-05-285-426/+435
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the signing trigger mechanism used by bitrot daemon as a "catch up" meachanism to sign files which _missed_ signing on the last run either due to bitrot being disabled and enabled again or if bitrot is enabled for a volume with existing data. Existing implementation relies on overloading writev() to trigger signing which just by the looks sounded dangerous and I hated it to the core. This change moves all that business to the setxattr interface thereby keeping the writev path strictly for client IO. Why not use IPC fop to trigger signing? There's a need to access the object's inode to perform various maintainance operations. inode is not _directly_ accessible in the IPC fop (although, it can be found via inode_grep() for the object's GFID - the inode just needs to be pinned in memory, which is the case if there's an active fd on the inode). This patch relies on good old technique of overloading fsetxattr() to do the job instead of using IPC fop. There are some pretty nice cleanups along the lines of memory deallocations, unncessary allocations and redundant ref()ing of structures (such as fd's) provided by this patch. All in all - much improved code navigation. Change-Id: Id93fe90b1618802d1a95a5072517dac342b96cb8 BUG: 1224600 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10942 Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Skip block count and size update for directoriesKrutika Dhananjay2015-05-271-0/+2
| | | | | | | | | | Change-Id: Iaa7022c95a8d9c9c471db025ec644e0bcc4eeb29 BUG: 1221104 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10772 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* quota: fix for spurious failurevmallika2015-05-271-20/+31
| | | | | | | | | | | | | | | | | | | | | | During ancestry build, loc path was set to invalid path. path was set to one of its child instead of itself. Because of this quota accounting was going wrong This patch fix the issue Below mentioned tests removed from bad test list as part of patch# 10930 ./tests/basic/ec/quota.t ./tests/basic/quota-nfs.t ./tests/bugs/quota/bug-1035576.t Change-Id: Iaa65b2d968c04c9abcd476d0e9f588cb7fd39294 BUG: 1223798 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10918 Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/locks: Handle virtual getxattrs in more fopsPranith Kumar K2015-05-272-279/+339
| | | | | | | | | | | | With this patch getxattr of inodelk/entrylk counts can be requested in readv/writev/create/unlink/opendir. Change-Id: If7430317ad478a3c753eb33bdf89046cb001a904 BUG: 1165041 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10880 Tested-by: NetBSD Build System Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* uss : implement statfs fop for snapdMohammed Rafi KC2015-05-102-0/+112
| | | | | | | | | | | | | | | snapview-client and snapview-server doesnot have statfs fop implemented Change-Id: I2cdd4c5784414b0549a01af9a28dbc723b7cdc67 BUG: 1176837 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10358 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: scrubber should crawl based on the scrubber frequency valueGaurav Kumar Garg2015-05-103-5/+192
| | | | | | | | | | | | | | | Currently scrubber is crawling all the files continuously. It should crawl files based on the scrubber frequency which user have set. By default scrubber crawling frequency value will be biweekly. Change-Id: I5762a92c1e700134cfe4283d1f631904adbfe31d BUG: 1208131 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10602 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* CTR/Libgfdb: Log typo fixJoseph Fernandes2015-05-101-1/+1
| | | | | | | | | | | | | | Log typo fix for CTR Xlator and Libgfdb Change-Id: I8d7208b9756199f8dc0a72771a3c953b4fa00f3f BUG: 1215896 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10668 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* geo-rep: rename handling in dht volume(changelog changes)Saravanakumar Arumugam2015-05-081-38/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: Glusterfs changelogs are stored in each brick, which records the changes happened in that brick. Georep will run in all the nodes of master and processes changelogs "independently". Processing changelogs is in brick level, but all the fops will be replayed on "slave mount" point. Problem: With a DHT volume, in changelog "internal fops" are NOT recorded. For Rename case, Rename is recorded in "hashed" brick changelog. (DHT's internal fops like creating linkto file, unlink is NOT recorded). This lead us to inconsistent rename operations. For example, Distribute volume created with Two bricks B1, B2. //Consider master volume mounted @ /mnt/master and following operations executed: cd /mnt/master touch f1 // f1 falls on B1 Hash mv f1 f2 // f2 falls on B2 Hash // Here, Changelogs are recorded as below: @B1 CREATE f1 @B2 RENAME f1 f2 Here, race exists between Brick B1 and B2, say B2 will get executed first. Source file f1 itself is "NOT PRESENT", so it will go ahead and create f2 (Current implementation). We have this problem When rename falls in another brick and file is unlinked in Master. Similar kind of issue exists in following case too(multiple rename): CREATE f1 RENAME f1 f2 RENAME f2 f1 Solution: Instead of carrying out "changelogging" at "HASHED volume", carry out at the "CACHED volume". This way we have rename operations carried out where actual files are present. So,Changelog recorded as : @B1 CREATE f1 RENAME f1 f2 Note: This patch is dependent on dht changes from this patch. http://review.gluster.org/10410/ changelog related changes are separated out for review. In changelog, xdata passed from DHT is considered as : 1. In case of unlink (internal operation as part of rename), xdata value is set , it is considered as RENAME and recorded accordingly. 2. In case of rename (Hash and Cache different), xdata value is NOT set, recording rename operation is SKIPPED. BUG: 1141379 Change-Id: Ifca675e6d4ef8c4e3b7ef4a7ec85de8b3a38dc08 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/10220 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kotresh HR <khiremat@redhat.com>
* features/changelog: Fix buffer overflow in snprintfKotresh HR2015-05-081-8/+8
| | | | | | | | | | Change-Id: Ie7e7c6028c7bffe47e60a2e93827e0e8767a3d66 BUG: 1219894 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10687 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* features/bit-rot-stub: versioning of objects in write/truncate fop instead ↵Raghavendra Bhat2015-05-086-339/+1101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of open * This patch brings in the changes where object versioning is done in write and truncate fops instead of tracking them in open and create fops. This model works for both regular and anonymous fds. It also removes the race associated with open calls, create and lookups. This patch follows the below method for object versioning and notifications: Before sending writev on the fd, increase the ongoing version first. This makes anonymous fd write similar to the regular fd write by having the ongoing version increased before doing the write. Do following steps to do versioning: 1) For anonymous fds set the fd context (so that release is invoked) and add the fd context to the list maintained in the inode context. For regular fds the above think would have been done in open itself. 2) Increase the on-disk ongoing version 3) Increase the in memory ongoing version and mark inode as non-dirty 3) Once versioning is successfully done send write operation. If versioning fails, then fail the write fop. 5) In writev_cbk mark inode as modified. Change-Id: I7104391bbe076d8fc49b68745d2ec29a6e92476c BUG: 1207979 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10233 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: Fixing compilation issueAravinda VK2015-05-082-2/+5
| | | | | | | | | | | | This issue introduced due to manual rebase. Change-Id: I0589f4a0a1270190340f419b8022d6483bcf853d Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1219479 Reviewed-on: http://review.gluster.org/10685 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com>
* features/changelog: Avoid creation of empty changelogsSaravanakumar Arumugam2015-05-083-23/+154
| | | | | | | | | | | | | | | | | | An empty changelog when rolled over gets unlinked and indexed with a modified path-name in htime file. The modification is "changelog" not "CHANGELOG" in basename of the empty changelog file. Change-Id: I77fd0b48b5c33c245418f5ac7a9756f08ece24d9 BUG: 1208470 Signed-off-by: Ajeet Jha <ajha@redhat.com> Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/9572 Tested-by: NetBSD Build System Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Scrubber pause/resumeVenky Shankar2015-05-083-9/+58
| | | | | | | | | | | | | | | | | | | | | | With logical scan/scrub split, pausing filesystem scrubber is an override to the thread throttling mechanism, which effectively throttles "down" number of scrubber threads to zero. This causes scanner to wait until threads are spawned again (when resumed) thereby continuing where it left off (since the file tree walk stack is effectively preserved when the main scanner thread is waiting for scrubbers to consume scanned entries). The only catch is when scrubber daemon restarts: file tree walk stack is lost and scrubbing initiates from root. This is probably OK for now (can be changed later to persist parent directory information before entering pause state). Change-Id: I5109a749b7fccd0f5367765078f46e6522dd32a1 BUG: 1208131 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10521 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-075-51/+712
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Implement [f]truncate fopsKrutika Dhananjay2015-05-072-307/+802
| | | | | | | | | | | | | | To-Do: * Make ftruncate work even in the absence of path * Aggregate and update ia_blocks appropriately when a file is truncated to a lower size. Change-Id: Ifd24c2f5e80d2c3bc921261f5481251df8948126 BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10631 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glupy: fix tuntime search path and python module directory layoutEmmanuel Dreyfus2015-05-073-3/+11
| | | | | | | | | | | | | | | | | | | | 1) The glupy.so xlator should embed the runtime search path for the python libraries. Unfortunately, python-config does not gives the appprioate flags, therefore we need to also use pkg-config to obtain them 2) Fix the glupy python module directory layout so that python can import the module without problem That two fixes seems to let glupy.t pass on NetBSD again. BUG: 1129939 Change-Id: I397aa726ab8bf7d91fa0d6d870a30910a5f4a5d9 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10616 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* features/bit-rot: Token Bucket based throttlingVenky Shankar2015-05-076-9/+432
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BitRot daemons (signer & scrubber) are disk/cpu hoggers when left running full throttle. Checksum calculations (especially SHA family of hash routines) can be quite CPU intensive. Moreover periodic disk scans performed by scrubber followed by reading data blocks for hash calculation (which is also done by signer) generate lot of heavy IO request(s). This causes interference with actual client operations (be it a regular client or filesystems daemons such as self-heal, etc..) and results in degraded system performance. This patch introduces throttling based on Token Bucket Filtering[1]. It's a well known algorithm for checking (and ensuring) that data transmission conform to defined limits and generally used in packet switched networks. Linux control groups (Cgroups) uses a variant[2] of this algorithm to provide block device IO throttling (cgroup subsys "blkio": blk-iothrottle). So, why not just live with Cgroups? Cgroups is linux specific. We need to have a throttling mechanism for other supported UNIXes. Moreover, having our own implementation gives much more finer control in terms of tuning it for our needs (plus the simplicity of the alogorithm itself). Ideally, throttling should be a part of server stack (either as a separate translator or integrated with io-threads) since that's the point of entry for IO request(s) from *all* client(s). That way one could selectively throttle IO request(s) based on client PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc.. (*actual* clients can run full throttle). This implementation avoids that deliberately (there needs to be a much more smarter queueing mechanism) and throttles CPU usage for hash calculations. This patch is just the infrastructure part with no interfaces exposed to set various throttling values. The tunable selected here (basically hardcoded) avoids 100% CPU usage during hash calculation (with some bursts cycles). We'd need much more intensive test(s) to assign values for various throttling options (lazy/normal/aggressive). [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8 BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10307 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: Version support for Changelog ParserAravinda VK2015-05-062-11/+70
| | | | | | | | | | | | | | | | | | As optional feature, during unlink, full path will be recorded. Changelog Version number to be bumped up to 1.2. With this patch, parser checks the version number before parsing and handles accordingly. Change-Id: Ic1ad98259c39e417029a08e26a1d4b467817e65a BUG: 1214561 Signed-off-by: Aravinda VK <avishwan@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10166 Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* quota/marker: turn off inode quotas by defaultvmallika2015-05-063-8/+63
| | | | | | | | | | | | | | | | | | | | | inode quota is a new feature implemented in glusterfs-3.7 if quota is enabled in the older version and is upgraded to a new version, we can hit setxattr spike during self-heal of inode quotas. So, when a quota is enabled, turn off inode-quotas with a xlator option. With this patch, we still account for inode quotas but only when a write operation is performed for a particular file. User will be able to query inode quotas once the Inode-quota xlator option is enabled. Change-Id: I52fb28bf7024989ce7bb08ac63a303bf3ec1ec9a BUG: 1209430 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/10152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ctr/xlator: Named lookup heal of pre-existing files, before ctr was ON.Joseph Fernandes2015-05-066-4/+945
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The CTR xlator records file meta (heat/hardlinks) into the data. This works fine for files which are created after ctr xlator is switched ON. But for files which were created before CTR xlator is ON, CTR xlator is not able to record either of the meta i.e heat or hardlinks. Thus making those files immune to promotions/demotions. Solution: The solution that is implemented in this patch is do ctr-db heal of all those pre-existent files, using named lookup. For this purpose we use the inode-xlator context variable option in gluster. The inode-xlator context variable for ctr xlator will have the following, a. A Lock for the context variable b. A hardlink list: This list represents the successful looked up hardlinks. These are the scenarios when the hardlink list is updated: 1) Named-Lookup: Whenever a named lookup happens on a file, in the wind path we copy all required hardlink and inode information to ctr_db_record structure, which resides in the frame->local variable. We dont update the database in wind. During the unwind, we read the information from the ctr_db_record and , Check if the inode context variable is created, if not we create it. Check if the hard link is there in the hardlink list. If its not there we add it to the list and send a update to the database using libgfdb. Please note: The database transaction can fail(and we ignore) as there already might be a record in the db. This update to the db is to heal if its not there. If its there in the list we ignore it. 2) Inode Forget: Whenever an inode forget hits we clear the hardlink list in the inode context variable and delete the inode context variable. Please note: An inode forget may happen for two reason, a. when the inode is delete. b. the in-memory inode is evicted from the inode table due to cache limits. 3) create: whenever a create happens we create the inode context variable and add the hardlink. The database updation is done as usual by ctr. 4) link: whenever a hardlink is created for the inode, we create the inode context variable, if not present, and add the hardlink to the list. 5) unlink: whenever a unlink happens we delete the hardlink from the list. 6) mknod: same as create. 7) rename: whenever a rename happens we update the hardlink in list. if the hardlink was not present for updation, we add the hardlink to the list. What is pending: 1) This solution will only work for named lookups. 2) We dont track afr-self-heal/dht-rebalancer traffic for healing. Change-Id: Ia4bbaf84128ad6ce8c3ddd70bcfa82894c79585f BUG: 1212037 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10370 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>