summaryrefslogtreecommitdiffstats
path: root/xlators/features
Commit message (Collapse)AuthorAgeFilesLines
* quota: marker accounting goes bad with rename while writing a filevmallika2015-07-021-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11403/ > With below test-case, marker accounting becomes bad: > 1) Create a volume with 1 brick > 2) fuse mount > 3) on one terminal write some data > dd if=/dev/zero of=f1 bs=1M count=500 oflag=sync > 4) on another terminal execute below rename operation while the write is > still in progress > for i in {1..50}; do > ii=`expr $i + 1`; > mv f$i f$ii; > done > > remove-xattr is already on while doing rename operation, > we should not be doing again in background when reducing the > parent size. > > Change-Id: I969a64bb559e2341315928b55b99203e9ddee3f2 > BUG: 1235195 > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/11403 > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Tested-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ic37c7f7bd74093ee7e155b305834dbc1fdd24b10 BUG: 1235990 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11425 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* bit-rot : New logging framework for bit-rot log messageMohamed Ashiq2015-07-018-201/+815
| | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10297 Cherry picked from 2f0d36d16c241365760aaa6d857b7a4d438e1042 >Change-Id: I83c494f2bb60d29495cd643659774d430325af0a >BUG: 1194640 >Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> >Reviewed-on: http://review.gluster.org/10297 >Tested-by: Venky Shankar <vshankar@redhat.com> >Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> >Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> >Tested-by: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Venky Shankar <vshankar@redhat.com> Change-Id: I83c494f2bb60d29495cd643659774d430325af0a BUG: 1217722 Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Reviewed-on: http://review.gluster.org/11379 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* stack: use list_head for managing framesKrishnan Parthasarathi2015-07-011-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM -------- statedump requests that traverse call frames of all call stacks in execution may race with a STACK_RESET on a stack. This could crash the corresponding glusterfs process. For e.g, recently we observed this in a regression test case tests/basic/afr/sparse-self-heal.t. FIX --- gf_proc_dump_pending_frames takes a (TRY_LOCK) call_pool->lock before iterating through call frames of all call stacks in progress. With this fix, STACK_RESET removes its call frames under the same lock. Additional info ---------------- This fix makes call_stack_t to use struct list_head in place of custom doubly-linked list implementation. This makes call_frame_t manipulation easier to maintain in the context of STACK_WIND et al. BUG: 1234408 Change-Id: I7e43bccd3994cd9184ab982dba3dbc10618f0d94 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/11095 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> (cherry picked from commit 79e4c7b2fad6db15863efb4e979525b1bd4862ea) Reviewed-on: http://review.gluster.org/11352
* features/marker: Cleanup loc in case of errorsVijay Bellur2015-06-291-3/+1
| | | | | | | | | | | | | | | | Backport of http://review.gluster.org/11074 Missing loc_wipe() for error paths in mq_readdir_cbk() can cause memory leaks. loc_wipe() is now done for both happy and unhappy paths. Change-Id: I882aa5dcca06e25b56a828767fb2b91a1efaf83b BUG: 1228535 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/11098 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* features/changelog: Always log directory rename operationsVijay Bellur2015-06-281-1/+3
| | | | | | | | | | | | | | | Directory renames are being ignored as special renames. Special renames can happen only on files. Hence always log directory rename operations in changelog. Change-Id: I4fbdb3e02e634a39a8846fb2f7a4c6cc2ba74400 BUG: 1235242 Reviewed-On: http://review.gluster.org/11356 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/11378 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* tier/ctr: Ignore creation of T file and Ctr Lookup heal improvememntsDan Lambright2015-06-275-22/+256
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a back port of 11334 1) Ignore creation of T file in ctr_mknod 2) Ignore lookup for T file in ctr_lookup 3) Ctr_lookup: a. If the gfid and pgfid in empty dont record b. Decreased log level for multiple heal attempts c. Inode/File heal happens after an expiry period, which is configurable. d. Hardlink heal happens after an expiry period, which is configurable. > Change-Id: Id8eb5092e78beaec22d05f5283645081619e2452 > BUG: 1235269 > Signed-off-by: Joseph Fernandes <josferna@redhat.com> > Reviewed-on: http://review.gluster.org/11334 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Dan Lambright <dlambrig@redhat.com> > Tested-by: Dan Lambright <dlambrig@redhat.com> Change-Id: Ia28a5cf975e41d318906f707deca447aaa35630f BUG: 1236288 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11446 Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Upcall: Fix an issue with invalidating parent entriesSoumya Koduri2015-06-271-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | Any directory or file creation should result in cache-invalidation requests sent to parent directory. However that is not the case currently due to a bug while processing these requests in the upcall xlator. We need to do invalidation checks on parent inode. Fixed the same. Also fixed an issue with null client entries while sending upcall notifications. This is backport of the below fix - http://review.gluster.org/#/c/11387/ Change-Id: I3da7c79091291ba36fd8f8ebcfebcd77a192f250 BUG: 1236274 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/11387 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Meghana M <mmadhusu@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/11440 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* quota/marker: fix mem-leak in markervmallika2015-06-264-83/+157
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11361/ > When removing contribution xattr, we also need to free > contribution node in memory > Use ref/unref mechanism to handle contribution node memory > > local->xdata should be freed in mq_local_unref > > There is another huge memory consumption happens > in function mq_inspect_directory_xattr_task > where dirty flag is not set > > Change-Id: Ieca3ab4bf410c51259560e778bce4e81b9d888bf > BUG: 1207735 > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/11361 > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> > Tested-by: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Tested-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I3038b41307f30867fa728054469ba917fd625e95 BUG: 1229282 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11401 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/bitrot: log scrub frequency & throttle valuesVenky Shankar2015-06-261-0/+28
| | | | | | | | | | | | Backport of http://review.gluster.org/11190 Change-Id: I56d5236c37a413046b5766320184047a908f2c8d BUG: 1231024 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11397 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: check for both inmemory and ondisk stalenessRaghavendra Bhat2015-06-262-19/+138
| | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10947 * Let bit-rot stub check both on disk ongoing version, signed version xattrs and the in memory flags in the inode and then decide whether the inode is stale or not. This information is used by one shot crawler in BitD to decide whether to trigger the sign for the object or skip it. NOTE: The above check should be done only for BitD. For scrubber its still the old way of comparing on disk ongoing version with signed version. * BitD's one shot crawler should not sign zero byte objects if they do not contain signature. (Means the object was just created and nothing was written to it). Change-Id: I580b45b85f62fc075616ee3da9c15a3c8335d7a8 BUG: 1232199 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11249 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/quota: port QUOTA messages to new logging frameworkSusant Palai2015-06-236-323/+594
| | | | | | | | | | | | | Change-Id: I2fa725323ee9a9959fd105ea014c90e2fad11c14 BUG: 1234297 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/7574 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11341
* quota: allow writes when with ENOENT/ESTALE on active fdvmallika2015-06-221-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11307/ > We may get ENOENT/ESTALE in case of below scenario > fd = open file.txt > unlink file.txt > write on fd > Here build_ancestry can fail as the file is removed. > For now ignore ENOENT/ESTALE on active fd with > writev and fallocate. > We need to re-visit this code once we understand > how other file-system behave in this scenario > > Below patch fixes the issue in DHT: > http://review.gluster.org/#/c/11097 > > Change-Id: I7be683583b808c280e3ea2ddd036c1558a6d53e5 > BUG: 1188242 > Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: Ic836d200689fe6f27d4675bc0ff89063b7dc3882 BUG: 1219358 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11326 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Skip shards resolution if lookup on /.shard returns ENOENTKrutika Dhananjay2015-06-221-22/+118
| | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/11065 This change is done in [f]truncate, rename, unlink and readv. Also, this patch also makes lookup in shard delete GF_CONTENT_KEY as a workaround for the problems with read caching of sparse files by quick-read. A proper solution would involve shard_lookup_cbk() performing a readv, aggregating and ordering the responses and setting it in the xdata before unwinding the response to upper translators, which will be done in a separate patch. Change-Id: I31e5cec8815db0269e664c17ce3e221c55c8863f BUG: 1227572 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11332 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/bitrot: fix fd leak in truncate (stub)Venky Shankar2015-06-191-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/11077 The need to perform object versioning in the truncate() code path required an fd to reuse existing versioning infrastructure that's used by fd based operations (such as writev(), ftruncate(), etc..). This tempted the use of anonymous fd which was never ever unref()'d after use resulting in fd and/or memory leak depending on the code path taken. Versioning resulted in a dangling file descriptor left open in the filesystem effecting the signing process of a given object (no release() would be trigerred, hence no signing would be performed). On the other hand, cases where the object need not be versioned, the anonymous fd in still ref()'d resulting in memory leak (NOTE: there's no "dangling" file descriptor in this case). Change-Id: I29c3d2af9bbc5cd4b8ddf38954080e3c7a44ba61 BUG: 1232179 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11300 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* libgfchangelog: Fix crash in gf_changelog_processKotresh HR2015-06-192-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Crash observed in gf_changelog_process and gf_changelog_callback_invoker. Cause: Assignments to arguments passed to thread is done post thread creation. If the thread created gets scheduled before the assignment and access these variables, it would crash with segmentation fault. Solution: Assignments to arguments are done prior to the thread creation. BUG: 1233044 Change-Id: I520599ab43026d25f4064ce71bd5a8b8e0d4b90a Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11273 Reviewed-on: http://review.gluster.org/11308 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* quota: fix double accounting with rename operationvmallika2015-06-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11264/ > When a rename operation is performed, we are renaming > the file first and performing remove-xattr when reducing > the contri size from parents. > This remove-xattr fails as the file is alreday renamed, > this failure causes reduce-parent-size to abort resulting > in double quota accounting > > This patch fixes the problem. We don't need to perform remove-xattr > operation on a file when performing reduce-parent-size txn as this > will be alreday done before starting reduce-parent-size txn > > Change-Id: If86e3dbb0233f6deaaa90bee72cb0ec1689c7325 > BUG: 1232572 > Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: I7b12962d731ce9acf3ac78a99b2c23491722b78a BUG: 1233117 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11312 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* features/bitrot: tuanble object signing waiting time value for bitrotGaurav Kumar Garg2015-06-172-8/+31
| | | | | | | | | | | | | | | | | Currently bitrot using 120 second waiting time for object to be signed after all fop's released. This signing waiting time value should be tunable. Command for changing the signing waiting time will be #gluster volume bitrot <VOLNAME> signing-time <waiting time value in second> Change-Id: I89f3121564c1bbd0825f60aae6147413a2fbd798 BUG: 1231832 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11105 (cherry picked from commit 554fa0c1315d0b4b78ba35a2d332d7ac0fd07d48) Reviewed-on: http://review.gluster.org/11235 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* quota: don't log error when disk quota exceededvmallika2015-06-171-4/+10
| | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11135 > When disk quota exceeded, quota enforcer logs > alert message, so no need to log error message > as this can fill up the log file > > Change-Id: Ia913f47bc0cedb7c0a9c611330ee5124d3bb6c9d > BUG: 1229609 > Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: I7b0b91965d8eb1b38ef6fa5ce0276a6b6da3372d BUG: 1232135 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11242 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* changetimerecorder : port log messages to a new frameworkMohamed Ashiq2015-06-174-125/+198
| | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/10938/ Cherry picked from 00f9a61fe8884062c141edd662424625d349a377 >Change-Id: I66e7ccc5e62482c3ecf0aab302568e6c9ecdc05d >BUG: 1194640 >Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> >Reviewed-on: http://review.gluster.org/10938 >Tested-by: Gluster Build System <jenkins@build.gluster.com> >Reviewed-by: Joseph Fernandes >Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Change-Id: I66e7ccc5e62482c3ecf0aab302568e6c9ecdc05d BUG: 1217722 Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Reviewed-on: http://review.gluster.org/11195 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes
* upcall: prevent busy loop in reaper threadNiels de Vos2015-06-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | http://review.gluster.org/10342 introduced a cleanup thread for expired client entries. When enabling the 'features.cache-invalidation' volume option, the brick process starts to run in a busy-loop. Obviously this is not intentional, and a process occupying 100% of the cycles on a CPU or core is not wanted. Cherry picked from commit a367d4c6965e1f0da36f17ab6c5fdbd37925ebdd)\: > Change-Id: I453c612d72001f4d8bbecdd5ac07aaed75b43914 > BUG: 1200267 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/11198 > Reviewed-by: soumya k <skoduri@redhat.com> > Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> > Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I453c612d72001f4d8bbecdd5ac07aaed75b43914 BUG: 1231516 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11211 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com>
* features/changelog: Do htime setxattr without XATTR_REPLACE flagKotresh HR2015-06-122-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | HTIME_KEY marks the last changelog rolled over. The xattr is maintained on .glusterfs/changelog/htime/HTIME.TSTAMP file. On every rollover of the changelog file, the xattr is updated. It is being updated with XATTR_REPLACE flag as xattr gets created during changelog enable. But it is once found that the xattrs on the file is cleared and is not reproduced later on. This patch protects that case, if it happens by setting xattr without XATTR_REPLACE flag in failure case. The reason behind doing this in failure case is not to mask the actual cause of xattrs getting cleared. This provides the log message if the original issue still exists but the consequential effects are fixed. Also changed the log messages to depict the events happened during changelog enable. Change-Id: I699ed09a03667fd823d01d65c9c360fa7bc0e455 BUG: 1230694 Reviewed-On: http://review.gluster.org/#/c/11150/ Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11181 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/qemu-block: Don't unref root inodePranith Kumar K2015-06-121-2/+2
| | | | | | | | | | | | | | | | Backport of http://review.gluster.org/11002 http://review.gluster.org/11076 Root inode doesn't participate in ref/unref. Don't do it in fini as by the time fini is called itable would be destroyed. BUG: 1226272 Change-Id: I776d8e9359c8b51763d9d2588a80311434a0ede3 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11047 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com>
* features/changelog: Avoid setattr fop logging during renameSaravanakumar Arumugam2015-06-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When a file is renamed and the (renamed)file's Hashing falls into a different brick, DHT creates a special file(linkto file) in the brick(Hashed subvolume) and carries out setattr operation on that file. Currently, Changelog records this(setattr) operation in Hashed subvolume. glusterfind in turn records this operation as MODIFY operation. So, there is a NEW entry in Cached subvolume and MODIFY entry in Hashed subvolume for the same file. Solution: Avoid logging setattr operation carried out, by marking the operation as internal fop using xdata. In changelog translator, check whether setattr is set as internal fop and skip accordingly. Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd BUG: 1230687 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11183 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/marker : Pass along xdata to lower translatorSachin Pandit2015-06-111-1/+3
| | | | | | | | | | | | | | | | | | The problem was in marker xlator, where during rename a NULL value is passed during STACK_WIND. Marker needs to pass the xdata un-modified to next translator if marker does not rely on that. Change-Id: I9e47e504fd241263987645abfed7ca13c0d54a80 BUG: 1230693 Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/11089 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/11182 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota: Fix ref-leakPranith Kumar K2015-06-111-0/+1
| | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/11045 > Change-Id: I0b44b70f07be441e044d9dfc5c2b64bd5b4cac18 > BUG: 1207735 > Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> > Reviewed-on: http://review.gluster.org/11045 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Tested-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: Id740d74fb5cf7a9b23027dbbb0a9f42616dcf2fc BUG: 1229282 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11124 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* Upcall/cache-invalidation: Ignore fops with frame->root->client not setSoumya Koduri2015-06-092-6/+23
| | | | | | | | | | | | | | | | | | | Server-side internally generated fops like 'quota/marker' will not have any client associated with the frame. Hence we need a check for clients to be valid before processing for upcall cache invalidation. Also fixed an issue with initializing reaper-thread. Added a testcase to test the fix. Change-Id: If7419b98aca383f4b80711c10fef2e0b32498c57 BUG: 1221941 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/10909 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11141 Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/shard: Handle symlinks appropriately in fopsKrutika Dhananjay2015-06-061-5/+15
| | | | | | | | | | | | | | | Backport of: http://review.gluster.org/10995 (f)stat, unlink and rename must skip doing inode_ctx_get() of shard block size on symbolic links. Change-Id: Iaf2502512a5838db137e5e1f0c14b12f5058865f BUG: 1227572 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11066 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* worm: Let lock, zero xattrop calls succeedPranith Kumar K2015-06-051-32/+42
| | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10727 Locks can be taken just to inspect the data as well, so allow them. Xattrops are internal fops so we can allow them as well as longs as it doesn't change the xattr value, i.e. All-zero xattrop. BUG: 1225320 Change-Id: I9e72806e0605ab2938348a87935966909f1a721f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11046 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* quota: retry connecting to quotad on ENOTCONN errorvmallika2015-06-043-25/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/10230/ > Suppose if there are two volumes vol1 and vol2, > and quota is enabled and limit is set on vol1. > Now if IO is happening on vol1 and quota is enabled/disabled > on vol2, quotad gets restarted and client will receive > ENOTCONN in the IO path of vol1. > > This patch will retry connecting to quotad upto 60sec > in a interval of 5sec (12 retries) > If not able to connect with 12 retries, then return ENOTCONN > > Change-Id: Ie7f5d108633ec68ba9cc3a6a61d79680485193e8 > BUG: 1211220 > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/10230 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Tested-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I94d8d4a814a73d69e934f3e77e989e5f3bf2e65a BUG: 1226789 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11024 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/quota: prevent statfs frame loss when an error happens during ancestryvmallika2015-06-041-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | building This is a backport of http://review.gluster.org/#/c/9380/ > We do quota_build_ancestry in function 'quota_get_limit_dir', > suppose if quota_build_ancestry fails, then we don't have a > frame saved to continue the statfs FOP and client can hang. > > Change-Id: I92e25c1510d09444b9d4810afdb6b2a69dcd92c0 > BUG: 1178619 > Signed-off-by: Raghavendra G <rgowdapp@redhat.com> > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/9380 > Tested-by: Gluster Build System <jenkins@build.gluster.com> Change-Id: I998f417d62d6ea4f57f237f547243d44c2da438c BUG: 1226792 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11025 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Fix incorrect parameter to get_lowest_block()Krutika Dhananjay2015-06-031-2/+3
| | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/10804 Due to get_lowest_block() being a macro, what needs to be passed to it is the evaluation of the expression (local->offset - 1), without which its substitution can cause junk values to be assigned to local->first_block. This patch also fixes calls to get_highest_block() where if offset and size are both equal to zero, it could return negative values. Change-Id: I8f1bc54b536587d6af3a5c193434d06dccbf76dc BUG: 1227572 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11051 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/quota : Do unwind if postbuf is NULLAnuradha2015-06-031-1/+1
| | | | | | | | | | | | | | | | If postbuf in quota_writev_cbk is NULL directly an unwind should be done. Trying to dereference it will lead to a crash. Change-Id: Idba6ce3cd1bbf37ede96c7f17d01007d6c07057a BUG: 1227235 Reviewed-on: http://review.gluster.org/10898 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/11040
* features/shard: Fix issue with readdir(p) fopKrutika Dhananjay2015-06-022-43/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/10809 Problem: When readdir(p) is performed on '/' and ".shard" happens to be the last of the entries read in a given iteration of dht_readdir(p) (in other words the entry with the highest offset in the dirent list sorted in ascending order of d_offs), shard xlator would delete this entry as part of handling the call so as to avoid exposing its presence to the application. This would cause xlators above (like fuse, readdir-ahead etc) to wind the next readdirp as part of the same req at an offset which is (now) the highest d_off (post deletion of .shard) from the previously unwound list of entries. This offset would be less than that of ".shard" and therefore cause /.shard to be read once again. If by any chance this happens to be the only entry until end-of-directory, shard xlator would delete this entry and unwind with 0 entries, causing the xlator(s) above to think there is nothing more to readdir and the fop is complete. This would prevent DHT from gathering entries from the rest of its subvolumes, causing some entries to disappear. Fix: At the level of shard xlator, if ".shard" happens to be the last entry, make shard xlator wind another readdirp at offset equal to d_off of ".shard". That way, if ".shard" happens to be the only other entry under '/' until end-of-directory, DHT would receive an op_ret=0. This would enable it to wind readdir(p) on the rest of its subvols and gather the complete picture. Also, fixed a bug in shard_lookup_cbk() wherein file_size should be fetched unconditionally in cbk since it is set unconditionally in the wind path, failing which, lookup would be unwound with ia_size and ia_blocks only equal to that of the base file. Change-Id: I0ff0b48b6c9c12edbef947b6840a77a54c131650 BUG: 1226880 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11031 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/changelog: Remove inline keyword to avoid warnings (gcc v5.1.1)v3.7.1Anoop C S2015-06-015-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/11004 When compiled with gcc5, following warnings were displayed and volume start failed: changelog-helpers.h:499:1: warning: inline function 'changelog_dispatch_event' declared but never defined changelog_dispatch_event (xlator_t *, changelog_priv_t *, changelog_event_t *); gf-changelog-journal-handler.c:692:17: warning: 'list_add_tail' is static but used in inline function 'gf_changelog_queue_journal' which is not static list_add_tail (&entry->list, &jnl_proc->entries); Fix is to remove the keyword from function prototype and definitions. Change-Id: I188b35b7ca087a94d7a48a052b05a6d845e3b74b BUG: 1226853 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/11030 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* featuress/changelog: On snapshot, notify irrespective of failuresKotresh HR2015-05-313-23/+33
| | | | | | | | | | | | | | | | | | | | During snapshot, changelog barrier is enabled and a explicit rollover of changelog is initiated. During rollover of changelog, if any error or changelog is empty, the notification was not sent to reconfigure and hence snapshot was failing because of timeout. This patch addresses it by sending notification irrespective of failures and sends error if any back to barrier. BUG: 1225543 Change-Id: I971ef3bdc63bb50bda0b655e55cd814e44254ba9 Reviewed-On: http://review.gluster.org/10951 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10988 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bit-rot-stub: implement mknod fopRaghavendra Bhat2015-05-311-0/+51
| | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10790 With the absence of mknod() fop implementation in bitrot stub, further operations that trigger versioning resulted in crashes as they expect the inode context to be valid. Therefore, this patch implements mknod() following similar simantics to fops such as create(). Furthermore, bitrot stub test C program is fixed to stop lying and validate obj versions according to the versioning protocol. Change-Id: If76f252577445d1851d6c13c7e969e864e2183ef BUG: 1226139 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10987 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bitrot: serialize versioningVenky Shankar2015-05-313-32/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10832 Current signing interface (fsetxattr()) had couple of issues: One, a signing request (by bitrot daemon) is denied if the version against which an object is to be signed is unequal to the current version of the object (cases where another subsequent modification increments the version). Such request(s) are rejected with EINVAL sent back to the signer resulting in a bunch of errors (in logs) reported by bitrot daemon. Although, the object would be eventaully signed with the version matching the current version, the "lagging" request should be correctly handled. Two, more than one signing request could race against each other with the object getting signed with a version depending on which request ended up last in the race. Although harmless to some extent, such a case could end up marking the object's signature as stale for infinity (if the object is *never* touched) thereby resulting in scrubber skipping the object during verification. This patch fixes these issues by ordering signing request(s) and fixing version comparison checks at the time of signing. Change-Id: I9fa83dfa3be664ba4db61d7f2edc408f4bde77dd BUG: 1224650 Signed-off-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/10900 Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/bitrot: refactor brick connection logicRaghavendra Bhat2015-05-302-63/+68
| | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/10763 Brick connection was bloated (and not implemented efficiently) with calls which were not required to be called under lock. This resulted in starvation of lock by critical code paths. This eventally did not scale when the number of bricks per volume increases (add-brick and the likes). Also, this patch cleans up some of the weird reconnection logic that added more to the starvation of resources and cleans up uncontrolled growing of log files. Change-Id: I05e737f2a9742944a4a543327d167de2489236a4 BUG: 1226146 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10986 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/bitrot: reimplement scrubbing frequencyVenky Shankar2015-05-305-180/+302
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch reimplments existing scrub-frequency mechanism used to schedule scrubber runs. Existing mechanism uses periodic sleeps (waking up periodically on minimum granularity) and performing a number of tracking checks based on counters and sleep times. This patch does away with all the nifty counters and uses timer-wheel to schedule scrub runs. Scheduling changes are peformed by merely calculating the new expiry time and calling mod_timer() [mod_timer_pending() in some cases] making the code more debuggable and easier to follow. This also introduces "hourly" scrubbing tunable as an aid for testing scrubbing during development/testing cycle. One could also implement on-demand scrubbing with ease: by invoking mod_timer() with an expiry of one (1) second, thereby scheduling a scrub run the very next second. Change-Id: I6c7c5f0c6c9f886bf574d88c04cde14b76e60a8b BUG: 1224647 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10902 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/bitrot: stub improvements and fixesVenky Shankar2015-05-305-426/+435
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the signing trigger mechanism used by bitrot daemon as a "catch up" meachanism to sign files which _missed_ signing on the last run either due to bitrot being disabled and enabled again or if bitrot is enabled for a volume with existing data. Existing implementation relies on overloading writev() to trigger signing which just by the looks sounded dangerous and I hated it to the core. This change moves all that business to the setxattr interface thereby keeping the writev path strictly for client IO. Why not use IPC fop to trigger signing? There's a need to access the object's inode to perform various maintainance operations. inode is not _directly_ accessible in the IPC fop (although, it can be found via inode_grep() for the object's GFID - the inode just needs to be pinned in memory, which is the case if there's an active fd on the inode). This patch relies on good old technique of overloading fsetxattr() to do the job instead of using IPC fop. There are some pretty nice cleanups along the lines of memory deallocations, unncessary allocations and redundant ref()ing of structures (such as fd's) provided by this patch. All in all - much improved code navigation. Change-Id: Id93fe90b1618802d1a95a5072517dac342b96cb8 BUG: 1225709 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10953 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* quota: fix for spurious failurevmallika2015-05-281-20/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of http://review.gluster.org/#/c/10918/ > During ancestry build, loc path was set to invalid > path. path was set to one of its child instead > of itself. Because of this quota accounting was > going wrong > > This patch fix the issue > > Below mentioned tests removed from bad test list > as part of patch# 10930 > ./tests/basic/ec/quota.t > ./tests/basic/quota-nfs.t > ./tests/bugs/quota/bug-1035576.t > > Change-Id: Iaa65b2d968c04c9abcd476d0e9f588cb7fd39294 > BUG: 1223798 > Signed-off-by: vmallika <vmallika@redhat.com> > Reviewed-on: http://review.gluster.org/10918 > Tested-by: NetBSD Build System > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I2986d18c111790bef8f2a7a1df5e43f46755e85e BUG: 1224894 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10958 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/shard: Skip block count and size update for directoriesKrutika Dhananjay2015-05-281-0/+2
| | | | | | | | | | | Backport of: http://review.gluster.org/10772 Change-Id: I3594641ef0bf6a17e1ceab3c9ad87ef18b981d2e BUG: 1225922 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10972 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/locks: Handle virtual getxattrs in more fopsPranith Kumar K2015-05-282-279/+339
| | | | | | | | | | | | | | Backport of http://review.gluster.com/10880 With this patch getxattr of inodelk/entrylk counts can be requested in readv/writev/create/unlink/opendir. BUG: 1225279 Change-Id: Ifb9378ce650377e67a8601147eac95cfbdf0abf0 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10925 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* uss : implement statfs fop for snapdMohammed Rafi KC2015-05-102-0/+112
| | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/10358/ snapview-client and snapview-server doesnot have statfs fop implemented Change-Id: I2cdd4c5784414b0549a01af9a28dbc723b7cdc67 BUG: 1218741 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10740 Tested-by: NetBSD Build System Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bit-rot-stub: versioning of objects in write/truncate fop instead ↵Raghavendra Bhat2015-05-106-339/+1101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of open * This patch brings in the changes where object versioning is done in write and truncate fops instead of tracking them in open and create fops. This model works for both regular and anonymous fds. It also removes the race associated with open calls, create and lookups. This patch follows the below method for object versioning and notifications: Before sending writev on the fd, increase the ongoing version first. This makes anonymous fd write similar to the regular fd write by having the ongoing version increased before doing the write. Do following steps to do versioning: 1) For anonymous fds set the fd context (so that release is invoked) and add the fd context to the list maintained in the inode context. For regular fds the above think would have been done in open itself. 2) Increase the on-disk ongoing version 3) Increase the in memory ongoing version and mark inode as non-dirty 3) Once versioning is successfully done send write operation. If versioning fails, then fail the write fop. 5) In writev_cbk mark inode as modified. > Change-Id: I7104391bbe076d8fc49b68745d2ec29a6e92476c > BUG: 1207979 > Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> > Reviewed-on: http://review.gluster.org/10233 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I4bb86989b5fab02b9ed2950798b1a80e566f1024 BUG: 1220041 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10722 Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bitrot: scrubber should crawl based on the scrubber frequency valueGaurav Kumar Garg2015-05-103-5/+192
| | | | | | | | | | | | | | | Currently scrubber is crawling all the files continuously. It should crawl files based on the scrubber frequency which user have set. By default scrubber crawling frequency value will be biweekly. Change-Id: I5762a92c1e700134cfe4283d1f631904adbfe31d BUG: 1220068 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10739 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/bitrot: Scrubber pause/resumeVenky Shankar2015-05-103-9/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | With logical scan/scrub split, pausing filesystem scrubber is an override to the thread throttling mechanism, which effectively throttles "down" number of scrubber threads to zero. This causes scanner to wait until threads are spawned again (when resumed) thereby continuing where it left off (since the file tree walk stack is effectively preserved when the main scanner thread is waiting for scrubbers to consume scanned entries). The only catch is when scrubber daemon restarts: file tree walk stack is lost and scrubbing initiates from root. This is probably OK for now (can be changed later to persist parent directory information before entering pause state). > Change-Id: I5109a749b7fccd0f5367765078f46e6522dd32a1 > BUG: 1208131 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10521 > Reviewed-by: Vijay Bellur <vbellur@redhat.com> > Tested-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I9b60f2ce24ca3787423a45ec7d502f89215fe45f Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10721 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-105-51/+712
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. > Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f > BUG: 1207020 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10511 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I5a125b2d0ac7dafd3e278b7fe4c6c9dd07af76dd Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10720 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* features/bit-rot: Token Bucket based throttlingVenky Shankar2015-05-106-9/+432
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BitRot daemons (signer & scrubber) are disk/cpu hoggers when left running full throttle. Checksum calculations (especially SHA family of hash routines) can be quite CPU intensive. Moreover periodic disk scans performed by scrubber followed by reading data blocks for hash calculation (which is also done by signer) generate lot of heavy IO request(s). This causes interference with actual client operations (be it a regular client or filesystems daemons such as self-heal, etc..) and results in degraded system performance. This patch introduces throttling based on Token Bucket Filtering[1]. It's a well known algorithm for checking (and ensuring) that data transmission conform to defined limits and generally used in packet switched networks. Linux control groups (Cgroups) uses a variant[2] of this algorithm to provide block device IO throttling (cgroup subsys "blkio": blk-iothrottle). So, why not just live with Cgroups? Cgroups is linux specific. We need to have a throttling mechanism for other supported UNIXes. Moreover, having our own implementation gives much more finer control in terms of tuning it for our needs (plus the simplicity of the alogorithm itself). Ideally, throttling should be a part of server stack (either as a separate translator or integrated with io-threads) since that's the point of entry for IO request(s) from *all* client(s). That way one could selectively throttle IO request(s) based on client PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc.. (*actual* clients can run full throttle). This implementation avoids that deliberately (there needs to be a much more smarter queueing mechanism) and throttles CPU usage for hash calculations. This patch is just the infrastructure part with no interfaces exposed to set various throttling values. The tunable selected here (basically hardcoded) avoids 100% CPU usage during hash calculation (with some bursts cycles). We'd need much more intensive test(s) to assign values for various throttling options (lazy/normal/aggressive). [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket > Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8 > BUG: 1207020 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10307 > Reviewed-by: Vijay Bellur <vbellur@redhat.com> > Tested-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I034ba1095aa3bfc3212a67a63ffb931431b372f6 Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10719 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* features/bitrot: Follow xattr naming conventionsVenky Shankar2015-05-103-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*" NOTE: With this patch, data on existing volumes would be resigned (which should be OK as of now since we do not expect many users as of now :-)) > Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0 > BUG: 1170075 > Signed-off-by: Venky Shankar <vshankar@redhat.com> > Reviewed-on: http://review.gluster.org/10181 > Tested-by: NetBSD Build System > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I3c18d7dc2db4beaca6e8d8d231b4171a7b18795f Signed-off-by: Venky Shankar <vshankar@redhat.com> BUG: 1220041 Reviewed-on: http://review.gluster.org/10718 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>