summaryrefslogtreecommitdiffstats
path: root/xlators/features/bit-rot/src/bitd/bit-rot-scrub.c
Commit message (Collapse)AuthorAgeFilesLines
* features / bitrot: Prevent spurious pthread_cond_wait() wakeupVenky Shankar2016-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | pthread_cond_wait() is prone to spurious wakeups and it's utmost necessarry to check a boolean predicate for thread continuation. See man(3) pthread_cond_wait() for details. The following is done in bitrot scrubber: if (list_empty (&fsscrub->scrublist)) pthread_cond_wait (&fsscrub->cond, &fsscrub->mutex); followed by: list_first_entry (&fsscrub->scrublist, ...) A spurious wakeup from pthread_cond_wait() with the absence of list_empty() check causes list_first_entry() to return garbage. Change-Id: I08786b9686b5503fcad6127e4c2a2cfac4bb7849 BUG: 1302201 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/13302 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* bitrot: getting correct value of scrub stat'sGaurav Kumar Garg2015-12-141-17/+120
| | | | | | | | | | | | | | | | | | | When user execute bitrot scrub status command then gluster is not giving correct value of Number of Scrubbed files, Number of Unsigned files, Last completed scrub time, Duration of last scrub. With this patch scrub status will give correct value for all the above fields. Change-Id: Ic966f76d22db5b0c889e6386a1c2219afbda1f49 BUG: 1285989 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/12776 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bit-rot: scrubber changes for getting the list of bad objects from stubRaghavendra Bhat2015-11-221-0/+277
| | | | | | | | | | Change-Id: I62885e4aba4a9b345db3c78c3291d563ff3d3567 BUG: 1207627 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/12654 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-011-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bitrot: Fix scrubber frequency setKotresh HR2015-08-231-5/+19
| | | | | | | | | | | | | | | | | | | | | When bitrot is configured on multiple volumes in a cluster and scrubber-frequency is changed for one volume, it is resetting frequency for all other volumes w.r.t to its scrubber-frequency. This should not happen. Changing scrubber-frequency should affect only that volume on which it is set. This patch fixes the issue. Also restricted the logs to the configure volume. Change-Id: I90d6e864b131e3d8dd4010079a00f924032f2098 BUG: 1252825 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11897 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* bitrot: Scrubber log should mark bad file as a ALERT in the scrubber logGaurav Kumar Garg2015-08-201-2/+2
| | | | | | | | | | | | If bad file detected by scrubber then scrubber should log that bad file as a ALERT message in scrubber log. Change-Id: I410429e78fd3768655230ac028fa66f7fc24b938 BUG: 1240218 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/11965 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/bitrot: Fix rescheduling scrub-frequencyKotresh HR2015-08-121-20/+11
| | | | | | | | | | | | | | | | | While rescheduling scrub frequency, boot time of the brick was considered where it is not required and also delta is calculated using unsigned int resulting in the loss of fractional part leading to wrong scrub frequency. Boot time is completely removed and delta calculation is simplified. Change-Id: If54697389f663afc86408dc8a01a3ea07e00f2dc BUG: 1251042 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/11853 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bitrot: Scrub log improvementVenky Shankar2015-07-101-1/+4
| | | | | | | | | Change-Id: I4937a578185ebacd2558cb8e22f130cd10193188 BUG: 1240219 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11547 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* features/bitrot: convert pending gf_log() to gf_msg()Venky Shankar2015-06-251-13/+19
| | | | | | | | | Change-Id: Idfd245327b485459ccbda503510b8ca0127bb66c BUG: 1231619 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11396 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* features/bitrot: handle scrub states via state machineVenky Shankar2015-06-251-32/+149
| | | | | | | | | | | | | | A bunch of command line options for scrubber tempted the use of state machine to track current state of scrubber under various circumstances where the options could be in effect. Change-Id: Id614bb2e6af30a90d2391ea31ae0a3edeb4e0d69 BUG: 1231619 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11149 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bitrot: cleanup, v2Venky Shankar2015-06-251-8/+19
| | | | | | | | | | | | | | | This patch uses "cleanup, v1" infrastrcuture to cleanup scrubber (data structures, threads, timers, etc..) on brick disconnection. Signer is not cleaned up yet: probably would be done as part of another patch. Change-Id: I78a92b8a7f02b2f39078aa9a5a6b101fc499fd70 BUG: 1231619 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11148 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* bit-rot : New logging framework for bit-rot log messageMohamed Ashiq2015-06-241-44/+57
| | | | | | | | | | | | Change-Id: I83c494f2bb60d29495cd643659774d430325af0a BUG: 1194640 Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Reviewed-on: http://review.gluster.org/10297 Tested-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bitrot: log scrub frequency & throttle valuesVenky Shankar2015-06-201-0/+28
| | | | | | | | | | Change-Id: I56d5236c37a413046b5766320184047a908f2c8d BUG: 1231620 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11190 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-291-5/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/bitrot: reimplement scrubbing frequencyVenky Shankar2015-05-281-165/+208
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reimplments existing scrub-frequency mechanism used to schedule scrubber runs. Existing mechanism uses periodic sleeps (waking up periodically on minimum granularity) and performing a number of tracking checks based on counters and sleep times. This patch does away with all the nifty counters and uses timer-wheel to schedule scrub runs. Scheduling changes are peformed by merely calculating the new expiry time and calling mod_timer() [mod_timer_pending() in some cases] making the code more debuggable and easier to follow. This also introduces "hourly" scrubbing tunable as an aid for testing scrubbing during development/testing cycle. One could also implement on-demand scrubbing with ease: by invoking mod_timer() with an expiry of one (1) second, thereby scheduling a scrub run the very next second. Change-Id: I6c7c5f0c6c9f886bf574d88c04cde14b76e60a8b BUG: 1224596 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10893 Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: scrubber should crawl based on the scrubber frequency valueGaurav Kumar Garg2015-05-101-5/+186
| | | | | | | | | | | | | | | Currently scrubber is crawling all the files continuously. It should crawl files based on the scrubber frequency which user have set. By default scrubber crawling frequency value will be biweekly. Change-Id: I5762a92c1e700134cfe4283d1f631904adbfe31d BUG: 1208131 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10602 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Scrubber pause/resumeVenky Shankar2015-05-081-9/+56
| | | | | | | | | | | | | | | | | | | | | | With logical scan/scrub split, pausing filesystem scrubber is an override to the thread throttling mechanism, which effectively throttles "down" number of scrubber threads to zero. This causes scanner to wait until threads are spawned again (when resumed) thereby continuing where it left off (since the file tree walk stack is effectively preserved when the main scanner thread is waiting for scrubbers to consume scanned entries). The only catch is when scrubber daemon restarts: file tree walk stack is lost and scrubbing initiates from root. This is probably OK for now (can be changed later to persist parent directory information before entering pause state). Change-Id: I5109a749b7fccd0f5367765078f46e6522dd32a1 BUG: 1208131 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10521 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-071-20/+545
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Follow xattr naming conventionsVenky Shankar2015-05-061-1/+1
| | | | | | | | | | | | | | | | | Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*" NOTE: With this patch, data on existing volumes would be resigned (which should be OK as of now since we do not expect many users as of now :-)) Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10181 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* bitrot: Scrubber log should report 'bad' file detection as ALERT in logGaurav Kumar Garg2015-04-301-2/+2
| | | | | | | | | | | | If scrubber detect any bad object by mismatching of checksum of scrubber and signer then log messages shold come as a Alert instead of warning. Change-Id: I075d80700cbe6182e525a04419a80ab18419ff91 BUG: 1210687 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10226 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* bitrot/scrub: fix induced throttling in syncop_ftw_throttle()Venky Shankar2015-04-261-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | Failing to reset scanning counter causes "incorrect" delay of around 50 seconds per directory entry. This causes scrubber to run extremely slowly. [ NOTE: This is a temporary fix. With the introduction of token bucket based throttling, inducing throttle via sleep() call would be unneeded. ] Also, fix logging messages in scrubber to log brick and full path of the object which is identified/marked as corrupted. Change-Id: Id501bd15dcdbd8a09613f80f9d84050304740027 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10375 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
* libglusterfs/syncop: Add xdata to all syncop callsRaghavendra Talur2015-04-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for xdata in both the request and response path of syncops. Few calls like lookup already had the support; have renamed variables in few places to maintain uniformity. xdata passed downwards is known as xdata_in and xdata passed upwards is known as xdata_out. There is an old patch by Jeff Darcy at http://review.gluster.org/#/c/8769/3 which does the same for some selected calls. It also brings in xdata support at gfapi level. xdata support at gfapi level would be introduced in subsequent patches. Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc BUG: 1158621 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9859 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* bitrot/scrub: Scrubber fixesVenky Shankar2015-04-081-73/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a handful of problem with scrubber which are detailed below. Scrubber used to skip objects for verification due to missing fd iterface to fetch versioning extended attributes. Similar to the inode interface, an fd based interface in POSIX is now introduced. Moreover, this patch also fixes potential false reporting by scrubber due to: An object gets dirtied and signed when scrubber is busy calculatingobject checksum. This is fixed by caching the signed version when an object is first inspected for stalenes, i.e., during pre-compute stage. This version is used to verify checksum in the post-compute stage when the signatures are compared for possible corruption. Side effect of _not_ sending signature length during signing resulted in "truncated" signature to be set for an object. Now, at the time of signing, the signature length is sent and is used in place of invoking strlen() to get signature length (which could have possible 00s). The signature length itself is not persisted in the signature xattr, but is calculated on-the-fly by substracting the xattr length by the "structure" header size. Some of the log entries are made more meaningful (as and aid for debugging). Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269 BUG: 1207624 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10118 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: filesystem scrubberVenky Shankar2015-03-241-0/+291
Scrubber performs signature verification for objects that were signed by signer. This is done by recalculating the signature (using the hash algorithm the object was signed with) and verifying it aginst the objects persisted signature. Since the object could be undergoing IO opretaion at the time of hash calculation, the signature may not match objects persisted signature. Bitrot stub provides additional information about the stalesness of an objects signature (determinted by it's versioning mechanism). This additional bit of information is used by scrubber to determine the staleness of the signature, and in such cases the object is skipped verification (although signature staleness is performed twice: once before initiation of hash calculation and another after it (an object could be modified after staleness checks). The implmentation is a part of the bitrot xlator (signer) which acts as a signer or scrubber based on a translator option. As of now the scrub process is ever running (but has some form of weak throttling mechanism during filesystem scan). Going forward, there needs to be some form of scrub scheduling and IO throttling (during hash calculation) tunables (via CLI). Change-Id: I665ce90208f6074b98c5a1dd841ce776627cc6f9 BUG: 1170075 Original-Author: Raghavendra Bhat <rabhat@redhat.com> Original-Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9914 Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>