summaryrefslogtreecommitdiffstats
path: root/xlators/features/bit-rot/src/stub
Commit message (Collapse)AuthorAgeFilesLines
...
* features/bit-rot: stub changes for showing bad objects in the statusRaghavendra Bhat2015-11-226-91/+955
| | | | | | | | | | Change-Id: If905132f6f1df4aebd9ab255e1e8c59902f84fe5 BUG: 1207627 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/12503 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* build: export minimum symbols from xlators for correct resolutionKaleb S. KEITHLEY2015-09-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been lucky that we haven't had any symbol collisions until now. Now we have a collision between the snapview-client's svc_lookup() and libntirpc's svc_lookup() with nfs-ganesha's FSAL_GLUSTER and libgfapi. As a short term solution all the snapview-client's FOP methods were changed to static scope. See http://review.gluster.org/11805. This works in snapview-client because all the FOP methods are defined in a single source file. This solution doesn't work for other xlators with FOP methods defined in multiple source files. To address this we link with libtool's '-export-symbols $symbol-file' (a wrapper around `ld --version-script ...` --- on linux anyway) and only export the minimum required symbols from the xlator sharedlib. N.B. the libtool man page says that the symbol file should be named foo.sym, thus the rename of *.exports to *.sym. While foo.exports worked, we will follow the documentation. Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 1248669 Change-Id: I1de68b3e3be58ae690d8bfb2168bfc019983627c Reviewed-on: http://review.gluster.org/11814 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-011-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bit-rot-stub: fail the fop if inode context get failsRaghavendra Bhat2015-08-112-23/+96
| | | | | | | | | | | | | | | In stub, for fops like readv, writev etc, if the the object is bad, then the fop is denied. But for checking if the object is bad inode context should be checked. Now, if the inode context is not there, then the fop is allowed to continue. This patch fixes it and the fop is unwound with an error, if the inode context is not found. Change-Id: I5ea4d4fc1a91387f7f9d13ca8cb43c88429f02b0 BUG: 1243391 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11449 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bit-rot-stub: handle REOPEN_WAIT on forgotten inodesRaghavendra Bhat2015-07-281-1/+43
| | | | | | | | | | Change-Id: Ia8706ec9b66d78c4e33e7b7faf69f0d113ba68a4 BUG: 1245981 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11729 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/bit-rot-stub: do not allow setxattr and removexattr on bit-rot xattrsRaghavendra Bhat2015-06-272-7/+106
| | | | | | | | | | | | * setxattr and {f}removexattr of versioning, signature and bad-file xattrs are returned with error. Change-Id: Ib423466195d1d8e4c6f80c2906a574e21ed624fb BUG: 1210689 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11389 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bit-rot-stub: deny access to bad objectsRaghavendra Bhat2015-06-274-28/+419
| | | | | | | | | | | | | | | | | * Access to bad objects (especially operations such as open, readv, writev) should be denied to prevent applications from getting wrong data. * Do not allow anyone apart from scrubber to set bad object xattr. * Do not allow bad object xattr to be removed. Change-Id: Ia9185a067233a9f26e3d41d41d11d9a4eb0da827 BUG: 1210689 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11126 Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/bitrot: cleanup, v1Venky Shankar2015-06-251-0/+1
| | | | | | | | | | | | | | | | | | This is a short series of patches (with other cleanups) aimed at cleaning up some of the incorrect assumptions taken in reconfigure() leading to crashes when subvolumes are not fully initialized (as reported here[1] on gluster-devel@). Furthermore, there is some amount of code cleanup to handle disconnection and cleanup up data structure (as part of subsequent patch). [1] http://www.gluster.org/pipermail/gluster-devel/2015-June/045410.html Change-Id: I68ac4bccfbac4bf02fcc31615bd7d2d191021132 BUG: 1231617 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11147 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* bit-rot : New logging framework for bit-rot log messageMohamed Ashiq2015-06-244-53/+230
| | | | | | | | | | | | Change-Id: I83c494f2bb60d29495cd643659774d430325af0a BUG: 1194640 Signed-off-by: Mohamed Ashiq <ashiq333@gmail.com> Reviewed-on: http://review.gluster.org/10297 Tested-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* features/bitrot: fix fd leak in truncate (stub)Venky Shankar2015-06-161-3/+8
| | | | | | | | | | | | | | | | | | | | | | The need to perform object versioning in the truncate() code path required an fd to reuse existing versioning infrastructure that's used by fd based operations (such as writev(), ftruncate(), etc..). This tempted the use of anonymous fd which was never ever unref()'d after use resulting in fd and/or memory leak depending on the code path taken. Versioning resulted in a dangling file descriptor left open in the filesystem effecting the signing process of a given object (no release() would be trigerred, hence no signing would be performed). On the other hand, cases where the object need not be versioned, the anonymous fd in still ref()'d resulting in memory leak (NOTE: there's no "dangling" file descriptor in this case). Change-Id: I29c3d2af9bbc5cd4b8ddf38954080e3c7a44ba61 BUG: 1227996 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/11077 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* features/bit-rot: check for both inmemory and ondisk stalenessRaghavendra Bhat2015-06-151-12/+131
| | | | | | | | | | | | | | | | | | | | | * Let bit-rot stub check both on disk ongoing version, signed version xattrs and the in memory flags in the inode and then decide whether the inode is stale or not. This information is used by one shot crawler in BitD to decide whether to trigger the sign for the object or skip it. NOTE: The above check should be done only for BitD. For scrubber its still the old way of comparing on disk ongoing version with signed version. * BitD's one shot crawler should not sign zero byte objects if they do not contain signature. (Means the object was just created and nothing was written to it). Change-Id: I6941aefc2981bf79a6aeb476e660f79908e165a8 BUG: 1224611 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10947 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bit-rot-stub: implement mknod fopRaghavendra Bhat2015-05-311-0/+51
| | | | | | | | | | | | | | | | | | | With the absence of mknod() fop implementation in bitrot stub, further operations that trigger versioning resulted in crashes as they expect the inode context to be valid. Therefore, this patch implements mknod() following similar simantics to fops such as create(). Furthermore, bitrot stub test C program is fixed to stop lying and validate obj versions according to the versioning protocol. Change-Id: If76f252577445d1851d6c13c7e969e864e2183ef BUG: 1221914 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10790 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bitrot: serialize versioningVenky Shankar2015-05-313-32/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current signing interface (fsetxattr()) had couple of issues: One, a signing request (by bitrot daemon) is denied if the version against which an object is to be signed is unequal to the current version of the object (cases where another subsequent modification increments the version). Such request(s) are rejected with EINVAL sent back to the signer resulting in a bunch of errors (in logs) reported by bitrot daemon. Although, the object would be eventaully signed with the version matching the current version, the "lagging" request should be correctly handled. Two, more than one signing request could race against each other with the object getting signed with a version depending on which request ended up last in the race. Although harmless to some extent, such a case could end up marking the object's signature as stale for infinity (if the object is *never* touched) thereby resulting in scrubber skipping the object during verification. This patch fixes these issues by ordering signing request(s) and fixing version comparison checks at the time of signing. Change-Id: I9fa83dfa3be664ba4db61d7f2edc408f4bde77dd BUG: 1221938 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10832 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* build: do not #include "config.h" in each fileNiels de Vos2015-05-293-15/+0
| | | | | | | | | | | | | | | | | | Instead of including config.h in each file, and have the additional config.h included from the compiler commandline (-include option). When a .c file tests for a certain #define, and config.h was not included, incorrect assumtions were made. With this change, it can not happen again. BUG: 1222319 Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10808 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* features/bitrot: reimplement scrubbing frequencyVenky Shankar2015-05-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reimplments existing scrub-frequency mechanism used to schedule scrubber runs. Existing mechanism uses periodic sleeps (waking up periodically on minimum granularity) and performing a number of tracking checks based on counters and sleep times. This patch does away with all the nifty counters and uses timer-wheel to schedule scrub runs. Scheduling changes are peformed by merely calculating the new expiry time and calling mod_timer() [mod_timer_pending() in some cases] making the code more debuggable and easier to follow. This also introduces "hourly" scrubbing tunable as an aid for testing scrubbing during development/testing cycle. One could also implement on-demand scrubbing with ease: by invoking mod_timer() with an expiry of one (1) second, thereby scheduling a scrub run the very next second. Change-Id: I6c7c5f0c6c9f886bf574d88c04cde14b76e60a8b BUG: 1224596 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10893 Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: stub improvements and fixesVenky Shankar2015-05-283-320/+388
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch refactors the signing trigger mechanism used by bitrot daemon as a "catch up" meachanism to sign files which _missed_ signing on the last run either due to bitrot being disabled and enabled again or if bitrot is enabled for a volume with existing data. Existing implementation relies on overloading writev() to trigger signing which just by the looks sounded dangerous and I hated it to the core. This change moves all that business to the setxattr interface thereby keeping the writev path strictly for client IO. Why not use IPC fop to trigger signing? There's a need to access the object's inode to perform various maintainance operations. inode is not _directly_ accessible in the IPC fop (although, it can be found via inode_grep() for the object's GFID - the inode just needs to be pinned in memory, which is the case if there's an active fd on the inode). This patch relies on good old technique of overloading fsetxattr() to do the job instead of using IPC fop. There are some pretty nice cleanups along the lines of memory deallocations, unncessary allocations and redundant ref()ing of structures (such as fd's) provided by this patch. All in all - much improved code navigation. Change-Id: Id93fe90b1618802d1a95a5072517dac342b96cb8 BUG: 1224600 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10942 Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot-stub: versioning of objects in write/truncate fop instead ↵Raghavendra Bhat2015-05-084-323/+973
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of open * This patch brings in the changes where object versioning is done in write and truncate fops instead of tracking them in open and create fops. This model works for both regular and anonymous fds. It also removes the race associated with open calls, create and lookups. This patch follows the below method for object versioning and notifications: Before sending writev on the fd, increase the ongoing version first. This makes anonymous fd write similar to the regular fd write by having the ongoing version increased before doing the write. Do following steps to do versioning: 1) For anonymous fds set the fd context (so that release is invoked) and add the fd context to the list maintained in the inode context. For regular fds the above think would have been done in open itself. 2) Increase the on-disk ongoing version 3) Increase the in memory ongoing version and mark inode as non-dirty 3) Once versioning is successfully done send write operation. If versioning fails, then fail the write fop. 5) In writev_cbk mark inode as modified. Change-Id: I7104391bbe076d8fc49b68745d2ec29a6e92476c BUG: 1207979 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10233 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Token Bucket based throttlingVenky Shankar2015-05-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BitRot daemons (signer & scrubber) are disk/cpu hoggers when left running full throttle. Checksum calculations (especially SHA family of hash routines) can be quite CPU intensive. Moreover periodic disk scans performed by scrubber followed by reading data blocks for hash calculation (which is also done by signer) generate lot of heavy IO request(s). This causes interference with actual client operations (be it a regular client or filesystems daemons such as self-heal, etc..) and results in degraded system performance. This patch introduces throttling based on Token Bucket Filtering[1]. It's a well known algorithm for checking (and ensuring) that data transmission conform to defined limits and generally used in packet switched networks. Linux control groups (Cgroups) uses a variant[2] of this algorithm to provide block device IO throttling (cgroup subsys "blkio": blk-iothrottle). So, why not just live with Cgroups? Cgroups is linux specific. We need to have a throttling mechanism for other supported UNIXes. Moreover, having our own implementation gives much more finer control in terms of tuning it for our needs (plus the simplicity of the alogorithm itself). Ideally, throttling should be a part of server stack (either as a separate translator or integrated with io-threads) since that's the point of entry for IO request(s) from *all* client(s). That way one could selectively throttle IO request(s) based on client PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc.. (*actual* clients can run full throttle). This implementation avoids that deliberately (there needs to be a much more smarter queueing mechanism) and throttles CPU usage for hash calculations. This patch is just the infrastructure part with no interfaces exposed to set various throttling values. The tunable selected here (basically hardcoded) avoids 100% CPU usage during hash calculation (with some bursts cycles). We'd need much more intensive test(s) to assign values for various throttling options (lazy/normal/aggressive). [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8 BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10307 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot: Follow xattr naming conventionsVenky Shankar2015-05-061-0/+1
| | | | | | | | | | | | | | | | | Instead of "trusted.glusterfs.bit-rot.*" use "trusted.bit-rot.*" NOTE: With this patch, data on existing volumes would be resigned (which should be OK as of now since we do not expect many users as of now :-)) Change-Id: I926c7bca266a9c8f2cb35d57c4d0359aa5cecfa0 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10181 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Mark versioning fsetxattr as internal fopKotresh HR2015-04-111-5/+9
| | | | | | | | | | | | | | | | | Changelog xlator was capturing bitrot-stub's fsetxattr sent for versioning. Since it was using the same frame as of the create fop, there was inconsistency in fop number and gfid of capturing metadata. So fix is to mark fsetxattr used for versioning as internal and add internal fop filter in changelog_fsetxattr. Change-Id: I51ff468995139838b22bf293a59a0713a92ee7a5 BUG: 1170075 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10148 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Venky Shankar <vshankar@redhat.com>
* features/bitrot-stub: Packed format for version xattrVenky Shankar2015-04-091-1/+1
| | | | | | | | | | | | | Using __attribute__ ((__packed__)) for object signature xattr saves some bytes (7 bytes to be particular) occupied by the extended attribute on-disk as compared to the unpacked format. Change-Id: I91a6a0a54aa60e6fd8c357d72f7601b6ed213f2d BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10161 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot-stub: header file update in noinst_HEADERSVenky Shankar2015-04-081-1/+2
| | | | | | | | | | | Missing "bit-rot-object-version.h" causing devrpm failures. Change-Id: I5af326c5871cc468a10dece4772b29eda06c4fa9 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10160 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* bitrot/scrub: Scrubber fixesVenky Shankar2015-04-083-7/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a handful of problem with scrubber which are detailed below. Scrubber used to skip objects for verification due to missing fd iterface to fetch versioning extended attributes. Similar to the inode interface, an fd based interface in POSIX is now introduced. Moreover, this patch also fixes potential false reporting by scrubber due to: An object gets dirtied and signed when scrubber is busy calculatingobject checksum. This is fixed by caching the signed version when an object is first inspected for stalenes, i.e., during pre-compute stage. This version is used to verify checksum in the post-compute stage when the signatures are compared for possible corruption. Side effect of _not_ sending signature length during signing resulted in "truncated" signature to be set for an object. Now, at the time of signing, the signature length is sent and is used in place of invoking strlen() to get signature length (which could have possible 00s). The signature length itself is not persisted in the signature xattr, but is calculated on-the-fly by substracting the xattr length by the "structure" header size. Some of the log entries are made more meaningful (as and aid for debugging). Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269 BUG: 1207624 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10118 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tests/bitrot-stub: Object versioning test(s)Venky Shankar2015-04-082-16/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces basic object versioning test(s) which is required for bitrot detection to work correctly. Basic test(s) such as opening a file in read-only mode, single open, multiple open()s are covered on FUSE mount _only_ as stub does not support anonymous fds yet. For this reason, the test case disables open-behind. Actual verification is implemented as a C source which makes use of the same on-disk data structures as used by the stub code. The data structures are moved to separate header file which is included by the test script. Such modularization helps in future enhancements to keep the version "data type" opaque and provide handful of APIs version checking (equal/greater/etc..). [ This is just a start and should grow over time as stub is enhanced and codebase matures. ] Change-Id: Ibee20e65a15b56bbdd59fd2703f9305b115aec7a BUG: 1201724 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10140 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bitrot-stub: Enhancement to versioning protocolVenky Shankar2015-04-082-57/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | .. and potential bug fixes / memleak. While assigning initial version to an object, both extended attributes (namely, ongoing version and the default signing version) were persisted. This is optimized to just persist the ongoing version along with safe handling of xattr request(s) in it's absence. This is better than the earlier approach as the two xattr sets were not atomic anyway (allowing a request to sneak in between between two set operations). This also allows to perform sanity checks on objects during lookup()/getxattr(): objects with missing ongoing version but presence of signature are possible candidates of tampering (and catching implementation bugs). There were couple of instances in the code where versioning xattrs were incorrectly removed before in-memory versions were initialized, which have been fixed with this patch. A memory leak in the IPC code path is also fixed. Change-Id: I01c690ccfe7156a883582275f40f79a7c10c0900 BUG: 1207054 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10117 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/bit-rot: filesystem scrubberVenky Shankar2015-03-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scrubber performs signature verification for objects that were signed by signer. This is done by recalculating the signature (using the hash algorithm the object was signed with) and verifying it aginst the objects persisted signature. Since the object could be undergoing IO opretaion at the time of hash calculation, the signature may not match objects persisted signature. Bitrot stub provides additional information about the stalesness of an objects signature (determinted by it's versioning mechanism). This additional bit of information is used by scrubber to determine the staleness of the signature, and in such cases the object is skipped verification (although signature staleness is performed twice: once before initiation of hash calculation and another after it (an object could be modified after staleness checks). The implmentation is a part of the bitrot xlator (signer) which acts as a signer or scrubber based on a translator option. As of now the scrub process is ever running (but has some form of weak throttling mechanism during filesystem scan). Going forward, there needs to be some form of scrub scheduling and IO throttling (during hash calculation) tunables (via CLI). Change-Id: I665ce90208f6074b98c5a1dd841ce776627cc6f9 BUG: 1170075 Original-Author: Raghavendra Bhat <rabhat@redhat.com> Original-Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9914 Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Implementation of bit-rot xlatorVenky Shankar2015-03-242-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the "Signer" -- responsible for signing files with their checksums upon last file descriptor close (last release()). The event notification facility provided by the changelog xlator is made use of. Moreover, checksums are as of now SHA256 hash of the object data and is the only available hash at this point of time. Therefore, there is no special "what hash to use" type check, although it's does not take much to add various hashing algorithms to sign objects with. Signatures are stored in extended attributes of the objects along with the the type of hashing used to calculate the signature. This makes thing future proof when other hash types are added. The signature infrastructure is provided by bitrot stub: a little piece of code that sits over the POSIX xlator providing interfaces to "get or set" objects signature and it's staleness. Since objects are signed upon receiving release() notification, pre-existing data which are "never" modified would never be signed. To counter this, an initial crawler thread is spawned The crawler scans the entire brick for objects that are unsigned or "missed" signing due to the server going offline (node reboots, crashes, etc..) and triggers an explicit sign. This would also sign objects when bit-rot is enabled for a volume and/or after upgrade. Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b BUG: 1170075 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9711 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Bitrot StubVenky Shankar2015-03-245-0/+1903
Bitrot stub implements object versioning required for identifying signature freshness. More details about versioning is explained as a part of the "bitrot feature documentation" patch. Change-Id: I2ad70d9eb109ba4a12148ab8d81336afda529ad9 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9709 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>