| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of open
* This patch brings in the changes where object versioning is done in write and
truncate fops instead of tracking them in open and create fops. This model
works for both regular and anonymous fds. It also removes the race associated
with open calls, create and lookups.
This patch follows the below method for object versioning and notifications:
Before sending writev on the fd, increase the ongoing
version first. This makes anonymous fd write similar to the regular
fd write by having the ongoing version increased before doing the
write.
Do following steps to do versioning:
1) For anonymous fds set the fd context (so that release is invoked) and add
the fd context to the list maintained in the inode context.
For regular fds the above think would have been done in open itself.
2) Increase the on-disk ongoing version
3) Increase the in memory ongoing version and mark inode as non-dirty
3) Once versioning is successfully done send write operation. If
versioning fails, then fail the write fop.
5) In writev_cbk mark inode as modified.
> Change-Id: I7104391bbe076d8fc49b68745d2ec29a6e92476c
> BUG: 1207979
> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
> Reviewed-on: http://review.gluster.org/10233
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Change-Id: I4bb86989b5fab02b9ed2950798b1a80e566f1024
BUG: 1220041
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-on: http://review.gluster.org/10722
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces multithreaded filesystem scrubber based
on throttling option configured for a particular volume. The
implementation "logically" breaks scanning and scrubbing with
the number of scrubber threads auto-configured depending upon
the throttle configuration. Scanning (crawling) is left single
threaded (per brick) with entries scrubbed in bulk. On reaching
this "bulk" watermark, scanner waits until entries are scrubbed.
Bricks for a particular volume have a set of thread(s) assigned
for scrubbing, with entries for each brick scrubbed in a round
robin fashion to avoid scrub "stalls" when a brick (out of N
bricks) is under active scrubbing.
This mechanism helps us implement "pause/resume" with ease: all
one need to do is to cleanup scrubber threads and let the main
scanner thread "wait" untill scrubbing is resumed (where the
scrubber thread(s) are spawned again), therefore continuing
where we left off (unless we restart the deamons, where crawl
initiates from root directory again, but I guess that's OK).
[
NOTE:
Throttling is optional for the signer daemon, without which
it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER"
predefined in CFLAGS enables CPU throttling (during checksum
calculation) thereby avoiding high CPU usage.
]
Subsequent patches would introduce CPU throttling during hash
calculation for scrubber.
> Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f
> BUG: 1207020
> Signed-off-by: Venky Shankar <vshankar@redhat.com>
> Reviewed-on: http://review.gluster.org/10511
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Change-Id: I5a125b2d0ac7dafd3e278b7fe4c6c9dd07af76dd
Signed-off-by: Venky Shankar <vshankar@redhat.com>
BUG: 1220041
Reviewed-on: http://review.gluster.org/10720
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BitRot daemons (signer & scrubber) are disk/cpu hoggers when left
running full throttle. Checksum calculations (especially SHA family
of hash routines) can be quite CPU intensive. Moreover periodic
disk scans performed by scrubber followed by reading data blocks
for hash calculation (which is also done by signer) generate lot
of heavy IO request(s). This causes interference with actual client
operations (be it a regular client or filesystems daemons such as
self-heal, etc..) and results in degraded system performance.
This patch introduces throttling based on Token Bucket Filtering[1].
It's a well known algorithm for checking (and ensuring) that data
transmission conform to defined limits and generally used in packet
switched networks. Linux control groups (Cgroups) uses a variant[2]
of this algorithm to provide block device IO throttling (cgroup
subsys "blkio": blk-iothrottle).
So, why not just live with Cgroups?
Cgroups is linux specific. We need to have a throttling mechanism
for other supported UNIXes. Moreover, having our own implementation
gives much more finer control in terms of tuning it for our needs
(plus the simplicity of the alogorithm itself).
Ideally, throttling should be a part of server stack (either as a
separate translator or integrated with io-threads) since that's
the point of entry for IO request(s) from *all* client(s). That
way one could selectively throttle IO request(s) based on client
PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc..
(*actual* clients can run full throttle). This implementation
avoids that deliberately (there needs to be a much more smarter
queueing mechanism) and throttles CPU usage for hash calculations.
This patch is just the infrastructure part with no interfaces
exposed to set various throttling values. The tunable selected
here (basically hardcoded) avoids 100% CPU usage during hash
calculation (with some bursts cycles). We'd need much more
intensive test(s) to assign values for various throttling
options (lazy/normal/aggressive).
[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket
> Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8
> BUG: 1207020
> Signed-off-by: Venky Shankar <vshankar@redhat.com>
> Reviewed-on: http://review.gluster.org/10307
> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
> Tested-by: Vijay Bellur <vbellur@redhat.com>
Change-Id: I034ba1095aa3bfc3212a67a63ffb931431b372f6
Signed-off-by: Venky Shankar <vshankar@redhat.com>
BUG: 1220041
Reviewed-on: http://review.gluster.org/10719
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Gaurav Kumar Garg <ggarg@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the "Signer" -- responsible for signing files with their
checksums upon last file descriptor close (last release()).
The event notification facility provided by the changelog xlator
is made use of.
Moreover, checksums are as of now SHA256 hash of the object data
and is the only available hash at this point of time. Therefore,
there is no special "what hash to use" type check, although it's
does not take much to add various hashing algorithms to sign
objects with. Signatures are stored in extended attributes of the
objects along with the the type of hashing used to calculate the
signature. This makes thing future proof when other hash types
are added. The signature infrastructure is provided by bitrot
stub: a little piece of code that sits over the POSIX xlator
providing interfaces to "get or set" objects signature and it's
staleness.
Since objects are signed upon receiving release() notification,
pre-existing data which are "never" modified would never be
signed. To counter this, an initial crawler thread is spawned
The crawler scans the entire brick for objects that are unsigned
or "missed" signing due to the server going offline (node reboots,
crashes, etc..) and triggers an explicit sign. This would also
sign objects when bit-rot is enabled for a volume and/or after
upgrade.
Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b
BUG: 1170075
Original-Author: Raghavendra Bhat <raghavendra@redhat.com>
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9711
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
Bitrot stub implements object versioning required for identifying
signature freshness. More details about versioning is explained
as a part of the "bitrot feature documentation" patch.
Change-Id: I2ad70d9eb109ba4a12148ab8d81336afda529ad9
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9709
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|