| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What:
If dht_open is called on a migrating file after the inode_ctx is set,
subsequent FOPs on that fd do not open the fd on the dst subvol.
This is seen when the open-ftruncate-close sequence is repeatedly
called on a migrating file.
A second call to the sequence described above causes dht_truncate_cbk
to call dht_truncate2 as the dht_inode_ctx was already set by the first
call. As dht_rebalance_in_progress_check is not called, the fd is not
opened on the dst subvol.
On a distributed-replicate volume, this causes AFR to
open the fd using afr_fix_open, but with the wrong flags, causing
posix_ftruncate to fail with EINVAL.
The fix: We require fd specific information to make a decision while
handling migrating files.
Set the fd_ctx to indicate the fd has been opened on the dst subvol
and check if it has been set while processing Phase1/Phase2 checks
in the FOP callback functions.
> Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: http://review.gluster.org/12985
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
> Tested-by: Dan Lambright <dlambrig@redhat.com>
Change-Id: I99f8aceec105f16631def06a263f0561954d14b3
BUG: 1293827
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13071
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had run sleep() in the pause tier callback. Blocking within
a synctask is dangerous. The sleep() call does not inform
the synctask scheduler that a thread is no longer running.
It therefore believes it is running. If a second synctask already
exists, it may not be able to run. This occurs if the thread
limit in the pool has been reached.
Note the pool size only grows when a synctask is created, not
when it is moved from wait state to run state, as is the case
when an FOP completes. When the tier is paused during migration,
synctasks already exist waiting for responses to FOPs to the
server with high probability.
The fix is to yield() in the RPC callback, which will place
the synctask into the wait queue and free up a thread for the
FOP callback. A timer wakes the callback after sufficient
time has elapsed for the pause to occur.
This is a backport of 12987.
> Change-Id: I6a947ee04c6e5649946cb6d8207ba17263a67fc6
> BUG: 1267950
> Signed-off-by: Dan Lambright <dlambrig@redhat.com>
> Reviewed-on: http://review.gluster.org/12987
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Change-Id: I9bb7564d57d59abb98e89c8d54c8b1ff4a5fc3f3
BUG: 1274100
Reviewed-on: http://review.gluster.org/13078
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The glusterd process loads the shared libraries of client translators.
This failed for tiering due to a reference to dht_methods which is
defined as a global variable which is not necessary.
The global variable has been removed and this is now a member of
dht_conf and is now initialised in the *_init calls.
> Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953
> Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: http://review.gluster.org/12863
> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
>Tested-by: Dan Lambright <dlambrig@redhat.com>
(cherry picked from commit 96fc7f64da2ef09e82845a7ab97574f511a9aae5)
Change-Id: If3cc908ebfcd1f165504f15db2e3079d97f3132e
BUG: 1288352
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12877
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
back port of http://review.gluster.org/#/c/12376/
After a successful nameless lookup if the directory is not
present on any of the subvol, then we will get the path of
the directory and will recursively send a named lookp on
each parent directory.
This will help particularly for the scenarios like add brick
and attach-tier.
Change-Id: I97f3178adb08b88910f709f0acf1d86c147be017
BUG: 1279095
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12539
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a backport of 12304
Snaps of tiered volumes cannot handle files undergoing migration.
We implement a helper mechanism to "pause" migration. Any files
undergoing migration are aborted. Clean up is done to remove
sticky bits and data at the destination. Migration is restarted
after snap completes.
For testing an internal switch is added. It is not exposed externally.
gluster volume set vol1 tier-pause [true|false]
> Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799
> BUG: 1267950
> Signed-off-by: Dan Lambright <dlambrig@redhat.com>
> Reviewed-on: http://review.gluster.org/12304
> Reviewed-by: N Balachandran <nbalacha@redhat.com>
> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Conflicts:
xlators/mgmt/glusterd/src/glusterd-messages.h
Change-Id: I5f039d8d38a4c915bd873969f336b96755a0b8f1
BUG: 1274101
Reviewed-on: http://review.gluster.org/12411
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backport fix 12039
This fix introduces infrastructure to support different
policies for promotion and demotion.
Currently the tier feature automatically promotes and demotes
files periodically based on access. This is good for testing
but too stringent for most real workloads. It makes it
difficult to fully utilize a hot tier- data will be demoted
before it is touched- its unlikely a 100GB hot SSD will have
all its data touched in a window of time.
A new parameter "mode" allows the user to pick promotion/demotion
polcies.
The "test mode" will be used for *.t and other general testing.
This is the current mechanism.
The "cache mode" introduces watermarks. The watermarks
represent levels of data residing on the hot tier.
"cache mode" policy:
The % the hot tier is full is called P.
Do not promote or demote more than D MB or F files.
A random number [0-100] is called R.
Rules for migration:
if (P < watermark_low) don't demote, always promote.
if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P.
if (P > watermark_hi) always demote, don't promote.
gluster volume set {vol} cluster.watermark-hi %
gluster volume set {vol} cluster.watermark-low %
gluster volume set {vol} cluster.tier-max-mb {D}
gluster volume set {vol} cluster.tier-max-files {F}
gluster volume set {vol} cluster.tier-mode {test|cache}
> Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2
> BUG: 1257911
> Signed-off-by: Dan Lambright <dlambrig@redhat.com>
> Reviewed-on: http://review.gluster.org/12039
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Conflicts:
xlators/cluster/dht/src/dht-rebalance.c
xlators/cluster/dht/src/tier.c
Change-Id: Ibfe6b89563ceab98708325cf5d5ab0997c64816c
BUG: 1270527
Reviewed-on: http://review.gluster.org/12330
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determine which DHT level is responsible for
handling fops on a file undergoing migration based
on the name of the the linkto xattr set on the file
being migrated and process accordingly.
Change-Id: I82772e39314d4fe7f2ba0dcf22de0c6a374ee139
BUG: 1265892
Signed-off-by: N Balachandran <nbalacha@redhat.com>
> Reviewed-on: http://review.gluster.org/12090
> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
(cherry picked from commit 470869a954c17f32a3ba43ccda7442f82c0da6b2)
Reviewed-on: http://review.gluster.org/12224
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
BUG: 1256702
Change-Id: I0795720cb77a9c77e608f34fbb69574fd2acb542
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/11998
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/12024
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Post remove-brick start till commit phase, the client layout
may not be in sync with disk layout because of lack of lookup.
Hence,a create call may fall on the decommissioned brick.
Solution:
Will acquire a lock on hashed subvol. So that a fix-layout or
selfheal can not step on layout while reading the layout.
Even if we read a layout before remove-brick fix-layout and the
file falls on the decommissioned brick, the file should be
migrated to a new brick as per the fix-layout.
BUG: 1256283
Change-Id: I3ef1adaf20dfb9524396a3648d1a664464eda8c1
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/11260
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/12001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
information.
Without refcounting, we might free up memory while other fops are
still accessing it.
BUG: 1235928
Change-Id: Ia4fa4a651cd6fe2394a0c20cef83c8d2cbc8750f
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/11419
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When a file is renamed and the (renamed)file's Hashing
falls into a different brick, DHT creates a special file(linkto file)
in the brick(Hashed subvolume) and carries out setattr operation
on that file.
Currently, Changelog records this(setattr) operation in Hashed
subvolume. glusterfind in turn records this operation
as MODIFY operation.
So, there is a NEW entry in Cached subvolume and MODIFY entry
in Hashed subvolume for the same file.
Solution:
Avoid logging setattr operation carried out, by
marking the operation as internal fop using xdata.
In changelog translator, check whether setattr is set
as internal fop and skip accordingly.
Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd
BUG: 1230687
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/11183
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently with commit 4eaaf5 a mixed version cluster would
have issues if lookup-uhashed is set to auto, as older clients
would fail to validate the layouts if newer clients (i.e 3.7 or
upwards) create directories. Also, in a mixed version cluster
rebalance daemon would set commit hash for some subvolumes and
not for the others.
This commit fixes this problem by moving the enabling of the
functionality introduced in the above mentioned commit to a
new dht option. This option also has a op_version of 3_7_1
thereby preventing it from being set in a mixed version
cluster. It brings in the following changes,
- Option can be set only if min version of the cluster is
3.7.1 or more
- Rebalance and mkdir update the layout with the commit hashes
only if this option is set, hence ensuring rebalance works in a
mixed version cluster, and also directories created by newer
clients do not cause layout errors when read by older clients
- This option also supersedes lookup-unhased, to enable the
optimization for lookups more deterministic and not conflict
with lookup-unhashed settings.
Option added is cluster.lookup-optimize, which is a boolean.
Usage: # gluster volume set VOLNAME cluster.lookup-optimize on
Change-Id: Ifd1d4ce3f6438fcbcd60ffbfdbfb647355ea1ae0
BUG: 1225940
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/10976
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stashing additional information in the inode_ctx to help
decide whether the migration information is stale, which could
happen if a file was migrated several times but FOPs only detected
the P1 migration phase. If no FOP detects the P2 phase, the inode
ctx1 is never reset.
We now save the src subvol as well as the dst subvol in the
inode ctx. The src subvol is the subvol on which the FOP was sent
when the mig info was set in the inode ctx. This information is
considered stale if:
1. The subvol on which the current FOP is sent is the same as
the dst subvol in the ctx
2. The subvol on which the current FOP is sent is not the same
as the src subvol in the ctx
This does not handle the case where the same file might have been
renamed such that the src subvol is the same but the dst subvol
is different. However, that is unlikely to happen very often.
Change-Id: I05a2e9b107ee64750c7ca629aee03b03a02ef75f
BUG: 1225809
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10967
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The destination subvol used in the fop2 variants is either stored in
inode-ctx1 or local->cached_subvol. However, it is not guaranteed that
a value stored in these locations before invocation of fop2 is still
present after the invocation as these locations are shared among
different concurrent operations. So, to preserve the atomicity of
"check dst-subvol and invoke fop2 variant if dst-subvol found", we
pass down the dst-subvol to fop2 variant.
This patch also fixes error handling in some fop2 variants.
Change-Id: Icc226228a246d3f223e3463519736c4495b364d2
BUG: 1225809
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/10966
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the patch http://review.gluster.org/#/c/9657
the client pid set by tiering migration was getting over-
written in dht_start_rebalance_task(). Just corrected it
in dht_setxattr() before calling dht_start_rebalance_task()
and removed it from dht_start_rebalance_task().
> http://review.gluster.org/#/c/10502/
> Cherry picked from commit a5fe0f594d41e1a11661d9074bb19e9c2e2c4776
> Change-Id: I37cfa111f83a4e5d498042575c93799f60b49870
> BUG: 1217937
> Signed-off-by: Joseph Fernandes <josferna@redhat.com>
> Reviewed-on: http://review.gluster.org/10502
> Tested-by: Gluster Build System <jenkins@build.gluster.com>
> Reviewed-by: Susant Palai <spalai@redhat.com>
> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/10502
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Conflicts:
xlators/cluster/dht/src/dht-common.c
xlators/cluster/dht/src/tier.c
Change-Id: Id513114c9a880c6196162dd4b35bbf1155a8cd09
BUG: 1219027
Reviewed-on: http://review.gluster.org/10609
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The key concept here is to determine whether a directory is "clean" by
comparing its last-known-good topology to the current one for the
volume. These are stored as "commit hashes" on the directory and the
volume root respectively. The volume's commit hash changes whenever a
brick is added or removed, and a fix-layout is done. A directory's
commit hash changes only when a full rebalance (not just fix-layout)
is done on it. If all bricks are present and have a directory
commit hash that matches the volume commit hash, then we can assume
that every file is in its "proper" place. Therefore, if we look for
a file in that proper place and don't find it, we can assume it's not
on any other subvolume and *safely* skip the global (broadcast to all)
lookup.
Change-Id: Id6ce4593ba1f7daffa74cfab591cb45960629ae3
BUG: 1220064
Reviewed-on-master: http://review.gluster.org/#/c/7702/
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/10729
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Back port of http://review.gluster.org/10108
These commands work in a manner analagous to rebalancing when removing a
brick. The existing migration daemon detects "detach start" and switches
to moving data off the hot tier. While in this state all lookups are
directed to the cold tier.
gluster v detach-tier <vol> start
gluster v detach-tier <vol> commit
The status and stop cli commands shall be submitted separately.
>Change-Id: I24fda5cc3ba74f5fb8aa9a3234ad51f18b80a8a0
>BUG: 1205540
>Signed-off-by: Dan Lambright <dlambrig@redhat.com>
>Signed-off-by: root <root@localhost.localdomain>
>Signed-off-by: Dan Lambright <dlambrig@redhat.com>
>Reviewed-on: http://review.gluster.org/10108
>Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Change-Id: I212d748d077fb5870ee84b316c653acbafbea3f7
BUG: 1220047
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/10708
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Throttle value will be "normal" by default. For throttling down,
a thread will be put in to sleep. And for throttling up,
gf_defrag_process_dir will wake up the sleeping threads.
Change-Id: I4892ab14982a1ff305aeb2d8bbd33c79d6877b69
BUG: 1219579
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10526
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/10629
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current patch address two part of the design proposed.
1. Rebalance multiple files in parallel
2. Crawl only bricks that belong to the current node
Brief design explanation for the above two points.
1. Rebalance multiple files in parallel:
-------------------------------------
The existing rebalance engine is single threaded. Hence, introduced
multiple threads which will be running parallel to the crawler. The
current rebalance migration is converted to a "Producer-Consumer"
frame work.
Where Producer is : Crawler
Consumer is : Migrating Threads
Crawler: Crawler is the main thread. The job of the crawler is now
limited to fix-layout of each directory and add the files which are
eligible for the migration to a global queue in a round robin manner
so that we will use all the disk resources efficiently. Hence, the
crawler will not be "blocked" by migration process.
Producer: Producer will monitor the global queue. If any file is
added to this queue, it will dqueue that entry and migrate the file.
Currently 20 migration threads are spawned at the beginning of the
rebalance process. Hence, multiple file migration happens in parallel.
2. Crawl only bricks that belong to the current node:
--------------------------------------------------
As rebalance process is spawned per node, it migrates only the files
that belongs to it's own node for the sake of load balancing. But it
also reads entries from the whole cluster, which is not necessary as
readdir hits other nodes.
New Design:
As part of the new design the rebalancer decides the subvols
that are local to the rebalancer node by checking the node-uuid of
root directory prior to the crawler starts. Hence, readdir won't hit
the whole cluster as it has already the context of local subvols and
also node-uuid request for each file can be avoided. This makes the
rebalance process "more scalable".
Change-Id: I6f1b44086a09df8ca23935fd213509c70cc0c050
BUG: 1217381
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10466
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I948f85cb369206ee8ce8b8cd5e48cae9adb971c9
BUG: 1075417
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/9529
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Changed the implementation of marker xattr handling to take just a
function which populates important data that is different from
default 'gauge' values and subvolumes where the call needs to be
wound.
- Removed duplicate code I found while reading the code and moved it to
cluster_marker_unwind. Removed unused structure members.
- Changed dht/afr/stripe implementations to follow the new implementation
- Implemented marker xattr handling for ec.
Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4
BUG: 1200372
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9892
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tier translator shares most of DHT's code. It differs in how
subvolumes are chosen for I/Os, and how file migration (cache promotion
and demotion) is managed. That different functionality is split to either
DHT or tier logic according to the "tier_methods" structure.
A cache promotion and demotion thread is created in a manner
similar to the rebalance daemon. The thread operates a timing
wheel which periodically checks for promotion and demotion candidates
(files). Candidates are queued and then migrated. Candidates must exist on
the same node as the daemon and meet other critera per caching policies.
This patch has two authors (Dan Lambright and Joseph Fernandes). Dan
did the DHT changes and Joe wrote the cache policies. The fix depends on
DHT readidr changes and the database library which have been submitted
separately. Header files in libglusterfs/src/gfdb should be reviewed in
patch 9683.
For more background and design see the feature page [1].
[1]
http://www.gluster.org/community/documentation/index.php/Features/data-classification
Change-Id: Icc26c517ccecf5c42aef039f5b9c6f7afe83e46c
BUG: 1194753
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9724
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
position in the graph rather than relative (local) to a particular
translator.
Encoding the volume in this way allows a single translator to manage
which brick is currently being scanned for directory entries. Using a
single translator minimizes allocated bits in the d_off. It also allows
multiple DHT translators in the same graph to have a common frame of
reference (the graph position) for which brick is being read. Multiple
DHT translators are needed for the Tiering feature.
The fix builds off a previous change (9332) which removed subvolume
encoding from AFR. The fix makes an equivalent change to the EC
translator.
More background can be found in fix 9332 and gluster-dev discussions [1].
DHT and AFR/EC are responsibile (as before) for choosing which brick to
enumerate directory entries in over the readdir lifecycle.
The client translator receiving the readdir fop encodes the dht_t. It
is referred to as the "leaf node" in the graph and corresponds to the
brick being scanned.
When DHT decodes the d_off, it translates the leaf node to a local
subvolume, which represents the next node in the graph leading to
the brick.
Tracking of leaf nodes is done in common utility functions. Leaf nodes
counts and positional information are updated on a graph switch.
[1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8
BUG: 1190734
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9688
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Don't consider "dir-spread-count" option. This option is not
supported.
* Consider transition to weighted to equal distribution or vice-versa
a valid case for fixing the layout.
Change-Id: I0dcfe555dae9269ce20a41611cfdaa4f96c9e98b
BUG: 1196615
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/9809
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht-common.h includes a function definition with "inline", but the
function is not declared in the header. Dropping the "inline" compile
directive so that linking against .o files works correctly.
BUG: 1196650
Change-Id: I105be591125b29cd455769b0c4ff22d6e139227d
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/9760
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current layout heal code assumes layout setting is idempotent. This
allowed multiple concurrent healers to set the layout without any
synchronization. However, this is not the case as different healers
can come up with different layout for same directory and making layout
setting non-idempotent. So, we bring in synchronization among healers
to
1. Not to overwrite an ondisk well-formed layout.
2. Refresh the in-memory layout with the ondisk layout if in-memory
layout needs healing and ondisk layout is well formed.
This patch can synchronize
1. among multiple healers.
2. among multiple fix-layouts (which extends layout to consider
added or removed brick)
3. (but) not between healers and fix-layouts. So, the problem of
in-memory stale layouts (not matching with layout ondisk), is not
_completely_ fixed by this patch.
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Change-Id: Ia285f25e8d043bb3175c61468d0d11090acee539
BUG: 1176008
Reviewed-on: http://review.gluster.org/9302
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rather than distribute.migrate-data and work with CLI.
The change makes this work both when it is internally driven and from the
shell. The problem is further described in bugzilla # 1147107.
Change-Id: I4fe04cae661dca25432530ddf5ac6ff2c957d6b3
BUG: 1147107
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9284
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In directory write FOPs, as far as updates to timestamps associated
with parent by DHT is concerned, there are three possibilities:
a) time (in sec) gotten from child of DHT < time (in sec) in inode ctx
b) time (in sec) gotten from child of DHT = time (in sec) in inode ctx
c) time (in sec) gotten from child of DHT > time (in sec) in inode ctx
In case (c), for time in nsecs, it is the value returned by DHT's child
that must be selected. But what DHT_UPDATE_TIME ends up doing is to choose
the maximum of (time in nsec gotten from DHT's child, time in nsec in inode ctx).
Change-Id: I535a600b9f89b8d9b6714a73476e63ce60e169a8
BUG: 1179169
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/9457
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the rename path, we wind the creation of newname hardlink and
linkto file in dst hashed a the same time. If the linkto creation
fails, but the link creation succeeds, we enter the failure code
and cleanup the created newname hardlink.
In the interim if another client looks up newname and finds it as
a hardlink from FUSE, it could send an unlink for oldname instead
of a rename. This combined with the above cleanup code could end
up losing all the files copies, and thereby losing data.
This fix separates these steps into 2 parts, creating the linkto
first and then the link file, so that post link file creation no
failures would cleanup the newname file. If linkto fails then link
is not attempted, thereby not polluting the name space with
newname.
Change-Id: I61da8e906060da16a31ea1076eec2f01fd617f44
BUG: 1130888
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/8570
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4f243c946f76d440680b651235f925e3d0ebf0fd
BUG: 1130888
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/8523
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I41389ba91951d3e63e617aa32cd0bee848261c72
BUG: 1130888
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/8521
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently whenever dht_lookup_everywhere gets called, if in
dht_lookup_everywhere_cbk, a linkto file is found on non-hashed
subvolume, file is unlinked. But there are cases when this file
is under migration. Under such condition, we should avoid deletion
of file.
When some other rebalance process changes the layout of parent
such that dst_file (w.r.t. migration) falls on non-hashed node,
then may be lookup could have found it as linkto file but just
before unlink, file is under migration or already migrated
In such cased unlink can be avoided.
Race:
-------
If we have two bricks (brick-1 and brick-2) with initial file "a"
under BaseDir which is hashed as well as cached on (brick-1).
Assume "a" hashing gives 44.
Brick-1 Brick-2
Initial Setup: BaseDir/a BaseDir
[1-50] [51-100]
Now add new-brick Brick-3.
1. Rebalance-1 on node Node-1 (Brick-1 node) will reset
the BaseDir Layout.
2. After that it will perform
a) Create linkto file on new-hashed (brick-2)
b) Perform file migration.
1.Rebalance-1 Fixes the base-layout:
Brick-1 Brick-2 Brick-3
--------- ---------- ------------
BaseDir/a BaseDir BaseDir
[1-33] [34-66] [67-100]
2. Only a) is BaseDir/a BaseDir/a(linkto) BaseDir
performed Create linktofile
Now rebalance 2 on node-2 jumped in and it will perform
step 1 and 2-a.
After (rebal-2, step-1), it changes the layout of the BaseDir.
BaseDir/a BaseDir/a(link) BaseDir
[67-100] [1-33] [34-66]
For (rebale-2, step-2), It will perform lookup at Brick-3 as w.r.t new
layout 44 falls for brick-3. But lookup will fail.
So dht_lookup_everywhere gets called.
NOTE: On brick-2 by rebalance-1, a linkto file was created.
Currently that linkto files gets deleted by rebalance-2 lookup as it
is considered as stale linkto file. But with patch if rebalance is
already in progress or rebalance is over, linkto file will not be
unlinked. If rebalance is in progress fd will be open and if rebalance
is over then linkto file wont be set.
Change-Id: I3fee0d28de3c76197325536a9e30099d2413f079
BUG: 1116150
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/8345
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Explanation of Race between rebalance processes:
https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4
STATE 1: BRICK-1
only one brick Cached File
in the system
STATE 2:
Add brick-2 BRICK-1 BRICK-2
STATE 3: Lookup of File on brick-2
by this node's rebalance
will fail because hashed
file is not created yet.
So dht_lookup_everywhere is
about to get called.
STATE 4: As part of lookup
link file at brick-2
will be created.
STATE 5: getxattr to check that
cached file belongs to
this node is done
STATE 6:
dht_lookup_everywhere_cbk detects
the link created by rebalance-1.
It will unlink it.
STATE 7: getxattr at the link
file with "pathinfo" key
will be called will fail
as the link file is deleted
by rebalance on node-2
Fix:
So in the STATE 6, we should avoid the deletion of link file. Every time
dht_lookup_everywhere gets called, lookup will be performed on all the nodes.
So to avoid STATE 6, if linkto file is found, it is not deleted until valid
case is found in dht_lookup_everywhere_done.
Case 1: if linkto file points to cached node, and cached file exists,
uwind with success.
Case 2: if linkto does not point to current cached node, and cached file
exists:
a) Unlink stale link file
b) Create new link file
Case 3: Only linkto file exists:
Delete linkto file
Case 4: Only cached file
Create link file (Handled event without patch)
Case 5: Neither cached nor hashed file is present
Return with ENOENT (handled even without patch)
Change-Id: Ibf53671410d8d613b8e2e7e5d0ec30fc7dcc0298
BUG: 1116150
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/8231
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If two clients try to rename the same file at the same time, we
sometimes end up with *no file at all* in either the old or new
location. That's kind of bad. The culprit seems to be some overly
aggressive cleanup code. AFAICT, based on today's study of the code,
the intent of the changed section is to remove any linkfile we might
have created before the actual rename. However, what we're removing
might not be our extra link. If we're racing with another client that's
also doing a rename, it might be the only remaining link to the user's
data. The solution, which is good enough to pass this test but almost
certainly still not complete, is to be more selective about when we do
this unlink. Now, we only do it if we know that, at some point, we did
in fact create the link without error (notably ENOENT on the source or
EEXIST on the destination) ourselves.
Change-Id: I8d8cce150b6f8b372c9fb813c90be58d69f8eb7b
BUG: 1117851
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/8269
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Calculation of layouts now considers the size of each brick, so that
smaller bricks don't get an "unfair" share of allocations and start
returning ENOSPC while the larger bricks still have plenty of space.
The observation has been made that some clients might get ENOTCONN when
trying to fetch disk-size information, and end up calculating layouts
differently. The following meta-observations can be made.
(1) This scenario is extremely unlikely in configurations with AFR.
(2) The most likely consequence of this scenario is that some files will
be placed sub-optimally by the client with the obsolete (non-weighted)
layout. They'll still be found anyway, so this isn't a show stopper.
(3) Without this patch it's *guaranteed* that some files will be placed
sub-optimally, because any layout that fails to account for brick sizes
is sub-optimal.
(4) We shouldn't be doing fix-layout from two nodes simultaneously
anyway. That's inefficient at best. Any instances of such behavior are
separate bugs, which should be fixed separately.
(5) In the most extreme edge case, two nodes doing weighted and
non-weighted layout fixes could race and end up creating an internally
inconsistent layout. This condition is still transient; it will be
detected and repaired automatically the next time anyone fetches the
layout. (If it's not that's also a preexisting bug that can show up in
other contexts.)
In conclusion, it's not the purpose of this patch to fix bugs elsewhere
in DHT. Its purpose is to make life incrementally better for users who
add new hardware with larger disks etc. than the older equipment. It's
only one part of an ongoing process to improve layout management and
repair, all the way up to support for multiple hash rings or tiering.
Change-Id: I05eb6f9eface9cdaf8622e0260c8c7f29020447f
BUG: 1114680
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/8093
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added a log which logs the new layout which will be used
for the directory self healing
It prints:
a) Subvolume name
b) Error --> Is needed because layout healing depends on
the error and having it in log will help in
debugging
c) Start Starting of the layout range
d) Stop Ending of the layout range
Change-Id: I48c9c697716a899165ed29b737362a75c62e09b3
BUG: 1113066
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/8173
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The function depends on the fact that if quota-deem-statfs option is enabled,
all of the subvolumes send their xdata with quota-deem-statfs flag ON. But,
this may not be true in case of errors in some of the subvolumes.
There is a decision/policy made which assumes quota-deem-statfs to be ON if at
least ONE of the subvolumes sends the flag ON. By this, df reports quota
modified statfs values if *at least ONE* of the bricks sends the
quota-deem-statfs flag ON. This can be visualized with the below "Transition
Diagram/State Machine".
Event: Each Quota deem statfs status from the individual bricks
Action: Decision taken on the calculation of the statvfs received
State: Whether quota deem statfs is ON or OFF (0: OFF, 1: ON)
Input: Event from individual bricks
___ ___
/ \ OFF* / \ (OFF|ON)*
| | | |
\ / ON \ /
-----> 0 ----------------> 1
The below Transition Function depicts the relation between the statfs
calculation based on the events received.
State Event action
-------------------------------------
OFF OFF OFF
OFF ON REPLACE
ON OFF NEGLECT
ON ON COMPARE
Change-Id: I0e8fb7d3945a3ca3dde0bb99de6cd397e27a3162
BUG: 1048786
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/6652
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Currently in the nameless lookup code path, if at the
end of the lookup, even if it detects that layout
anamolies are there, layout healing will not be done as
there is no code to heal it.
So there can be race between mkdir and lookup.
Assume mkdir is going on from some other mount point,
Say, M1. Directories are created on some nodes but layout
is not set yet.
Now from M2, nameless lookup goes, lookup will be success
full as the directory is present on some of the nodes, but
it won't heal layout. Now if create goes after lookup fop,
because layout is absent, file creation will fail.
Fix: Included the code of layout self-heal in the nameless
lookup path. At the end of lookup, layout will be computed
as it would have been in the named lookup, but it will be
set to those node only, where directory is present.
So after that if create fop goes, the probabiliy to get the
subvolume with proper hash-range is high now, so reduces
the race window.
Other: Whenever a directory is created, we have to choose a brick
from which we start allocating layout in a circular fashion.
To calculate this starting brick, I have changed the candidate
from name of the directory to gfid of the directory
But to compute where a given file belongs, we will still
use the name of the file. Hash computed from the name of the
file should belong to any one of the directory-hash-range
Calculation of hash for a file is acting as a consumer and the
setting of directory layout based on gfid is acting as a producer,
which are independent from each other.
Change-Id: I3808c55082cd1b5c72d2c77cbbc063f55aa38bee
BUG: 1095888
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/7493
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I11794eb2adceb88e75864aede450e904431a6273
BUG: 1095888
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/8049
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moved all relevant DHT gf_log calls to the new logging
framework.
Change-Id: I3af3cfe0416e332774a6c4ff6a091d006c400af2
BUG: 1075611
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/7929
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on single brick(non first up subvolume).
Problem: If snapshot is taken, when mkdir has succeeded only on
hashed_subvolume, then after restoring snapshot the directory
is not shown on mount point.
Why: dht_readdirp takes only those directory entries in to
account, which are present on first_up_subvolume. Hence, if the
"hashed subvolume" is not same as first_up_subvolume, it wont be listed
on mount point and also not healed.
Solution:
Case 1: (Rebalance not running)If hashed subvolume is NULL or down then
filter in first_up_subvolume. Other wise the corresponding hashed subvolume
will take care of the directory entry.
Case 2: If readdirp_optimize option is turned on then read from first_up_subvol
Change-Id: Idaad28f1c9f688dbfb1a8a3ab8b244510c02365e
BUG: 1092433
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/7599
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
git@forge.gluster.org:~schafdog/glusterfs-core/osx-glusterfs
Working functionality on MacOSX
- GlusterD (management daemon)
- GlusterCLI (management cli)
- GlusterFS FUSE (using OSXFUSE)
- GlusterNFS (without NLM - issues with rpc.statd)
Change-Id: I20193d3f8904388e47344e523b3787dbeab044ac
BUG: 1089172
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Signed-off-by: Dennis Schafroth <dennis@schafroth.com>
Tested-by: Harshavardhana <harsha@harshavardhana.net>
Tested-by: Dennis Schafroth <dennis@schafroth.com>
Reviewed-on: http://review.gluster.org/7503
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In setattr, the inode times may have been explicitly set "back
in time". In such cases, if the inode ctx times are not force
set, then they continue to be higher and continue serving the
higher/older value in future calls to dht_inode_ctx_time_update()
Change-Id: I9cbfa7cf7c4069b0106d1f462de08c5d59bc91b5
BUG: 1083324
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/7378
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Tested-by: Harshavardhana <harsha@harshavardhana.net>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I98da41342127b1690d887a5bc025e4c9dd504894
BUG: 1038452
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/6435
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shishir Gowda <gowda.shishir@gmail.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When clients refer to a GFID which does not exist, the errno to
be returned in ESTALE (and not ENOENT). Even though ENOENT might
look "proper" most of the time, as the application eventually expects
ENOENT even if a parent directory does not exist, not returning
ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution
in uncached mode. This can result in spurious ENOENTs during
concurrent path modification operations.
Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936
BUG: 1032894
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6318
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
components other than distribute (like marker to exclude linkfiles
from being accounted) also need awareness of what constitutes a
linkfile. Hence its good to separate out this functionality into
core.
Change-Id: Ib944eeacc991bb1de464c9e73ee409fc7a689ff1
BUG: 1022995
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/6152
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
| |
BUG: 764655
Change-Id: I4d18c9a6c00cb4696645fcb437398562f00b9d24
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/6284
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glfs_zerofill() can be potentially called to zero-out entire file and
hence allow for bigger value of length parameter.
Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63
BUG: 1028673
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-on: http://review.gluster.org/6266
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com>
Tested-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* migrate all the fd's on an inode to newer subvol after rebalance
* use the migration in progress flag in inode, so all the operations
on the inode can make use of it
Change-Id: Ib807a46e927a1062688fc15119c916797c52a350
BUG: 1013456
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/5891
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
the specified range. This fop will be useful when a whole file needs to
be initialized with zero (could be useful for zero filled VM disk image
provisioning or during scrubbing of VM disk images).
Client/application can issue this FOP for zeroing out. Gluster server
will zero out required range of bytes ie server offloaded zeroing. In
the absence of this fop, client/application has to repetitively issue
write (zero) fop to the server, which is very inefficient method because
of the overheads involved in RPC calls and acknowledgements.
WRITESAME is a SCSI T10 command that takes a block of data as input and
writes the same data to other blocks and this write is handled
completely within the storage and hence is known as offload . Linux ,now
has support for SCSI WRITESAME command which is exposed to the user in
the form of BLKZEROOUT ioctl. BD Xlator can exploit BLKZEROOUT ioctl to
implement this fop. Thus zeroing out operations can be completely
offloaded to the storage device , making it highly efficient.
The fop takes two arguments offset and size. It zeroes out 'size' number
of bytes in an opened file starting from 'offset' position.
This patch adds zerofill support to the following areas:
- libglusterfs
- io-stats
- performance/md-cache,open-behind
- quota
- cluster/afr,dht,stripe
- rpc/xdr
- protocol/client,server
- io-threads
- marker
- storage/posix
- libgfapi
Client applications can exloit this fop by using glfs_zerofill introduced in
libgfapi.FUSE support to this fop has not been added as there is no system call
for this fop.
Changes from previous version 3:
* Removed redundant memory failure log messages
Changes from previous version 2:
* Rebased and fixed build error
Changes from previous version 1:
* Rebased for latest master
TODO :
* Add zerofill support to trace xlator
* Expose zerofill capability as part of gluster volume info
Here is a performance comparison of server offloaded zeofill vs zeroing
out using repeated writes.
[root@llmvm02 remote]# time ./offloaded aakash-test log 20
real 3m34.155s
user 0m0.018s
sys 0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20
real 4m23.043s
user 0m2.197s
sys 0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;
real 4m28.363s
user 0m0.021s
sys 0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25
real 5m34.278s
user 0m2.957s
sys 0m18.808s
The argument log is a file which we want to set for logging purpose and
the third argument is size in GB .
As we can see there is a performance improvement of around 20% with this
fop.
Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
BUG: 1028673
Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5327
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|