| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With lock-migration, we need to send requests to destination
brick post migration. Once, the source brick marks the lock
structure to be already migrated, the requests will be redirected
to destination brick by dht_lk2/flush2.
Change-Id: I50b14011c5ab68c34826fb7ba7f8c8d42a68ad97
BUG: 1326085
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/13493
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I48c6f9cdda47503615ba65882acd5eedf0a70c89
BUG: 1326085
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/14024
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When we spawn promote and demote thread, query files
are build. And only query file with index 0 is picked for migration
as the first query file. This may not be suitable for scenarios,
where the file in the query are too big to move in the first cycle,
as a result file in the other query files always get missed. We need to
shuffle so that other query files also get a chance.
Fix: Remember the previous first query file and shift it by one index,
before the migration starts.
Change-Id: I704947bcf4bab6b20b1179a6d9ae4a15a3d51bd9
BUG: 1330353
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/14068
Tested-by: Joseph Fernandes
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an ongoing rebalance completion check task been
triggered by dht, there is a possibility of a race
between afr setting subvol as non-readable and dht updates
the cached subvol. In this window a write can fail with EIO.
Change-Id: I42638e6d4104c0dbe893d1bc73e1366188458c5d
BUG: 1329503
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/14049
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I0bbc2c2ef115c78393f6570815a5b80316e7e4be
BUG: 1319992
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/11720
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht_mkdir ()
{
first-hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any
subvol, but we choose first-hashed-subvol randomly");
{
begin:
hashed-subvol = hashed-subvol for "bname" in in-memory
layout of "parent";
hash-range = extract hashe-range from layout of "parent";
ret = mkdir (parent/bname, hashed-subvol, hash-range);
if (ret == "hash-value doesn't fall into layout stored on
the brick (this error is returned by posix-mkdir)")
{
refresh_parent_layout ();
goto begin;
}
}
inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN",
"first-hashed-subvol");
proceed with other parts of dht_mkdir;
}
posix_mkdir (parent/bname, client-hash-range)
{
disk-hash-range = getxattr (parent, "dht-layout-key");
if (disk-hash-range != client-hash-range) {
fail-with-error ("hash-value doesn't fall into layout
stored on the brick");
return 0;
}
continue-with-posix-mkdir;
}
Similar changes need to be done for dentry operations like create,
symlink, link, unlink, rmdir, rename. These will be addressed in
subsequent patches. This patch addresses only mkdir codepath.
This change breaks stripe tests, as on some striped subvols dht layout
xattrs are not set for some reason. This results in failure of
mkdir. Since striped volumes are always created with dht, some tests
associated with stripe also fail. So, I am making following tests
changes (since stripe is out of maintainance):
* modify ./tests/basic/rpc-coverage.t to not to use striped volumes
* mark all (2) tests in tests/bugs/stripe/ as bad tests
Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25
BUG: 1323040
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/13885
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a possibility that while an rmdir is completed on
some non-hashed subvol and proceeding to others, a lookup
selfheal can recreate the same directory on those subvols
for which the rmdir had succeeded. Now the deletion of the
parent directory will fail with an ENOTEMPTY.
To fix this take blocking inodelk on the subvols before
starting rmdir. Selfheal must also take blocking inodelk
before creating the entry.
Change-Id: I168a195c35ac1230ba7124d3b0ca157755b3df96
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/13528
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Spawn a thread for background fix-layout for tier process.
2. Once the fix-layout is completed a marker xttr is set on the root of
volume to mark the completion of the background fixlayout, so that
even if the tier process is spawned again, fixlayout will not be
issued, if it was completed last time.
3. Please note that promotion of legacy files will happen eventually as
the ctr lookup heal in the fixlayout slowly heals the ctr db for legacy
files OR the ctr lookup heal happend due to a name lookup.
4. When a detach tier is successful in evacuation data from hot tier, we remove
the marker xattr is removed. So that next attach tier runs the background
tier fixlayout.
what is remaining ?
1. Instead of clearing the marker xattr of tiering fix layout at the end of detach start
clear it during detach commit. But the issue is detach commit is a glusterd operation
and the volume is not mounted in glusterd.
The reason we want to do it in detach commit is that if the admin wants to attach the
same tier again, then a background fixlayout will be triggered, which would not be needed.
2. Clearing the CTR DB of the cold bricks when there is a detach commit, as it will be having
entries which will be stale when the volume is used, with ctr off (ctr is switched off only when
we have detach commit.)
Change-Id: Ibe343572e95865325cd0eef4d0b976b626a3c0c5
BUG: 1313228
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/13491
Smoke: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Joseph Fernandes
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Directory size is meaningless. Every filesystem has its own
unpredictable way of increasing or decreasing it, based on internal data
structures and even transient conditions. Some filesystems (e.g. ext4)
never decrease it at all. Others (e.g. btrfs) don't even report it.
Very few programs look at it, and those that do are broken.
Unfortunately, one such program is GNU tar, which will complain when it
sees different values because at different times we got the value from
different DHT subvolumes. To avoid such problems, just report a
constant value.
Change-Id: Id64ce917c75b5f7ff50cb55b6e997f3b3556e7e3
BUG: 1302948
Original-author: Shyam <srangana@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13770
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix adds a paramater "tier-max_promote_size" to control wether
a file is migrated or not based on its size. By default the value
is 0, meaning all files are migrated. If set to a non-zero
value, files larger than the parameter won't be moved
in tiered volumes.
Change-Id: Ia6b88e9b2508935bef500d956f9192e59670fe00
BUG: 1313495
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13570
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cli/src/cli-cmd-parser.c (chenk)
cli/src/cli-xml-output.c (spandit)
cli/src/cli.c (chenk)
libglusterfs/src/common-utils.c (vmallika)
libglusterfs/src/gfdb/gfdb_sqlite3.c (jfernand +1)
rpc/rpc-transport/socket/src/socket.c (?)
xlators/cluster/afr/src/afr-transaction.c (?)
xlators/cluster/dht/src/dht-common.h (srangana +2)
xlators/cluster/dht/src/dht-selfheal.c (srangana +2)
xlators/debug/io-stats/src/io-stats.c (R. Wareing)
xlators/features/barrier/src/barrier.c (vshastry)
xlators/features/bit-rot/src/bitd/bit-rot-scrub.h (vshankar +1)
xlators/features/shard/src/shard.c (kdhananj +1)
xlators/mgmt/glusterd/src/glusterd-ganesha.c (skoduri)
xlators/mgmt/glusterd/src/glusterd-handler.c (atinmu)
xlators/mgmt/glusterd/src/glusterd-op-sm.h (atinmu)
xlators/mgmt/glusterd/src/glusterd-snapshot.c (spandit)
xlators/mgmt/glusterd/src/glusterd-syncop.c (atinmu)
xlators/mgmt/glusterd/src/glusterd-volgen.c (atinmu)
xlators/protocol/client/src/client-messages.h (mselvaga +1)
xlators/storage/bd/src/bd-helper.c (M. Mohan Kumar)
xlators/storage/bd/src/bd.c (M. Mohan Kumar)
xlators/storage/posix/src/posix.c (nbalacha +1)
Change-Id: I85934fbcaf485932136ef3acd206f6ebecde61dd
BUG: 1293133
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/13031
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A query to the database may take a long time if the database
has many entries. The tier daemon also sends IPC calls to the
bricks which can run slowly, espcially in RHEL6. While it is
possible to track down each such instance, the snapshot
feature should not be affected by database operations. It requires
no migration be underway. Therefore it is okay to pause tiering
at any time except when DHT is moving a file. This fix implements
this strategy by monitoring when control passes to DHT to
migrate a file using the GF_XATTR_FILE_MIGRATE_KEY trigger. If it
is not, the pause operation is successful.
Change-Id: I21f168b1bd424077ad5f38cf82f794060a1fabf6
BUG: 1287842
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13104
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What:
If dht_open is called on a migrating file after the inode_ctx is set,
subsequent FOPs on that fd do not open the fd on the dst subvol.
This is seen when the open-ftruncate-close sequence is repeatedly
called on a migrating file.
A second call to the sequence described above causes dht_truncate_cbk
to call dht_truncate2 as the dht_inode_ctx was already set by the first
call. As dht_rebalance_in_progress_check is not called, the fd is not
opened on the dst subvol.
On a distributed-replicate volume, this causes AFR to
open the fd using afr_fix_open, but with the wrong flags, causing
posix_ftruncate to fail with EINVAL.
The fix: We require fd specific information to make a decision while
handling migrating files.
Set the fd_ctx to indicate the fd has been opened on the dst subvol
and check if it has been set while processing Phase1/Phase2 checks
in the FOP callback functions.
Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14
BUG: 1284823
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12985
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had run sleep() in the pause tier callback. Blocking within
a synctask is dangerous. The sleep() call does not inform
the synctask scheduler that a thread is no longer running.
It therefore believes it is running. If a second synctask already
exists, it may not be able to run. This occurs if the thread
limit in the pool has been reached.
Note the pool size only grows when a synctask is created, not
when it is moved from wait state to run state, as is the case
when an FOP completes. When the tier is paused during migration,
synctasks already exist waiting for responses to FOPs to the
server with high probability.
The fix is to yield() in the RPC callback, which will place
the synctask into the wait queue and free up a thread for the
FOP callback. A timer wakes the callback after sufficient
time has elapsed for the pause to occur.
Change-Id: I6a947ee04c6e5649946cb6d8207ba17263a67fc6
BUG: 1267950
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12987
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
files deleted during promotion were not deleting as the
files are moving from hashed to non-hashed.
On deleting a file that is undergoing promotion,
the unlink call is not sent to the dst file as the
hashed subvol == cached subvol. This causes
the file to reappear once the migration is complete.
This patch also fixes a problem with stale linkfile
deleting.
Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5
BUG: 1282390
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12829
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd occasionally loads shared libraries of translators. This
failed for tiering due to a reference to dht_methods which is defined
as a global variable which is not necessary.
The global variable has been removed and this is now a member of
dht_conf and is now initialised in the *_init calls.
Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953
BUG: 1287842
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12863
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After a successful nameless lookup if the directory is not
present on any of the subvol, then we will get the path of
the directory and will recursively send a named lookp on
each parent directory.
This will help particularly for the scenarios like add brick
and attach-tier.
Change-Id: I64c2118a5ab03bbaa59b0dfc62babdf4472a92a3
BUG: 1272949
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12376
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Snaps of tiered volumes cannot handle files undergoing migration.
We implement a helper mechanism to "pause" migration. Any files
undergoing migration are aborted. Clean up is done to remove
sticky bits and data at the destination. Migration is restarted
after snap completes.
For testing an internal switch is added. It is not exposed externally.
gluster volume set vol1 tier-pause [true|false]
Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799
BUG: 1267950
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12304
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix introduces infrastructure to support different
policies for promotion and demotion.
Currently the tier feature automatically promotes and demotes
files periodically based on access. This is good for testing
but too stringent for most real workloads. It makes it
difficult to fully utilize a hot tier- data will be demoted
before it is touched- its unlikely a 100GB hot SSD will have
all its data touched in a window of time.
A new parameter "mode" allows the user to pick promotion/demotion
polcies.
The "test mode" will be used for *.t and other general testing.
This is the current mechanism.
The "cache mode" introduces watermarks. The watermarks
represent levels of data residing on the hot tier.
"cache mode" policy:
The % the hot tier is full is called P.
Do not promote or demote more than D MB or F files.
A random number [0-100] is called R.
Rules for migration:
if (P < watermark_low) don't demote, always promote.
if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P.
if (P > watermark_hi) always demote, don't promote.
gluster volume set {vol} cluster.watermark-hi %
gluster volume set {vol} cluster.watermark-low %
gluster volume set {vol} cluster.tier-max-mb {D}
gluster volume set {vol} cluster.tier-max-files {F}
gluster volume set {vol} cluster.tier-mode {test|cache}
Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2
BUG: 1257911
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12039
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Determine which DHT level is responsible for
handling fops on a file undergoing migration based
on the name of the the linkto xattr set on the file
being migrated and process accordingly.
Change-Id: I82772e39314d4fe7f2ba0dcf22de0c6a374ee139
BUG: 1254428
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12090
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lookup selfheal race
Locking on all subvols before an rmdir is unable to remove all
directory entries. Hence reverting the patch for now.
Change-Id: I31baf2b2fa2f62c57429cd44f3f229c35eff1939
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/12125
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a possibility that while an rmdir is completed on
some non-hashed subvol and proceeding to others. A lookup
selfheal can recreate the same directory on those subvols
for which the rmdir had succeeded. The fix is to take a
blocking inodelk on the subvols before starting rmdir.
Since selfheal requires lock on all subvols, if an rmdir
is in progess acquiring locks will fail and vice versa.
Change-Id: I841a44758c3b88f5e04d1cb73ad36e0cac9fdabb
BUG: 1245065
Signed-off-by: Sakshi <sabansal@redhat.com>
Reviewed-on: http://review.gluster.org/11725
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I8c39ce38e257758e27e11ccaaff4798138203e0c
BUG: 1256243
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/11998
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Post remove-brick start till commit phase, the client layout
may not be in sync with disk layout because of lack of lookup.
Hence,a create call may fall on the decommissioned brick.
Solution:
Will acquire a lock on hashed subvol. So that a fix-layout or
selfheal can not step on layout while reading the layout.
Even if we read a layout before remove-brick fix-layout and the
file falls on the decommissioned brick, the file should be
migrated to a new brick as per the fix-layout.
Change-Id: If84a12ec34f981adb2b9b224e80f535cfe5bf9f2
BUG: 1232378
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/11260
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
information.
Without refcounting, we might free up memory while other fops are
still accessing it.
BUG: 1235927
Change-Id: Ia4fa4a651cd6fe2394a0c20cef83c8d2cbc8750f
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/11418
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
When a file is renamed and the (renamed)file's Hashing
falls into a different brick, DHT creates a special file(linkto file)
in the brick(Hashed subvolume) and carries out setattr operation
on that file.
Currently, Changelog records this(setattr) operation in Hashed
subvolume. glusterfind in turn records this operation
as MODIFY operation.
So, there is a NEW entry in Cached subvolume and MODIFY entry
in Hashed subvolume for the same file.
Solution:
Avoid logging setattr operation carried out, by
marking the operation as internal fop using xdata.
In changelog translator, check whether setattr is set
as internal fop and skip accordingly.
Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd
BUG: 1230007
Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com>
Reviewed-on: http://review.gluster.org/11137
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stashing additional information in the inode_ctx to help
decide whether the migration information is stale, which could
happen if a file was migrated several times but FOPs only detected
the P1 migration phase. If no FOP detects the P2 phase, the inode
ctx1 is never reset.
We now save the src subvol as well as the dst subvol in the
inode ctx. The src subvol is the subvol on which the FOP was sent
when the mig info was set in the inode ctx. This information is
considered stale if:
1. The subvol on which the current FOP is sent is the same as
the dst subvol in the ctx
2. The subvol on which the current FOP is sent is not the same
as the src subvol in the ctx
This does not handle the case where the same file might have been
renamed such that the src subvol is the same but the dst subvol
is different. However, that is unlikely to happen very often.
Change-Id: I05a2e9b107ee64750c7ca629aee03b03a02ef75f
BUG: 1142423
Signed-off-by: Nithya Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/10834
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The destination subvol used in the fop2 variants is either stored in
inode-ctx1 or local->cached_subvol. However, it is not guaranteed that
a value stored in these locations before invocation of fop2 is still
present after the invocation as these locations are shared among
different concurrent operations. So, to preserve the atomicity of
"check dst-subvol and invoke fop2 variant if dst-subvol found", we
pass down the dst-subvol to fop2 variant.
This patch also fixes error handling in some fop2 variants.
Change-Id: Icc226228a246d3f223e3463519736c4495b364d2
BUG: 1142423
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/10943
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently with commit 4eaaf5 a mixed version cluster would
have issues if lookup-uhashed is set to auto, as older clients
would fail to validate the layouts if newer clients (i.e 3.7 or
upwards) create directories. Also, in a mixed version cluster
rebalance daemon would set commit hash for some subvolumes and
not for the others.
This commit fixes this problem by moving the enabling of the
functionality introduced in the above mentioned commit to a
new dht option. This option also has a op_version of 3_7_1
thereby preventing it from being set in a mixed version
cluster. It brings in the following changes,
- Option can be set only if min version of the cluster is
3.7.1 or more
- Rebalance and mkdir update the layout with the commit hashes
only if this option is set, hence ensuring rebalance works in a
mixed version cluster, and also directories created by newer
clients do not cause layout errors when read by older clients
- This option also supersedes lookup-unhased, to enable the
optimization for lookups more deterministic and not conflict
with lookup-unhashed settings.
Option added is cluster.lookup-optimize, which is a boolean.
Usage: # gluster volume set VOLNAME cluster.lookup-optimize on
Change-Id: Ifd1d4ce3f6438fcbcd60ffbfdbfb647355ea1ae0
BUG: 1222126
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/10797
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Kaushal M <kaushal@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of including config.h in each file, and have the additional
config.h included from the compiler commandline (-include option).
When a .c file tests for a certain #define, and config.h was not
included, incorrect assumtions were made. With this change, it can not
happen again.
BUG: 1222319
Change-Id: I4f9097b8740b81ecfe8b218d52ca50361f74cb64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10808
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The key concept here is to determine whether a directory is "clean" by
comparing its last-known-good topology to the current one for the
volume. These are stored as "commit hashes" on the directory and the
volume root respectively. The volume's commit hash changes whenever a
brick is added or removed, and a fix-layout is done. A directory's
commit hash changes only when a full rebalance (not just fix-layout)
is done on it. If all bricks are present and have a directory
commit hash that matches the volume commit hash, then we can assume
that every file is in its "proper" place. Therefore, if we look for
a file in that proper place and don't find it, we can assume it's not
on any other subvolume and *safely* skip the global (broadcast to all)
lookup.
Change-Id: Id6ce4593ba1f7daffa74cfab591cb45960629ae3
BUG: 1219637
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/7702
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Throttle value will be "normal" by default. For throttling down,
a thread will be put in to sleep. And for throttling up,
gf_defrag_process_dir will wake up the sleeping threads.
Change-Id: I74d530e3effd6e60e6eec81ccc8ff65789fa9c13
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/10526
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the patch http://review.gluster.org/#/c/9657
the client pid set by tiering migration was getting over-
written in dht_start_rebalance_task(). Just corrected it
in dht_setxattr() before calling dht_start_rebalance_task()
and removed it from dht_start_rebalance_task().
Change-Id: I37cfa111f83a4e5d498042575c93799f60b49870
BUG: 1217937
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/10502
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Susant Palai <spalai@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current patch address two part of the design proposed.
1. Rebalance multiple files in parallel
2. Crawl only bricks that belong to the current node
Brief design explanation for the above two points.
1. Rebalance multiple files in parallel:
-------------------------------------
The existing rebalance engine is single threaded. Hence, introduced
multiple threads which will be running parallel to the crawler. The
current rebalance migration is converted to a "Producer-Consumer"
frame work.
Where Producer is : Crawler
Consumer is : Migrating Threads
Crawler: Crawler is the main thread. The job of the crawler is now
limited to fix-layout of each directory and add the files which are
eligible for the migration to a global queue in a round robin manner
so that we will use all the disk resources efficiently. Hence, the
crawler will not be "blocked" by migration process.
Producer: Producer will monitor the global queue. If any file is
added to this queue, it will dqueue that entry and migrate the file.
Currently 20 migration threads are spawned at the beginning of the
rebalance process. Hence, multiple file migration happens in parallel.
2. Crawl only bricks that belong to the current node:
--------------------------------------------------
As rebalance process is spawned per node, it migrates only the files
that belongs to it's own node for the sake of load balancing. But it
also reads entries from the whole cluster, which is not necessary as
readdir hits other nodes.
New Design:
As part of the new design the rebalancer decides the subvols
that are local to the rebalancer node by checking the node-uuid of
root directory prior to the crawler starts. Hence, readdir won't hit
the whole cluster as it has already the context of local subvols and
also node-uuid request for each file can be avoided. This makes the
rebalance process "more scalable".
Change-Id: I73ed6ff807adea15086eabbb8d9883e88571ebc1
BUG: 1171954
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/9657
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These commands work in a manner analagous to rebalancing when removing a
brick. The existing migration daemon detects "detach start" and switches
to moving data off the hot tier. While in this state all lookups are
directed to the cold tier.
gluster v detach-tier <vol> start
gluster v detach-tier <vol> commit
The status and stop cli commands shall be submitted separately.
Change-Id: I24fda5cc3ba74f5fb8aa9a3234ad51f18b80a8a0
BUG: 1205540
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: root <root@localhost.localdomain>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/10108
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: NetBSD Build System
|
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I948f85cb369206ee8ce8b8cd5e48cae9adb971c9
BUG: 1075417
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Reviewed-on: http://review.gluster.org/9529
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Changed the implementation of marker xattr handling to take just a
function which populates important data that is different from
default 'gauge' values and subvolumes where the call needs to be
wound.
- Removed duplicate code I found while reading the code and moved it to
cluster_marker_unwind. Removed unused structure members.
- Changed dht/afr/stripe implementations to follow the new implementation
- Implemented marker xattr handling for ec.
Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4
BUG: 1200372
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9892
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tier translator shares most of DHT's code. It differs in how
subvolumes are chosen for I/Os, and how file migration (cache promotion
and demotion) is managed. That different functionality is split to either
DHT or tier logic according to the "tier_methods" structure.
A cache promotion and demotion thread is created in a manner
similar to the rebalance daemon. The thread operates a timing
wheel which periodically checks for promotion and demotion candidates
(files). Candidates are queued and then migrated. Candidates must exist on
the same node as the daemon and meet other critera per caching policies.
This patch has two authors (Dan Lambright and Joseph Fernandes). Dan
did the DHT changes and Joe wrote the cache policies. The fix depends on
DHT readidr changes and the database library which have been submitted
separately. Header files in libglusterfs/src/gfdb should be reviewed in
patch 9683.
For more background and design see the feature page [1].
[1]
http://www.gluster.org/community/documentation/index.php/Features/data-classification
Change-Id: Icc26c517ccecf5c42aef039f5b9c6f7afe83e46c
BUG: 1194753
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9724
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
position in the graph rather than relative (local) to a particular
translator.
Encoding the volume in this way allows a single translator to manage
which brick is currently being scanned for directory entries. Using a
single translator minimizes allocated bits in the d_off. It also allows
multiple DHT translators in the same graph to have a common frame of
reference (the graph position) for which brick is being read. Multiple
DHT translators are needed for the Tiering feature.
The fix builds off a previous change (9332) which removed subvolume
encoding from AFR. The fix makes an equivalent change to the EC
translator.
More background can be found in fix 9332 and gluster-dev discussions [1].
DHT and AFR/EC are responsibile (as before) for choosing which brick to
enumerate directory entries in over the readdir lifecycle.
The client translator receiving the readdir fop encodes the dht_t. It
is referred to as the "leaf node" in the graph and corresponds to the
brick being scanned.
When DHT decodes the d_off, it translates the leaf node to a local
subvolume, which represents the next node in the graph leading to
the brick.
Tracking of leaf nodes is done in common utility functions. Leaf nodes
counts and positional information are updated on a graph switch.
[1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8
BUG: 1190734
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9688
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Don't consider "dir-spread-count" option. This option is not
supported.
* Consider transition to weighted to equal distribution or vice-versa
a valid case for fixing the layout.
Change-Id: I0dcfe555dae9269ce20a41611cfdaa4f96c9e98b
BUG: 1196615
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/9809
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dht-common.h includes a function definition with "inline", but the
function is not declared in the header. Dropping the "inline" compile
directive so that linking against .o files works correctly.
BUG: 1196650
Change-Id: I105be591125b29cd455769b0c4ff22d6e139227d
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/9760
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current layout heal code assumes layout setting is idempotent. This
allowed multiple concurrent healers to set the layout without any
synchronization. However, this is not the case as different healers
can come up with different layout for same directory and making layout
setting non-idempotent. So, we bring in synchronization among healers
to
1. Not to overwrite an ondisk well-formed layout.
2. Refresh the in-memory layout with the ondisk layout if in-memory
layout needs healing and ondisk layout is well formed.
This patch can synchronize
1. among multiple healers.
2. among multiple fix-layouts (which extends layout to consider
added or removed brick)
3. (but) not between healers and fix-layouts. So, the problem of
in-memory stale layouts (not matching with layout ondisk), is not
_completely_ fixed by this patch.
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Change-Id: Ia285f25e8d043bb3175c61468d0d11090acee539
BUG: 1176008
Reviewed-on: http://review.gluster.org/9302
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rather than distribute.migrate-data and work with CLI.
The change makes this work both when it is internally driven and from the
shell. The problem is further described in bugzilla # 1147107.
Change-Id: I4fe04cae661dca25432530ddf5ac6ff2c957d6b3
BUG: 1147107
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9284
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In directory write FOPs, as far as updates to timestamps associated
with parent by DHT is concerned, there are three possibilities:
a) time (in sec) gotten from child of DHT < time (in sec) in inode ctx
b) time (in sec) gotten from child of DHT = time (in sec) in inode ctx
c) time (in sec) gotten from child of DHT > time (in sec) in inode ctx
In case (c), for time in nsecs, it is the value returned by DHT's child
that must be selected. But what DHT_UPDATE_TIME ends up doing is to choose
the maximum of (time in nsec gotten from DHT's child, time in nsec in inode ctx).
Change-Id: I535a600b9f89b8d9b6714a73476e63ce60e169a8
BUG: 1179169
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: http://review.gluster.org/9457
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the rename path, we wind the creation of newname hardlink and
linkto file in dst hashed a the same time. If the linkto creation
fails, but the link creation succeeds, we enter the failure code
and cleanup the created newname hardlink.
In the interim if another client looks up newname and finds it as
a hardlink from FUSE, it could send an unlink for oldname instead
of a rename. This combined with the above cleanup code could end
up losing all the files copies, and thereby losing data.
This fix separates these steps into 2 parts, creating the linkto
first and then the link file, so that post link file creation no
failures would cleanup the newname file. If linkto fails then link
is not attempted, thereby not polluting the name space with
newname.
Change-Id: I61da8e906060da16a31ea1076eec2f01fd617f44
BUG: 1130888
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/8570
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I4f243c946f76d440680b651235f925e3d0ebf0fd
BUG: 1130888
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/8523
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: I41389ba91951d3e63e617aa32cd0bee848261c72
BUG: 1130888
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/8521
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently whenever dht_lookup_everywhere gets called, if in
dht_lookup_everywhere_cbk, a linkto file is found on non-hashed
subvolume, file is unlinked. But there are cases when this file
is under migration. Under such condition, we should avoid deletion
of file.
When some other rebalance process changes the layout of parent
such that dst_file (w.r.t. migration) falls on non-hashed node,
then may be lookup could have found it as linkto file but just
before unlink, file is under migration or already migrated
In such cased unlink can be avoided.
Race:
-------
If we have two bricks (brick-1 and brick-2) with initial file "a"
under BaseDir which is hashed as well as cached on (brick-1).
Assume "a" hashing gives 44.
Brick-1 Brick-2
Initial Setup: BaseDir/a BaseDir
[1-50] [51-100]
Now add new-brick Brick-3.
1. Rebalance-1 on node Node-1 (Brick-1 node) will reset
the BaseDir Layout.
2. After that it will perform
a) Create linkto file on new-hashed (brick-2)
b) Perform file migration.
1.Rebalance-1 Fixes the base-layout:
Brick-1 Brick-2 Brick-3
--------- ---------- ------------
BaseDir/a BaseDir BaseDir
[1-33] [34-66] [67-100]
2. Only a) is BaseDir/a BaseDir/a(linkto) BaseDir
performed Create linktofile
Now rebalance 2 on node-2 jumped in and it will perform
step 1 and 2-a.
After (rebal-2, step-1), it changes the layout of the BaseDir.
BaseDir/a BaseDir/a(link) BaseDir
[67-100] [1-33] [34-66]
For (rebale-2, step-2), It will perform lookup at Brick-3 as w.r.t new
layout 44 falls for brick-3. But lookup will fail.
So dht_lookup_everywhere gets called.
NOTE: On brick-2 by rebalance-1, a linkto file was created.
Currently that linkto files gets deleted by rebalance-2 lookup as it
is considered as stale linkto file. But with patch if rebalance is
already in progress or rebalance is over, linkto file will not be
unlinked. If rebalance is in progress fd will be open and if rebalance
is over then linkto file wont be set.
Change-Id: I3fee0d28de3c76197325536a9e30099d2413f079
BUG: 1116150
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/8345
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Explanation of Race between rebalance processes:
https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4
STATE 1: BRICK-1
only one brick Cached File
in the system
STATE 2:
Add brick-2 BRICK-1 BRICK-2
STATE 3: Lookup of File on brick-2
by this node's rebalance
will fail because hashed
file is not created yet.
So dht_lookup_everywhere is
about to get called.
STATE 4: As part of lookup
link file at brick-2
will be created.
STATE 5: getxattr to check that
cached file belongs to
this node is done
STATE 6:
dht_lookup_everywhere_cbk detects
the link created by rebalance-1.
It will unlink it.
STATE 7: getxattr at the link
file with "pathinfo" key
will be called will fail
as the link file is deleted
by rebalance on node-2
Fix:
So in the STATE 6, we should avoid the deletion of link file. Every time
dht_lookup_everywhere gets called, lookup will be performed on all the nodes.
So to avoid STATE 6, if linkto file is found, it is not deleted until valid
case is found in dht_lookup_everywhere_done.
Case 1: if linkto file points to cached node, and cached file exists,
uwind with success.
Case 2: if linkto does not point to current cached node, and cached file
exists:
a) Unlink stale link file
b) Create new link file
Case 3: Only linkto file exists:
Delete linkto file
Case 4: Only cached file
Create link file (Handled event without patch)
Case 5: Neither cached nor hashed file is present
Return with ENOENT (handled even without patch)
Change-Id: Ibf53671410d8d613b8e2e7e5d0ec30fc7dcc0298
BUG: 1116150
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/8231
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If two clients try to rename the same file at the same time, we
sometimes end up with *no file at all* in either the old or new
location. That's kind of bad. The culprit seems to be some overly
aggressive cleanup code. AFAICT, based on today's study of the code,
the intent of the changed section is to remove any linkfile we might
have created before the actual rename. However, what we're removing
might not be our extra link. If we're racing with another client that's
also doing a rename, it might be the only remaining link to the user's
data. The solution, which is good enough to pass this test but almost
certainly still not complete, is to be more selective about when we do
this unlink. Now, we only do it if we know that, at some point, we did
in fact create the link without error (notably ENOENT on the source or
EEXIST on the destination) ourselves.
Change-Id: I8d8cce150b6f8b372c9fb813c90be58d69f8eb7b
BUG: 1117851
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/8269
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
|