| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Break tier_migrate_using_query_file() into a more manageable
tier_migrate_link() and helpers.
Change-Id: I5eb2d2cff9e7a2a2da567c3c4c2d53aab195f477
BUG: 1358296
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/14957
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Demote files on priority if hi-watermark has been breached and continue
to demote until the watermark drops below hi-watermark.
Monitor watermark more frequently.
Trigger demotion as soon as hi-watermark is breached.
Add cluster.tier-emergency-demote-query-limit option to limit number
of files returned from the database query for every iteration of
tier_migrate_using_query_file(). If watermark hasn't dropped below
hi-watermark during the first iteration, the next iteration will be
triggered approximately 1 second after tier_demote() returns to the
main tiering loop.
Update changetimerecorder xlator to handle query for emergency demote
mode.
Add tier-ctr-interface.h:
Move tier and ctr interface specific macros and struct definition from
libglusterfs/src/gfdb/gfdb_data_store.h to new header
libglusterfs/src/tier-ctr-interface.h
Change-Id: If56af78c6c81d37529b9b6e65ae606ba5c99a811
BUG: 1366648
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/15158
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
http://review.gluster.org/14085 fixes a "pragma leak" where the
generated rpc/xdr headers have a pair of pragmas that disable these
warnings. With the warnings disabled, many unused variables have
crept into the code base.
And 14085 won't pass its own smoke test until all these warnings are
fixed.
BUG: 1369124
Change-Id: I6feee863927254ae9f59013112bcc297ad09e89b
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/15477
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add events for:
* tier attach and detach
* tier pause and resume
* tier rising and dropping hi and lo watermarks
Update eventskeygen.py with tiering events.
Update cli help with:
* attach: add optional force argument
* detach: make force available as non-optional argument on its own
Change-Id: I43990d3a8742151a4a7889bafa19cb572fe661bd
BUG: 1368336
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/15232
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: As metadata in the database fills up, querying the database
take a long time. As a result, tier migration slows down. To
counteract this, we added a way to enable the compaction methods of
the underlying database. The goal is to reduce the size of the
underlying file by eliminating database fragmentation.
NOTE: There is currently a bug where sometimes a brick will
attempt to activate compaction. This happens even compaction is already
turned on.
The cause is narrowed down to the compact_mode_switch flipping its value.
Changes: libglusterfs/src/gfdb - Added a gfdb function to compact the
underlying database, compact_db() This is a no-op if the database has
no such option.
- Added a compaction function for SQLite3 that does the following
1) Changes the auto_vacuum pragma of the database
2) Compacts the database according to the type of compaction requested
- Compaction type can be changed by changing the macro
GF_SQL_COMPACT_DEF to one of the 4 compaction types in
gfdb_sqlite3.h
It is currently set to GF_SQL_COMPACT_INCR, or incremental
vacuuming.
xlators/cluster/dht/src - Added the following command-line option to
enable SQLite3 compaction.
gluster volume set <vol-name> tier-compact <off|on>
- Added the following command-line option to change the frequency the
hot and cold tier are ordered to compact.
gluster volume set <vol-name> tier-hot-compact-frequency <int>
gluster volume set <vol-name> tier-cold-compact-frequency <int>
- tier daemon periodically sends the (new)
GFDB_IPC_CTR_SET_COMPACT_PRAGMA IPC to the CTR xlator. The IPC
triggers compaction of the database.
The inputs are both gf_boolean_t.
IPC Input:
compact_active: Is compaction currently on for the db.
compact_mode_switched: Did we flip the compaction switch recently?
IPC Output:
0 if the compaction succeeds.
Non-zero otherwise.
xlators/features/changetimerecorder/src/ - When the CTR gets the
compaction IPC, it launches a thread that will perform the
compaction. The IPC ends after the thread is launched. To avoid extra
allocations, the parameters are passed using static variables.
Change-Id: I5e1433becb9eeff2afe8dcb4a5798977bf5ba0dd
Signed-off-by: Diogenes Nunez <dnunez@redhat.com>
Reviewed-on: http://review.gluster.org/15031
Reviewed-by: Milind Changire <mchangir@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
http://review.gluster.org/14085 fixes a/the "leak" - via the
generated rpc/xdr headers - of pragmas that mask these warnings.
However 14085 won't pass the smoke test until all the warnings are
fixed.
Change-Id: I367a737570dd7d2f6cc25f4bf4299d31bb6826aa
BUG: 1369124
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/15242
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add test to fail promotion if estimated block consumption grows
beyond hi watermark.
Skip file migrations until next cycle if tier_get_fs_stat() fails
in tier_migrate_using_query_file()
Change-Id: Ice04572fa739c09109c4433e65965197482a7beb
BUG: 1349284
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: http://review.gluster.org/14780
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Return the correct size of the tiered volume in statfs. It should
be the size of the cold tier, not the sum of the hot and cold tier,
because the hot tier is a cache and not an extension of the volume's
capacity. The number of free blocks, etc is the cold tier's capacity
subtracted by the sum of utilization on the hot and cold tiers. Note
if both tiers are part of the same file system this must be accounted
for as well.
The patch also fixes a pre-existing bug in the DHT/tier
translators. If statfs was taken on a file, the code only calculated
free space on the cached subvolume, not all subvolumes in the replica
group. With the fix, this is corrected, except in the case
where quota is used with the deem-statfs option set to "on".
Change-Id: I2b8bcb4511edf83f12130960aad0a609fcf8f513
BUG: 1339689
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/14536
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "max cycle time" log message was incorrectly logged as
an error. Downgrade it to INFO.
Change-Id: Ia7d074423019fa79443bc6ea694148b7b8da455d
BUG: 1335973
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/14336
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: N Balachandran <nbalacha@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this patch rearrange the code, to add some defence functionality for
pthread_create(), i.e. only on a success on pthread_create() call
pthread_join().
Change-Id: I0836bc950a210574cfdc755a666c6ac5df6ab430
BUG: 1332219
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-on: http://review.gluster.org/14152
Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When we spawn promote and demote thread, query files
are build. And only query file with index 0 is picked for migration
as the first query file. This may not be suitable for scenarios,
where the file in the query are too big to move in the first cycle,
as a result file in the other query files always get missed. We need to
shuffle so that other query files also get a chance.
Fix: Remember the previous first query file and shift it by one index,
before the migration starts.
Change-Id: I704947bcf4bab6b20b1179a6d9ae4a15a3d51bd9
BUG: 1330353
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/14068
Tested-by: Joseph Fernandes
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix adds a paramater "tier-max_promote_size" to control wether
a file is migrated or not based on its size. By default the value
is 0, meaning all files are migrated. If set to a non-zero
value, files larger than the parameter won't be moved
in tiered volumes.
Change-Id: Ia6b88e9b2508935bef500d956f9192e59670fe00
BUG: 1313495
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13570
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If there are large number of files to be migrated
and by this time if the volume goes down, then the tierd
has to be stopped. But on a huge query file list it keeps
checking for each file before stopping. If the volume comes
up before the old tierd dies then due to the
presence of old tierd new one won't be created. After
the old one completes the task, it dies and the status
ends up as failed.
This patch will check if the status is still running and then let
it continue its work. Else it will stop running the tierd.
Change-Id: I6522a4e2919e84bf502b99b13873795b9274f3cd
BUG: 1315659
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/13646
Tested-by: Dan Lambright <dlambrig@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the parent GFID is a stale entry, the lookup on this parent
fails and this in turn fails the demotion process.
This patch will make the stale entry error to be skipped.
Situation for pargfid to be stale:
Consider a folder from a tar file. Once the tar file is untared
the files in the tar-file will start to demote.
when the demotion is under progress, if we tend to delete
the actual folder, then the files under it which are
undergoing demotion will do a lookup on the parent which was
deleted and become stale entry. This stale entry fails the
Lookup and this will fail the demotion of the other files(not from
tar) that are supposed to be demoted.
Change-Id: I3d47c32c4077526d477a25912b0135bab98b23fc
BUG: 1311178
Signed-off-by: hari gowtham <hgowtham@redhat.com>
Reviewed-on: http://review.gluster.org/13501
Tested-by: hari gowtham <hari.gowtham005@gmail.com>
Smoke: Gluster Build System <jenkins@build.gluster.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently a main loop in tiering spawns promotion and demotion threads,
and does a join to wait for them to complete. When one of the two
threads takes a long time, the main thread waits for it
before exiting the join. It may wait so long the scheduled time
for the other thread is skipped. In the case of demotion, it may be
a long time before another attempt. This patch fixes that by
making the promotion and demotion activities independant. A side
effect of this change is the logic is significantly simplified.
Change-Id: I1196bd4bbfc95e8aa326a9bd4ebf395032369d1c
BUG: 1306852
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13433
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The wrong variable was being checked to determine
the watermark value.
Change-Id: If4c97fa70b772187f1fcbdf5193e077cb356a8b1
BUG: 1303895
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13357
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a bug in the way hardlinks are handled in tiered volumes. Ideally, the tier linkto files on the cold tier to files that are hardlinks to each other on the hot tier, should themselves be hardlinks of each other. As they are not, they end up being files with the same gfid but different names for the cold tier dht, and end up overwriting the cached-subvol information stored in the dht inode-ctx.
Change-Id: Ic658a316836e6a1729cfea848b7d212674b0edd2
BUG: 1305277
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13391
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Each brick on a host will get a separate query file.
2. While reading query record from these query files we
read them in a Round-Robin manner.
3. When an error occurs during migration we rename it to
query file with an time stamp and .err extension for
better debugging.
Change-Id: I27c4285d24fd695d2d5cbd9fd7db3879d277ecc8
BUG: 1302772
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/13293
Smoke: Gluster Build System <jenkins@build.gluster.com>
Tested-by: N Balachandran <nbalacha@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A node which contains only cold bricks and has detected that
the high watermark value has been breached on the hot tier will
never reset the watermark to the correct value. The promotion check
will thus always fail and no promotions will occur from that node.
Change-Id: I0f0804744cd184c263acbea1ee50cd6010a49ec5
BUG: 1303895
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13341
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When iterating the query file during migration, tiering should
break out of the loop once cycle time completes. Otherwise it
may be possible to stay in the loop for a long time. If that
happens updates to files will become stale and have not impact
migration.
Change-Id: Ib60cf74bc84e8646e6a0da21ff04954b1b83c414
BUG: 1301227
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13284
Smoke: Gluster Build System <jenkins@build.gluster.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Default value for tier-demote-frequency is 3600 sec to avoid
frequent demotions.
Default value for tier-max-mb is 4000 mb
Default value for tier-max-files is 10000 files
Change-Id: Ie60951c478a7462c425059699ab82511aa13fa0a
BUG: 1300412
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/13270
Smoke: Gluster Build System <jenkins@build.gluster.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tier process watermark calculations were incorrect when the
quota-deem-statfs option was enabled. We now ignore this while
calculating hot tier usage to determine watermark levels.
Change-Id: I308bc24432e2fa5ad1d5703e80fc391433538bbb
BUG: 1301473
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13288
Smoke: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Tested-by: mohammed rafi kc <rkavunga@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A query to the database may take a long time if the database
has many entries. The tier daemon also sends IPC calls to the
bricks which can run slowly, espcially in RHEL6. While it is
possible to track down each such instance, the snapshot
feature should not be affected by database operations. It requires
no migration be underway. Therefore it is okay to pause tiering
at any time except when DHT is moving a file. This fix implements
this strategy by monitoring when control passes to DHT to
migrate a file using the GF_XATTR_FILE_MIGRATE_KEY trigger. If it
is not, the pause operation is successful.
Change-Id: I21f168b1bd424077ad5f38cf82f794060a1fabf6
BUG: 1287842
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13104
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we check the watermarks only before a cycle
begins. We should also check the hot tier's fullness
against the watermarks during the migration so the watermark
is not exceeded as files are promoted.
Change-Id: I2ff87a1c308d64fbdca14bbdf55f3ec3007290ae
BUG: 1293932
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/13103
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added file path/gfid when available to the tier log
messages to make debugging easier.
Change-Id: I22dda329367df2b846dcf254594312c997b66083
BUG: 1273043
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/13114
Reviewed-by: mohammed rafi kc <rkavunga@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What:
If dht_open is called on a migrating file after the inode_ctx is set,
subsequent FOPs on that fd do not open the fd on the dst subvol.
This is seen when the open-ftruncate-close sequence is repeatedly
called on a migrating file.
A second call to the sequence described above causes dht_truncate_cbk
to call dht_truncate2 as the dht_inode_ctx was already set by the first
call. As dht_rebalance_in_progress_check is not called, the fd is not
opened on the dst subvol.
On a distributed-replicate volume, this causes AFR to
open the fd using afr_fix_open, but with the wrong flags, causing
posix_ftruncate to fail with EINVAL.
The fix: We require fd specific information to make a decision while
handling migrating files.
Set the fd_ctx to indicate the fd has been opened on the dst subvol
and check if it has been set while processing Phase1/Phase2 checks
in the FOP callback functions.
Change-Id: I43cdcd8017b4a11e18afdd210469de7cd9a5ef14
BUG: 1284823
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12985
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we are creating data file in a hot subvolume
then we will create a linkfile in cold subvolume.
Linkfile creation happens first. If linkfile creation
was successful and data file creation failed, then
linkfile in cold subvolume will become stale.
This patch will delete the linkfile as well, if data
file creation fails.
Also this code duplicates dht_create to make tier_create
Change-Id: I377a90dad47f288e9576c7323b23cf694a91a7a3
BUG: 1290677
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12948
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had run sleep() in the pause tier callback. Blocking within
a synctask is dangerous. The sleep() call does not inform
the synctask scheduler that a thread is no longer running.
It therefore believes it is running. If a second synctask already
exists, it may not be able to run. This occurs if the thread
limit in the pool has been reached.
Note the pool size only grows when a synctask is created, not
when it is moved from wait state to run state, as is the case
when an FOP completes. When the tier is paused during migration,
synctasks already exist waiting for responses to FOPs to the
server with high probability.
The fix is to yield() in the RPC callback, which will place
the synctask into the wait queue and free up a thread for the
FOP callback. A timer wakes the callback after sufficient
time has elapsed for the pause to occur.
Change-Id: I6a947ee04c6e5649946cb6d8207ba17263a67fc6
BUG: 1267950
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12987
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The option tier-max-files represents the maximum number
of files trasferred by a node in a gives cycle. Fix help message
to reflect the "per node" aspect. The code transferred one
more file per cycle than the given value.
Also change the default values of max file and max bytes to
very large values, effectively we will not throttle migration
unless the administrator requests it via CLI.
Change-Id: Ic2949ed3d8c35afe7c9ae4db72195603cfb2e28f
BUG: 1292671
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12984
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
files deleted during promotion were not deleting as the
files are moving from hashed to non-hashed.
On deleting a file that is undergoing promotion,
the unlink call is not sent to the dst file as the
hashed subvol == cached subvol. This causes
the file to reappear once the migration is complete.
This patch also fixes a problem with stale linkfile
deleting.
Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5
BUG: 1282390
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/12829
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spawn Promotion or Demotion depending if there are any Cold or Hot
bricks present localy.
IF the local HOT brick list is empty dont spawn demote thread.
IF the local COLD brick list is empty dont spawn promote thread.
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Change-Id: I524730e59414dd156c78ec0bd7a3629212697e6e
BUG: 1289578
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12912
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tier process tries to free ipc_ctr_params twice
if the syncop_ipc call in tier_process_ctr_query fails.
ipc_ctr_params is freed when ctr_ipc_in_dict is freed.
But ctr_ipc_out_dict is NULL when syncop_ipc fails, causing
GF_FREE to be called on a non-NULL ipc_ctr_params ptr again.
Change-Id: Ia15f36dfbcd97be5524588beb7caad5cb79efdb4
BUG: 1288995
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12890
Reviewed-by: Joseph Fernandes
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixing memory leak from response dict during a parent
lookup to get the path.
Change-Id: I60c23d0b25e7f763f0e53c40e71ee053aba6d555
BUG: 1288019
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12867
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
Tested-by: Joseph Fernandes
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ignore the status of already migrated files and in the
process don't count.
Change-Id: Idba6402508d51a4285ac96742c6edf797ee51b6a
BUG: 1276141
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12758
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterd occasionally loads shared libraries of translators. This
failed for tiering due to a reference to dht_methods which is defined
as a global variable which is not necessary.
The global variable has been removed and this is now a member of
dht_conf and is now initialised in the *_init calls.
Change-Id: Ifa0a21e3962b5cd8d9b927ef1d087d3b25312953
BUG: 1287842
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12863
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible a file would get migrated in the middle
of a readdir operation. If there are four subvolumes A,B,C,D,
and if readdir reads them in order and reaches subvol B,
then, if a file is moved from D to A, it will not be included
in the readdir output.
This phenonema has pre-existed in DHT migration but is more
apparent in tiering.
When a file is moved off the hashed subvolume a T file is created.
For tiering, we will make the cold subvolume the hashed subvolume.
This will ensure the creation of a T file. Readdir will not skip T
files in the tier translator.
Making the cold subvolume the hashed subvolume ensures the T
files created on promotions or creates will be less likely to
fill the volume.
Creates still put the data on the hot subvolume.
Change-Id: Ifde557d3d0e94a4570ca9f115adee3db2ee75407
BUG: 1281598
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12530
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tail, as in dog chasing its tail. These are the unwrapped
syscalls that have crept in (or were missed) in the previous
patches.
various xlators and other components are invoking system calls
directly instead of using the libglusterfs/syscall.[ch] wrappers.
If not using the system call wrappers there should be a comment
in the source explaining why the wrapper isn't used.
Change-Id: If183487de92fc7cbc47d4c5aa3f3e80eae50b84f
BUG: 1267967
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/12589
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Check if detach is running, disallow detach commit if so.
2. Cleanup shutdown of tier daemon on detach: do not rerun fix-layout,
do not send incorrect status back to glusterd.
Change-Id: I97202f748773c1176396a4ffd32a4c7fa9b9c1bc
BUG: 1279637
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12560
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tier translator should only choose candidate files for promotion
from the most recent cycle, not a multiple of the most recent cycles.
Otherwise user observed behavior can be inconsistent. Remove related
test in tier.t that is subject to race condition.
Change-Id: I9ad1523cac00f904097ce468efa6ddd515857024
BUG: 1275524
Signed-off-by: root <root@rhs-cli-15.gdev.lab.eng.bos.redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12480
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Earlier, when the database was queried we used to save
all the queried records in an ASCII format in the query file.
This caused issues like filename having ASCII delimiter and used
to take a lot of space. The tier.c file also had a lot of parsing code.
Here we changed the format of the query file to binary.
All the logic of serialization and formating of query record is done
by libgfdb. Libgfdb provides API,
gfdb_write_query_record() and gfdb_read_query_record(),
which the user i.e tier migrator and CTR xlator can use to
write to and read from query file.
With this binary format we save on disk space i.e reduce to 50% atleast
as we are saving GFID's in binary format 16 bytes and not the string format
which takes 36 bytes + We are not saving path of the file + we are also saving on
ASCII delimiters.
The on disk format of query record is as follows,
+---------------------------------------------------------------------------+
| Length of serialized query record | Serialized Query Record |
+---------------------------------------------------------------------------+
4 bytes Length of serialized query record
|
|
-------------------------------------------------|
|
|
V
Serialized Query Record Format:
+---------------------------------------------------------------------------+
| GFID | Link count | <LINK INFO> |..... | FOOTER |
+---------------------------------------------------------------------------+
16 B 4 B Link Length 4 B
| |
| |
-----------------------------| |
| |
| |
V |
Each <Link Info> will be serialized as |
+-----------------------------------------------+ |
| PGID | BASE_NAME_LENGTH | BASE_NAME | |
+-----------------------------------------------+ |
16 B 4 B BASE_NAME_LENGTH |
|
|
------------------------------------------------------------------------|
|
|
V
FOOTER is a magic number 0xBAADF00D indicating the end of the record.
This also serves as a serialized schema validator.
Change-Id: I9db7416fd421e118dd44eafab8b535caafe50d5a
BUG: 1272207
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12354
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to count replica files for migration counting even though
they were ignore for migration as the replica brick didnt have
the ownership (as per the replication xlator either AFR/EC).
As a result the number of files migrated would show a wrong count,
i.e each replicated file would be counted 1 + number of replica.
This patch ignores such cases.
Change-Id: I91aa352ee3b0a5029790653266e9333f3947d0ac
BUG: 1276141
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12453
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tier query parsing code was using fscanf to read each record.
As space is a delimiter for fscanf, filenames containing spaces
caused the parsing to return unexpected values causing various
issues in the tier process, including crashes due to buffer
overflows.
Change-Id: Ife602cb7ecb158fccbc2c89e4d2959bd97098a87
BUG: 1276562
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: http://review.gluster.org/12469
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Dan Lambright <dlambrig@redhat.com>
Tested-by: Dan Lambright <dlambrig@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
various xlators and other components are invoking system calls
directly instead of using the libglusterfs/syscall.[ch] wrappers.
If not using the system call wrappers there should be a comment
in the source explaining why the wrapper isn't used.
Change-Id: I1f47820534c890a00b452fa61f7438eb2b3f667c
BUG: 1267967
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: http://review.gluster.org/12276
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On fix-layout heal files are scanned. Files found are exist on the hot or cold
subvolume. Those not found in the cold tier would exist on the hot. They
should not be flagged as an error.
Replace INFO with TRACE for common tier migration logs. Frequent migration
was growing the log files too quickly.
On migratation failures, do not acrue files towards cycle limit's budget.
Change-Id: Ie832ee07c43bce5477ae81c939d1fe8416a11615
BUG: 1275383
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12430
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Snaps of tiered volumes cannot handle files undergoing migration.
We implement a helper mechanism to "pause" migration. Any files
undergoing migration are aborted. Clean up is done to remove
sticky bits and data at the destination. Migration is restarted
after snap completes.
For testing an internal switch is added. It is not exposed externally.
gluster volume set vol1 tier-pause [true|false]
Change-Id: Ia85bbf89ac142e9b7e73fcbef98bb9da86097799
BUG: 1267950
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12304
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a bricks are down, promotion/demotion should still be possible.
For example, if an EC brick is down, the other bricks are able to
recover the data and migrate it.
Change-Id: I8e650c640bce22a3ad23d75c363fbb9fd027d705
BUG: 1273215
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12397
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Joseph Fernandes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On a write to a replica volume, we record in all brick's databases an entry.
When the tier daemon runs, it will only move the file if it is the true
owner of the file as defined by the XATTR_NODE_UUID_KEY.
Change-Id: Ib82717f87a3f94f3d0d9f969773de9e88d6aaf22
BUG: 1273043
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12391
Reviewed-by: Joseph Fernandes
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix introduces infrastructure to support different
policies for promotion and demotion.
Currently the tier feature automatically promotes and demotes
files periodically based on access. This is good for testing
but too stringent for most real workloads. It makes it
difficult to fully utilize a hot tier- data will be demoted
before it is touched- its unlikely a 100GB hot SSD will have
all its data touched in a window of time.
A new parameter "mode" allows the user to pick promotion/demotion
polcies.
The "test mode" will be used for *.t and other general testing.
This is the current mechanism.
The "cache mode" introduces watermarks. The watermarks
represent levels of data residing on the hot tier.
"cache mode" policy:
The % the hot tier is full is called P.
Do not promote or demote more than D MB or F files.
A random number [0-100] is called R.
Rules for migration:
if (P < watermark_low) don't demote, always promote.
if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P.
if (P > watermark_hi) always demote, don't promote.
gluster volume set {vol} cluster.watermark-hi %
gluster volume set {vol} cluster.watermark-low %
gluster volume set {vol} cluster.tier-max-mb {D}
gluster volume set {vol} cluster.tier-max-files {F}
gluster volume set {vol} cluster.tier-mode {test|cache}
Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2
BUG: 1257911
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12039
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
version less than 3.7 i.e rhel 6.7
Problem: On RHEL 6.7, we have sqlite version 3.6.2 which doesnt support
WAL journaling mode, as this journaling mode is only available in sqlite 3.7 and above.
As a result we cannot have to progreses concurrently accessing sqlite, without
running into db locks! Well WAL is also need for performace on CTR side.
Solution: This solution is to use CTR db connection for doing queries when WAL mode is
absent. i,e tier migrator will send sync_op ipc calls to CTR, which in turn will
do the query and create/update the query file suggested by tier migrator.
Pending: Well this solution will stop the db locks but the performance is still an issue for CTR.
We are developing an in-Memory Transaction Log (iMeTaL) which will help boost the CTR
performance by doing in memory udpates on the IO path and later flush the updates to
the db in a batch/segment flush.
Change-Id: Ie3149643ded159234b5cc6aa6cf93b9022c2f124
BUG: 1240577
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Signed-off-by: Joseph Fernandes <josferna@redhat.com>
Reviewed-on: http://review.gluster.org/12191
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Luis Pabon <lpabon@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases a brick will try to migrate a file that has already
been migrated. This is a legal case, e.g. when both bricks
are replica pairs.
Change-Id: If2578b947014cbbdfb3c6591db9044d6b1d92774
BUG: 1263726
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/12185
Tested-by: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Joseph Fernandes
Tested-by: Gluster Build System <jenkins@build.gluster.com>
|