|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | There is a low level security issue with fencing since one client
can preempt another client's lock.
This patch does not completely eliminate the issue of a client
misbehaving, but certainly it adds a security layer for default use cases
that does not need fencing.
Change-Id: I55cd15f2ed1ae0f2556e3d27a2ef4bc10fdada1c
updates: #466
Signed-off-by: Susant Palai <spalai@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | rm -rf <dir> fails on dirs which contain linkto files
that point to themselves because dht incorrectly thought
that they were cached files after looking them up.
The fix now treats them as invalid linkto files
and deletes them.
Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643
fixes: bz#1667804
Signed-off-by: N Balachandran <nbalacha@redhat.com> | 
| | 
| 
| 
| 
| 
| | Change-Id: Iefe08b136044495f6fa2b092c9e8c833efee1400
fixes: bz#1667905
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | In gluster get-state volumeoptions command there was some amount of leak
observed. This fix resolves the identified leaks.
Change-Id: Ibde5743d1136fa72c531d48bb1b0b5da0c0b82a1
fixes: bz#1667779
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Logically dead code
CID: 1398468
Updates: bz#789278
Change-Id: I8713a0c51777eb64e617d00ab72fd1db4994b6ab
Signed-off-by: Iraj Jamali <ijamali@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In automatic Splitbrain resolution when favorite child policy
is set as size, split brain resolution must not work for
directories.
Currently, if a directory is in split brain with both copies
having same size, the source is selected arbitrarily
and healed.
fixes: bz#1655050
Change-Id: I5739498639c17c89874cc577362e543adab55f5d
Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | 1. cli_req.dict.dict_val,
   It must be freed no metter operation error or success.
   Fix it as lookup "alloca" memory before decode.
2. args.xdata.xdata_val,
   It is allocated by "alloca", free is unneeded.
3. qd_nameless_lookup,
   It olny needs gfid, a gfs3_lookup_req argument is unneeded.
Change-Id: I746dddf7f3d1465b1885af2644afe0bcf0a5665b
fixes: bz#1656682
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Added functionality to gluster volume set auth.allow command to
accept CIDR IP addresses. Modified few functions to isolate cidr
feature so that it prevents other gluster commands such as peer
probe to use cidr format ip. The functions are modified in such
a way that they have an option to enable accepting of cidr
format for other gluster commands if required in furture.
updates: bz#1138841
Change-Id: Ie6734002a7078f1820e5df42d404411cce945e8b
Credits: Mohit Agrawal
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | design reference: https://review.gluster.org/#/c/glusterfs-specs/+/21925/
This patch adds the lock preempt support.
Note: The current model stores lock enforcement information as separate
xattr on disk. There is another effort going in parallel to store this
in stat(x) of the file. This patch is self sufficient to add fencing
support. Based on the availability of the stat(x) support either I will
rebase this patch or we can modify the necessary bits post merging this
patch.
Change-Id: If4a42f3e0afaee1f66cdb0360ad4e0c005b5b017
updates: #466
Signed-off-by: Susant Palai <spalai@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Problem: Some functions are not freeing memory allocated by
         xdr_to_genric so it has become leak
Solution: Call free to avoid leak
Change-Id: I3524fe2831d1511d378a032f21467edae3850314
fixes: bz#1656682
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> | 
| | 
| 
| 
| 
| 
| | Change-Id: I629698d8ddf6f15428880bdc1501d36bc37b8ebb
fixes: bz#1657607
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Problem: In gluster code some of the places it call's get_new_dict
         to create a dictionary without taking reference so at the time
         of dict_unref it has become a leak
Solution: To resolve the same call dict_new instead of get_new_dict
updates bz#1650403
Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| | This maybe one mistake when coding.
Fixes: bz#1665332
Change-Id: Ia8f8dadf4a71579240ff9950b141ca528bd342b3
Signed-off-by: Xiubo Li <xiubli@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Problem: running "gluster get-state glusterd odir /get-state"
resulted in glusterd crash.
Cause: In the above command output directory has been specified
without "/" at the end. If "/" is not given at the end, "/" will
be added to path using "strcat", so the added character "/" is
not having memory allocated. When tried to free, glusterd will
crash as"/" has no memory allocated.
Solution: Instead of concatenating "/" to output directory, add
it to output filename.
Change-Id: I5dc00a71e46fbef4d07fe99ae23b36fb60dec1c2
fixes: bz#1665038
Signed-off-by: Sanju Rakonde <srakonde@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | s/QUIESCE/INDEX/
fixes: bz#1665363
Change-Id: I6dc4fde682cedeaa10d870267b8909af1a9449c0
Signed-off-by: Vijay Bellur <vbellur@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| | fixes: bz#1622665
Change-Id: I777d67b1b62c284c62a02277238ad7538eef001e
Signed-off-by: Iraj Jamali <ijamali@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | PROBLEM:
When multiple sharded files are deleted in quick succession, multiple
issues were observed:
1. misleading logs corresponding to a sharded file where while one log
   message said the shards corresponding to the file were deleted
   successfully, this was followed by multiple logs suggesting the very
   same operation failed. This was because of multiple synctasks
   attempting to clean up shards of the same file and only one of them
   succeeding (the one that gets ENTRYLK successfully), and the rest of
   them logging failure.
2. multiple synctasks to do background deletion would be launched, one
   for each deleted file but all of them could readdir entries from
   .remove_me at the same time could potentially contend for ENTRYLK on
   .shard for each of the entry names. This is undesirable and wasteful.
FIX:
Background deletion will now follow a state machine. In the event that
there are multiple attempts to launch synctask for background deletion,
one for each file deleted, only the first task is launched. And if while
this task is doing the cleanup, more attempts are made to delete other
files, the state of the synctask is adjusted so that it restarts the
crawl even after reaching end-of-directory to pick up any files it may
have missed in the previous iteration.
This patch also fixes uninitialized lk-owner during syncop_entrylk()
which was leading to multiple background deletion synctasks entering
the critical section at the same time and leading to illegal memory access
of base inode in the second syntcask after it was destroyed post shard deletion
by the first synctask.
Change-Id: Ib33773d27fb4be463c7a8a5a6a4b63689705324e
updates: bz#1662368
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Problem: brick is getting crashed at the time of calling
         pthread_detach after just call gf_thread_create.If
         sufficient resources are not available on the system
         pthread_create returns EAGAIN (non-negative) but the
         caller function expects negative error code in case of failure
Solution: Change the condition in caller function to avoid the crash
Change-Id: Ifeaa49f809957eb6c33aa9792f5af1b55566756d
fixes: bz#1662906 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | mem_put() in STACK_UNWIND_STRICT causes a crash if frame->local is not null
as md-cache obtains local from CALLOC.
Changed two occurrences of STACK_UNWIND_STRICT to MDC_STACK_UNWIND as
the latter macro does not rely on STACK_UNWIND_STRICT for cleaning up
frame->local.
fixes: bz#1632503
Change-Id: I1b3edcb9372a164ef73119e99a49e747765d7166
Signed-off-by: Vijay Bellur <vbellur@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch fixes memory leak reported by ASan.
The fix was first merged by
https://review.gluster.org/#/c/glusterfs/+/21805.
But later change was reverted due to this patch
https://review.gluster.org/#/c/glusterfs/+/21178/.
updates: bz#1633930
Change-Id: I1febe121e0be33a637397a0b54d6b78391692b0d
Signed-off-by: Sunny Kumar <sunkumar@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | With this changeset, default value for the AFR client side
heal volume option is set to "off"
fixes: bz#1663102
Change-Id: Ie4016932339c4896487e3e7cb5caca68739b7ba2
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> | 
| | 
| 
| 
| 
| 
| | Change-Id: I2ced288113a369cc6497a77ac1871007df434da4
fixes: bz#1664647
Signed-off-by: Susant Palai <spalai@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | ... in statedump for a better debugging experience.
BEFORE:
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 13635, owner=2dd2c3a11706dc8c, client=0x7f159012b000,
connection-id=(null), granted at 2018-12-31 14:20:42
connection-id is null above.
AFTER:
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 10977, owner=b485e33df21bdaa2, client=0x7fa24c01ab90,
connection-id=CTX_ID:68e12340-eed2-4386-bf5e-1f43cf8693d9-GRAPH_ID:0-
PID:10901-HOST:dhcp35-215.lab.eng.blr.redhat.com-PC_NAME:patchy-client-0-
RECON_NO:-0, granted at 2018-12-31 14:33:50
Change-Id: I4608994bacabb558a3be8c1634ee6b1d2d3022e2
fixes: bz#1662679
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | excessive logging
... of the kind
"[2018-12-26 05:22:44.195019] E [MSGID: 133010]
[shard.c:2253:shard_common_lookup_shards_cbk] 0-volume1-shard: Lookup
on shard 785 failed. Base file gfid = cd938e64-bf06-476f-a5d4-d580a0d37416
[No such file or directory]"
shard_common_lookup_shards_cbk() has a specific check to ignore ENOENT error without
logging them during specific fops. But because background deletion is done in a new
frame (with local->fop being GF_FOP_NULL), the ENOENT check is skipped and the
absence of shards gets logged everytime.
To fix this, local->fop is initialized to GF_FOP_UNLINK during background deletion.
Change-Id: I0ca8d3b3bfbcd354b4a555eee520eb0479bcda35
updates: bz#1662368
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit b87c397091bac6a4a6dec4e45a7671fad4a11770.
There seems to be some performance regression with the patch and hence recommended to have it reverted.
Updates: #325
Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | In lookup, if the file has been marked as bad, then bit-rot-stub
was sending the version and signature xattr values as well in the
response dictinary. This is not needed. Only bad file marker has
to be sent.
Change-Id: Id59c02e9857577c60849fd28ef657f71e0b15207
fixes: bz#1664122
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | statedump from io-threads lacked information to understand the number of
running threads & number of requests in each priority queue. This patch
addresses that.
Sample statedump output w/ this patch:
current_high_priority_threads=7
current_normal_priority_threads=9
current_low_priority_threads=0
current_least_priority_threads=0
fast_priority_queue_length=32
normal_priority_queue_length=45
Also, changed the wording for least priority queue in
iot_get_pri_meaning().
Change-Id: Ic5f6391a15cc28884383f5185fce1cb52e0d10a5
fixes: bz#1664124
Signed-off-by: Vijay Bellur <vbellur@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| | Change-Id: Ie0fe971e694101aa011d66aa496d0644669c2c5a
Updates: #389
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
Signed-off-by: ShyamsundarR <srangana@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | To avoid use_after_free, reset lease_ctx->timer back to NULL
after the structure has been freed.
Change-Id: Icd213ec809b8af934afdb519c335a4680a1d6cdc
updates: bz#1648768
Signed-off-by: Soumya Koduri <skoduri@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Problem:
https://review.gluster.org/#/c/glusterfs/+/21762/ has migrated
rebalance commands from op-sm framework to mgmt v3 framework.
In a heterogenous cluster, if rebalance commands follow op-sm
framework, localhost information is not displayed in the
output of "gluster v rebalance <volname> status".
Cause:
Previously without https://review.gluster.org/#/c/glusterfs/+/21762/
rebalance commands were following op-sm framework.
In glusterd_volume_rebalance_use_rsp_dict() current_index variable
keeps track of number/count of peers in trusted storage pool.
In op-sm, glusterd_volume_rebalance_use_rsp_dict() will be called
only for the peers. So the current index should start from 2
assuming local host as node 1.
With the above patch, rebalance commands are following mgmt v3
framework. In mgmt v3, glusterd_volume_rebalance_use_rsp_dict()
is called for all nodes. For localhost it is called from
brick-op function and for peers it is called from brick-op
call back function. So the current index value should start
from 1.
https://review.gluster.org/#/c/glusterfs/+/21762/ has changed the
value of current index to 1. Because of this, In heterogenous cluster,
local host's information is overwritten by one of the peers information.
And rebalance status will not display localhost's information in
the output.
Solution: assign a value to current index based on a op-version
check.
Change-Id: I2dfba1f007e908cf160acc4a4a5d8ef672572e4d
fixes: bz#1663243
Signed-off-by: Sanju Rakonde <srakonde@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | When we run profile info command, it should display statistics
of all the bricks of the volume. To display information of bricks
which are hosted on peers, we need to aggregate the response from
peers.
For profile info command, all the statistics will be added into
the dictionary in brick-op phase. To aggregate the information from
peers, we need to call glusterd_syncop_aggr_rsp_dict() in brick-op
call back function.
fixes: bz#1663223
Change-Id: I5f5890c3d01974747f829128ab74be6071f4aa30
Signed-off-by: Sanju Rakonde <srakonde@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Add missing unref to req_dict to fix memory leak in handle of
handshake.
Change-Id: I0d8573fc3668c1a0ccc9030e3a096bbe20ed5c36
fixes: bz#1663077
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Defect: CID 1398469- Calling strncpy with a maximum size argument
of 4096 bytes on destination array key of size 4096 bytes might
leave the destination string unterminated.
Fix: Using snprintf instead of strncpy.
updates: bz#789278
Change-Id: I4fdcd0cbf3af8b2ded94603d92d1ceb4112284c4
Signed-off-by: Harpreet Kaur <hlalwani@redhat.com> | 
| | 
| 
| 
| 
| 
| | updates: bz#1622665
Change-Id: I9f3a75ed9be3d90f37843a140563c356830ef945
Signed-off-by: Amar Tumballi <amarts@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Added ternary operator to avoid this issue
Updates: bz#1622665
Change-Id: I163d0628304a0d61249d1d97a4a3d3bee4ba4927
Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com> | 
| | 
| 
| 
| 
| 
| 
| 
| | Attempt to free rsp.dict.dict_val twice
Change-Id: I5dbc50430f59ca8d0c739b0fbe95d71981852889
Updates: bz#1622665
Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | For implementing copy_file_range fop, AFR needs to perform two inodelks in the
same transaction. This patch brings in the necessary structure to make it
easier to do so.
Entry-locks in AFR were already taking multiple entry-locks on different inodes
with the respective basenames. This patch extends the logic in inodelks to use
the same lockee_t structure. This lead to removal of quite a lot of duplicate
code present in afr-lk-common.c as both the locks are doing same things except
'winding' part.
updates: #536
Change-Id: Ibfce7e3f260bb27b18645152ec680c33866fe0ae
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Squash some ugly warnings, and make the code a little bit simpler by
removing some unneeded goto jumps.
On Ubuntu 16.04 the following warnings were reported by Amudhan:
      CC	   barrier.lo
    barrier.c: In function ‘notify’:
    barrier.c:499:33: warning: switch condition has boolean value [-Wswitch-bool]
                             switch (past) {
                                     ^
    barrier.c: In function ‘reconfigure’:
    barrier.c:565:25: warning: switch condition has boolean value [-Wswitch-bool]
                     switch (past) {
                             ^
Change-Id: Ifb6b75058dff8c789b729c76530a1358d391f4d1
Updates: bz#1193929
Reported-by: Amudhan P <amudhan83@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch addresses coverity issues with CID 1398470 and 1398475
1398470 - Missing unlock - False positive, Added a annotation to
                           make coverity happy
1398475 - Unused value
Change-Id: I1bb3df0b716690fad8fc52c393c8b2b6c41f7860
updates: bz#789278
Signed-off-by: Sanju Rakonde <srakonde@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| | Replaced "recieve" with "receive".
Change-Id: I58a3d3d4a0093df4743de9fae4d8ff152d4b216c
fixes: bz#1662089
Signed-off-by: N Balachandran <nbalacha@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Rearrange the dht_lookup_cbk code to make
it easier to understand.
Corrected a message in dht_linkfile_create_lookup_cbk
Change-Id: Id41db9ef901732f0410f1c007807362c630218ff
fixes: bz#1590385
Signed-off-by: N Balachandran <nbalacha@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch fixes buffer overflow in
$SRC/xlators/storage/posix/src/posix-inode-fd-ops.c
Memory access at offset 432 overflows "md5_checksum" variable.
SUMMARY: AddressSanitizer: stack-buffer-overflow (/lib64/libasan.so.5+0xb825a)
updates: bz#1633930
Change-Id: I46010a09161d02cdf0c69679a334ec1d3d49cffb
Signed-off-by: Harpreet Kaur <hlalwani@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| | This patch fixes coverity issue possible null dereference.
Change-Id: I93c0847c3d93b29a1e001ed044a63e908c670167
updates: bz#789278
Signed-off-by: Sunny Kumar <sunkumar@redhat.com> | 
| | 
| 
| 
| 
| 
| | updates: bz#789278
Change-Id: I7de800b90a614e3666e965b0cafc70026a844b2d
Signed-off-by: Amar Tumballi <amarts@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | * we shouldn't be using 'local' after DHT_STACK_UNWIND() as it frees
the content of local. Add a 'goto out' or similar logic to handle
the situation.
* fix possible overlook of unref(dict), instead of unref(xdata).
* make coverity happy by re-ordering unref in meta-defaults.
* gfid-access: re-order dictionary allocation so we don't have to
  do a extra unref.
* other obvious errors reported.
updates: bz#789278
Change-Id: If05961ee946b0c4868df19861d7e4a927a2a2489
Signed-off-by: Amar Tumballi <amarts@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | With brick mux, the number of threads increases as the number of
bricks increases. As an initiative to reduce the number of
threads in brick mux scenario, replacing janitor thread to use
synctask infra.
Now close() and closedir() handle by separate janitor
thread which is linked with glusterfs_ctx.
Updates #475
Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73
Signed-off-by: Poornima G <pgurusid@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Problem: When trying to convert a plain distribute volume to replica-3
or arbiter type it is failing with ENOTCONN error as the lookup on
the root will fail as there is no quorum.
Fix: Allow lookup on root if it is coming from the ADD_REPLICA_MOUNT
which is used while adding bricks to a volume. It will try to set the
pending xattrs for the newly added bricks to allow the heal to happen
in the right direction and avoid data loss scenarios.
Note: This fix will solve the problem of type conversion only in the
case where the volume was mounted at least once. The conversion of
non mounted volumes will still fail since the dht selfheal tries to
set the directory layout will fail as they do that with the PID
GF_CLIENT_PID_NO_ROOT_SQUASH set in the frame->root.
Change-Id: Ic511939981dad118cc946754341318b164954b3b
fixes: bz#1655854
Signed-off-by: karthik-us <ksubrahm@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The current implementation of iobuf_pool has two problems:
- prealloc of 12.5MB memory, this limits the scale factor of the gluster
  processes due to RAM requirements
- lock contention, as the current implementation has one global
  iobuf_pool lock. Credits for debugging and addressing the same goes to
  Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410
Hence changing the iobuf implementation to use per thread mem pool.
This may theoritically appear to cause perf dip as there is no preallocation.
But per thread mem pool will not have significant perf impact as the last
allocated memory is kept alive for subsequent allocs, for some time.
The worst case would be if iobufs requested are of random sizes each time.
The best case is, if we get iobuf request of the same size. From the perf
tests, this patch did not seem to cause any perf decrease.
Note that, with this patch, the rdma performance is going to degrade
drastically. In one of the previous patchsets we had fixes to not
degrade rdma perf, but rdma is not supported and also not tested [1].
Hence the decision was to not have code in rdma that is not tested
and not supported.
[1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html
Updates: #325
Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27
Signed-off-by: Poornima G <pgurusid@redhat.com> | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Currently io-cache invalidate pages falling in the range of write. But
instead it can update pages with same data so that reads can make use
of the cache.
credits: Xavi Hernandez <xhernandez@redhat.com>
Change-Id: I932bd3da97ddfd464187f3009b1013eb334f00a7
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
updates: bz#1659869 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | With read-after-open being set to yes by default, if open-behind sees
any reads, it'll do an open on backend (and hence flush/release
later). This means with the current order of quick-read and
open-behind, open-behind sees all reads and hence also does open
bringing down performance for small file reads.
Since for small files, reads are absorbed by quick-read, if quick-read
is made a parent of open-behind, ob doesn't witness any reads. For
read-only workloads, this means ob doen't do any opens (even with
read-after-open yes and use-anonymous-fd no).
Change-Id: I138a42b006d104cff43ee6f07829e39c36f6f234
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
Fixes: bz#1659327 |