| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Change-Id: Iec5ce7f17fbf899f881a58cd20c4c967e3b71668
fixes: bz#1642168
Signed-off-by: Anuradha Talur <atalur@commvault.com>
|
|
|
|
|
|
|
|
|
|
|
| |
GF_LOG_OCCASSIONALLY doesn't log on the first instance rather at every
42nd iterations which isn't effective as in some cases we might not have
the code flow hitting the same log for as many as 42 times and we'd end
up suppressing the log.
Fixes: bz#1694925
Change-Id: Iee293281d25a652b64df111d59b13de4efce06fa
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Shd daemon is per node, which means they create a graph
with all volumes on it. While this is a great for utilizing
resources, it is so good in terms of performance and managebility.
Because self-heal daemons doesn't have capability to automatically
reconfigure their graphs. So each time when any configurations
changes happens to the volumes(replicate/disperse), we need to restart
shd to bring the changes into the graph.
Because of this all on going heal for all other volumes has to be
stopped in the middle, and need to restart all over again.
Solution:
This changes makes shd as a per volume daemon, so that the graph
will be generated for each volumes.
When we want to start/reconfigure shd for a volume, we first search
for an existing shd running on the node, if there is none, we will
start a new process. If already a daemon is running for shd, then
we will simply detach a graph for a volume and reatach the updated
graph for the volume. This won't touch any of the on going operations
for any other volumes on the shd daemon.
Example of an shd graph when it is per volume
graph
-----------------------
| debug-iostat |
-----------------------
/ | \
/ | \
--------- --------- ----------
| AFR-1 | | AFR-2 | | AFR-3 |
-------- --------- ----------
A running shd daemon with 3 volumes will be like-->
graph
-----------------------
| debug-iostat |
-----------------------
/ | \
/ | \
------------ ------------ ------------
| volume-1 | | volume-2 | | volume-3 |
------------ ------------ ------------
Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99
fixes: bz#1659708
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I3bbda719027b45e1289db2e6a718627141bcbdc8
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Commit 6d6a3b2 introduced some unused vars. This patch defines them
within #ifdef DEBUG
Fixes: bz#1580315
Change-Id: I8a332b00c3ffb66689b4b6480c490b9436c17d63
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
dumping the whole inode table detail to screen doesn't solve any
purpose. We should be getting only toplevel details on CLI, and then
if one wants to debug further, then they need to get to 'statedump'
to get full details.
Fixes: bz#1580315
Change-Id: Iaf3de94602f9c76832c3c918ffe2ad13c0b0e448
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
When using ec, there are many messages at brick log as,
[inodelk.c:514:__inode_unlock_lock] 0-test-locks: Matching lock not found for unlock 0-9223372036854775807, lo=68e040a84b7f0000 on 0x7f208c006f78
[MSGID: 115053] [server-rpc-fops_v2.c:280:server4_inodelk_cbk] 0-test-server: 2557439: INODELK <gfid:df4e41be-723f-4289-b7af-b4272b3e880c> (df4e41be-723f-4289-b7af-b4272b3e880c), client: CTX_ID:67d4a7f3-605a-4965-89a5-31309d62d1fa-GRAPH_ID:0-PID:1659-HOST:openfs-node2-PC_NAME:test-client-1-RECON_NO:-28, error-xlator: test-locks [Invalid argument]
Change-Id: Ib164d29ebb071f620a4ca9679c4345ef7c88512a
Updates: bz#1689920
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Multiple shd processes are spawned while starting volumes
in the loop on brick_mux environment.glusterd spawn a process
based on a pidfile and shd daemon is taking some time to
update pid in pidfile due to that glusterd is not able to
get shd pid
Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed
the code to update pidfile in parent for any gluster daemon
after getting the status of forking child in parent.To resolve
the same correct the condition update pidfile in parent only
for glusterd and for rest of the daemon pidfile is updated in
child
Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81
fixes: bz#1684404
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
as commit 073444 is backported to release-5.4 branch, op-version
for this change should 5.4 instead of 6.
fixes: bz#1685120
Change-Id: Id504b9a1446125cea7c6a32117ccc44f28e73aa7
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The structs worm_reten_state_t and read_only_priv_t from read-only.h are
using uint64_t values to store periods of retention and autocommmit.
This seems to be dangerous since in worm-helper.c the function
worm_set_state computes in line 97:
stbuf->ia_atime = time(NULL) + retention_state->ret_period;
stbuf->ia_atime is using int64_t because of the settings of struct
iattr. So if there is a very very high retention period stored, there
is maybe an integer overflow.
What can be the solution? Using int64_t instead if uint64_t may reduce
the probability of the occurance.
Change-Id: Id1e86c6b20edd53f171c4cfcb528804ba7881f65
fixes: bz#1685944
Signed-off-by: David Spisla <david.spisla@iternity.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: commit 5a152a changed the mechansim of computing the
checksum. In heterogeneous cluster, peers are running into
rejected state because we have different cksum computation
mechansims in upgraded and non-upgraded nodes.
Solution: add a check for op-version so that all the nodes
in the cluster follow the same mechanism for computing the
cksum.
Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193
fixes: bz#1685120
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch avoids printing of "invalid argument" unless
loglevel is set to GF_LOG_DEBUG.
fixes : bz#1654021
Change-Id: I0e3d43bc627526f696b12921081342ca9b4a5f84
fixes: bz#1654021
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements a thread pool that is wait-free for adding jobs to
the queue and uses a very small locked region to get jobs. This makes it
possible to decrease contention drastically. It's based on wfcqueue
structure provided by urcu library.
It automatically enables more threads when load demands it, and stops
them when not needed. There's a maximum number of threads that can be
used. This value can be configured.
Depending on the workload, the maximum number of threads plays an
important role. So it needs to be configured for optimal performance.
Currently the thread pool doesn't self adjust the maximum for the
workload, so this configuration needs to be changed manually.
For this reason, the global thread pool has been made optional, so that
volumes can still use the thread pool provided by io-threads.
To enable it for bricks, the following option needs to be set:
config.global-threading = on
This option has no effect if bricks are already running. A restart is
required to activate it. It's recommended to also enable the following
option when running bricks with the global thread pool:
performance.iot-pass-through = on
To enable it for a FUSE mount point, the option '--global-threading'
must be added to the mount command. To change it, an umount and remount
is needed. It's recommended to disable the following option when using
global threading on a mount point:
performance.client-io-threads = off
To enable it for services managed by glusterd, glusterd needs to be
started with option '--global-threading'. In this case all daemons, like
self-heal, will be using the global thread pool.
Currently it can only be enabled for bricks, FUSE mounts and glusterd
services.
The maximum number of threads for clients and bricks can be configured
using the following options:
config.client-threads
config.brick-threads
These options can be applied online and its effect is immediate most of
the times. If one of them is set to 0, the maximum number of threads
will be calcutated as #cores * 2.
Some distributions use a very old userspace-rcu library (version 0.7)
for this reason, some header files from version 0.10 have been copied
into contrib/userspace-rcu and are used if the detected version is 0.7
or older.
An additional change has been made to io-threads to prevent that threads
are started when iot-pass-through is set.
Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b
updates: #532
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Returning any value from socket event handlers to the event sub-system
doesn't make sense since event sub-system cannot handle socket
sub-system errors.
Solution:
Change return type of all socket event handlers to 'void'
Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
Fixes: bz#1651246
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do all the 'static' tasks outside of locked region.
* hash_dentry() and hash_gfid() are now called outside locked region.
* remove extra __dentry_hash exported in libglusterfs.sym
* avoid checks in locked functions, if the check is done in calling
function.
* implement dentry_destroy(), which handles freeing of dentry separately,
from that of dentry_unset (which takes care of separating dentry from
inode, and table)
Updates: bz#1670031
Change-Id: I584213e0748464bb427fbdef3c4ab6615d7d5eb0
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
| |
We were not properly cleaning self-heal daemon resources
during afr fini. This patch will clean the same.
Change-Id: I597860be6f781b195449e695d871b8667a418d5a
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
During disconnect cleanup, we are not cancelling reconnect
timer, which causes a ref leak each time when a disconnect
happen.
Change-Id: I9d05d1f368d080e04836bf6a0bb018bf8f7b5b8a
updates: bz#1659708
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Only linking of inode to the table, and inserting it in
a list needs to be in locked region.
Updates: bz#1670031
Change-Id: I6ea7e956b80cf2765c2233d761909c4bf9c7253c
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The setxattr function receives a pointer to raw data, which may not be
null-terminated. When this data needs to be interpreted as a string, an
explicit null termination needs to be added before using the value.
Change-Id: Id110f9b215b22786da5782adec9449ce38d0d563
updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I0f3978d7e603e6e767dc7aa2a23ef35b1f2b43f7
Updates: bz#1193929
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scenarios tested:
* Upgrade the node when there are stripe / tiering and regular
type of volumes are present.
- All volumes are started fine (as the change was not on brick volfile)
- For tier, the functionality may not even work, as changetimerecorder
is not present.
- 'gluster volume info' properly shows as 'NOT SUPPORTED' for stripe and
tier type of volume.
* Upgrade in a rolling upgrade scenario, where an old version is
able to connect to higher master.
- on a normal volume, if the volfile-server was new, the newer client
volfiles needed to have utime xlator conditionally.
- with this one change, all other changes seem to work fine.
Change-Id: Ib2d3b69dafa02b2c695a735b13c1aa70aba07cb8
updates: bz#1635688
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Auto invalidation is necessary when same (meta)data is shared/access
across multiple mounts. However, if (meta)data is not shared, all
relevant I/O goes through the cache of single mount and hence is
coherent with (meta)data on bricks always. So, fuse-auto-invalidation
can be disabled for this case which gives a huge performance boost for
workloads that write data and then immediately read the data they just
wrote.
From glusterfs --help,
<snip>
--auto-invalidation[=BOOL] controls whether fuse-kernel can
auto-invalidate attribute, dentry and page-cache.
Disable this only if same files/directories are
not accessed across two different mounts
concurrently [default: "on"]
</snip>
Details on how disabling auto-invalidation helped to reduce pgbench
init times can be found at [1]. Time taken for pgbench init of scale
8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s)
with auto-invalidations turned off along with other
optimizations. Just disabling auto-invalidation contributed 56%
improvement by reducing the total time taken by 33260s.
[1] https://www.spinics.net/lists/gluster-devel/msg25907.html
Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
updates: bz#1664934
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch creates a specific function to set the thread name using a
string format and a variable argument list, like printf().
This function is used to set the thread name from gf_thread_create(),
which now accepts a variable argument list to create the full name. It's
not necessary anymore to use a local array to build the name of the
thread. This is done automatically.
Change-Id: Idd8d01fd462c227359b96e98699f8c6d962dc17c
Updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Too many logs get printed if dict_ref() and dict_unref() are passed NULL
pointer.
fixes: bz#1671213
Change-Id: I18afd849d64318f68baa7b549ee310dac0e1e786
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
A call to gf_backtrace_save() was done on each context switch of a
synctask. The backtrace is generated writing to the filesystem, so it
can have an important impact on latency.
The generated backtrace was not used anywhere, so it's been removed.
Change-Id: I399a93b932c5b6e981c696c72c3e1ef44710ba52
Updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When rpc-transport-disconnect happens, server_connection_cleanup_flush_cbk()
is supposed to call rpc_transport_unref() after open-files on
that transport are flushed per transport.But open-fd-count is
maintained in bound_xl->fd_count, which can be incremented/decremented
cumulatively in server_connection_cleanup() by all transport
disconnect paths. So instead of rpc_transport_unref() happening
per transport, it ends up doing it only once after all the files
on all the transports for the brick are flushed leading to
rpc-leaks.
Solution: To avoid races maintain fd_cnt at client instead of maintaining
on brick
Credits: Pranith Kumar Karampuri
Change-Id: I6e8ea37a61f82d9aefb227c5b3ab57a7a36850e6
fixes: bz#1668190
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly, unlock before logging.
In some cases, moved different code that was not needed
to be under lock (for example, taking time, or malloc'ing)
to be executed before taking the lock.
Note: logging might be slightly less accurate in order, since it may
not be done now under the lock, so order of logs is racy. I think
it's a reasonable compromise.
Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Found an issue on concurrent invoke of event handler to the same socket
fd, causing memory corruption. This issue arises after applying commit
"socket: Remove redundant in_lock in incoming message handling" that
removes priv->in_lock to serialize socket read.
The following call sequence describes how concurrent socket event handle
happens.
thread 1 thread 2 thread 3
epoll_wait() return
(slot->in_handler is 0) call select_on_epoll()
and epoll_ctl() on fd
epoll_wait() return
slot->in_handler++
(slot->in_handler is 1)
slot->in_handler++
(slot->in_handler is 2)
call handler() call handler()
Fix this issue by skip invoke of handler if there is already a handler
inprogress.
Change-Id: I437126ac772debcadb00993a948919c931cd607b
updates: bz#1467614
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
|
|
|
|
|
|
|
|
|
|
| |
gf_dirent struct has d_type variable which should check
with DT_DIR istead of IA_IFDIR or IA_IFDIR has to compare
with entry->d_stat.ia_type
Change-Id: Idf1059ce2a590734bc5b6adaad73604d9a708804
updates: bz#1653359
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
We had only changed the log level to DEBUG in release branch earlier.
But considering 90%+ of our deployments happen in same env, we can look
at these specific logs on need basis. With this change, the master
branch will be easier to debug with lesser logs.
Change-Id: I4157a7ec7d5ec9c2948b2bbc1e4cb8317f28d6b8
Updates: bz#1666833
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch helps enable IPv6 connections in the cluster.
The default address-family is IPv4 without using this option explicitly.
When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol
file, the mount command-line also needs to have
-o xlator-option="transport.address-family=inet6" added to it.
This option also gets added to the brick command-line.
Snapshot and gfapi use-cases should also use this option to pass in the
inet6 address-family.
Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270
fixes: bz#1635863
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
| |
To avoid the warning and preparing for adding writesame support.
Updates: #617
Change-Id: I0710b1e4c240368a9bf52968bddc6e250ae2028d
Signed-off-by: Xiubo Li <xiubli@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added functionality to gluster volume set auth.allow command to
accept CIDR IP addresses. Modified few functions to isolate cidr
feature so that it prevents other gluster commands such as peer
probe to use cidr format ip. The functions are modified in such
a way that they have an option to enable accepting of cidr
format for other gluster commands if required in furture.
updates: bz#1138841
Change-Id: Ie6734002a7078f1820e5df42d404411cce945e8b
Credits: Mohit Agrawal
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
design reference: https://review.gluster.org/#/c/glusterfs-specs/+/21925/
This patch adds the lock preempt support.
Note: The current model stores lock enforcement information as separate
xattr on disk. There is another effort going in parallel to store this
in stat(x) of the file. This patch is self sufficient to add fencing
support. Based on the availability of the stat(x) support either I will
rebase this patch or we can modify the necessary bits post merging this
patch.
Change-Id: If4a42f3e0afaee1f66cdb0360ad4e0c005b5b017
updates: #466
Signed-off-by: Susant Palai <spalai@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In gluster code some of the places it call's get_new_dict
to create a dictionary without taking reference so at the time
of dict_unref it has become a leak
Solution: To resolve the same call dict_new instead of get_new_dict
updates bz#1650403
Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
| |
fixes: bz#1622665
Change-Id: I777d67b1b62c284c62a02277238ad7538eef001e
Signed-off-by: Iraj Jamali <ijamali@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When quorum count option is updated, the change is not reflected in
the nfs-server.vol file. This is because in get_checksum_for_file(), when the
last part of the file read has size less than buffer size, the read buffer
stores old data value along with correct data value.
Solution: Pass the bytes read instead of fixed buffer size, for calculating
checksum.
Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589
fixes: bz#1657744
Signed-off-by: Varsha Rao <varao@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This reverts commit b87c397091bac6a4a6dec4e45a7671fad4a11770.
There seems to be some performance regression with the patch and hence recommended to have it reverted.
Updates: #325
Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
|
|
|
|
|
|
|
| |
Change-Id: I44dd6ceef0954ae7fc13f920e84d81bbd3f6a774
Updates: #389
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
| |
updates: bz#1622665
Change-Id: I9f3a75ed9be3d90f37843a140563c356830ef945
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They were not used at all, just taking space.
I've also marked all those that are not common really, but used
in just one place - they probably should move there (in follow-up
patches)
As a test, I've removed from the stripe xlator unused private
enums and moved one that was in the common list, but only
used in the stripe code, to be a private enum.
Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Change-Id: I1158dc1d259f1fd3f69904336c46c9d83cea799f
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* we shouldn't be using 'local' after DHT_STACK_UNWIND() as it frees
the content of local. Add a 'goto out' or similar logic to handle
the situation.
* fix possible overlook of unref(dict), instead of unref(xdata).
* make coverity happy by re-ordering unref in meta-defaults.
* gfid-access: re-order dictionary allocation so we don't have to
do a extra unref.
* other obvious errors reported.
updates: bz#789278
Change-Id: If05961ee946b0c4868df19861d7e4a927a2a2489
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With brick mux, the number of threads increases as the number of
bricks increases. As an initiative to reduce the number of
threads in brick mux scenario, replacing janitor thread to use
synctask infra.
Now close() and closedir() handle by separate janitor
thread which is linked with glusterfs_ctx.
Updates #475
Change-Id: I0c4aaf728125ab7264442fde59f3d08542785f73
Signed-off-by: Poornima G <pgurusid@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: When trying to convert a plain distribute volume to replica-3
or arbiter type it is failing with ENOTCONN error as the lookup on
the root will fail as there is no quorum.
Fix: Allow lookup on root if it is coming from the ADD_REPLICA_MOUNT
which is used while adding bricks to a volume. It will try to set the
pending xattrs for the newly added bricks to allow the heal to happen
in the right direction and avoid data loss scenarios.
Note: This fix will solve the problem of type conversion only in the
case where the volume was mounted at least once. The conversion of
non mounted volumes will still fail since the dht selfheal tries to
set the directory layout will fail as they do that with the PID
GF_CLIENT_PID_NO_ROOT_SQUASH set in the frame->root.
Change-Id: Ic511939981dad118cc946754341318b164954b3b
fixes: bz#1655854
Signed-off-by: karthik-us <ksubrahm@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation of iobuf_pool has two problems:
- prealloc of 12.5MB memory, this limits the scale factor of the gluster
processes due to RAM requirements
- lock contention, as the current implementation has one global
iobuf_pool lock. Credits for debugging and addressing the same goes to
Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410
Hence changing the iobuf implementation to use per thread mem pool.
This may theoritically appear to cause perf dip as there is no preallocation.
But per thread mem pool will not have significant perf impact as the last
allocated memory is kept alive for subsequent allocs, for some time.
The worst case would be if iobufs requested are of random sizes each time.
The best case is, if we get iobuf request of the same size. From the perf
tests, this patch did not seem to cause any perf decrease.
Note that, with this patch, the rdma performance is going to degrade
drastically. In one of the previous patchsets we had fixes to not
degrade rdma perf, but rdma is not supported and also not tested [1].
Hence the decision was to not have code in rdma that is not tested
and not supported.
[1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html
Updates: #325
Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27
Signed-off-by: Poornima G <pgurusid@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently mem-pool implementation provides api to get from the
mem pool based on the struct type. This is to retain api
compatibility with the old implementation of mem pool. Internally
in the mem pool structure there is a mapping from struct to size
based pools.
In this patch, we are adding new APIs to fetch memory from mem pool,
given a size.
Change-Id: Ib220ee45ebd134a7be8f6482db5a592dbb7b9211
Updates: #325
Signed-off-by: Poornima G <pgurusid@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inode LRU mechanism is moot in fuse xlator (ie. there is no
limit for the LRU list), as fuse inodes are referenced from
kernel context, and thus they can only be dropped on request of
the kernel. This might results in a high number of passive
inodes which are useless for the glusterfs client, causing a
significant memory overhead.
This change tries to remedy this by extending the LRU semantics
and allowing to set a finite limit on the fuse inode LRU.
A brief history of problem:
When gluster's inode table was designed, fuse didn't have any
'invalidate' method, which means, userspace application could
never ask kernel to send a 'forget()' fop, instead had to wait
for kernel to send it based on kernel's parameters. Inode table
remembers the number of times kernel has cached the inode based
on the 'nlookup' parameter. And 'nlookup' field is not used by
no other entry points (like server-protocol, gfapi etc).
Hence the inode_table of fuse module always has to have lru-limit
as '0', which means no limit. GlusterFS always had to keep all
inodes in memory as kernel would have had a reference to it.
Again, the reason for this is, kernel's glusterfs inode reference
was pointer of 'inode_t' structure in glusterfs. As it is a
pointer, we could never free it (to prevent segfault, or memory
corruption).
Solution:
In the inode table, handle the prune case of inodes with 'nlookup'
differently, and call a 'invalidator' method, which in this case is
fuse_invalidate(), and it sends the request to kernel for getting
the forget request.
When the kernel sends the forget, it means, it has dropped all
the reference to the inode, and it will send the forget with the
'nlookup' parameter too. We just need to make sure to reduce the
'nlookup' value we have when we get forget. That automatically
cause the relevant prune to happen.
Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B
fixes: bz#1560969
Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just looked at posix.c and related code and performed
some changes and cleanups. The only important one is #3 below,
but surely the others (#2 and #4) need careful review.
Changes to other files are as they were related to code paths
in posix.c.
I'll send a separate patch for other posix related files.
Main changes:
1. Proper initializtion for parameters, where it made sense.
2. Logged outside the lock in several places.
3. Moved from CALLOC to MALLOC where it made sense.
4. Aligned structures.
5. moved dictionary functions to use _sizen where possible.
(dict_get() -> dict_get_sizen() for example)
Compile-tested only!
Change-Id: Ia84699fb495e06d095339c91c1ba770d1393bb6c
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Remove the options to load old symbol.
* keep only 'xlator_api' symbol from being exported using xlator.sym
* add xlator_api to all the xlators where its missing
NOTE: This covers all the xlators which has at least a test case
to validate its loading. If there is a translator, which doesn't
have any test, then we should probably remove that from codebase.
fixes: #164
Change-Id: Ibcdc8c9844cda6b4463d907a15813745d14c1ebb
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In changelog xlator after destroying listener it call's
unlink to delete changelog socket file but socket file
reference is not cleaned up from process memory
Solution: 1) To cleanup reference completely from process memory
serialize transport cleanup for changelog and then
unlink socket file
2) Brick xlator will notify GF_EVENT_PARENT_DOWN to next
xlator only after cleanup all xprts
Test: To test the same run below steps
1) Setup some volume and enable brick mux
2) kill anyone brick with gf_attach
3) check changelog socket for specific to killed brick
in lsof, it should cleanup completely
fixes: bz#1600145
Change-Id: Iba06cbf77d8a87b34a60fce50f6d8c0d427fa491
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|