| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
I'm not sure why it was there and I did not see any use for it.
In the hope I did not miss anything, I removed it.
Change-Id: I02fa2e8e2a598b488fddbff4c7168dc4a41929b2
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With added check of volume-id during handshake, we can be sure to not
connect with a brick if this gets re-used in another volume. This
prevents any accidental issues which can happen with a stale client
process lurking along.
Also added test case for testing same volume name which would fetch a
different volfile (ie, different bricks, different type), and a
different volume name, but same brick.
For reference:
Currently a client<->server handshake happens in glusterfs through
protocol/client translator (setvolume) to protocol/server using a
dictionary which containes many keys. Rejection happens in server
side if some of the required keys are missing in handshake
dictionary.
Till now, there was no single unique identifier to validate for a
client to tell server if it is actually talking to a corresponding
server. All we look in protocol/client is a key called
'remote-subvolume', which should match with a subvolume name in server
volume file, and for any volume with same brick name (can be present
in same cluster due to recreate), it would be same. This could cause
major issue, when a client was connected to a given brick, in one
volume would be connected to another volume's brick if its
re-created/re-used.
To prevent this behavior, we are now passing along 'volume-id' in
handshake, which would be preserved for the life of client process,
which can prevent this accidental connections.
NOTE: This behavior wouldn't be applicable for user-snapshot enabled
volumes, as snapshotted volume's would have different volume-id.
Fixes: bz#1620580
Change-Id: Ie98286e94ce95ae09c2135fd6ec7d7c2ca1e8095
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
There could be cases (either due to insufficient memory or
corrupted mem-pool) due to which frame creation fails. Bail out
with error in such cases.
Change-Id: I8cc0a5852f6f04d2bac991e4eb79ecb42577da11
Fixes: bz#1748448
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
|
|
|
|
|
|
| |
fixes: #721
Change-Id: I5333540e3c635ccf441cf1f4696e4c8986e38ea8
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were unconditionally cleaning up the grap when we get
child_down followed by parent_down. But this is prone to
race condition when some of the bricks are already disconnected.
In this case, even before the last child down is executed in the
client xlator code,we might have freed the graph. Because the
child_down event is alreadt recevied.
To fix this race, we have introduced a check to see if all client
xlator have cleared thier reconnect chain, and called the child_down
for last time.
Change-Id: I7d02813bc366dac733a836e0cd7b14a6fac52042
fixes: bz#1727329
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On systems that don't support "timespec_get"(e.g., centos6), it
was using "clock_gettime" with "CLOCK_MONOTONIC" to get unix epoch
time which is incorrect. This patch introduces "timespec_now_realtime"
which uses "clock_gettime" with "CLOCK_REALTIME" which fixes
the issue.
Change-Id: I57be35ce442d7e05319e82112b687eb4f28d7612
Signed-off-by: Kotresh HR <khiremat@redhat.com>
fixes: bz#1743652
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To convert the existing `gf_msg` to `gf_smsg`:
- Define `_STR` of respective Message ID as below(In `*-messages.h`)
#define PC_MSG_REMOTE_OP_FAILED_STR "remote operation failed."
- Change `gf_msg` to use `gf_smsg`. Convert values into fields and
add any missing fields. Note: `errno` and `error` fields will be
added automatically to log message in case errnum is specified.
Example:
gf_smsg(
this->name, // Name or log domain
GF_LOG_WARNING, // Log Level
rsp.op_errno, // Error number
PC_MSG_REMOTE_OP_FAILED, // Message ID
"path=%s", local->loc.path, // Key Value 1
"gfid=%s", loc_gfid_utoa(&local->loc), // Key Value 2
NULL // Log End
);
Key value pairs formatting Help:
gf_slog(
this->name, // Name or log domain
GF_LOG_WARNING, // Log Level
rsp.op_errno, // Error number
PC_MSG_REMOTE_OP_FAILED, // Message ID
"op=CREATE", // Static Key and Value
"path=%s", local->loc.path, // Format for Value
"brick-%d-status=%s", brkidx, brkstatus, // Use format for key and val
NULL // Log End
);
Before:
[2019-07-03 08:16:18.226819] W [MSGID: 114031] [client-rpc-fops_v2.c \
:2633:client4_0_lookup_cbk] 0-gv3-client-0: remote operation failed. \
Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint \
is not connected]
After:
[2019-07-29 07:50:15.773765] W [MSGID: 114031] \
[client-rpc-fops_v2.c:2633:client4_0_lookup_cbk] 0-gv1-client-0: \
remote operation failed. [{path=/f1}, \
{gfid=00000000-0000-0000-0000-000000000000}, \
{errno=107}, {error=Transport endpoint is not connected}]
To add new `gf_smsg`, Add a Message ID in respective `*-messages.h` file
and the follow the steps mentioned above.
Change-Id: I4e7d37f27f106ab398e991d931ba2ac7841a44b1
Updates: #657
Signed-off-by: Aravinda VK <avishwan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: In brick_mux environment sometime brick is crashed while
volume stop/start in a loop.Brick is crashed in janitor task
at the time of accessing priv.If posix priv is cleaned up before
call janitor task then janitor task is crashed.
Solution: To avoid the crash in brick_mux environment introduce a new
flag janitor_task_stop in posix_private and before send CHILD_DOWN event
wait for update the flag by janitor_task_done
Change-Id: Id9fa5d183a463b2b682774ab5cb9868357d139a4
fixes: bz#1730409
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Id9f5f448db305f3135a1fdca61b1d7ec898c63a4
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Goal: 'libglusterfs' files shouldn't have any dependency outside of
the tree, specially the header files, shouldn't have '#include'
from outside the tree.
Fixes:
* Had to introduce libglusterd so, methods and structures required
for only mgmt/glusterd, and cli/ are separated from 'libglusterfs/'
* Remove rpc/xdr/gen from build, which was used mainly so
dependency for libglusterfs could be properly satisfied.
* Move rpcsvc_auth_data to client_t.h, so all dependencies could
be handled.
Updates: bz#1636297
Change-Id: I0e80243a5a3f4615e6fac6e1b947ad08a9363fce
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the glusterfs fuse client process is unable to
process the invalidate requests quickly enough, the
number of such requests quickly grows large enough
to use a significant amount of memory.
We are now introducing another option to set an upper
limit on these to prevent runaway memory usage.
Change-Id: Iddfff1ee2de1466223e6717f7abd4b28ed947788
Fixes: bz#1732717
Signed-off-by: N Balachandran <nbalacha@redhat.com>
|
|
|
|
|
|
| |
Fixes: bz#1644322
Change-Id: I53e8fa362cd8c7d04fb1c4abb606a9abb642c592
Signed-off-by: Csaba Henk <csaba@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I hit one crash issue when using the libgfapi.
In the libgfapi it will call glfs_poller() --> event_dispatch()
in file api/src/glfs.c:721, and the event_dispatch() is defined
by libgluster locally, the problem is the name of event_dispatch()
is the extremly the same with the one from libevent package form
the OS.
For example, if a executable program Foo, which will also use and
link the libevent and the libgfapi at the same time, I can hit the
crash, like:
kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp
00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000]
The link for Foo is:
lib_foo_LADD = -levent $(GFAPI_LIBS)
It will crash.
This is because the glfs_poller() is calling the event_dispatch() from
the libevent, not the libglsuter.
The gfapi link info :
GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid
If I link Foo like:
lib_foo_LADD = $(GFAPI_LIBS) -levent
It will works well without any problem.
And if Foo call one private lib, such as handler_glfs.so, and the
handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't
and it will dlopen(handler_glfs.so), then the crash will be hit everytime.
The link info will be:
foo_LADD = -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)
I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like:
foo_LADD = $(GFAPI_LIBS) -levent
libhandler_glfs_LIBADD = $(GFAPI_LIBS)
But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS.
And in some cases when the --as-needed link option is added(on many dists
it is added as default), then the crash is back again, the above workaround
won't work.
Fixes: #699
Change-Id: I38f0200b941bd1cff4bf3066fca2fc1f9a5263aa
Signed-off-by: Xiubo Li <xiubli@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
The files which were created before ctime enabled would not
have "trusted.glusterfs.mdata"(stores time attributes) xattr.
Upon fops which modifies either ctime or mtime, the xattr
gets created with latest ctime, mtime and atime, which is
incorrect. It should update only the corresponding time
attribute and rest from backend
Solution:
Creating xattr with values from brick is not possible as
each brick of replica set would have different times.
So create the xattr upon successful lookup if the xattr
is not created
Note To Reviewers:
The time attributes used to set xattr is got from successful
lookup. Instead of sending the whole iatt over the wire via
setxattr, a structure called mdata_iatt is sent. The mdata_iatt
contains only time attributes.
Change-Id: I5e535631ddef04195361ae0364336410a2895dd4
fixes: bz#1593542
Signed-off-by: Kotresh HR <khiremat@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As usleep has been obsoleted, changed all invocations of usleep
to nanosleep. From man 3 usleep:
"4.3BSD, POSIX.1-2001. POSIX.1-2001 declares this function
obsolete; use nanosleep(2) instead. POSIX.1-2008 removes the
specification of usleep()."
Added a helper function gf_nanosleep() to have a single place
for handling edge cases that might arise from the conversion of
usleep to nanosleep and allow the sleep to resume with right
remaining value upon being interrupted.
Fixes: bz#1721686
Change-Id: Ia39ab82c9e0f4669d2c00d4cdf25e38d94ef9f62
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
|
|
|
| |
As Hadoop is no longer supported, dropping code for
handling Hadoop access.
Fixes: bz#1728417
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Change-Id: I8fcf4faacb364f1c9a8abb0c48faec337087f845
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For a normal volume, we are updating the pid from a the
process while we do a daemonization or at the end of the
init if it is no-daemon mode. Along with updating the pid
we also lock the file, to make sure that the process is
running fine.
With brick mux, we were updating the pidfile from gluterd
after an attach/detach request.
There are two problems with this approach.
1) We are not holding a pidlock for any file other than parent
process.
2) There is a chance for possible race conditions with attach/detach.
For example, shd start and a volume stop could race. Let's say
we are starting an shd and it is attached to a volume.
While we trying to link the pid file to the running process,
this would have deleted by the thread that doing a volume stop.
Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a
Updates: bz#1722541
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
glusterfs-fops.h was moved to rpc/xdr to support compound fops.
(ref: https://review.gluster.org/14032, 2f945b86d3)
This was fine as long as all these header files were in single
include directory after 'install'. With the move to separate out
glusterfs specific header files into another directory inside
/usr/include (ref: https://review.gluster.org/21746, 20ef211cfa),
glusterfs-fops.h file was not in the proper path when an external
.c file tried to include any of glusterfs specific .h file (like
xlator.h).
Now, we have removed compound-fops, with that, none of the enums
declared in glusterfs-fops.h are actually getting used on wire
anymore. Hence, it makes sense to get this to libglusterfs/src
as a single point of definition. With this change, the external
programs can use glusterfs header files.
also remove some enum definitions which are not used in code
anymore.
Updates: bz#1636297
Change-Id: I423c44d3dbe2efc777299c544ece3cb172fc7e44
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two problems have been identified that caused that gluster's memory
usage were twice higher than required.
1. An off by 1 error caused that all objects allocated from the memory
pools were taken from a pool bigger than required. Since each pool
corresponds to a size equal to a power of two, this was wasting half
of the available memory.
2. The header information used for accounting on each memory object was
not taken into consideration when searching for a suitable memory
pool. It was added later when each individual block was allocated.
This made this space "invisible" to memory accounting.
Credits: Thanks to Nithya Balachandran for identifying this problem and
testing this patch.
Fixes: bz#1722802
Change-Id: I90e27ad795fe51ca11c13080f62207451f6c138c
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is a good chance that, the inode on which unref came has already been
zero refed and added to the purge list. This can happen when inode table is
being destroyed (glfs_fini is something which destroys the inode table).
Consider a directory 'a' which has a file 'b'. Now as part of inode table destruction
zero refing of inodes does not happen from leaf to the root. It happens in the order
inodes are present in the list. So, in this example, the dentry of 'b' would have its
parent set to the inode of 'a'. So if 'a' gets zero refed first (as part of
inode table cleanup) and then 'b' has to zero refed, then dentry_unset is called on
the dentry of 'b' and it further goes on to call inode_unref on b's parent which is
'a'. In this situation, GF_ASSERT would be called as the refcount of 'a' has been
already set to zero.
So, return the inode (in the function inode_unref without doing anything) if the
inode table cleanup has already started and inode's refcount is zero.
Change-Id: I91e0a807d5c9ce0daae5a611c38da379fd11076e
fixes: bz#1722546
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes a failure to compile glusterfs with glibc 2.25 (on Gentoo and other systems)
Making all in src
CC libglusterfs_la-dict.lo
In file included from iatt.h:16:0,
from common-utils.h:44,
from dict.c:20:
/usr/include/sys/sysmacros.h:57:45: error: attempt to use poisoned "system"
directly. If you did not intend to use a system-defined macro\n\
^
make[4]: *** [Makefile:959: libglusterfs_la-dict.lo] Error 1
Fixes: bz#1494654
Change-Id: I09b910b5772f52e853f87d81f3923eed9a90f7a1
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
| |
Also fixed some issues on test ec-1468261.t.
Change-Id: If156f86af986d9eed13cdd1f15c5a7214cd11706
Updates: bz#1193929
Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many include statements that are not needed.
A previous more ambitious attempt failed because of *BSD plafrom
(see https://review.gluster.org/#/c/glusterfs/+/21929/ )
Now trying a more conservative reduction.
It does not solve all circular deps that we have, but it
does reduce some of them. There is just too much to handle
reasonably (dht-common.h includes dht-lock.h which includes
dht-common.h ...), but it does reduce the overall number of lines
of include we need to look at in the future to understand and fix
the mess later one.
Change-Id: I550cd001bdefb8be0fe67632f783c0ef6bee3f9f
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch cleans some iovec code and creates two additional helper
functions to simplify management of iovec structures.
iov_range_copy(struct iovec *dst, uint32_t dst_count, uint32_t dst_offset,
struct iovec *src, uint32_t src_count, uint32_t src_offset,
uint32_t size);
This function copies up to 'size' bytes from 'src' at offset
'src_offset' to 'dst' at 'dst_offset'. It returns the number of
bytes copied.
iov_skip(struct iovec *iovec, uint32_t count, uint32_t size);
This function removes the initial 'size' bytes from 'iovec' and
returns the updated number of iovec vectors remaining.
The signature of iov_subset() has also been modified to make it safer
and easier to use. The new signature is:
iov_subset(struct iovec *src, int src_count, uint32_t start, uint32_t size,
struct iovec **dst, int32_t dst_count);
This function creates a new iovec array containing the subset of the
'src' vector starting at 'start' with size 'size'. The resulting
array is allocated if '*dst' is NULL, or copied to '*dst' if it fits
(based on 'dst_count'). It returns the number of iovec vectors used.
A new set of functions to iterate through an iovec array have been
created. They can be used to simplify the implementation of other
iovec-based helper functions.
Change-Id: Ia5fe57e388e23392a8d6cdab17670e337cadd587
Updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way delta_blocks is computed in shard is incorrect, when a file
is truncated to a lower size. The accounting only considers change
in size of the last of the truncated shards.
FIX:
Get the block-count of each shard just before an unlink at posix in
xdata. Their summation plus the change in size of last shard
(from an actual truncate) is used to compute delta_blocks which is
used in the xattrop for size update.
Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53
fixes: bz#1705884
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: While high no. of volumes are configured around 2000
glusterd has bottleneck during handshake at the time
of copying dictionary
Solution: To avoid the bottleneck serialize a dictionary instead
of copying key-value pair one by one
Change-Id: I9fb332f432e4f915bc3af8dcab38bed26bda2b9a
fixes: bz#1711297
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During a graph cleanup, we first sent a PARENT_DOWN and wait for
a child down to ultimately free the xlator and the graph.
In the ec xlator, we cleanup the threads when we get a PARENT_DOWN event.
But a racing event like CHILD_UP or event xl_op may trigger healing threads
after threads cleanup.
So there is a chance that the threads might access a freed private variabe
Change-Id: I252d10181bb67b95900c903d479de707a8489532
fixes: bz#1703948
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of saving each key-value separately, which is slow (
especially as we fflush() after each!), store them all as one
string and write all together.
Implements https://github.com/gluster/glusterfs/issues/629
Change-Id: Ie77a272446b0b6785584b710a4fdd9c613dd9578
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat,.com>
|
|
|
|
|
|
|
|
| |
* libglusterfs/graph-print: remove unused code
updates: bz#1693692
Change-Id: Iae81bb6a3af5911c3da07ab8f1d8f58f27e06905
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Minor code changes (less variables and if statements)
and use dict_get_strn(), since all options are fixed strings.
Similar changes could be done to GF_OPTION_INIT() as well.
Change-Id: I4a523f67183f4c4852a3d4de5e3cac52df68d3cf
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
| |
updates: bz#1693692
Change-Id: If69702990af273be1f38855ba56b3b89fabff167
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Icbe53e78e9c4f6699c7a26a806ef4b14b39f5019
updates: bz#1642168
Signed-off-by: Anuradha Talur <atalur@commvault.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Try to reduce the number of sprintf() and string copies until we finally
log a log line.
Specifically, do not sprintf separately the timestr string and
do not sprintf/strcpy the appmsgstr separately - just stick it with
the header.
Hoping I did not leak anything or changed the log line formatting.
Also, allocate 4K (GF_LOG_BACKTRACE_SIZE) of memory
dynamically for trace output -
only if trace was actually requested (previously, it was
unconditionally)
In addition, some minor code formatting (unrelated to the above).
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
Change-Id: Id2ccc85f9213a2b1c6eaa4a2f58ce043eac1824f
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some interdependencies between logging and memory management functions
make it impossible to use the logging framework before initializing
memory subsystem because they both depend on Thread Local Storage
allocated through pthread_key_create() during initialization.
This causes a crash when we try to log something very early in the
initialization phase.
To prevent this, several dynamically allocated TLS structures have
been replaced by static TLS reserved at compile time using '__thread'
keyword. This also reduces the number of error sources, making
initialization simpler.
Updates: bz#1193929
Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a translator stops, memory accounting for that translator is not
destroyed (because there could remain memory allocated that references
it), but mutexes that coordinate updates of memory accounting were
destroyed. This caused incorrect memory accounting and even crashes in
debug mode.
This patch also fixes some other things:
* Reduce the number of atomic operations needed to manage memory
accounting.
* Correctly account memory when realloc() is used.
* Merge two critical sections into one.
* Cleaned the code a bit.
Change-Id: Id5eaee7338729b9bc52c931815ca3ff1e5a7dcc8
Updates: bz#1659334
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level
per xlator during reconfigure only for a brick process not for
the client process.
Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure
about brick multiplex introudce a flag brick_mux at ctx->cmd_args.
Note: There are two other changes done with this patch
1) Ignore client-log-level option to attach a brick with
already running brick if brick_mux is enabled
2) Add a log to print pid of the running process to make easier
debugging
Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
Fixes: bz#1696046
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was written just before fill_void() call.
Note that there was a possible overflow if the hostname was too long
(unrelated to this patch), but it is now also fixed, as we use a smaller buffer
for the hostname. This, in turn, forces us to check if gethostname() failed
and add explicitly the terminating null to it.
Change-Id: I45fbc0a8e105f1247f3cbf61befac06fabbaea06
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
| |
memdup() and gf_memdup() have the same implementation. Removed one API
as the presence of both can be confusing.
Change-Id: I562130c668457e13e4288e592792872d2e49887e
updates: bz#1193929
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
|
|
|
|
|
|
| |
Change-Id: Iec5ce7f17fbf899f881a58cd20c4c967e3b71668
fixes: bz#1642168
Signed-off-by: Anuradha Talur <atalur@commvault.com>
|
|
|
|
|
|
|
|
|
|
|
| |
GF_LOG_OCCASSIONALLY doesn't log on the first instance rather at every
42nd iterations which isn't effective as in some cases we might not have
the code flow hitting the same log for as many as 42 times and we'd end
up suppressing the log.
Fixes: bz#1694925
Change-Id: Iee293281d25a652b64df111d59b13de4efce06fa
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Shd daemon is per node, which means they create a graph
with all volumes on it. While this is a great for utilizing
resources, it is so good in terms of performance and managebility.
Because self-heal daemons doesn't have capability to automatically
reconfigure their graphs. So each time when any configurations
changes happens to the volumes(replicate/disperse), we need to restart
shd to bring the changes into the graph.
Because of this all on going heal for all other volumes has to be
stopped in the middle, and need to restart all over again.
Solution:
This changes makes shd as a per volume daemon, so that the graph
will be generated for each volumes.
When we want to start/reconfigure shd for a volume, we first search
for an existing shd running on the node, if there is none, we will
start a new process. If already a daemon is running for shd, then
we will simply detach a graph for a volume and reatach the updated
graph for the volume. This won't touch any of the on going operations
for any other volumes on the shd daemon.
Example of an shd graph when it is per volume
graph
-----------------------
| debug-iostat |
-----------------------
/ | \
/ | \
--------- --------- ----------
| AFR-1 | | AFR-2 | | AFR-3 |
-------- --------- ----------
A running shd daemon with 3 volumes will be like-->
graph
-----------------------
| debug-iostat |
-----------------------
/ | \
/ | \
------------ ------------ ------------
| volume-1 | | volume-2 | | volume-3 |
------------ ------------ ------------
Change-Id: Idcb2698be3eeb95beaac47125565c93370afbd99
fixes: bz#1659708
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I3bbda719027b45e1289db2e6a718627141bcbdc8
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
as commit 073444 is backported to release-5.4 branch, op-version
for this change should 5.4 instead of 6.
fixes: bz#1685120
Change-Id: Id504b9a1446125cea7c6a32117ccc44f28e73aa7
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: commit 5a152a changed the mechansim of computing the
checksum. In heterogeneous cluster, peers are running into
rejected state because we have different cksum computation
mechansims in upgraded and non-upgraded nodes.
Solution: add a check for op-version so that all the nodes
in the cluster follow the same mechanism for computing the
cksum.
Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193
fixes: bz#1685120
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements a thread pool that is wait-free for adding jobs to
the queue and uses a very small locked region to get jobs. This makes it
possible to decrease contention drastically. It's based on wfcqueue
structure provided by urcu library.
It automatically enables more threads when load demands it, and stops
them when not needed. There's a maximum number of threads that can be
used. This value can be configured.
Depending on the workload, the maximum number of threads plays an
important role. So it needs to be configured for optimal performance.
Currently the thread pool doesn't self adjust the maximum for the
workload, so this configuration needs to be changed manually.
For this reason, the global thread pool has been made optional, so that
volumes can still use the thread pool provided by io-threads.
To enable it for bricks, the following option needs to be set:
config.global-threading = on
This option has no effect if bricks are already running. A restart is
required to activate it. It's recommended to also enable the following
option when running bricks with the global thread pool:
performance.iot-pass-through = on
To enable it for a FUSE mount point, the option '--global-threading'
must be added to the mount command. To change it, an umount and remount
is needed. It's recommended to disable the following option when using
global threading on a mount point:
performance.client-io-threads = off
To enable it for services managed by glusterd, glusterd needs to be
started with option '--global-threading'. In this case all daemons, like
self-heal, will be using the global thread pool.
Currently it can only be enabled for bricks, FUSE mounts and glusterd
services.
The maximum number of threads for clients and bricks can be configured
using the following options:
config.client-threads
config.brick-threads
These options can be applied online and its effect is immediate most of
the times. If one of them is set to 0, the maximum number of threads
will be calcutated as #cores * 2.
Some distributions use a very old userspace-rcu library (version 0.7)
for this reason, some header files from version 0.10 have been copied
into contrib/userspace-rcu and are used if the detected version is 0.7
or older.
An additional change has been made to io-threads to prevent that threads
are started when iot-pass-through is set.
Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b
updates: #532
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem:
Returning any value from socket event handlers to the event sub-system
doesn't make sense since event sub-system cannot handle socket
sub-system errors.
Solution:
Change return type of all socket event handlers to 'void'
Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
Fixes: bz#1651246
Signed-off-by: Milind Changire <mchangir@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do all the 'static' tasks outside of locked region.
* hash_dentry() and hash_gfid() are now called outside locked region.
* remove extra __dentry_hash exported in libglusterfs.sym
* avoid checks in locked functions, if the check is done in calling
function.
* implement dentry_destroy(), which handles freeing of dentry separately,
from that of dentry_unset (which takes care of separating dentry from
inode, and table)
Updates: bz#1670031
Change-Id: I584213e0748464bb427fbdef3c4ab6615d7d5eb0
Signed-off-by: Amar Tumballi <amarts@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
The setxattr function receives a pointer to raw data, which may not be
null-terminated. When this data needs to be interpreted as a string, an
explicit null termination needs to be added before using the value.
Change-Id: Id110f9b215b22786da5782adec9449ce38d0d563
updates: bz#1193929
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
|
|
|
|
|
|
| |
Change-Id: I0f3978d7e603e6e767dc7aa2a23ef35b1f2b43f7
Updates: bz#1193929
Signed-off-by: ShyamsundarR <srangana@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Auto invalidation is necessary when same (meta)data is shared/access
across multiple mounts. However, if (meta)data is not shared, all
relevant I/O goes through the cache of single mount and hence is
coherent with (meta)data on bricks always. So, fuse-auto-invalidation
can be disabled for this case which gives a huge performance boost for
workloads that write data and then immediately read the data they just
wrote.
From glusterfs --help,
<snip>
--auto-invalidation[=BOOL] controls whether fuse-kernel can
auto-invalidate attribute, dentry and page-cache.
Disable this only if same files/directories are
not accessed across two different mounts
concurrently [default: "on"]
</snip>
Details on how disabling auto-invalidation helped to reduce pgbench
init times can be found at [1]. Time taken for pgbench init of scale
8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s)
with auto-invalidations turned off along with other
optimizations. Just disabling auto-invalidation contributed 56%
improvement by reducing the total time taken by 33260s.
[1] https://www.spinics.net/lists/gluster-devel/msg25907.html
Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
updates: bz#1664934
|