summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
...
* cluster/ec: skip updating ctx->loc again when ec_fix_open/opendirKinglong Mee2019-07-172-10/+14
| | | | | | | | | | | | | The ec_manager_open/opendir memsets ctx->loc which causes memory/inode leak, and ec_fheal uses ctx->loc out of fd->lock that loc_copy may copy bad data when memset it. This patch skips updating ctx->loc when it is initilizaed. With it, ctx->loc is filled once, and never updated. Change-Id: I3bf5ffce4caf4c1c667f7acaa14b451d37a3550a fixes: bz#1729772 Signed-off-by: Kinglong Mee <mijinlong@horiscale.com>
* cluster/ec: inherit healing from lock when it has infoKinglong Mee2019-07-161-2/+3
| | | | | | | | | If lock has info, fop should inherit healing mask from it. Otherwise, fop cannot inherit right healing when changed_flags is zero. Change-Id: Ife80c9169d2c555024347a20300b0583f7e8a87f fixes: bz#1727081 Signed-off-by: Kinglong Mee <mijinlong@horiscale.com>
* system/posix-acl: update ctx only if iatt is non-NULLHomma2019-07-161-0/+8
| | | | | | | | | We need to safe-guard against possible zero'ing out of iatt structure in acl ctx, which can cause many issues. fixes: bz#1668286 Change-Id: Ie81a57d7453a6624078de3be8c0845bf4d432773 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* dht-common.h: reorder variables to reduce padding.Yaniv Kaul2019-07-151-73/+81
| | | | | | | | | Manually added '-Wpadded' to get warnings on padding, and reordered structs to reduce most of them. Change-Id: I0c505fcb3dfef76399ac9d5d33bfb235354532de updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* Detach iot_worker to release its resourcesLiguang Li2019-07-151-0/+1
| | | | | | | | | | | | When iot_worker terminates, its resources have not been reaped, which will consumes lots of memory. Detach iot_worker to automically release its resources back to the system. fixes: bz#1729107 Change-Id: I71fabb2940e76ad54dc56b4c41aeeead2644b8bb Signed-off-by: Liguang Li <liguang.lee6@gmail.com>
* glusterd: do not mark skip_locking as true for geo-rep operationsSanju Rakonde2019-07-141-2/+7
| | | | | | | | | | | | | | | | | We need to send the commit req to peers in case of geo-rep operations even though it is a no volname operation. In commit phase peers try to set the txn_opinfo which will fail because it is a no volname operation where we don't require a commit phase. We mark skip_locking as true for no volname operations, but we have to give an exception to geo-rep operations, so that they can set txn_opinfo in commit phase. Please refer to detailed RCA at the bug: 1729463 fixes: bz#1729463 Change-Id: I9f2478b12a281f6e052035c0563c40543493a3fc Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* Fix spelling errorsAravinda VK2019-07-144-4/+4
| | | | | | | Fixes: bz#1728554 Change-Id: I88357aed7c14988a12616035c3738c32c09a8f9a Signed-off-by: Patrick Matthäi <pmatthaei@debian.org> Signed-off-by: Aravinda VK <avishwan@redhat.com>
* ibverbs/rdma: remove from buildAmar Tumballi2019-07-131-10/+0
| | | | | | | | | | | | | | | We have proposed about this an year ago, and with recent smoke failures, it looks like the right time to take such call. ref: https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html With this, glusterfs-8.0 wouldn't have rdma feature, and would allow some modularity changes possible with rpc layer (as we would have just 1 transport) Updates: bz#1635688 Change-Id: Ia277dca4d4b1f0cffae20819024a52b075b775e5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* features/snapview-server: obtain the list of snapshots inside the lockRaghavendra Bhat2019-07-121-1/+1
| | | | | | | | The current list of snapshots from priv->dirents is obtained outside the lock. Change-Id: I8876ec0a38308da5db058397382fbc82cc7ac177 Fixes: bz#1726783
* cluster/ta: Notify the clients only if there are pending healskarthik-us2019-07-124-22/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In case of thin arbiter, before index healer starts crawling the indices at every heal-timeout interval, even if there is nothing to be healed it will send an upcall notification to all the clients to release any AFR_TA_DOM_NOTIFY locks that they hold. SHD will wait for the upcall to return before proceeding with the heal even though there is nothing to be healed. This will also invalidates the cached information about the bricks states on the clients which leads to extra calls on TA from clients for the next reads & writes if needed. This will impact the IO performance. Fix: - Before sending the upcall to the clients, check for any pending heals on TA without taking any locks. - If there is nothing marked bad on TA, then continue with the index crawl to heal any dirty markings present on the files due to any post-op failure. - If there is a brick marked as bad on TA, then take the AFR_TA_DOM_NOTIFY lock on TA from SHD, get the state on TA and continue with the current healing process. Change-Id: Ieb477bc6cb18bbdfd4e7a0453c5ed79b574ec9d6 fixes: bz#1724184 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* cluster/afr: Fix incorrect reporting of gfid & type mismatchkarthik-us2019-07-122-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problems: 1. When checking for type and gfid mismatch, if the type or gfid is unknown because of missing gfid handle and the gfid xattr it will be reported as type or gfid mismatch and the heal will not complete. 2. If the source selected during entry heal has null gfid the same will be sent to afr_lookup_and_heal_gfid(). In this function when we try to assign the gfid on the bricks where it does not exist, we are considering the same gfid and try to assign that on those bricks. This will fail in posix_gfid_set() since the gfid sent is null. Fix: If the gfid sent to afr_lookup_and_heal_gfid() is null choose a valid gfid before proceeding to assign the gfid on the bricks where it is missing. In afr_selfheal_detect_gfid_and_type_mismatch(), do not report type/gfid mismatch if the type/gfid is unknown or not set. Change-Id: Ia06552e4dc4a9f89cb7f5302833604bd21bbf7da fixes: bz#1722507 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* Replace usleep() with nanosleep()Vijay Bellur2019-07-113-4/+4
| | | | | | | | | | | | | | | | | | | As usleep has been obsoleted, changed all invocations of usleep to nanosleep. From man 3 usleep: "4.3BSD, POSIX.1-2001. POSIX.1-2001 declares this function obsolete; use nanosleep(2) instead. POSIX.1-2008 removes the specification of usleep()." Added a helper function gf_nanosleep() to have a single place for handling edge cases that might arise from the conversion of usleep to nanosleep and allow the sleep to resume with right remaining value upon being interrupted. Fixes: bz#1721686 Change-Id: Ia39ab82c9e0f4669d2c00d4cdf25e38d94ef9f62 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* Remove hadoop related code from the codebaseVijay Bellur2019-07-095-84/+15
| | | | | | | | | As Hadoop is no longer supported, dropping code for handling Hadoop access. Fixes: bz#1728417 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I8fcf4faacb364f1c9a8abb0c48faec337087f845
* glusterd/svc: update pid of mux volumes from the shd processMohammed Rafi KC2019-07-099-35/+114
| | | | | | | | | | | | | | | | | | | | | | | | For a normal volume, we are updating the pid from a the process while we do a daemonization or at the end of the init if it is no-daemon mode. Along with updating the pid we also lock the file, to make sure that the process is running fine. With brick mux, we were updating the pidfile from gluterd after an attach/detach request. There are two problems with this approach. 1) We are not holding a pidlock for any file other than parent process. 2) There is a chance for possible race conditions with attach/detach. For example, shd start and a volume stop could race. Let's say we are starting an shd and it is attached to a volume. While we trying to link the pid file to the running process, this would have deleted by the thread that doing a volume stop. Change-Id: I29a00352102877ce09ea3f376ca52affceb5cf1a Updates: bz#1722541 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* posix: fix Wformat-overflow warningSheetal Pamecha2019-07-091-2/+2
| | | | | | | | warning: ‘%s’ directive argument is null Change-Id: I2ce9560f98a8310886c31384e40c2e101ad2c719 updates: bz#1193929 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* glusterd: fix packed member address warningSheetal Pamecha2019-07-091-2/+3
| | | | | | | | | | warning: taking address of packed member of ‘struct _quota_limits’ may result in an unaligned pointer value. Change-Id: Ib889c99184560f0d24dbcc4fae10b563a2b9b7fd updates: bz#1193929 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* quick-read: rename cache-invalidation key to avoid redundant keysAtin Mukherjee2019-07-082-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | With group-metadata-cache group profile settings performance.cache-invalidation option when turned on enables both md-cache and quick-read xlator's cache-invalidation feature. While the intent of the group-metadata-cache is to set md-cache xlator's cache-invalidation feature, quick-read xlator also gets affected due to the same. While md-cache feature and it's profile existed since release-3.9, quick-read cache-invalidation was introduced in release-4 and due to this op-version mismatch on any cluster which is >= glusterfs-4 when this group profile is applied it breaks backward compatibility with the old clients. The proposed fix here is to rename the key in quick-read to 'quick-read-cache-invalidation' so that both these features have distinct identification. While this brings in by itself a backward compatibility challenge where this feature is enabled in an existing cluster and when the same is upgraded to a version where this change exists, it will lead to an unidentified old key. But as a workaround we can always ask users upgrading to release-7 version to turn off this option, upgrade the cluster and turn it back on with the new key. This needs to be documented once the patch is accepted. Fixes: bz#1698042 Change-Id: I30422ba6496208e21191a8d78ad29b2e21078664 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* gnfs: use strcpy to prevent memory overflowXie Changlong2019-07-081-1/+1
| | | | | | fixes: bz#1727248 Change-Id: Iea289032a8feecf2945668d3fb44a6a53089fdea Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
* glusterd.h: align structsYaniv Kaul2019-07-081-64/+62
| | | | | | | | Move some variables around to reduce compiler padding. Change-Id: I9cd53ce257fb169270c295ac60d58d4a6a950d4f updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* glusterd: Show the correct brick status in get-stateMohit Agrawal2019-07-043-2/+37
| | | | | | | | | | | | | Problem: get-state does not show correct brick status if brick status is not Started, it always shows started if any value is set brickinfo->status Solution: Check the value of brickinfo->status to show correct status in get-state Change-Id: I12a79619024c2cf59f338220d144f2f034059b3b fixes: bz#1726906 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd: don't log a warning message for tier-enabled keySanju Rakonde2019-07-041-2/+5
| | | | | | | | | | | | | We are logging a warning message saying unknown-key for tier-enabled kay. although the tier xlator is deprecated, this key is left behind for handling the peer rejection issues in a heterogeneous cluster. We need not to log if this key is not found/recognised. updates: bz#1193929 Change-Id: Ia68661898a618f99a240ca8d8a124ff6a65ebe9d Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* features/snapview-server: use the same volfile server for gfapi optionsRaghavendra Bhat2019-07-032-4/+42
| | | | | | | | | | | snapview server xlator makes use of "localhost" as the volfile server while initing the new glfs instance to talk to a snapshot. While localhost is fine, better use the same volfile server that was used to start the snapshot daemon containing the snapview-server xlator. Change-Id: I4485d39b0e3d066f481adc6958ace53ea33237f7 fixes: bz#1725211 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/dht: Fixed a memleak in dht_rename_cbkN Balachandran2019-07-021-11/+33
| | | | | | | | | Fixed a memleak in dht_rename_cbk when creating a linkto file. Change-Id: I705adef3cb79e33806520fc2b15558e90e2c211c fixes: bz#1722698 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* glusterfs-fops: fix the modularityAmar Tumballi2019-07-021-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs-fops.h was moved to rpc/xdr to support compound fops. (ref: https://review.gluster.org/14032, 2f945b86d3) This was fine as long as all these header files were in single include directory after 'install'. With the move to separate out glusterfs specific header files into another directory inside /usr/include (ref: https://review.gluster.org/21746, 20ef211cfa), glusterfs-fops.h file was not in the proper path when an external .c file tried to include any of glusterfs specific .h file (like xlator.h). Now, we have removed compound-fops, with that, none of the enums declared in glusterfs-fops.h are actually getting used on wire anymore. Hence, it makes sense to get this to libglusterfs/src as a single point of definition. With this change, the external programs can use glusterfs header files. also remove some enum definitions which are not used in code anymore. Updates: bz#1636297 Change-Id: I423c44d3dbe2efc777299c544ece3cb172fc7e44 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core: use multiple servers while mounting a volume using ipv6Sanju Rakonde2019-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | According to man page: mount -t glusterfs [-o <options>] <server1>,<server2>, <server3>,..<serverN>:/<volname>[/<subdir>] <mount_point> When we try mount -t glusterfs 52:54:00:23:5a:b6,52:54:00:a0:be:ed:/ta-vol1 /mnt/ta we see the below msg in log: [2019-06-17 11:41:36.085809] I [MSGID: 100030] [glusterfsd.c:2867:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 7dev (args: /usr/local/sbin/glusterfs --process-name fuse --volfile-server=52:54:00:23:5a --volfile-id=/ta-vol1 /mnt/ta) With this change, we'll able to give multiple volfile servers while mounting the volume using ipv6. After the change, I see the following in log: [2019-06-17 12:00:21.183658] I [MSGID: 100030] [glusterfsd.c:2867:main] 0-/usr/local/sbin/glusterfs: Started running /usr/local/sbin/glusterfs version 7dev (args: /usr/local/sbin/glusterfs --process-name fuse --volfile-server=52:54:00:23:5a:b6 --volfile-server=52:54:00:a0:be:ed --volfile-id=/ta-vol1 /mnt/ta) fixes: bz#1719290 credits: Aga <aga_1990@hotmail.com> Change-Id: Icf89bea3ba15d8374ef428aeb59f2ef55ad544ec Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* glusterd: fix clang scan defectsAtin Mukherjee2019-07-012-3/+4
| | | | | | | | | | Fixes following: https://build.gluster.org/job/clang-scan/744/clangScanBuildBugs/browse/report-dd8d31.html#EndPath https://build.gluster.org/job/clang-scan/744/clangScanBuildBugs/browse/report-89a1cd.html#EndPath Updates: bz#1622665 Change-Id: Ibd201fb2ca54ae7ae3fed8a8d87815358b614349 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd/thin-arbiter: Thin-arbiter integration with GD1Vishal Pandey2019-06-289-23/+730
| | | | | | | | | | | | | | | | | | | | | | | | gluster volume create <VOLNAME> replica 2 thin-arbiter 1 <host1>:<brick1> <host2>:<brick2> <thin-arbiter-host>:<path-to-store-replica-id-file> [force] The changes have been made in a way that the last brick in the bricks list will be treated as the thin-arbiter. GD1 will be manipulated to consider replica count to be as 2 and continue creating the volume like any other replica 2 volume but since thin-arbiter volumes need ta-brick client xlator entries for each subvolume in fuse volfile, volfile generation is modified in a way to inject these entries seperately in the volfile for every subvolume. Few more additions - 1- Save the volinfo with new fields ta_bricks list and thin_arbiter_count. 2- Introduce a new option client.ta-brick-port to add remote-port to ta-brick xlator entry in fuse volfiles. The option can be set using the following CLI syntax - gluster volume set <VOLNAME> client.ta-brick-port <PORTNO.> 3- Volume Info will contain a Thin-Arbiter-path entry to distinguish from other replicate volumes. Change-Id: Ib434e2313b29716f32476c6c211d282c4ef39406 Updates #687 Signed-off-by: Vishal Pandey <vpandey@redhat.com>
* graph/shd: Use top down approach while cleaning xlatorMohammed Rafi KC2019-06-279-1/+12
| | | | | | | | | | | | | | We were cleaning xlator from botton to top, which might lead to problems when upper xlators trying to access the xlator object loaded below. One such scenario is when fd_unref happens as part of the fini call which might lead to calling the releasedir to lower xlator. This will lead to invalid mem access Change-Id: I8a6cb619256fab0b0c01a2d564fc88287c4415a0 Updates: bz#1716695 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brickRaghavendra G2019-06-271-4/+12
| | | | | | | | | | | | | | | | | | | | | | | Two reasons: * ping responses from glusterd may not be relevant for Halo replication. Instead, it might be interested in only knowing whether the brick itself is responsive. * When a brick is killed, propagating GF_EVENT_CHILD_PING of ping response from glusterd results in GF_EVENT_DISCONNECT spuriously propagated to parent xlators. These DISCONNECT events are from the connections client establishes with glusterd as part of its reconnect logic. Without GF_EVENT_CHILD_PING, the last event propagated to parent xlators would be the first DISCONNECT event from brick and hence subsequent DISCONNECTS to glusterd are not propagated as protocol/client prevents same event being propagated to parent xlators consecutively. propagating GF_EVENT_CHILD_PING for ping responses from glusterd would change the last_sent_event to GF_EVENT_CHILD_PING and hence protocol/client cannot prevent subsequent DISCONNECT events Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1716979 Change-Id: I50276680c52f05ca9e12149a3094923622d6eaef
* posix : add posix_set_ctime() in posix_ftruncate()Jiffin Tony Thottan2019-06-271-0/+2
| | | | | | Change-Id: I0cb5320fea71306e0283509ae47024f23874b53b fixes: bz#1723761 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* graph/shd: Use glusterfs_graph_deactivate to free the xl recMohammed Rafi KC2019-06-271-0/+3
| | | | | | | | | | | | We were using glusterfs_graph_fini to free the xl rec from glusterfs_process_volfp as well as glusterfs_graph_cleanup. Instead we can use glusterfs_graph_deactivate, which is does fini as well as other common rec free. Change-Id: Ie4a5f2771e5254aa5ed9f00c3672a6d2cc8e4bc1 Updates: bz#1716695 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* posix: modify storage.reserve option to take size and percentSheetal Pamecha2019-06-265-72/+39
| | | | | | | | | | | * reverting changes made in https://review.gluster.org/#/c/glusterfs/+/21686/ * Now storage.reserve can take value in percent or bytes fixes: bz#1651445 Change-Id: Id4826210ec27991c55b17d1fecd90356bff3e036 Signed-off-by: Sheetal Pamecha <spamecha@redhat.com>
* [RFC] change get_real_filename implementation to use ENOATTR instead of ENOENTMichael Adam2019-06-262-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_real_filename is implemented as a virtual extended attribute to help Samba implement the case-insensitive but case preserving SMB protocol more efficiently. It is implemented as a getxattr call on the parent directory with the virtual key of "get_real_filename:<entryname>" by looking for a spelling with different case for the provided file/dir name (<entryname>) and returning this correct spelling as a result if the entry is found. Originally (05aaec645a6262d431486eb5ac7cd702646cfcfb), the implementation used the ENOENT errno to return the authoritative answer that <entryname> does not exist in any case folding. Now this implementation is actually a violation or misuse of the defined API for the getxattr call which returns ENOENT for the case that the dir that the call is made against does not exist and ENOATTR (or the synonym ENODATA) for the case that the xattr does not exist. This was not a problem until the gluster fuse-bridge was changed to do map ENOENT to ESTALE in 59629f1da9dca670d5dcc6425f7f89b3e96b46bf, after which we the getxattr call for get_real_filename returned an ESTALE instead of ENOENT breaking the expectation in Samba. It is an independent problem that ESTALE should not leak out to user space but is intended to trigger retries between fuse and gluster. But nevertheless, the semantics seem to be incorrect here and should be changed. This patch changes the implementation of the get_real_filename virtual xattr to correctly return ENOATTR instead of ENOENT if the file/directory being looked up is not found. The Samba glusterfs_fuse vfs module which takes advantage of the get_real_filename over a fuse mount will receive a corresponding change to map ENOATTR to ENOENT. Without this change, it will still work correctly, but the performance optimization for nonexisting files is lost. On the other hand side, this change removes the distinction between the old not-implemented case and the implemented case. So Samba changed to treat ENOATTR like ENOENT will not work correctly any more against old servers that don't implement get_real_filename. I.e. existing files will be reported as non-existing Change-Id: I971b427ab8410636d5d201157d9af70e0d075b67 fixes: bz#1722977 Signed-off-by: Michael Adam <obnox@samba.org>
* locks: enable notify-contention by defaultXavi Hernandez2019-06-261-1/+1
| | | | | | | | This patch enables the lock contention notification by default. Change-Id: I10131b026a7cb09fc7c93e1e6c8549988c1d7751 Fixes: bz#1717754 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* glusterd: fix use-after-free of a dict_tXavi Hernandez2019-06-261-1/+1
| | | | | | | | | | | | | A dict was passed to a function that calls dict_unref() without taking any additional reference. Given that the same dict is also used after the function returns, this was causing a use-after-free situation. To fix the issue, we simply take an additional reference before calling the function. Fixes: bz#1723890 Change-Id: I98c6b76b08fe3fa6224edf281a26e9ba1ffe3017 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* glusterd: conditionally clear txn_opinfo in stage opAtin Mukherjee2019-06-251-2/+10
| | | | | | | | | ...otherwise this leads to a crash when volume status is run on a heterogeneous mode. Fixes: bz#1723658 Change-Id: I0d39f412b2e5e9d3ef0a3462b90b38bb5364b09d Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: Can't run rebalance due to long unix socketMohit Agrawal2019-06-251-51/+9
| | | | | | | | | | | | | Problem: glusterd populate unix socket file name based on volname and if volname is lengthy socket system call's are failed due to breach maximum length is defined in the kernel. Solution:Convert unix socket name to hash to resolve the issue Change-Id: I5072e8184013095587537dbfa4767286307fff65 fixes: bz#1720566 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* glusterd: ignore user.* options from compatibility check in brick muxAtin Mukherjee2019-06-251-0/+3
| | | | | | | | | user.* options are just custom and they don't contribute anything in terms of determining the volume compatibility in brick multiplexing Fixes: bz#1723402 Change-Id: Ic7e0181ab72993d29cab345cde64ae1340bf4faf Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* lcov: add more tests to glfsxmp-coverageAmar Tumballi2019-06-251-9/+0
| | | | | | | | | * found a bug with quiesce fallocate() - fixed. * found a bug with cloudsync part of code in posix - fixed updates: bz#1693692 Change-Id: I4f315ffebb612de072ae08761b8cd0f47714080a Signed-off-by: Amar Tumballi <amarts@redhat.com>
* glusterd.h: remove unneeded macros or move them to their users.Yaniv Kaul2019-06-257-204/+162
| | | | | | | | | | Some macros were not used, so removed. Some macros were quite local, so moved to the respective users. Some macros simplified (removed an allocation here and there) Change-Id: Ifaf1aff15a78f105b1549ab8053378933b35df43 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* glusterd/shd: Change shd logfile to a unique nameMohammed Rafi KC2019-06-246-33/+39
| | | | | | | | | | | | | | | | With the shd mux changes, shd was havinga a logfile with volname of the first started volume. This was creating a lot confusion, as other volumes data is also logging to a logfile which has a different vol name. With this changes the logfile will be changed to a unique name ie "/var/log/glusterfs/glustershd.log". This was the same logfile name before the shd mux Change-Id: I2b94c1f0b2cf3c9493505dddf873687755a46dda fixes: bz#1721601 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* shd/mux: Fix race between mux_proc unlink and stopMohammed Rafi KC2019-06-241-0/+3
| | | | | | | | | | | | | | There is a small race window, where we have a shd proc without having a connection. That is when we stopped the last shd running on a process. The list was removed outside of a lock just after stopping the process. So there is a window where we stopped the process, but the shd proc list contains the entry. Change-Id: Id82a82509e5cd72acac24e8b7b87197626525441 fixes: bz#1722541 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* cluster/ec: Prevent double pre-op xattropsPranith Kumar K2019-06-221-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Race: Thread-1 Thread-2 1) Does ec_get_size_version() to perform pre-op fxattrop as part of write-1 2) Calls ec_set_dirty_flag() in ec_get_size_version() for write-2. This sets dirty[] to 1 3) Completes executing ec_prepare_update_cbk leading to ctx->dirty[] = '1' 4) Takes LOCK(inode->lock) to check if there are any flags and sets dirty-flag because lock->waiting_flag is 0 now. This leads to fxattrop to increment on-disk dirty[] to '2' At the end of the writes the file will be marked for heal even when it doesn't need heal. Fix: Perform ec_set_dirty_flag() and other checks inside LOCK() to prevent dirty[] to be marked as '1' in step 2) above Updates bz#1593224 Change-Id: Icac2ab39c0b1e7e154387800fbededc561612865 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* posix/ctime: Fix ctime upgrade issueKotresh HR2019-06-211-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: On a EC volume, during upgrade from the older version where ctime feature is not enabled(or not present) to the newer version where the ctime feature is available (enabled default), the self heal hangs and doesn't complete. Cause: The ctime feature has both client side code (utime) and server side code (posix). The feature is driven from client. Only if the client side sets the time in the frame, should the server side sets the time attributes in xattr. But posix setattr/fseattr was not doing that. When one of the server nodes is updated, since ctime is enabled by default, it starts setting xattr on setattr/fseattr on the updated node/brick. On a EC volume the first two updated nodes(bricks) are not a problem because there are 4 other bricks with consistent data. However once the third brick is updated, the new attribute(mdata xattr) will cause an inconsistency on metadata on 3 bricks, which prevents the file to be repaired. Fix: Don't create mdata xattr with utimes/utimensat system call. Only update if already present. Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c fixes: bz#1720201 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* WORM-Xlator: Avoid performing fsetxattr if fd is NULLDavid Spisla2019-06-211-0/+7
| | | | | | | | | | | If worm_create_cbk receives an error (op_ret == -1) fd will be NULL and therefore performing fsetxattr would lead to a segfault and the brick process crashes. To avoid this we allow setting fsetxattr only if op_ret >= 0 . If an error happens we explicitly unwind Change-Id: Ie7f8a198add93e5cd908eb7029cffc834c3b58a6 fixes: bz#1717757 Signed-off-by: David Spisla <david.spisla@iternity.com>
* ec-heal: check file's gfid when deleting stale nameKinglong Mee2019-06-201-1/+11
| | | | | | | | | | | | A name-less lookup does not contain parent's stat, It is hard to check the lookuped file is at the right path. This patch changes to a name lookup, and check file's gfid with expected gfid. If the gfid is different, mark it estale. fixes: bz#1702131 Change-Id: I2de20b10d680eed1e2fb1d3830b3b3dec4520dbf Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
* afr/read: Implement latency based read child selectionMohammed Rafi KC2019-06-203-27/+98
| | | | | | | | | | | | | | | | | Network latency is an important factor selecting a read subvolume. So this patch is adding two new policy. 1) We measure the latency of a child during a GF_DUMP rpc call. Then use this latency to pick a read subvol having the least latency. 2) Second one is an hybrid mode where it calculates the effective latency by multiplying outstanding pending read request and latency, and choose the least one. Change-Id: Ia49c8a08ab61f7dcdad8b8950aa4d338e7accf97 fixes: #520 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* posix: fix crash in posix_cs_set_stateSusant Palai2019-06-202-3/+9
| | | | | | Fixes: bz#1721474 Change-Id: Ic2a53fa3d1e9e23424c6898e0986f80d52c5e3f6 Signed-off-by: Susant Palai <spalai@redhat.com>
* encryption/crypt: remove from volume fileAmar Tumballi2019-06-202-34/+0
| | | | | | | | | | | | | | The feature is not supported and is moved out of the codebase from glusterfs-5.x release. Doesn't make sense to keep the code to support it. For those who want to upgrade from an version supporting it to higher version, please do a 'gluster volume reset $VOL encryption reset' and then continue with the upgrade process. updates: bz#1648169 Change-Id: I8cf822c0d7195940bd37f6af2432a3cac68d44d1 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* md-cache: only update generation for inode at upcall and NULL statKinglong Mee2019-06-191-20/+36
| | | | | | | | | | | | | | | 1. For parallel writes from nfs-ganesha, two fops with two generations, but the fops reply maybe returned disordered. 2. The inode md-cache timeout should not increase conf->generation. With this patch, 1, Fop only gets generation from inode md-cache or conf, does not increase it. 2. The generation is increased at upcall invalidate, estal/enoent error invalidate, reply with zeroed out stat from write-behind. Change-Id: I897ecaa143fd18bc024c1948c7d1a6f831fd53da Updates: bz#1683594 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>