summaryrefslogtreecommitdiffstats
path: root/tests
Commit message (Collapse)AuthorAgeFilesLines
* socket/ssl: fix crl handlingMilind Changire2019-03-191-2/+11
| | | | | | | | | | | | | | | | Problem: Just setting the path to the CRL directory in socket_init() wasn't working. Solution: Need to use special API to retrieve and set X509_VERIFY_PARAM and set the CRL checking flags explicitly. Also, setting the CRL checking flags is a big pain, since the connection is declared as failed if any CRL isn't found in the designated file or directory. A comment has been added to the code appropriately. Change-Id: I8a8ed2ddaf4b5eb974387d2f7b1a85c1ca39fe79 fixes: bz#1687326 Signed-off-by: Milind Changire <mchangir@redhat.com>
* tests/bug-844688.t: test bug-844688.t is failing on masterMohammed Rafi KC2019-03-131-11/+32
| | | | | | | | | | | | | | Test case bug-844688.t is failing quite frequently on master. This test check for the existence of call_stack, frame creation time. But there is a chance that at a point in time, the stack count might become zero. So doing the check in EXPECT_WITHIN make more sense. Change-Id: Id2ede7f6fdcb5f016f52c5c0557ce6ac510d4e96 updates: bz#1688116 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* test: Fix a missing a '$' symbolMohammed Rafi KC2019-03-131-1/+1
| | | | | | | | | | While checking a test case using EXPECT_WITHIN, the argument is actually missing a '$' symbol to denote the token as a variable in bash Change-Id: I5b9150acdea000b29e94cfb01d975c77f5ece3e5 fixes: bz#1688116 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* cluster/afr: Send truncate on arbiter brick from SHDkarthik-us2019-03-112-1/+39
| | | | | | | | | | | | | | | | | | | Problem: In an arbiter volume configuration SHD will not send any writes onto the arbiter brick even if there is data pending marker for the arbiter brick. If we have a arbiter setup on the geo-rep master and there are data pending markers for the files on arbiter brick, SHD will not mark any data changelog during healing. While syncing the data from master to slave, if the arbiter-brick is considered as ACTIVE, then there is a chance that slave will miss out some data. If the arbiter brick is being newly added or replaced there is a chance of slave missing all the data during sync. Fix: If there is data pending marker for the arbiter brick, send truncate on the arbiter brick during heal, so that it will record truncate as the data transaction in changelog. Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec fixes: bz#1686568 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* fuse lock interrupt: fix flock_interrupt.tCsaba Henk2019-03-051-5/+5
| | | | | | updates: bz#1193929 Change-Id: I347de62755100cd69e3cf341434767ae23fd1ba4 Signed-off-by: Csaba Henk <csaba@redhat.com>
* tests/dht: Remove hardcoded brick pathsN Balachandran2019-02-182-3/+7
| | | | | | | | | | The tests assumed that the file is created on a particular brick.This need not be the case in all scenarios and has been removed. Change-Id: Id420f43d7f72d983a7c6f16ea8fed273d46c4824 updates: bz#1672480 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* core: implement a global thread poolXavi Hernandez2019-02-182-0/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements a thread pool that is wait-free for adding jobs to the queue and uses a very small locked region to get jobs. This makes it possible to decrease contention drastically. It's based on wfcqueue structure provided by urcu library. It automatically enables more threads when load demands it, and stops them when not needed. There's a maximum number of threads that can be used. This value can be configured. Depending on the workload, the maximum number of threads plays an important role. So it needs to be configured for optimal performance. Currently the thread pool doesn't self adjust the maximum for the workload, so this configuration needs to be changed manually. For this reason, the global thread pool has been made optional, so that volumes can still use the thread pool provided by io-threads. To enable it for bricks, the following option needs to be set: config.global-threading = on This option has no effect if bricks are already running. A restart is required to activate it. It's recommended to also enable the following option when running bricks with the global thread pool: performance.iot-pass-through = on To enable it for a FUSE mount point, the option '--global-threading' must be added to the mount command. To change it, an umount and remount is needed. It's recommended to disable the following option when using global threading on a mount point: performance.client-io-threads = off To enable it for services managed by glusterd, glusterd needs to be started with option '--global-threading'. In this case all daemons, like self-heal, will be using the global thread pool. Currently it can only be enabled for bricks, FUSE mounts and glusterd services. The maximum number of threads for clients and bricks can be configured using the following options: config.client-threads config.brick-threads These options can be applied online and its effect is immediate most of the times. If one of them is set to 0, the maximum number of threads will be calcutated as #cores * 2. Some distributions use a very old userspace-rcu library (version 0.7) for this reason, some header files from version 0.10 have been copied into contrib/userspace-rcu and are used if the detected version is 0.7 or older. An additional change has been made to io-threads to prevent that threads are started when iot-pass-through is set. Change-Id: I09d19e246b9e6d53c6247b29dfca6af6ee00a24b updates: #532 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* Fix compilation for fops-sanity.cPranith Kumar K2019-02-141-0/+1
| | | | | | | | | | | | | | | | Without this patch the following error is seen: .... warning: implicit declaration of function ‘makedev’ [-Wimplicit-function-declaration] ret = mknod("cspecial", S_IFCHR | S_IRWXU | S_IRWXG, makedev(2, 3)); ^~~~~~~ /usr/bin/ld: /tmp/ccIVwT46.o: in function `path_based_fops': /home/pk/workspace/gerrit-repo/tests/basic/fops-sanity.c:478: undefined reference to `makedev' .... updates bz#1676797 Change-Id: I8a17c38fdfd458dd2dc75f4c7e2bf20ce555a042 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* tests/dht: Stop volume before unmounting bricksN Balachandran2019-02-131-1/+7
| | | | | | | | | The bricks are loopback devices. Unmounting them is done before the cleanup and leads to "target is busy" messages. Change-Id: Ia808c2c9580273e1bf0595ecf53c210847c44577 fixes: bz#1676736 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* Bump up timeout for tests on AWSNigel Babu2019-02-073-3/+4
| | | | | | Fixes: bz#1672727 Change-Id: I2b9be45f199f6436b858536c6f49be85902217f0 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* glusterd: manage upgrade to current masterAmar Tumballi2019-02-043-3/+0
| | | | | | | | | | | | | | | | | | | | | | Scenarios tested: * Upgrade the node when there are stripe / tiering and regular type of volumes are present. - All volumes are started fine (as the change was not on brick volfile) - For tier, the functionality may not even work, as changetimerecorder is not present. - 'gluster volume info' properly shows as 'NOT SUPPORTED' for stripe and tier type of volume. * Upgrade in a rolling upgrade scenario, where an old version is able to connect to higher master. - on a normal volume, if the volfile-server was new, the newer client volfiles needed to have utime xlator conditionally. - with this one change, all other changes seem to work fine. Change-Id: Ib2d3b69dafa02b2c695a735b13c1aa70aba07cb8 updates: bz#1635688 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* mount/fuse: expose auto-invalidation as a mount optionRaghavendra Gowdappa2019-02-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Auto invalidation is necessary when same (meta)data is shared/access across multiple mounts. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, fuse-auto-invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. From glusterfs --help, <snip> --auto-invalidation[=BOOL] controls whether fuse-kernel can auto-invalidate attribute, dentry and page-cache. Disable this only if same files/directories are not accessed across two different mounts concurrently [default: "on"] </snip> Details on how disabling auto-invalidation helped to reduce pgbench init times can be found at [1]. Time taken for pgbench init of scale 8000 was 8340s. That will be an improvement of 86% (59280s vs 8340s) with auto-invalidations turned off along with other optimizations. Just disabling auto-invalidation contributed 56% improvement by reducing the total time taken by 33260s. [1] https://www.spinics.net/lists/gluster-devel/msg25907.html Change-Id: I0ed730dba9064bd9c576ad1800170a21e100e1ce Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1664934
* cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelogAshish Pandey2019-02-011-6/+16
| | | | | | | | | | | | | | | | If a fop to create an entry fails on one of the data brick, we mark the pending changelog on the entry on brick for which it was successful. This is done as part of post op phase to make sure that entry gets healed even if it gets renamed to some other path where its parent was not marked as bad. As it happens as part of post op, we should consider thin-arbiter to check if the brick, which was successful, is the good brick or not. This will avoide split brain and other issues. Change-Id: I12686675be98f02f70a5186b3ed748c541514d53 updates: bz#1662264 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* readdir-ahead: do not zero-out iatt in fop cbkRavishankar N2019-01-311-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | ...when ctime is zero. ia_type and ia_gfid always need to be non-zero for things to work correctly. Problem: Commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac zeroed out the iatt buffer in the cbks of modification fops before unwinding if the ctime in the buffer was zero. This was causing the fops to fail: noticeable when AFR's 'consistent-metadata' option was enabled. (AFR zeros out the ctime when the option is set. See commit 4c4624c9bad2edf27128cb122c64f15d7d63bbc8). Fixes: -Do not zero out the ia_type and ia_gfid of the iatt buff under any circumstance. -Also, fixed _rda_inode_ctx_update_iatts() to always update these values from the incoming buf when ctime is zero. Otherwise we end up with zero ia_type and ia_gfid the first time the function is called *and* the incoming buf has ctime set to zero. fixes: bz#1670253 Reported-By:Michael Hanselmann <public@hansmi.ch> Change-Id: Ib72228892d42c3513c19fc6dfb543f2aa3489eca Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* features/shard: Ref shard inode while adding to fsync listKrutika Dhananjay2019-01-241-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: Lot of the earlier changes in the management of shards in lru, fsync lists assumed that if a given shard exists in fsync list, it must be part of lru list as well. This was found to be not true. Consider this - a file is FALLOCATE'd to a size which would make the number of participant shards to be greater than the lru list size. In this case, some of the resolved shards that are to participate in this fop will be evicted from lru list to give way to the rest of the shards. And once FALLOCATE completes, these shards are added to fsync list but without a ref. After the fop completes, these shard inodes are unref'd and destroyed while their inode ctxs are still part of fsync list. Now when an FSYNC is called on the base file and the fsync-list traversed, the client crashes due to illegal memory access. FIX: Hold a ref on the shard inode when adding to fsync list as well. And unref under following conditions: 1. when the shard is evicted from lru list 2. when the base file is fsync'd 3. when the shards are deleted. Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2 fixes: bz#1669077 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* tests: run nfs tests only if --enable-gnfs is providedAmar Tumballi2019-01-2457-1/+117
| | | | | | Fixes: bz#1665358 Change-Id: Idbf88ec3ac683733b32c313377eeb72f2819bf0d Signed-off-by: Amar Tumballi <amarts@redhat.com>
* tests/bug-brick-mux-restart: add extra informationAmar Tumballi2019-01-241-1/+12
| | | | | | | | | | so that we can understand more about process memory and thread consumptions With this, we will also be able to understand more about the process details with brick-mux. updates: bz#1193929 Change-Id: I147a3e3814fc37dfb635217d0a0f0184fae0994f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* afr: not resolve splitbrains when copies are of same sizeIraj Jamali2019-01-221-0/+55
| | | | | | | | | | | | Automatic Splitbrain with size as policy must not resolve splitbrains when the copies are of same size. Determining if the sizes of copies are same and returning -1 in that case. updates: bz#1655052 Change-Id: I3d8e8b4d7962b070ed16c3ee02a1e5a926fd5eab Signed-off-by: Iraj Jamali <ijamali@redhat.com>
* locks/fencing: Add a security knob for fencingSusant Palai2019-01-223-0/+40
| | | | | | | | | | | | | There is a low level security issue with fencing since one client can preempt another client's lock. This patch does not completely eliminate the issue of a client misbehaving, but certainly it adds a security layer for default use cases that does not need fencing. Change-Id: I55cd15f2ed1ae0f2556e3d27a2ef4bc10fdada1c updates: #466 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: Delete invalid linkto files in rmdirN Balachandran2019-01-221-0/+63
| | | | | | | | | | | | rm -rf <dir> fails on dirs which contain linkto files that point to themselves because dht incorrectly thought that they were cached files after looking them up. The fix now treats them as invalid linkto files and deletes them. Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643 fixes: bz#1667804 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* afr: Splitbrain with size as policy must not resolve for directorySheetal Pamecha2019-01-211-0/+55
| | | | | | | | | | | | | | In automatic Splitbrain resolution when favorite child policy is set as size, split brain resolution must not work for directories. Currently, if a directory is in split brain with both copies having same size, the source is selected arbitrarily and healed. fixes: bz#1655050 Change-Id: I5739498639c17c89874cc577362e543adab55f5d Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
* core: Feature added to accept CidrIp in auth.allowRinku Kothiya2019-01-181-0/+25
| | | | | | | | | | | | | | | Added functionality to gluster volume set auth.allow command to accept CIDR IP addresses. Modified few functions to isolate cidr feature so that it prevents other gluster commands such as peer probe to use cidr format ip. The functions are modified in such a way that they have an option to enable accepting of cidr format for other gluster commands if required in furture. updates: bz#1138841 Change-Id: Ie6734002a7078f1820e5df42d404411cce945e8b Credits: Mohit Agrawal Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
* lock: Add fencing supportSusant Palai2019-01-173-0/+319
| | | | | | | | | | | | | | | | | design reference: https://review.gluster.org/#/c/glusterfs-specs/+/21925/ This patch adds the lock preempt support. Note: The current model stores lock enforcement information as separate xattr on disk. There is another effort going in parallel to store this in stat(x) of the file. This patch is self sufficient to add fencing support. Based on the availability of the stat(x) support either I will rebase this patch or we can modify the necessary bits post merging this patch. Change-Id: If4a42f3e0afaee1f66cdb0360ad4e0c005b5b017 updates: #466 Signed-off-by: Susant Palai <spalai@redhat.com>
* core: Resolve dict_leak at the time of destroying graphMohit Agrawal2019-01-141-0/+112
| | | | | | | | | | | | Problem: In gluster code some of the places it call's get_new_dict to create a dictionary without taking reference so at the time of dict_unref it has become a leak Solution: To resolve the same call dict_new instead of get_new_dict updates bz#1650403 Change-Id: I3ccbbf5af07079a4fa09aad2cd0458c8625b2f06 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* libglusterfs/common-utils.c: Fix buffer size for checksum computationVarsha Rao2019-01-111-0/+35
| | | | | | | | | | | | | | Problem: When quorum count option is updated, the change is not reflected in the nfs-server.vol file. This is because in get_checksum_for_file(), when the last part of the file read has size less than buffer size, the read buffer stores old data value along with correct data value. Solution: Pass the bytes read instead of fixed buffer size, for calculating checksum. Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589 fixes: bz#1657744 Signed-off-by: Varsha Rao <varao@redhat.com>
* performance/md-cache: Fix a crash when statfs caching is enabledVijay Bellur2019-01-111-0/+24
| | | | | | | | | | | | | mem_put() in STACK_UNWIND_STRICT causes a crash if frame->local is not null as md-cache obtains local from CALLOC. Changed two occurrences of STACK_UNWIND_STRICT to MDC_STACK_UNWIND as the latter macro does not rely on STACK_UNWIND_STRICT for cleaning up frame->local. fixes: bz#1632503 Change-Id: I1b3edcb9372a164ef73119e99a49e747765d7166 Signed-off-by: Vijay Bellur <vbellur@redhat.com>
* tests: increase the timeout for distribute bug 1117851.tAmar Tumballi2019-01-111-0/+2
| | | | | | | | | | The test is in borderline of 200seconds, and many a times, randomly takes little more time, and fails the whole regression. Better to keep timeout high, so we don't 'randomly' fail regression tests. updates: bz#1193929 Change-Id: Ib0d3a9f7a75ee44446ec6da5e0510cccf83eecaa Signed-off-by: Amar Tumballi <amarts@redhat.com>
* cluster/afr: Disable client side heals in AFR by default.Sunil Kumar Acharya2019-01-1010-1/+30
| | | | | | | | | With this changeset, default value for the AFR client side heal volume option is set to "off" fixes: bz#1663102 Change-Id: Ie4016932339c4896487e3e7cb5caca68739b7ba2 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* gfapi: update returned/callback pre/post attributes to glfs_statShyamsundarR2019-01-071-3/+6
| | | | | | | Change-Id: Ie0fe971e694101aa011d66aa496d0644669c2c5a Updates: #389 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> Signed-off-by: ShyamsundarR <srangana@redhat.com>
* gfapi: new api glfs_statx as linux's statxShyamsundarR2019-01-072-0/+214
| | | | | | | Change-Id: I44dd6ceef0954ae7fc13f920e84d81bbd3f6a774 Updates: #389 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> Signed-off-by: ShyamsundarR <srangana@redhat.com>
* cluster/ta: Check number/type of locks held on ta fileAshish Pandey2018-12-271-0/+68
| | | | | | Change-Id: Iec47856ce2819e7d7d38a60279602e53ba45858d updates: bz#1624332 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* tests: Brick is getting OOM in ./tests/bugs/core/bug-1432542-mpx-restart-crash.tMohit Agrawal2018-12-211-2/+3
| | | | | | | | | | | | | | This test "tests/bugs/core/bug-1432542-mpx-restart-crash.t" case creates 20 2x3 volumes after enabling brick_mux.At the time of creating last volume brick is getting OOM because brick consumption has increased from previous consumption due to these patches https://review.gluster.org/#/c/glusterfs/+/19997/, https://review.gluster.org/#/c/glusterfs/+/20362/ To avoid OOM reduce NUM_VOLS to 15 so that brick consumption has reduced Change-Id: Ib98b47a3db6b990ff22c7e57396d51e7fef5c7e8 fixes: bz#1661214 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* tests: Fix zero-flag.t scriptKrutika Dhananjay2018-12-191-1/+1
| | | | | | | | | | | | | | | The default value of shard-block-size was changed from 4MB to 64MB sometime back. The script "fallocate"s a 6MB file and expects it to have 1 shard under .shard. This worked when the shard-block-size was 4MB. With the default value now at 64MB, file "file1" won't have any shards under .shard and the stat on the 1st shard's path fails with ENOENT. Changed the script to explicitly set shard-block-size to 4MB. Change-Id: I7f1785922287d16d74c95fa57cbbe12e6e66e4f7 fixes: bz#1656264 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/afr: Allow lookup on root if it is from ADD_REPLICA_MOUNTkarthik-us2018-12-181-0/+95
| | | | | | | | | | | | | | | | | | | | | Problem: When trying to convert a plain distribute volume to replica-3 or arbiter type it is failing with ENOTCONN error as the lookup on the root will fail as there is no quorum. Fix: Allow lookup on root if it is coming from the ADD_REPLICA_MOUNT which is used while adding bricks to a volume. It will try to set the pending xattrs for the newly added bricks to allow the heal to happen in the right direction and avoid data loss scenarios. Note: This fix will solve the problem of type conversion only in the case where the volume was mounted at least once. The conversion of non mounted volumes will still fail since the dht selfheal tries to set the directory layout will fail as they do that with the PID GF_CLIENT_PID_NO_ROOT_SQUASH set in the frame->root. Change-Id: Ic511939981dad118cc946754341318b164954b3b fixes: bz#1655854 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* iobuf: Get rid of pre allocated iobuf_pool and use per thread mem poolPoornima G2018-12-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation of iobuf_pool has two problems: - prealloc of 12.5MB memory, this limits the scale factor of the gluster processes due to RAM requirements - lock contention, as the current implementation has one global iobuf_pool lock. Credits for debugging and addressing the same goes to Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410 Hence changing the iobuf implementation to use per thread mem pool. This may theoritically appear to cause perf dip as there is no preallocation. But per thread mem pool will not have significant perf impact as the last allocated memory is kept alive for subsequent allocs, for some time. The worst case would be if iobufs requested are of random sizes each time. The best case is, if we get iobuf request of the same size. From the perf tests, this patch did not seem to cause any perf decrease. Note that, with this patch, the rdma performance is going to degrade drastically. In one of the previous patchsets we had fixes to not degrade rdma perf, but rdma is not supported and also not tested [1]. Hence the decision was to not have code in rdma that is not tested and not supported. [1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html Updates: #325 Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27 Signed-off-by: Poornima G <pgurusid@redhat.com>
* fuse: SETLKW interruptCsaba Henk2018-12-141-0/+33
| | | | | | | | | Use the (f)getxattr based clearlocks interface to interrupt a pending lock request. updates: #465 Change-Id: I4e91a4d8791fc688fed400a02de4c53487e61be2 Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: add --lru-limit optionAmar Tumballi2018-12-141-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1560969 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* [geo-rep]: Worker still ACTIVE after killing bricksMohit Agrawal2018-12-132-0/+110
| | | | | | | | | | | | | | | | | | | | | | | Problem: In changelog xlator after destroying listener it call's unlink to delete changelog socket file but socket file reference is not cleaned up from process memory Solution: 1) To cleanup reference completely from process memory serialize transport cleanup for changelog and then unlink socket file 2) Brick xlator will notify GF_EVENT_PARENT_DOWN to next xlator only after cleanup all xprts Test: To test the same run below steps 1) Setup some volume and enable brick mux 2) kill anyone brick with gf_attach 3) check changelog socket for specific to killed brick in lsof, it should cleanup completely fixes: bz#1600145 Change-Id: Iba06cbf77d8a87b34a60fce50f6d8c0d427fa491 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* copy_file_range support in GlusterFSRaghavendra Bhat2018-12-122-0/+257
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * libglusterfs changes to add new fop * Fuse changes: - Changes in fuse bridge xlator to receive and send responses * posix changes to perform the op on the backend filesystem * protocol and rpc changes for sending and receiving the fop * gfapi changes for performing the fop * tools: glfs-copy-file-range tool for testing copy_file_range fop - Although, copy_file_range support has been added to the upstream fuse kernel module, no release has been made yet of a kernel which contains the support. It is expected to come in the upcoming release of linux-4.20 So, as of now, executing copy_file_range fop on a fused based filesystem results in fuse kernel module sending read on the source fd and write on the destination fd. Therefore a small gfapi based tool has been written to be able test the copy_file_range fop. This tool is similar (in functionality) to the example program given in copy_file_range man page. So, running regular copy_file_range on a fuse mount point and running gfapi based glfs-copy-file-range tool gives some idea about how fast, the copy_file_range (or reflink) can be. On the local machine this was the result obtained. mount -t glusterfs workstation:new /mnt/glusterfs [root@workstation ~]# cd /mnt/glusterfs/ [root@workstation glusterfs]# ls file [root@workstation glusterfs]# cd [root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new real 0m6.495s user 0m0.000s sys 0m1.439s [root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr OPEN_SRC: opening /file is success OPEN_DST: opening /rrr is success FSTAT_SRC: fstat on /rrr is success copy_file_range successful real 0m0.309s user 0m0.039s sys 0m0.017s This tool needs following arguments 1) hostname 2) volume name 3) log file path 4) source file path (relative to the gluster volume root) 5) destination file path (relative to the gluster volume root) "glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>" - Added a testcase as well to run glfs-copy-file-range tool * io-stats changes to capture the fop for profiling * NOTE: - Added conditional check to see whether the copy_file_range syscall is available or not. If not, then return ENOSYS. - Added conditional check for kernel minor version in fuse_kernel.h and fuse-bridge while referring to copy_file_range. And the kernel minor version is kept as it is. i.e. 24. Increment it in future when there is a kernel release which contains the support for copy_file_range fop in fuse kernel module. * The document which contains a writeup on this enhancement can be found at https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367 updates: #536 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/afr: Do not update read_subvol in inode_ctx after rename/link fopkarthik-us2018-12-121-0/+40
| | | | | | | | | | | Since rename/link fops on a file will not change any data in it, it should not update the read_subvol values in the inode_ctx, which interprets the data & metadata readable subvols for that file. The old read_subvol values should be retained even after the rename/link operations. Change-Id: I068044a426823a566f5bea8aa063cd689199d6dd fixes: bz#1657783 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* geo-rep: Make slave volume read-only (by default)Harpreet Kaur2018-12-075-0/+19
| | | | | | | | | | | | | | | Added a command to set "features.read-only" option to a default value "on" for slave volume. Changes are made in: $SRC//extras/hook-scripts/S56glusterd-geo-rep-create-post.sh for root geo-rep and $SRC/geo-replication/src/set_geo_rep_pem_keys.sh for non-root geo-rep. Fixes: bz#1654187 Change-Id: I15beeae3506f3f6b1dcba0a5c50b6344fd468c7c Signed-off-by: Harpreet Kaur <hlalwani@redhat.com>
* libglusterfs: Move devel headers under glusterfs directoryShyamsundarR2018-12-052-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | libglusterfs devel package headers are referenced in code using include semantics for a program, this while it works can be better especially when dealing with out of tree xlator builds or in general out of tree devel package usage. Towards this, the following changes are done, - moved all devel headers under a glusterfs directory - Included these headers using system header notation <> in all code outside of libglusterfs - Included these headers using own program notation "" within libglusterfs This change although big, is just moving around the headers and making it correct when including these headers from other sources. This helps us correctly include libglusterfs includes without namespace conflicts. Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b Updates: bz#1193929 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* protocol/server: support server.all-squashXie Changlong2018-12-051-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | We still use gnfs on our side, so do a little work to support server.all-squash. Just like server.root-squash, it's also a volume wide option. Also see bz#1285126 $ gluster volume set <VOLNAME> server.all-squash on Note: If you enable server.root-squash and server.all-squash at the same time, only server.all-squash works. Please refer to following table +---------------+-----------------+---------------------------+ | |all_squash | no_all_squash | +-------------------------------------------------------------+ | | |anonuid/anongid for root | |root_squash |anonuid/anongid |useruid/usergid for no-root| +-------------------------------------------------------------+ |no_root_squash |anonuid/anongid |useruid/usergid | +-------------------------------------------------------------+ Updates bz#1285126 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> Signed-off-by: Xue Chuanyu <xuechuanyu@cmss.chinamobile.com> Change-Id: Iea043318fe6e9a75fa92b396737985062a26b47e
* tests/geo-rep: Mask failure of geo-rep arbiter testKotresh HR2018-12-053-21/+21
| | | | | | | | | | | | Comment out the particular test which is failing arbitrarily. Also changed the code to differentiate error cases. There could be some race because of which it's failing arbitrarily. This will be debugged and fixed in separate patch. Change-Id: I925df6421737d7a9abd9446a9d85029b4285ad2c updates: bz#1193929 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* tests: Mark tests/bugs/shard/zero-flag.t badAtin Mukherjee2018-12-051-0/+1
| | | | | | Change-Id: I2f4ca470c6666584e0feb129ab712f06772a86c2 Updates: bz#1656264 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* geo-rep: Fix syncing of files with non-ascii filenamesKotresh HR2018-12-041-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Creation of files/directories with non-ascii names fails to sync to the slave. It crashes with below traceback on slave. ... File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 709, in entry_ops [ESTALE, EINVAL, EBUSY]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap return call(*arg) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 83, in lsetxattr cls.raise_oserr() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 38, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory Cause: The length calculation arguments passed to blob creation was done before encoding. Hence was failing in gfid-access layer. Fix: It appears that the calculating lenght properly fixes this issue. But it will cause issues in other places in 'python2' and not in 'python3'. So encoding and decoding each required string to make geo-rep compatible with both 'python2' and 'python3' is a nightmare and is not fool proof. Hence kept 'python2' code as is with out encode/decode and applied encode/decode only to 'python3' Added non-ascii filename tests to regression fixes: bz#1650893 Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* afr: assign gfid during name heal when no 'source' is present.Ravishankar N2018-12-031-0/+149
| | | | | | | | | | | | | | | | | | Problem: If parent dir is in split-brain or has dirty xattrs set, and the file has gfid missing on one of the bricks, then name heal won't assign the gfid. Fix: Use the brick we select the gfid from as the 'source'. Note: Problem was found while trying to debug a split-brain issue on Cynthia Zhou's setup. updates: bz#1637249 Change-Id: Id088d4f0fb017aa35122de426654194e581ed742 Reported-by: Cynthia Zhou <cynthia.zhou@nokia-sbell.com> Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* tests/geo-rep: Add Arbiter volume test caseHarpreet Kaur2018-11-282-0/+439
| | | | | | | | | Added geo-rep regression tests with Arbiter volume. Fixes: bz#1653565 Change-Id: Id99523c1f1d3d301fbe871aa0641d9ae4ed7b8d7 Signed-off-by: Harpreet Kaur <hlalwani@redhat.com>
* cluster/afr: Add test for thin-arbiter featureAshish Pandey2018-11-261-0/+51
| | | | | | | | | Test : Check success/failure of write fop while different bricks/ta process are down. Change-Id: I3c376935df93ebf1f794c964bd19bc1280d91c59 updates: bz#1624332 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* gfapi: Offload callback notifications to synctaskSoumya Koduri2018-11-262-0/+374
| | | | | | | | | | | | | | | | | | | | | | Upcall notifications are received from server via epoll and same thread is used to forward these notifications to the application. This may lead to deadlock and hang in the following scenario. Consider if as part of handling these callbacks, application has to do some operations which involve sending I/Os to gfapi stack which inturn have to wait for epoll threads to receive repsonse. Thus this may lead to deadlock if all the epoll threads are waiting to complete these callback notifications. To address it, instead of using epoll thread itself, make use of synctask to send those notificaitons to the application. Change-Id: If614e0d09246e4279b9d1f40d883a32a39c8fd90 updates: bz#1648768 Signed-off-by: Soumya Koduri <skoduri@redhat.com>