summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* posix: APIs in posix to get and set time attributesKotresh HR2018-05-068-21/+614
| | | | | | | | | | | | | This is part of the effort to provide consistent time across distribute and replica set for time attributes (ctime, atime, mtime) of the object. This patch contains the APIs to set and get the attributes from on disk and in inode context. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: I5d3cba53eef90ac252cb8299c0da42ebab3bde9f Signed-off-by: Kotresh HR <khiremat@redhat.com>
* afr: Add lease() fopPoornima G2018-05-053-0/+157
| | | | | | | | Change-Id: Ied047dd5ee44e9d5a5d3db214826f7df30332ef9 updates: #350 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* mount,fuse: make fuse dumping available as mount optionCsaba Henk2018-05-041-0/+7
| | | | | | Updates: bz#1193929 Change-Id: I4dd4d0e607f89650ebb74b893b911b554472826d Signed-off-by: Csaba Henk <csaba@redhat.com>
* fuse: add support for kernel writeback cacheCsaba Henk2018-05-044-4/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
* feature/leases : fixing bugs found while testing glfs_test.tJiffin Tony Thottan2018-05-043-14/+56
| | | | | | Change-Id: Iee8f431601ecda184108a079f665e05902b0f78b updates: #350 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* features/bitrot: print the path of the corrupted objectsRaghavendra Bhat2018-05-044-6/+250
| | | | | | | | | | | | | | | | | | | | | | | | | Currently "gluster volume bitrot <volume name> scrub status" gives the list of the corrupted objects (files as of now). But only the gfids of those corrupted objects are seen and one has to do getfattr, find etc operations to get the actual path of those objects for removal etc. This change makes an attempt to print the path of those files as much as possible. * Try to get the path using the on disk gfid2path xattr. * If the above operation fails, then go for in memory path (provided that the object has its dentry properly created and linked in the inode table of the brick where the corrupted object is present) So the gfid to path resolution is a soft resolution, i.e. based on the inode and dentry cache in the brick's memory. If the path cannot be obtained via inode table also, then only gfid is printed. Change-Id: Ie9a30307f43a49a2a9225821803c7d40d231de68 fixes: bz#1570962 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* glusterd: enable self-heal in daemonsRavishankar N2018-05-043-14/+0
| | | | | | | | | ..like rebalance, quota and tier because that seems to be the consensus (see BZ). Change-Id: I912336a12f4e33ea4ec55f804df403fab0dc89fc BUG: 1536024 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* posix: Avoid changelog retries for geo-repMohit Agrawal2018-05-031-0/+33
| | | | | | | | | | | | | | | Problem: georep is slowdown to migrate directory from master volume to slave volume due to lot of changelog retries Solution: Update the condition in posix_getxattr to ignore MDS_INTERNAL_XATTR as it(posix) ignored other internal xattrs BUG: 1571069 Change-Id: I4d91ec73e5b1ca1cb3ecf0825ab9f49e261da70e fixes: bz#1571069 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* Don't use hardcoded /sbin, /usr/bin etc. paths. Fixes #1450546Niklas Hambüchen2018-05-033-18/+6
| | | | | | | | | Instead, rely on programs to be in PATH, as gluster already does in many places across its code base. Change-Id: Id21152fe42f5b67205d8f1571b0656c4d5f74246 BUG: 1450546 Signed-off-by: Niklas Hambuechen <mail@nh2.me>
* cluster/dht: unwind if dht_selfheal_dir_mkdir returns an errorRaghavendra G2018-05-031-1/+5
| | | | | | | | | | If dht_selfheal_dir_mkdir returns an error, cbk passed to dht_selfheal_directory is not invoked. So, Current codepath leaves an unwound frame resulting in a hung fop forever. Change-Id: I422308b8a34a074301ca46b029ffe676f5e0f66c fixes: bz#1574305 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* protocol/server : unwind as per op versionAshish Pandey2018-05-032-3/+10
| | | | | | | Change-Id: Id6717640ac14881b490e512c4682e45ffffa7f5b fixes: bz#1570538 BUG: 1570538 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* glusterd/geo-rep: Fix UNUSED_VALUE coverity issueVarsha Rao2018-05-032-12/+12
| | | | | | | | | | | The return value of glusterd_get_local_brickpaths is unused so add goto statement. As it is reinitialized outside the if block. Also change the if condition to check the failure case, when return value is -1 and path_list is NULL. Change-Id: I6b47d7751263f704bd69a6452a7e71bfcf226d49 updates: bz#789278 Signed-off-by: Varsha Rao <varao@redhat.com>
* core/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-05-0211-190/+204
| | | | | | | | | | see https://review.gluster.org/#/c/19788/ use print fn from __future__ Change-Id: If5075d8d9ca9641058fbc71df8a52aa35804cda4 updates: #411 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* glusterd: Fix for memory leak in get-state detailSanju Rakonde2018-05-011-1/+8
| | | | | | Fixes: bz#1573066 Change-Id: I76fe3bdde7351736b32eb3d6c4cc5f8f276257ed Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* dht: gf_defrag_settle_hash should ignore ENOENT and ESTALE errorSusant Palai2018-04-301-1/+8
| | | | | | | | | | | Problem: A directory deletion can happen just before gf_defrag_settle_hash which internally does a setxattr operation on a directory. Solution: Ignore ENOENT and ESTALE errors Fixes: bz#1572581 Change-Id: I2f91809f3b5e02976c4c3a5a596406a8b2f8f6f2 Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/afr: shd changes for thin arbiterkarthik-us2018-04-301-0/+184
| | | | | | | Updates #352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>
* afr: initial changes for thin arbiterRavishankar N2018-04-306-8/+229
| | | | | | | | | 1. Create thin arbiter index file during mount. 2. Set pending marker in thin arbiter id file in case of failure. Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82 updates: #352 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* server/resolver: don't trust inode-table for RESOLVE_NOTRaghavendra G2018-04-301-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been known races between fops which add a dentry (like lookup, create, mknod etc) and fops that remove a dentry (like rename, unlink, rmdir etc) due to which stale dentries are left out in inode table even though the dentry doesn't exist on backend. For eg., consider a lookup (parent/bname) and unlink (parent/bname) racing in the following order: * lookup hits storage/posix and finds that dentry exists * unlink removes the dentry on storage/posix * unlink reaches protocol/server where the dentry (parent/bname) is unlinked from the inode * lookup reaches protocol/server and creates a dentry (parent/bname) on the inode Now we've a stale dentry (parent/bname) associated with the inode in itable. This situation is bad for fops like link, create etc which invoke resolver with type RESOLVE_NOT. These fops fail with EEXIST even though there is no such dentry on backend fs. This issue can be solved in two ways: * Enable "dentry fop serializer" xlator [1]. # gluster volume set features.sdfs on * Make sure resolver does a lookup on backend when it finds a dentry in itable and validates the state of itable. - If a dentry is not found, unlink those stale dentries from itable and continue with fop - If dentry is found, fail the fop with EEXIST This patch implements second solution as sdfs is not enabled by default in brick xlator stack. Once sdfs is enabled by default, this patch can be reverted. [1] https://github.com/gluster/glusterfs/issues/397 Change-Id: Ia8bb0cf97f97cb0e72639bce8aadb0f6d3f4a34a updates: bz#1543279 BUG: 1543279 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs: Capture the dict response in syncop_xattrop_cbkkarthik-us2018-04-273-5/+9
| | | | | | | | | | | | | | | Problem: Currently it is not possible to capture the xattrs values which are set on the bricks by calling syncop_(f)xattrop, because the response dict is not being assigned to any of the dictionaries. Fix: In the xattrop callback capture the response dict and send it back to the caller if it is requested. Change-Id: I9de9bcd97d6008091c9b060bcca3676cb9ae8ef9 fixes: bz#1572076 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* feature/thin-arbiter: Implement thin-arbiter translatorAshish Pandey2018-04-257-1/+773
| | | | | | | Updates #352 Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* performance/md-cache: purge cache on ENOENT/ESTALE errorsRaghavendra G2018-04-251-87/+438
| | | | | | | | | | If not, next lookup could be served from cache and can be success, which is wrong. This can affect retry logic of VFS when it receives an ESTALE. Change-Id: Iad8e564d666aa4172823343f19a60c11e4416ef6 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1566303
* cluster/afr: Keep child-up until ping-eventPranith Kumar K2018-04-253-25/+40
| | | | | | | | | | | | | | | | | | | | | Problem: If we have 2 bricks, brick-A and brick-B with brick-A within halo-max-latency and brick-B more than halo-max-latency. If we set both halo-min, halo-max replicas as '1'. In this case, brick-A comes online and then ping-latency will be updated for it. When brick-B comes online, we have 2 up-bricks, so the code tries to find the brick with worst latency to mark it down. Since Brick-B just came online it always had '0' latency so brick-B used to be marked offline and Brick-B would eventually be the one to be online even when brick-A is more suited. Fix: Consider latency of just-up child as HALO_MAX_LATENCY so that worst-child until ping-latency is found as the just-up brick. Also keep ping-latency as -1 until child-up during initialization. BUG: 1567881 fixes bz#1567881 Change-Id: I148262fe505468190f0eb99225d0f6d57cdb6f04 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* features/shard: Add option to barrier parallel lookup and unlink of shardsKrutika Dhananjay2018-04-232-28/+89
| | | | | | | | | Also move the common parallel unlink callback for GF_FOP_TRUNCATE and GF_FOP_FTRUNCATE into a separate function. Change-Id: Ib0f90a5f62abdfa89cda7bef9f3ff99f349ec332 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/dht: Fix dht_rename lock orderN Balachandran2018-04-231-18/+47
| | | | | | | | | | Fixed dht_order_rename_lock to use the same inodelk ordering as that of the dht selfheal locks (dictionary order of lock subvolumes). Change-Id: Ia3f8353b33ea2fd3bc1ba7e8e777dda6c1d33e0d fixes: bz#1568348 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* server/auth: add option for strict authenticationMohammed Rafi KC2018-04-206-12/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When this option is enabled, we will check for a matching username and password, if not found then the connection will be rejected. This also does a checksum validation of volfile The option is invalid when SSL/TLS is in use, at which point the SSL/TLS certificate user name is used to validate and hence authorize the right user. This expects TLS allow rules to be setup correctly rather than the default *. This option is not settable, as a result this cannot be enabled for volumes using the CLI. This is used with the shared storage volume, to restrict access to the same in non-SSL/TLS environments to the gluster peers only. Tested: ./tests/bugs/protocol/bug-1321578.t ./tests/features/ssl-authz.t - Ran tests on volumes with and without strict auth checking (as brick vol file needed to be edited to test, or rather to enable the option) - Ran tests on volumes to ensure existing mounts are disconnected when we enable strict checking Change-Id: I2ac4f0cfa5b59cc789cc5a265358389b04556b59 fixes: bz#1568844 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: ShyamsundarR <srangana@redhat.com>
* shared storage: Prevent mounting shared storage from non-trusted clientMohammed Rafi KC2018-04-201-0/+21
| | | | | | | | | | | | | | | gluster shared storage is a volume used for internal storage for various features including ganesha, geo-rep, snapshot. So this volume should not be exposed to the client, as it is a special volume for internal use. This fix wont't generate non trusted volfile for shared storage volume. Change-Id: I8ffe30ae99ec05196d75466210b84db311611a4c fixes: bz#1568844 BUG: 1568844 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
* server: fix unresolved symbols by moving them to libglusterfsMohit Agrawal2018-04-201-1/+1
| | | | | | | | | | | | | | | | Problem: glusterd2 build is failed due to undefined symbol (xlator_mem_cleanup , glusterfsd_ctx) in server.so Solution: To resolve the same done below two changes 1) Move xlator_mem_cleanup code from glusterfsd-mgmt.c to xlator.c to be part of libglusterfs.so 2) replace glusterfsd_ctx to this->ctx because symbol glusterfsd_ctx is not part of server.so BUG: 1544090 Change-Id: Ie5e6fba9ed458931d08eb0948d450aa962424ae5 fixes: bz#1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/afr: Need heal-timeout to be configured as low as 5 secondsPranith Kumar K2018-04-201-1/+1
| | | | | | | | | | | In Halo replication, there are pending heals more often than not. It makes sense to give users the capability to configure it as low as 5 seconds. BUG: 1569489 fixes bz#1569489 Change-Id: I451c1975827f66398b903f659c981ef3121d5376 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* features/bitrot: show the corresponding brick for the corrupted objectsRaghavendra Bhat2018-04-201-3/+8
| | | | | | | | | | | Currently with "gluster volume bitrot <volume name> scrub status" command the corrupted objects of a node are shown. But to what brick that corrupted object belongs to is not shown. Showing the brick of the corrupted object will help in situations where a node hosts multiple bricks of a volume. Change-Id: I7fbdea1e0072b9d3487eb10757468bc02d24df21 fixes: bz#1569198 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* gluster: Sometimes Brick process is crashed at the time of stopping brickMohit Agrawal2018-04-199-61/+190
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Sometimes brick process is getting crashed at the time of stop brick while brick mux is enabled. Solution: Brick process was getting crashed because of rpc connection was not cleaning properly while brick mux is enabled.In this patch after sending GF_EVENT_CLEANUP notification to xlator(server) waits for all rpc client connection destroy for specific xlator.Once rpc connections are destroyed in server_rpc_notify for all associated client for that brick then call xlator_mem_cleanup for for brick xlator as well as all child xlators.To avoid races at the time of cleanup introduce two new flags at each xlator cleanup_starting, call_cleanup. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/#/c/19700/) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ic4ab9c128df282d146cf1135640281fcb31997bf updates: bz#1544090
* glusterd: volume inode/fd status broken with brick muxhari gowtham2018-04-195-44/+59
| | | | | | | | | | | | | | | | | | | | | | | Problem: The values for inode/fd was populated from the ctx received from the server xlator. Without brickmux, every brick from a volume belonged to a single brick from the volume. So searching the server and populating it worked. With brickmux, a number of bricks can be confined to a single process. These bricks can be from different volumes too (if we use the max-bricks-per-process option). If they are from different volumes, using the server xlator to populate causes problem. Fix: Use the brick to validate and populate the inode/fd status. Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: I2543fa5397ea095f8338b518460037bba3dfdbfd fixes: bz#1566067
* features/shard: Make operations on internal directories genericKrutika Dhananjay2018-04-182-92/+206
| | | | | | Change-Id: Iea7ad2102220c6d415909f8caef84167ce2d6818 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* fuse: do fd_resolve in fuse_getattr if fd is receivedSusant Palai2018-04-182-7/+10
| | | | | | | | | | | | | | | | | | | | problem: With the current code, post graph switch the old fd is received for fuse_getattr and since it is associated with old inode, it does not have the inode ctx across xlators in new graph. Hence, dht errored out saying "no layout" for fstat call. Hence the EINVAL. Solution: if fd is passed, init and resolve fd to carry on getattr test case: - Created a single brick distributed volume - Started untar - Added a new-brick Without this fix, untar used to abort with ERROR. Change-Id: I5805c463fb9a04ba5c24829b768127097ff8b9f9 fixes: bz#1566207 Signed-off-by: Susant Palai <spalai@redhat.com>
* glusterd: update listen-backlog value to 1024Milind Changire2018-04-181-1/+1
| | | | | | | | | | | | Update default value of listen-backlog to 1024 to reflect the changes in socket.c This keeps the actual implementation in socket.c and the help text in glusterd-volume-set.c consistent Change-Id: If04c9e0bb5afb55edcc7ca57bbc10922b85b7075 fixes: bz#1564600 Signed-off-by: Milind Changire <mchangir@redhat.com>
* cluster/afr: Make sure latency-arg is passed to afrPranith Kumar K2018-04-181-0/+2
| | | | | | | | | | | xlator_notify doesn't pass the extra arguments that come in the input function, so XLATOR_NOTIFY macro should be used instead to pass the extra arguments to the function. BUG: 1567881 fixes bz#1567881 Change-Id: Ic15b6c446638cbacf3149693147a754219037c47 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: fixes to afr-eager lockingRavishankar N2018-04-181-0/+2
| | | | | | | | | | | | | 1. If pre-op fails on all bricks,set lock->release to true in afr_handle_lock_acquire_failure so that the GF_ASSERT in afr_unlock() does not crash. 2. Added a missing 'return' after handling pre-op failure in afr_transaction_perform_fop(), fixing a use-after-free issue. Change-Id: If0627a9124cb5d6405037cab3f17f8325eed2d83 fixes: bz#1561129 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* Revert "storage/posix: add pgfid in readdirp if needed"Nigel Babu2018-04-181-38/+8
| | | | | | | | This reverts commit d206fab73f6815c927a84171ee9361c9b31557b1. Change-Id: I5b43fdcf916bc844437c9d60f6957bc40936e3c2 Updates: bz#1560319 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* fuse: retire statvfs tweakCsaba Henk2018-04-161-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | fuse xlator used to override the filesystem block size of the storage backend to indicate its preferences. Now we retire this tweak and pass on what we get from the backend. This fixes the anomaly reported in the referred BUG. For more background, see the following email, which was sent out to gluster-devel and gluster-users mailing lists to gauge if anyone sees any use of this tweak: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054660.html http://lists.gluster.org/pipermail/gluster-users/2018-March/033775.html Noone vetoed the removal of it but it got endorsement: http://lists.gluster.org/pipermail/gluster-devel/2018-March/054686.html BUG: 1523219 Change-Id: I3b7111d3037a1b91a288c1589f407b2c48d81bfa Signed-off-by: Csaba Henk <csaba@redhat.com>
* cluster/dht: Handle file migrations when brick downN Balachandran2018-04-131-5/+51
| | | | | | | | | | | | | | | The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* core/build/various: python3 compat, prepare for python2 -> python3Kaleb S. KEITHLEY2018-04-127-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note 1) we're not supposed to be using #!/usr/bin/env python, see https://fedoraproject.org/wiki/Packaging:Guidelines?rd=Packaging/Guidelines#Shebang_lines Note 2) we're also not supposed to be using "!/usr/bin/python, see https://fedoraproject.org/wiki/Changes/Avoid_usr_bin_python_in_RPM_Build#Quick_Opt-Out The previous patch (https://review.gluster.org/19767) tried to do too much in one patch, so it was abandoned. This patch does two things: 1) minor cleanup of configure(.ac) to explicitly use python2 2) change all the shebang lines to #!/usr/bin/python2 and add them where they were missing based on warnings emitted during rpmbuild. In a follow-up patch python2 will eventually be changed to python3. Before that python2-isms (e.g. print, string.join(), etc.) need to be converted to python3. Some of those can be rewritten in version agnostic python. E.g. print statements become print() with "from __future_ import print_function". The python 2to3 utility will be used for some of those. Also Aravinda has given guidance in the comments to the first patch for changes. updates: #411 Change-Id: I471730962b2526022115a1fc33629fb078b74338 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* cluster/dht: Wind open to all subvolsN Balachandran2018-04-111-10/+5
| | | | | | | | | | dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1564198 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* xlators/performance: Add pass-through optionVarsha Rao2018-04-119-10/+139
| | | | | | | | | | Add pass-through option in performance traslators. Set the option in GF_OPTION_INIT() and GF_OPTION_RECONF() Updates: #304 Change-Id: If1537450147d154905831e36f7162a32866d7ad6 Signed-off-by: Varsha Rao <varao@redhat.com>
* posix: reserve option behavior is not correct while using fallocateMohit Agrawal2018-04-112-0/+11
| | | | | | | | | | | | | | | | | Problem: storage.reserve option is not working correctly while disk space is allocate throguh fallocate Solution: In posix_disk_space_check_thread_proc after every 5 sec interval it calls posix_disk_space_check to monitor disk space and set the flag in posix priv.In 5 sec timestamp user can create big file with fallocate that can reach posix reserve limit and no error is shown on terminal even limit has reached. To resolve the same call posix_disk_space for every fallocate fop instead to call by a thread after 5 second BUG: 1560411 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I39ba9390e2e6d084eedbf3bcf45cd6d708591577
* storage/posix: add pgfid in readdirp if neededKinglong Mee2018-04-101-8/+38
| | | | | | Change-Id: I6745428fd9d4e402bf2cad52cee8ab46b7fd822f fixes: bz#1560319 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* posix: check file state before continuing with fopsSusant Palai2018-04-105-16/+756
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In context of Cloudsync: In scenarios where a data modification fop e.g. a write landed in POSIX thinking that the file is local, while the file was actually remote, can be dangerous. Ofcourse we don’t want to take inodelk for every read/write operation to check the archival status or coordinate with an upload or a download of a file. To avoid inodelk, we will check the status of the file in POSIX it self, before we resume the fop. This helps us avoiding any races mentioned above. Now e.g. if a write reached POSIX for a file which was actually remote, it can check the status of the file and will get to know that the file is remote. It can error out with this status “remote” and cloudsync xlator will retry the same operation, once it finished downloading the file. This patch includes the setxattr changes to do the post processing of upload i.e. truncate and setting the remote xattr "trusted.glusterfs.cs.remote" to indicate the file is REMOTE Each file will have no xattr if the file is LOCAL, one remote xattr if the file is REMOTE and a combination of REMOTE and DOWNLOADING xattr if the file is getting downloaded. There is healing logic of these xattrs to recover from crash inconsitencies. Fixes: #387 Change-Id: Ie93c2d41aa8d6a798a39bdbef9d1669f057e5fdb Signed-off-by: Susant Palai <spalai@redhat.com>
* cluster/dht: act as passthrough for renames on single child DHTRaghavendra G2018-04-101-7/+15
| | | | | | | | | | Various synchronization present in dht_rename while handling directories and files is necessary only if we have more than only one child. Change-Id: Ie21ad419125504ca2f391b1ae2e5c1d166fee247 fixes: bz#1563511 Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
* experimental/cloudsync: Download xlator for archival featureSusant Palai2018-04-1015-2/+2414
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | spec-files: https://review.gluster.org/#/c/18854/ Overview: * Cloudsync maintains three file states in it's inode-ctx i.e 1 - LOCAL, 2 - REMOTE, 3 - DOWNLOADING. * A data modifying fop is allowed only if the state is LOCAL. If the state is REMOTE or DOWNLOADING, client will download or wait for the download to finish initiated by other client. * Multiple download and upload from different clients are synchronized by inodelk. * In POSIX a state check is done (part of different commit)before allowing the fop to continue. If the state is remote/downloading the fop is unwound with EREMOTE. The client will then download the file and continue with the fop again. * Basic Algo for fop (let's say write fop): - If LOCAL -> resume fop - If REMOTE -> - INODELK - STAT (this gets state and heal the state if needed) - DOWNLOAD - resume fop Note: * Developers will need to write plugins for download, based on the remote store they choose. In phase-1, support will be added for one remote store per volume. In future, more options for multiple remote stores will be explored. TODOs: - Implement stat/lookup/readdirp to return size info from xattr - Make plugins configurable - Implement unlink fop - Add metrics collection - Add sharding support Design Contributions: Aravinda V K <avishwan@redhat.com> Amar Tumballi <amarts@redhat.com> Ram Ankireddypalle <areddy@commvault.com> Susant Palai <spalai@redhat.com> updates: #387 Change-Id: Iddf711ee7ab4e946ae3e472ff62791a7b85e6d4b Signed-off-by: Susant Palai <spalai@redhat.com>
* quota: allow writes when with EINVAL on pgfid isnot existKinglong Mee2018-04-091-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | NFS client gets "Invalid argument" when writing file through nfs-ganesha. 1. With quota disabled; nfs client mount nfs-ganesha share, and do 'll' in the testing directory. 2. Enable quota; getfattr: Removing leading '/' from absolute path names trusted.gfid=0xe2edaac0eca8420ebbbcba7e56bbd240 trusted.gfid2path.b3250af8fa558e66=0x39663134343566662d653530332d343831352d396635312d3236633565366332633137642f7465737466696c653932 trusted.glusterfs.quota.9f1445ff-e503-4815-9f51-26c5e6c2c17d.contri.3=0x00000000000002000000000000000001 Notice: testfile92 without trusted.pgfid xattr. 3. restart glusterfs volume by "gluster volume stop/start gvtest" 4. echo somedata > testfile92 5. ll testfile92 -rw-r--r-- 1 root root 0 Mar 6 21:43 testfile92 BUG: 1560319 Change-Id: Iaa4dd1e891c99069fb85b7b11bb0482cbf2303b1 fixes: bz#1560319 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* features/index: Choose different base file on EMLINK errorPranith Kumar K2018-04-061-18/+9
| | | | | | | Change-Id: I4648816af908539efdc2528608aa2ebf7f0d0e2f fixes: bz#1559004 BUG: 1559004 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/ec: Turn ON the stripe-cache option by defaultAshish Pandey2018-04-061-1/+1
| | | | | | Change-Id: I0a290396c30c635b13ee73004d20259efb76a954 fixes: bz#1563945 Signed-off-by: Ashish Pandey <aspandey@redhat.com>