summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd-volume-set.c
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: remove experimental xlator options from glusterd-volume-set.cSanju Rakonde2019-02-271-20/+0
| | | | | | | | | | | experimental xlators have been removed from the codebase. But we missed to remove the options related to experimental xlators from the codebase. This patch removes those options. fixes: bz#1683506 Change-Id: I3fa7e14c6cd8ebde5cebc8d2b0cb2409bf37c1ae Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 5cddd4d758014fe116d9c130632eada2ecded88c)
* performance/md-cache: introduce an option to control invalidation of inodesRaghavendra Gowdappa2019-02-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Explicit invalidation by calling inode_invalidate is necessary when same (meta)data is shared/access across multiple mounts. Without an explicit inode_invalidate call, caches in the mount which didn't witness writes wouldn't be aware of changes as writes wouldn't have passed through them. However, if (meta)data is not shared, all relevant I/O goes through the cache of single mount and hence is coherent with (meta)data on bricks always. So, explicit inode invalidation can be disabled for this case which gives a huge performance boost for workloads that write data and then immediately read the data they just wrote. Note that otherwise, local writes (which pass through the cache) will change ctime and cause unnecessary invalidations. The name of the option that controls this behavior is "performance.global-cache-invalidation". This option is global and it purges caches both in glusterfs and kernel stack for native FUSE mounts. For non-native FUSE mounts, it purges cache only from glusterfs stack. This option is effective only when performance.stat-prefetch is on. Note that there is a similar option "performance.cache-invalidation", but the scope of that option is limited to quick-read and md-cache. Change-Id: I462bb4b65ff9aae1f6ba76f50b1f2f94fb10323b Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> updates: bz#1674364 (cherry picked from commit 2b5aa4489de2017a03bcb6ec8986286f0c76a670)
* features/sdfs: disable by defaultAmar Tumballi2019-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the feature enabled, some of the performance testing results, specially those which create millions of small files, got approximately 4x regression compared to version before enabling this. On master without this patch: 765 creates/sec On master with this patch : 3380 creates/sec Also there seems to be regression caused by this in 'ls -l' workload. On master without this patch: 3030 files/sec On master with this patch : 16610 files/sec This is a feature added to handle multiple clients parallely operating (specially those which race for file creates with same name) on a single namespace/directory. Considering that is < 3% of Gluster's usecase right now, it makes sense to disable the feature by default, so we don't penalize the default users who doesn't bother about this usecase. Also note that the client side translators, specially, distribute, replicate and disperse already handle the issue upto 99.5% of the cases without SDFS, so it makes sense to keep the feature disabled by default. Credits: Shyamsunder <srangana@redhat.com> for running the tests and getting the numbers. Change-Id: Iec49ce1d82e621e9db25eb633fcb1d932e74f4fc Updates: bz#1670031 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* locks/fencing: Add a security knob for fencingSusant Palai2019-01-221-0/+8
| | | | | | | | | | | | | There is a low level security issue with fencing since one client can preempt another client's lock. This patch does not completely eliminate the issue of a client misbehaving, but certainly it adds a security layer for default use cases that does not need fencing. Change-Id: I55cd15f2ed1ae0f2556e3d27a2ef4bc10fdada1c updates: #466 Signed-off-by: Susant Palai <spalai@redhat.com>
* performance/ob: make open-behind as a child of quick-readRaghavendra Gowdappa2018-12-181-9/+7
| | | | | | | | | | | | | | | | | With read-after-open being set to yes by default, if open-behind sees any reads, it'll do an open on backend (and hence flush/release later). This means with the current order of quick-read and open-behind, open-behind sees all reads and hence also does open bringing down performance for small file reads. Since for small files, reads are absorbed by quick-read, if quick-read is made a parent of open-behind, ob doesn't witness any reads. For read-only workloads, this means ob doen't do any opens (even with read-after-open yes and use-anonymous-fd no). Change-Id: I138a42b006d104cff43ee6f07829e39c36f6f234 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Fixes: bz#1659327
* selinux/glusterd : add "features.selinux" to glusterd-volume-set.cJiffin Tony Thottan2018-12-171-0/+9
| | | | | | Fixes: bz#1659868 Change-Id: I38675ba4d47c8ba7f94cfb4734692683ddb3dcfd Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* glusterd: fix get_mux_limit_per_process to read default valueAtin Mukherjee2018-12-071-1/+1
| | | | | | | | | | | get_mux_limit_per_process () reads the global option dictionary and in case it doesn't find out a key, assumes that cluster.max-bricks-per-process option isn't configured however the default value should be picked up in such case. Change-Id: I35dd8da084adbf59793d58557e818d8e6c17f9f3 Fixes: bz#1656951 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* protocol/server: support server.all-squashXie Changlong2018-12-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | We still use gnfs on our side, so do a little work to support server.all-squash. Just like server.root-squash, it's also a volume wide option. Also see bz#1285126 $ gluster volume set <VOLNAME> server.all-squash on Note: If you enable server.root-squash and server.all-squash at the same time, only server.all-squash works. Please refer to following table +---------------+-----------------+---------------------------+ | |all_squash | no_all_squash | +-------------------------------------------------------------+ | | |anonuid/anongid for root | |root_squash |anonuid/anongid |useruid/usergid for no-root| +-------------------------------------------------------------+ |no_root_squash |anonuid/anongid |useruid/usergid | +-------------------------------------------------------------+ Updates bz#1285126 Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com> Signed-off-by: Xue Chuanyu <xuechuanyu@cmss.chinamobile.com> Change-Id: Iea043318fe6e9a75fa92b396737985062a26b47e
* glusterd: make max-bricks-per-process default value to 250Atin Mukherjee2018-11-251-1/+1
| | | | | | Change-Id: Ia2c6a10e2b76a4aa8bd4ea97e5ce33bdc813942e Fixes: bz#1652118 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* core: create a constant for default network timeoutXavi Hernandez2018-11-231-2/+0
| | | | | | | | | | A new constant named GF_NETWORK_TIMEOUT has been defined and all references to the hard-coded timeout of 42 seconds have been replaced with this constant. Change-Id: Id30f5ce4f1230f9288d9e300538624bcf1a6da27 fixes: bz#1652852 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* core: Retrieving the value of "client.ssl" option, before SSL is set up, failsSheetal Pamecha2018-11-201-0/+2
| | | | | | | | Added a default value "off" for (client|server).ssl fixes: bz#1651059 Change-Id: I3d9c80093ac471d9d770fbd6c67f945491cf726e Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
* ctime: Enable ctime feature by defaultKotresh HR2018-11-111-3/+3
| | | | | | | | | | | | | | | | | | | | This patch does following. 1. Enable ctime feature by default. 2. Earlier, to enable the ctime feature, two options needed to be enabled a. gluster vol set <volname> utime on b. gluster vol set <volname> ctime on This is inconvenient from the usability point of view. Hence changed it to following single option a. gluster vol set <volname> ctime on fixes: bz#1624724 Change-Id: I04af0e5de1ea6126c58a06ba8a26e22f9f06344e Signed-off-by: Kotresh HR <khiremat@redhat.com>
* tiering: remove the translator from build and glusterdAmar Tumballi2018-11-021-723/+0
| | | | | | | | | | | | | Based on the proposal to remove few features as they are not actively maintained [1], removing tier translator from the build. Also make sure there are no regression tests involving tiering feature are present. [1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html Change-Id: I2c177f711f9b54b7b24e1a13525ff3132bd9a9c5 updates: bz#1642807 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* stripe: remove the translator from build and glusterdAmar Tumballi2018-10-311-43/+0
| | | | | | | | | | | | | | | | Based on the proposal to remove few features as they are not actively maintained [1], removing stripe translator from the build. Also make sure there are no regression tests involving stripe translator. [1] https://lists.gluster.org/pipermail/gluster-users/2018-July/034400.html Note that this patch aims at removing the translator from build, and a followup patch is needed to remove the code from repository. Updates: bz#1364707 Change-Id: I235b305338f138e29e9f30cba65bc0dadbebbbd5 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* core: auth.ssl-allow has no option descriptionHarpreet Kaur Lalwani2018-10-301-1/+5
| | | | | | | | Added a description for auth.ssl-allow Change-Id: I50cd7c738007c3d7a1b333dae62dbb5e46a7ee67 fixes: bz#1643349 Signed-off-by: Harpreet Kaur Lalwani <hlalwani@redhat.com>
* ctime: Provide noatime optionKotresh HR2018-09-251-0/+8
| | | | | | | | | | | | | | | | | Most of the applications are {c|m}time dependant and very few are atime dependant. So provide noatime option to not update atime when ctime feature is enabled. Also this option has to be enabled with ctime feature to avoid unnecessary self heal. Since AFR/EC reads data from single subvolume, atime is only updated in one subvolume triggering self heal. updates: bz#1593538 Change-Id: I085fb33c882296545345f5df194cde7b6cbc337e Signed-off-by: Kotresh HR <khiremat@redhat.com>
* glusterd: Update op-version from 4.2 to 5.0ShyamsundarR2018-09-131-9/+9
| | | | | | | | | | | | Post changing the max op-version to 4.2, after release 4.1 branching, the decision was to go with increasing release numbers. Thus this needs to change to 5.0. This commit addresses the above change. Fixes: bz#1628664 Change-Id: Ifcc0c6da90fdd51e4eceea40749511110a432cce Signed-off-by: ShyamsundarR <srangana@redhat.com>
* Land part 2 of clang-format changesGluster Ant2018-09-121-3583/+3403
| | | | | Change-Id: Ia84cc24c8924e6d22d02ac15f611c10e26db99b4 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* features/shard: Make lru limit of inode list configurableKrutika Dhananjay2018-07-231-0/+6
| | | | | | | | | | | | | | | Currently this lru limit is hard-coded to 16384. This patch makes it configurable to make it easier to hit the lru limit and enable testing of different cases that arise when the limit is reached. The option is features.shard-lru-limit. It is by design allowed to be configured only in init() but not in reconfigure(). This is to avoid all the complexity associated with eviction of least recently used shards when the list is shrunk. Change-Id: Ifdcc2099f634314fafe8444e2d676e192e89e295 updates: bz#1605056 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* All: run codespell on the code and fix issues.Yaniv Kaul2018-07-221-4/+4
| | | | | | | | | | | | Please review, it's not always just the comments that were fixed. I've had to revert of course all calls to creat() that were changed to create() ... Only compile-tested! Change-Id: I7d02e82d9766e272a7fd9cc68e51901d69e5aab5 updates: bz#1193929 Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
* glusterd: Introduce daemon-log-level cluster wide optionAtin Mukherjee2018-07-031-0/+6
| | | | | | | | | | | | This option, applicable to the node level daemons can be very helpful in controlling the log level of these services. Please note any daemon which is started prior to setting the specific value of this option (if not INFO) will need to go through a restart to have this change into effect. Change-Id: I7f6d2620bab2b094c737f5cc816bc093e9c9c4c9 fixes: bz#1597473 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* features/shard: Perform shards deletion in the backgroundKrutika Dhananjay2018-06-201-0/+5
| | | | | | | | | | | | | | | A synctask is created that would scan the indices from .shard/.remove_me, to delete the shards associated with the gfid corresponding to the index bname and the rate of deletion is controlled by the option features.shard-deletion-rate whose default value is 100. The task is launched on two accounts: 1. when shard receives its first-ever lookup on the volume 2. when a rename or unlink deleted an inode Change-Id: Ia83117230c9dd7d0d9cae05235644f8475e97bc3 updates: bz#1568521 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
* performance/quick-read: provide an invalidation based on ctimeRaghavendra G2018-06-181-0/+6
| | | | | | | | | | | | | | | | | | | | Quick-read by default uses mtime to identify changes to file data. However there are applications like rsync which explicitly set mtime making it unreliable for the purpose of identifying change in file content. Since ctime also changes when content of a file changes and it cannot be set explicitly, it becomes suitable for identifying staleness of cached data. This option makes quick-read to prefer ctime over mtime to validate its cache. However, using ctime can result in false positives as ctime changes with just attribute changes like permission without changes to file data. So, use this option only when mtime is not reliable. credits to Kotresh Hiremath Ravishankar <khiremat@redhat.com> for suggestion on using ctime instead of mtime. Change-Id: Ib3ae39a3252b2876c8ffe81f471d02a87190e9b9 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Updates: bz#1591621
* cloudsync: Adding s3 plugin for cloudsyncSusant Palai2018-05-301-1/+21
| | | | | | | | | | | | | | | | | | This is a plugin which provides an interface to retrive files from amazon-s3 which are archived in to s3. Users need to give the above information for cloudsync to retrieve the file from s3. TODO: 1- A separate commit in to developer-guide will detail about the usage of this plugin in more detail. 2- Need to create target file in aws-bucket with "gfid" names. Helps avoiding name collisions. Change-Id: I2e4a586f4e3f86164de9178e37673a07f317e7d9 Updates: #387 Signed-off-by: Susant Palai <spalai@redhat.com>
* sdfs: enable by defaultAmar Tumballi2018-05-241-2/+1
| | | | | | | | also provide an option for pass-through to enable/disable xlator fixes: #421 Change-Id: Ie30a91ad09620db62ab07b797e23123fd1200d1f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* features/cloudsync: Make plugins configurableSusant Palai2018-05-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch brings the configuration option for plugins. For new plugins, an entry has to be created in to cs_plugin structure e.g. struct cs_plugin plugins[] = { { .name = "amazons3", .library = "libamazons3.so", .description = "amazon s3 store." }, {.name = NULL}, }; Library field describes the name of the shared library for the plugin. To configure plugin type "feature.cloudsync-storetype" option need to be set to the remote-store type. e.g. gluster volume set VOLNAME cloudsync-storetype amazons3. This should be same as the ".name" field in cs_plugin structure. cs_init will pick this up in run time to load the plugin. Change-Id: I2cec10b206f71ac4e71d472631a3a5badf278b59 fixes: bz#1576842 Signed-off-by: Susant Palai <spalai@redhat.com>
* glusterd/ctime: Provide option to enable/disable ctime featureKotresh HR2018-05-061-0/+5
| | | | | | Updates: #208 Change-Id: If6f52b9b1b5b823ad64faeed662e96ceb848c54c Signed-off-by: Kotresh HR <khiremat@redhat.com>
* glusterd/utime: glusterd utime changesKotresh HR2018-05-061-0/+9
| | | | | | | | | Load utime xlator in the client side just after (below) performance xlators. Updates: #208 Change-Id: Ie15f156943fa8e7dac7050e5479c906da747b568 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* glusterd: update listen-backlog value to 1024Milind Changire2018-04-181-1/+1
| | | | | | | | | | | | Update default value of listen-backlog to 1024 to reflect the changes in socket.c This keeps the actual implementation in socket.c and the help text in glusterd-volume-set.c consistent Change-Id: If04c9e0bb5afb55edcc7ca57bbc10922b85b7075 fixes: bz#1564600 Signed-off-by: Milind Changire <mchangir@redhat.com>
* xlators/performance: Add pass-through optionVarsha Rao2018-04-111-1/+36
| | | | | | | | | | Add pass-through option in performance traslators. Set the option in GF_OPTION_INIT() and GF_OPTION_RECONF() Updates: #304 Change-Id: If1537450147d154905831e36f7162a32866d7ad6 Signed-off-by: Varsha Rao <varao@redhat.com>
* experimental/cloudsync: Download xlator for archival featureSusant Palai2018-04-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | spec-files: https://review.gluster.org/#/c/18854/ Overview: * Cloudsync maintains three file states in it's inode-ctx i.e 1 - LOCAL, 2 - REMOTE, 3 - DOWNLOADING. * A data modifying fop is allowed only if the state is LOCAL. If the state is REMOTE or DOWNLOADING, client will download or wait for the download to finish initiated by other client. * Multiple download and upload from different clients are synchronized by inodelk. * In POSIX a state check is done (part of different commit)before allowing the fop to continue. If the state is remote/downloading the fop is unwound with EREMOTE. The client will then download the file and continue with the fop again. * Basic Algo for fop (let's say write fop): - If LOCAL -> resume fop - If REMOTE -> - INODELK - STAT (this gets state and heal the state if needed) - DOWNLOAD - resume fop Note: * Developers will need to write plugins for download, based on the remote store they choose. In phase-1, support will be added for one remote store per volume. In future, more options for multiple remote stores will be explored. TODOs: - Implement stat/lookup/readdirp to return size info from xattr - Make plugins configurable - Implement unlink fop - Add metrics collection - Add sharding support Design Contributions: Aravinda V K <avishwan@redhat.com> Amar Tumballi <amarts@redhat.com> Ram Ankireddypalle <areddy@commvault.com> Susant Palai <spalai@redhat.com> updates: #387 Change-Id: Iddf711ee7ab4e946ae3e472ff62791a7b85e6d4b Signed-off-by: Susant Palai <spalai@redhat.com>
* perfomance/io-threads: Add option to disable client disconnect featureVarsha Rao2018-02-281-0/+5
| | | | | | | | | | | | | > Add options to disable new features > Commit ID: c071992e8d > https://review.gluster.org/#/c/18291/ > By Michael Goulet <mgoulet@fb.com> This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: Ice477fdf4b8934f9fac0b4a2f6c93db97429a586 Signed-off-by: Varsha Rao <varao@redhat.com>
* write-behind: Make aggregate size configurablePoornima G2018-02-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Currently the aggregate size is by default 128K (page size). From performance perspective small number of large writes is faster than large number of small writes, especially in EC volumes. But identifying the right aggregate size depends on multiple factors like the memcpy overhead, network overhead etc. On local machine, combining 128k writes to 1M writes for EC volumes yielded 30% improvement. As a part of this patch, aggregate size is just made configurable and page_size is modified accordingly. Raghavendra Gowdappa had suggested that, while aggregating writes we should get rid of memcpy of large write size, and instead add the pointer to existinf vector, will be doing it as a part of another patch. Also, in EC volumes, the vectors are merged into one vector, so even if we save memcopy in write_behind, EC would anyways do memcopy for merging vectors into one vector. Updates: #364 Change-Id: Ib67294b8577bea14dde1c84cd271012ecea99f09 Signed-off-by: Poornima G <pgurusid@redhat.com>
* performance/io-threads: Add threads to priority based stagnant queuesVarsha Rao2018-02-221-0/+5
| | | | | | | | | | | | | > performance/io-threads: Add watchdog to cover up a possible thread leak > Commit ID: 8b6804f75c > https://review.gluster.org/#/c/18239/ > By Shreyas Siravara <sshreyas@fb.com> This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: Id057c34a2abb9fc6dfb4afcd5c7bbbfe5693bbb8 Signed-off-by: Varsha Rao <varao@redhat.com>
* xlators/features/namespace: Add namespace xlator and link into brick graphVarsha Rao2018-02-211-0/+11
| | | | | | | | | | | | | | | | | | | | | The following release-3.8-fb branch patch is upstreamed: > features/namespace: Add namespace xlator and link into brick graph > Commit ID: dbd30776f26e > https://review.gluster.org/#/c/18041/ > By Michael Goulet <mgoulet@fb.com> Changes in this patch: Removes extra config.h and namespace.h file in namespace.c Adds default_getspec_cbk to libglusterfs.sym Rename dict_for_each to dict_foreach_inline Remove fd.h header file stack.h Add test case for truncate, open and symlink This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: Ib88c95b89eecee9b8957df8a4c8712c899c761d1 Signed-off-by: Varsha Rao <varao@redhat.com>
* posix/afr: handle backward compatibility for rchecksum fopRavishankar N2018-02-191-0/+7
| | | | | | | | | Added a volume option 'fips-mode-rchecksum' tied to op version 4. If not set, rchecksum fop will use MD5 instead of SHA256. updates: #230 Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: avoid overwriting client writes during migrationSusant Palai2018-02-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>
* quiesce, gfproxy: Implement failover across multiple gfproxy nodesPoornima G2018-01-301-0/+21
| | | | | | Updates: #242 Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69 Signed-off-by: Poornima G <pgurusid@redhat.com>
* core: add some examples of site.h usageJeff Darcy2018-01-301-1/+1
| | | | | Change-Id: I6ce574a593eda8f3a6b2fc8969b5edf7c250b61c Signed-off-by: Jeff Darcy <jdarcy@fb.com>
* protocol: Remove lock recovery logic from client and serverAnoop C S2018-01-291-24/+0
| | | | | | Change-Id: I27f5e1e34fe3eac96c7dd88e90753fb5d3d14550 BUG: 1272030 Signed-off-by: Anoop C S <anoopcs@redhat.com>
* dentry fop serializer: added new server side xlator for dentry fop serializationSakshi Bansal2018-01-241-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: #397 BUG: 1304962 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* md-cache: Implement dynamic configuration of xattr list for cachingPoornima G2018-01-221-1/+8
| | | | | | | | | | | | | | | Currently, the list of xattrs that md-cache can cache is hard coded in the md-cache.c file, this necessiates code change and rebuild everytime a new xattr needs to be added to md-cache xattr cache list. With this patch, the user will be able to configure a comma seperated list of xattrs to be cached by md-cache Updates #297 Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e Signed-off-by: Poornima G <pgurusid@redhat.com>
* cluster/afr: Adding option to take full file lockkarthik-us2018-01-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In replica 3 volumes there is a possibilities of ending up in split brain scenario, when multiple clients writing data on the same file at non overlapping regions in parallel. Scenario: - Initially all the copies are good and all the clients gets the value of data readables as all good. - Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks. - C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks. - All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain. Fix: Have an option to take either full lock or range lock on files while doing data transactions, to prevent the possibility of ending up in split brains. With this change, by default the files will take full lock while doing IO. If you want to make use of the old range lock change the value of "cluster.full-lock" to "no". Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962 BUG: 1535438 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* locks: added inodelk/entrylk contention upcall notificationsXavier Hernandez2018-01-161-0/+20
| | | | | | | | | | | | | | The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* glusterd: fix up volume option flagsCsaba Henk2018-01-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | In glusterd volfile generation code options should be ornamented with the VOLOPT_FLAG_* flags. However, some are ornamented with OPT_FLAG_* flags (which are to be used in xlator context). The impact is: the OPT_FLAG_* that occurs is OPT_FLAG_CLIENT_OPT, which has the same value as VOLOPT_FLAG_XLATOR_OPT, so what was meant is "option affects clients" and what was there means "option enables/disables xlators". Because of this semantic shift, op version might be incorrectly calculated for volumes and clients. (At this point it's a theoretical possibility. Actual occurrence might depend on connecting client & server versions; it's also possible that there exists a proof of concept scenario but it's irrealistic.) This commit eliminates the OPT_FLAG_* occurrences from glusterd code, and replaces them with the appropriate VOLOPT_FLAG_* flags. Change-Id: Ia4e6fbac738d5a8d889c0f5561c4dea6783250b1 Signed-off-by: Csaba Henk <csaba@redhat.com>
* mgmt/glusterd: Adding validation for setting quorum-countkarthik-us2017-12-291-5/+40
| | | | | | | | | | | In a replicated volume it was allowing to set the quorum-count value between the range [1 - 2147483647]. This patch adds validation for allowing only maximum of replica_count number of quorum-count value to be set on a volume. Change-Id: I13952f3c6cf498c9f2b91161503fc0fba9d94898 BUG: 1529515 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* write-behind: Allow trickling-writes to be configurableCsaba Henk2017-12-131-0/+12
| | | | | | | | | | | | | | This is the undisputed/trivial part of Shreyas' patch he attached to https://bugzilla.redhat.com/1364740 (of which the current bug is a clone). We need more evaluation for the page_size and window_size bits before taking them on. Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9 BUG: 1428060 Co-authored-by: Shreyas Siravara <sshreyas@fb.com> Signed-off-by: Csaba Henk <csaba@redhat.com>
* quick-read: Integrate quick read with upcall and increase cache timePoornima G2017-12-131-0/+12
| | | | | | | Fixes : #261 Co-author: Subha sree Mohankumar <smohanku@redhat.com> Change-Id: Ie9dd94e86459123663b9b200d92940625ef68eab Signed-off-by: Poornima G <pgurusid@redhat.com>
* md-cache: Cache statfs callsShreyas Siravara2017-12-121-0/+6
| | | | | | | | | | | | | | | Summary: - This gives md-cache to cache statfs calls - You can turn it on or off via 'gluster vol set groot performance.md-cache-statfs <on|off>' Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4 BUG: 1523295 Signature: t1:4652632:1488581841:111cc01efe83c71f1e98d075abb10589c4574705 Reviewed-on: https://review.gluster.org/18228 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
* nfs: Enable multi-core epoll support in gNFSdShreyas Siravara2017-12-081-0/+6
| | | | | | Change-Id: Ie8a7b1ba04b0e83f5ec7a09f9d181fe59be479ca BUG: 1522847 Signed-off-by: Shreyas Siravara <sshreyas@fb.com>