summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd/src/glusterd-volume-set.c
Commit message (Collapse)AuthorAgeFilesLines
* cluster/dht: avoid overwriting client writes during migrationSusant Palai2018-02-061-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 545a7ce6762a1b3a7b989b43a9d18b5b1b299df0)
* core: remove experimental xlators and associated testsKaleb S. KEITHLEY2018-01-311-22/+0
| | | | | | | | | | | | experimental xlators removed from 4.0 > Cherry picked from 4231c40973c60999f5ef759db450d25e129ef6ba: > Reviewed-on: https://review.gluster.org/17953 > Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Signed-off-by: ShyamsundarR <srangana@redhat.com> Change-Id: I34419ce22ca09b7626b8f9382c377a614fd9fed8 BUG: 1539842
* protocol: Remove lock recovery logic from client and serverAnoop C S2018-01-311-24/+0
| | | | | | | Change-Id: I27f5e1e34fe3eac96c7dd88e90753fb5d3d14550 BUG: 1540438 Signed-off-by: Anoop C S <anoopcs@redhat.com> (cherry picked from commit 3e78ea991b213422fc423ff94994e1eb295569c7)
* dentry fop serializer: added new server side xlator for dentry fop serializationSakshi Bansal2018-01-301-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: #397 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
* md-cache: Implement dynamic configuration of xattr list for cachingPoornima G2018-01-221-1/+8
| | | | | | | | | | | | | | | Currently, the list of xattrs that md-cache can cache is hard coded in the md-cache.c file, this necessiates code change and rebuild everytime a new xattr needs to be added to md-cache xattr cache list. With this patch, the user will be able to configure a comma seperated list of xattrs to be cached by md-cache Updates #297 Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e Signed-off-by: Poornima G <pgurusid@redhat.com>
* cluster/afr: Adding option to take full file lockkarthik-us2018-01-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In replica 3 volumes there is a possibilities of ending up in split brain scenario, when multiple clients writing data on the same file at non overlapping regions in parallel. Scenario: - Initially all the copies are good and all the clients gets the value of data readables as all good. - Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks. - C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks. - All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain. Fix: Have an option to take either full lock or range lock on files while doing data transactions, to prevent the possibility of ending up in split brains. With this change, by default the files will take full lock while doing IO. If you want to make use of the old range lock change the value of "cluster.full-lock" to "no". Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962 BUG: 1535438 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* locks: added inodelk/entrylk contention upcall notificationsXavier Hernandez2018-01-161-0/+20
| | | | | | | | | | | | | | The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
* glusterd: fix up volume option flagsCsaba Henk2018-01-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | In glusterd volfile generation code options should be ornamented with the VOLOPT_FLAG_* flags. However, some are ornamented with OPT_FLAG_* flags (which are to be used in xlator context). The impact is: the OPT_FLAG_* that occurs is OPT_FLAG_CLIENT_OPT, which has the same value as VOLOPT_FLAG_XLATOR_OPT, so what was meant is "option affects clients" and what was there means "option enables/disables xlators". Because of this semantic shift, op version might be incorrectly calculated for volumes and clients. (At this point it's a theoretical possibility. Actual occurrence might depend on connecting client & server versions; it's also possible that there exists a proof of concept scenario but it's irrealistic.) This commit eliminates the OPT_FLAG_* occurrences from glusterd code, and replaces them with the appropriate VOLOPT_FLAG_* flags. Change-Id: Ia4e6fbac738d5a8d889c0f5561c4dea6783250b1 Signed-off-by: Csaba Henk <csaba@redhat.com>
* mgmt/glusterd: Adding validation for setting quorum-countkarthik-us2017-12-291-5/+40
| | | | | | | | | | | In a replicated volume it was allowing to set the quorum-count value between the range [1 - 2147483647]. This patch adds validation for allowing only maximum of replica_count number of quorum-count value to be set on a volume. Change-Id: I13952f3c6cf498c9f2b91161503fc0fba9d94898 BUG: 1529515 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* write-behind: Allow trickling-writes to be configurableCsaba Henk2017-12-131-0/+12
| | | | | | | | | | | | | | This is the undisputed/trivial part of Shreyas' patch he attached to https://bugzilla.redhat.com/1364740 (of which the current bug is a clone). We need more evaluation for the page_size and window_size bits before taking them on. Change-Id: Iaa0b9a69d35e522b77a52a09acef47460e8ae3e9 BUG: 1428060 Co-authored-by: Shreyas Siravara <sshreyas@fb.com> Signed-off-by: Csaba Henk <csaba@redhat.com>
* quick-read: Integrate quick read with upcall and increase cache timePoornima G2017-12-131-0/+12
| | | | | | | Fixes : #261 Co-author: Subha sree Mohankumar <smohanku@redhat.com> Change-Id: Ie9dd94e86459123663b9b200d92940625ef68eab Signed-off-by: Poornima G <pgurusid@redhat.com>
* md-cache: Cache statfs callsShreyas Siravara2017-12-121-0/+6
| | | | | | | | | | | | | | | Summary: - This gives md-cache to cache statfs calls - You can turn it on or off via 'gluster vol set groot performance.md-cache-statfs <on|off>' Change-Id: I664579e3c19fb9a6cd9d7b3a0eae061f70f4def4 BUG: 1523295 Signature: t1:4652632:1488581841:111cc01efe83c71f1e98d075abb10589c4574705 Reviewed-on: https://review.gluster.org/18228 Reviewed-by: Shreyas Siravara <sshreyas@fb.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
* nfs: Enable multi-core epoll support in gNFSdShreyas Siravara2017-12-081-0/+6
| | | | | | Change-Id: Ie8a7b1ba04b0e83f5ec7a09f9d181fe59be479ca BUG: 1522847 Signed-off-by: Shreyas Siravara <sshreyas@fb.com>
* storage/posix: Add limit to number of hard linksShreyas Siravara2017-12-081-0/+5
| | | | | | | | | | | | Summary: Too may hard links blow up btrfs by exceeding max xattr size (recordign pgfid for each hardlink). Add a limit to prevent this explosion. > Reviewed-on: https://review.gluster.org/18232 > Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Fixes gluster/glusterfs#370 Signed-off-by: ShyamsundarR <srangana@redhat.com> Change-Id: I614a247834fb8f2b2743c0c67d11cefafff0dbaa
* storage/posix : options to override umaskSubha sree Mohankumar2017-12-021-1/+21
| | | | | | | | | | | | | | | | | | | | | Options "create-mask" and "create-directory-mask" are added to remove the mode bits set on a file or directory when its created. Default value of these options is 0777. Options "force-create-mode" and "force-create-directory" sets the default permission for a file or directory irrespective of the clients umask. Default value of these options is 0000. Command to set option: volume set <volume name> storage.<option-name> <value> The valid value range from 0000 to 0777. Updates #301 Change-Id: Ia33d13f2117202ca55a056c747ccc3674eb8bae1 Signed-off-by: Subha sree Mohankumar <smohanku@redhat.com>
* posix: Convert posix_fs_health_check asynchrnously to save timestampMohit Agrawal2017-12-011-0/+5
| | | | | | | | | | | Problem: Sometime posix_fs_health_check thread is blocked on write/read call while backend device deleted abruptly. Solution: To resolve it convert code to update timestamp asynchrnously. BUG: 1501132 Change-Id: Id68ea6a572bf68fbf437e1d9be5221b63d47ff9c Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/ec: Modify OP_VERSION to 4.0.0 for stripe cache optionAshish Pandey2017-11-291-1/+1
| | | | | | Change-Id: I991eaeb979497a1bf056b5871284274f959f36f2 BUG: 1471753 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* posix: Change GD_OP_VERSION to 3_13_0 from 3_12_0 for storage.reserveMohit Agrawal2017-11-291-1/+1
| | | | | | | | | | | Problem: Change GD_OP_VERSION to 3_13_0 from 3_12_0 for option storage.reserve Solution: Actually feature was merged in 3.13.0 branch so GD_OP_VERSION needs to change from 3_12_0 to 3_13_0 BUG: 1518508 Change-Id: I3890a3e921847d896465ce456fee003efaeb0c61 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* features/worm: new config option to manage deletion of Worm files.Vishal Pandey2017-11-201-0/+8
| | | | | | | | | | | | | | | | | | | | Add a new configuration option worm-files-deletable to file-level Worm in order to control behaviour of Worm files upon deletion. Steps to Test: 1. Add all the configuration options to a volume to activate file-level-worm 2. Option features.worm-files-deletable is set to 1 by default. 3. Create a new file and wait for the retention time to expire. 4. After retention time expires, do an truncate, rename, unlink, link or write to send the file in Worm state. 5. After that do `rm -f filename`. 6. The file is successfully removed. 7. Repeat from step 2 by setting features.worm-files-deletable 0. This time deletion should not be successful. Change-Id: Ibc89861ee296e065330b93a9f9606be5da40af31 BUG: 1508898 Signed-off-by: Vishal Pandey <vishpandey2014@gmail.com>
* cluster/ec: Fix op-version for disperse.other-eager-lockXavier Hernandez2017-11-141-1/+1
| | | | | | | | | The op-version used for the new option was wrong. It has been set to 3.13.0. Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf BUG: 1502610 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Keep last written strip in in-memory cacheAshish Pandey2017-11-101-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Consider an EC volume with configuration 4 + 2. The stripe size for this would be 512 * 4 = 2048. That means, 2048 bytes of user data stored in one stripe. Let's say 2048 + 512 = 2560 bytes are already written on this volume. 512 Bytes would be in second stripe. Now, if there are sequential writes with offset 2560 and of size 1 Byte, we have to read the whole stripe, encode it with 1 Byte and then again have to write it back. Next, write with offset 2561 and size of 1 Byte will again READ-MODIFY-WRITE the whole stripe. This is causing bad performance because of lots of READ request travelling over the network. There are some tools and scenario's where such kind of load is coming and users are not aware of that. Example: fio and zip Solution: One possible solution to deal with this issue is to keep last stripe in memory. This way, we need not to read it again and we can save READ fop going over the network. Considering the above example, we have to keep last 2048 bytes (maximum) in memory per file. Change-Id: I3f95e6fc3ff81953646d374c445a40c6886b0b85 BUG: 1471753 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* glusterd: Changed default op-version for some optionsShyamsundarR2017-11-061-4/+4
| | | | | | | | | | As 3.13 is branched at a point that includes the features that are changed with this commit, their minimum supported op-versions should also change to 3.13. Change-Id: I7ef8eccc13a16f93939c1edbff9508d1e167c5e4 BUG: 1509412 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* cluster/ec: create eager-lock option for non-regular filesXavier Hernandez2017-11-051-0/+5
| | | | | | | | | A new option is added to allow independent configuration of eager locking for regular files and non-regular files. Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 BUG: 1502610 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: Allow parallel writes in EC if possiblePranith Kumar K2017-10-241-0/+6
| | | | | | | | | | | | | | | | | | Problem: Ec at the moment sends one modification fop after another, so if some of the disks become slow, for a while then the wait time for the writes that are waiting in the queue becomes really bad. Fix: Allow parallel writes when possible. For this we need to make 3 changes. 1) Each fop now has range parameters they will be updating. 2) Xattrop is changed to handle parallel xattrop requests where some would be modifying just dirty xattr. 3) Fops that refer to size now take locks and update the locks. Fixes #251 Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* glusterd: documenting server.allow-insecureSanju Rakonde2017-10-181-1/+1
| | | | | | | | | | | | problem: "server.allow-insecure" is invisible in gluster volume set help. Fix: "server.allow-insecure" is defined as NO_DOC type, chainging it to DOC type solve the problem. Change-Id: I327f1e4c1684ff846deb8b7df07d4d8a09073274 BUG: 1503424 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* gfproxyd: Let glusterd manage gfproxy daemonPoornima G2017-10-181-3/+8
| | | | | | | Updates: #242 BUG: 1428063 Change-Id: Iaaf2edf99b2ecc75f6d30762c752a6d445c1c826 Signed-off-by: Poornima G <pgurusid@redhat.com>
* gfproxy: Introduce new server-side daemon called GFProxyShreyas Siravara2017-10-101-0/+5
| | | | | | | | | | | | | | Summmary: Adds a new server-side daemon called gfproxyd & a new FUSE client called gfproxy-client Updates: #242 BUG: 1428063 Change-Id: I83210098d3a381922bc64fed1110ae1b76e6519f Tested-by: Shreyas Siravara <sshreyas@fb.com> Reviewed-by: Kevin Vigor <kvigor@fb.com> Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Signed-off-by: Poornima G <pgurusid@redhat.com>
* glusterd: spelling errors reported by Debian maintainerKaleb S. KEITHLEY2017-09-041-3/+3
| | | | | | | | | | | | Reported-by: "Patrick Matthäi" <pmatthaei@debian.org> Change-Id: I0dd6b7d88ddf3c98e8083b75f8dd848babcfd30a BUG: 1487840 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/18185 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* debug/delay-gen: Implement delay-generation featurePranith Kumar K2017-08-311-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: I was working on a customer issue where the disks were responding some times after seconds. It was becoming very difficult to recreate the issues in our labs, so had to come up with this feature. Requirements: We need an xlator which can delay x% of ops for y micro seconds. We should be able to enable delays for specific fops. This feature is modeled after error-gen. Most of the logic is borrowed from that xlator. This is a minimum implementation of the feature which satisfied the requirements I had. May be in future with more requirements and understanding of the problem further we can improve upon this implementation. Here are the commands and what they do: Enable delay-gen: (This is similar to how err-gen is enabled on the brick side) - gluster volume set <volname> delay-gen posix Set the percentage of fops that need to be delayed - gluster volume set <volname> delay-gen.delay-percentage 50 Default is 10% Set the delay in micro seconds - gluster volume set <volname> delay-gen.delay-duration 500000 Default is 100000 Set comma separated fops to be delayed - gluster v set r2 delay-gen.enable read,write Default is all fops. Fixes #257 Change-Id: Ib547bd39cc024c9cdb63754d21e3aa62fc9d6473 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17591 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* feature/posix: Enabled gfid2path by defaultKotresh HR2017-08-291-0/+1
| | | | | | | | | | | | | | | | Enable gfid2path feature by default. The basic performance tests are carried out and it doesn't show significant depreciation. The results are updated in issue. Updates: #139 Change-Id: I5f1949a608d0827018ef9d548d5d69f3bb7744fd Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17950 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Add new fields to volume_options structKaushal M2017-08-291-172/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The new fields are required to enable equivalent volume set and volgen features, and some more additional features in GD2. GD2 does not use a hard-coded volume options map like GD1, but builds such by reading the options tables directly from the xlators. The new fields being introduced into the volume options struct include the following, - op-version - version(s) the option was introduced in - deprecated - version(s) the option was deprecated in - flags - flags for the option (settable, client, global, force, doc etc.) - tags - descriptive tags that apply to this option, can be used to group options - validate_fn - custom option validation function Enums for currently available flags have also been defined. To avoid a naming clashes, the flag enums in GD1 have been renamed. Updates #302 Change-Id: Ic7e08aef9e051beb47e8dc17d7f7be211aed308a Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: https://review.gluster.org/18059 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* logging: localtime logging, cmdline, volume set optionKaleb S. KEITHLEY2017-08-031-1/+6
| | | | | | | | | | | | | | | | | Despite the fact that appliances generally use UTC, some users really want log entries in localtime. fixes gluster/glusterfs#272 feature page: https://review.gluster.org/17807 Change-Id: I5fbf2c3eedd9eb128fb3f851dd67b2f4081c8bba Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/16911 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* storage/posix: Add virtual xattr to fetch path from gfidKotresh HR2017-07-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The gfid2path infra stores the "pargfid/bname" as on xattr value for each non directory entry. Hardlinks would have a separate xattr. This xattr key is internal and is not exposed to applications. A virtual xattr is exposed for the applications to fetch the path from gfid. Internal xattr: trusted.gfid2path.<xxhash> Virtual xattr: glusterfs.gfidtopath getfattr -h -n glusterfs.gfidtopath /<aux-mnt>/.gfid/<gfid> If there are hardlinks, it returns all the paths separated by ':'. A volume set option is introduced to change the delimiter to required string of max length 7. gluster vol set gfid2path-separator ":::" Updates: #139 Change-Id: Ie3b0c3fd8bd5333c4a27410011e608333918c02a Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17785 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* posix: Needs to reserve disk space to prevent the brick from getting fullMohit Agrawal2017-07-251-0/+4
| | | | | | | | | | | | | | | | | | | | Problem: Currently there is no option available at posix xlator to save the disk from getting full Solution: Introduce a new option storage.reserve at posix xlator to configure disk threshold.posix xlator spawn a thread to update the disk space status in posix private structure and same flag is checked by every posix fop before start operation.If flag value is 1 then it sets op_errno to ENOSPC and goto out from the fop. BUG: 1471366 Change-Id: I98287cd409860f4c754fc69a332e0521bfb1b67e Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17780 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* glusterd: Set default value for cluster.max-bricks-per-process to 0Samikshan Bairagya2017-07-191-4/+15
| | | | | | | | | | | | | | | | | | | When brick-multiplexing is enabled, and "cluster.max-bricks-per-process" isn't explicitly set, multiplexing happens without any limit set. But the default value set for that tunable is 1, which is confusing. This commit sets the default value to 0, and prevents the user from being able to set this value to 1 when brick-multiplexing is enbaled. The default value of 0 denotes that brick-multiplexing can happen without any limit on the number of bricks per process. Change-Id: I4647f7bf5837d520075dc5c19a6e75bc1bba258b BUG: 1472417 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17819 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Add description field to global options for brick-muxSamikshan Bairagya2017-07-171-2/+12
| | | | | | | | | | | | | | | | Currently the "cluster.brick-multiplex" and "cluster.max-bricks-per-process" options do not show anything in the description field when gluster volume set help is called. This commit adds the description fields for these 2 options. Change-Id: I3d162c61fa2774dd994f046e305d457f0fd43192 BUG: 1471790 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17790 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Gaurav Yadav <gyadav@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* core: miscellaneous cleanupKaleb S. KEITHLEY2017-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | clean up things that I tripped over doing other changes. 1) fix mishmash of random spacing in struct decls in glusterfs.h. Not technically a problem, just ugly to look at. 2) replace open-coded strings constants with existing #define constants. A disaster waiting to happen. 3) Use sys_access() instead of sys_stat() or sys_lstat() to test simple existence of file. Why copy dozens of bytes from kernel to user space that aren't going to be used by anything? There are probably more instances like these. Change-Id: I28089bef4cc93d5e4e4213045fb1a2649d110f82 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17769 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* storage/posix: New gfid2path infraKotresh HR2017-07-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | With this infra, a new xattr is stored on each entry creation as below. trusted.gfid2path.<xxhash> = <pargfid>/<basename> If there are hardlinks, multiple xattrs would be present. Fops which are impacted: create, mknod, link, symlink, rename, unlink Option to enable: gluster vol set <VOLNAME> storage.gfid2path on Updates: #139 Change-Id: I369974cd16703c45ee87f82e6c2ff5a987a6cc6a Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: https://review.gluster.org/17488 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* glusterd: Introduce option to limit no. of muxed bricks per processSamikshan Bairagya2017-07-101-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces a new global option that can be set to limit the number of multiplexed bricks in one process. Usage: `# gluster volume set all cluster.max-bricks-per-process <value>` If this option is not set then multiplexing will happen for now with no limitations set; i.e. a brick process will have as many bricks multiplexed to it as possible. In other words the current multiplexing behaviour won't change if this option isn't set to any value. This commit also introduces a brick process instance that contains information about brick processes, like the number of bricks handled by the process (which is 1 in non-multiplexing cases), list of bricks, and port number which also serves as an unique identifier for each brick process instance. The brick process list is maintained in 'glusterd_conf_t'. Updates: #151 Change-Id: Ib987d14ab0a4f6034dac01b73a4b2839f7b0b695 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17469 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* core: assorted typos and spelling mistakes from Debian lintianKaleb S. KEITHLEY2017-07-031-4/+5
| | | | | | | | | | | | | | Plus minor readability improvements. Reported-by: pmatthaei@debian.org Change-Id: I5393819a2fc9f240a19811143bb57b127df717cf BUG: 1466785 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17660 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* debug/io-stats: Provide option to select stats output formatKrutika Dhananjay2017-06-151-0/+5
| | | | | | | | | | | | | | ... as opposed to hardcoding it to "json" always. Change-Id: I5e79473a514373145ad764f24bb6219a6983a4c6 BUG: 1458197 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: https://review.gluster.org/17451 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* nl-cache: Fix a possible crash and stale cachePoornima G2017-06-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue1: Consider the followinf sequence of operations: ... nlc_ctx = nlc_ctx_get (inode i1) ....... -> nlc_clear_cache (i1) gets called as a part of nlc_invalidate or any other callers ... GF_FREE (ii nlc_ctx) LOCK (nlc_ctx->lock); -> This will result in crash as the ctx got freed in nlc_clear_cache. Issue2: lookup on dir1/file1 result in ENOENT add cache to dir1 at time T1 .... CHILD_DOWN at T2 lookup on dir1/file2 result in ENOENT add cache to dir1, but the cache time is still T1 lookup on dir1/file2 - should have been served from cache but the cache time is T1 < T2, hence cache is considered as invalid. So, after CHILD_DOWN the right thing would be to clear the cache and restart caching on that inode. Solution: Do not free nlc_ctx in nlc_clear_cache, but only in inode_forget() The fix for both issue1 and 2 is interleaved hence sending it as single patch. Change-Id: I83d8ed36c049a93567c6d7e63d045dc14ccbb397 BUG: 1458539 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17453 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* protocol/server: make listen backlog value as configurableMohammed Rafi KC2017-06-081-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | problem: When we call listen from protocol/server, we are giving a hard coded valie of 10 if it is not manually given. With multiplexing, especially when glusterd restarts all clients may try to connect to the server at a time. Which will result in overflowing the queue, and kernel will complain about the errors. Solution: This patch will introduce a volume set command to make backlog value as a configurable. This patch also changes the default values for backlog from 10 to 128. This changes is only applicable for sockets listening from protocol. Example: gluster volume set <volname> transport.listen-backlog 1024 Note: 1 Brick has to be restarted to get this value in effect 2 This changes won't be reflected in glusterd, or other xlators which calls listen. If you need, you have to add this option to the volfile. Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477 BUG: 1456405 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17411 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* core: fix spelling errorsKaleb S. KEITHLEY2017-06-021-3/+3
| | | | | | | | | | | | | | fixes for various minor spelling errors and typos Reported-by: Patrick Matthäi <pmatthaei@debian.org> Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5 BUG: 1457808 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17442 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* rda, glusterd: Change the max of rda-cache-limit to INFINITYPoornima G2017-05-211-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The max value of rda-cache-limit is 1GB before this patch. When parallel-readdir is enabled, there will be many instances of readdir-ahead, hence the rda-cache-limit depends on the number of instances. Eg: On a volume with distribute count 4, rda-cache-limit when parallel-readdir is enabled, will be 4GB instead of 1GB. Consider a followinf sequence of operations: - Enable parallel readdir - Set rda-cache-limit to lets say 3GB - Disable parallel-readdir, this results in one instance of readdir-ahead and the rda-cache-limit will be back to 1GB, but the current value is 3GB and hence the mount will stop working as 3GB > max 1GB. Solution: To fix this, we can limit the cache to 1GB even when parallel-readdir is enabled. But there is no necessity to limit the cache to 1GB, it can be increased if the system has enough resources. Hence getting rid of the rda-cache-limit max value is more apt. If we just change the rda-cache-limit max to INFINITY, we will render older(<3.11) clients broken, when the rda-cache-limit is set to > 1GB (as the older clients still expect a value < 1GB). To safely change the max value of rda-cache-limit to INFINITY, add a check in glusted to verify all the clients are > 3.11 if the value exceeds 1GB. Change-Id: Id0cdda3b053287b659c7bf511b13db2e45b92032 BUG: 1446516 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17338 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: remove useless options from glusterd's volume set tableZhou Zhengping2017-05-171-13/+0
| | | | | | | | | | | | | | | | These options will cause brick's log complains: _log_if_unknown_option] 0-patchy-quota: option 'timeout' is not recognized _log_if_unknown_option] 0-patchy-server: option 'ping-timeout' is not recognized Change-Id: Ida2add13f792736a4e52bfaf38d1169309283a3f BUG: 1449008 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17213 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* Tier: Watermark check for hi and low value being equalhari gowtham2017-05-081-2/+3
| | | | | | | | | | | | | | | | | | | Problem: Both low and hi watermark can be set to same value as the check missed the case for being equal. Fix: Add the check to both the hi and low values being equal along with the low value being higher than hi value. Change-Id: Ia235163aeefdcb2a059e2e58a5cfd8fb7f1a4c64 BUG: 1447960 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: https://review.gluster.org/17175 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* SELinux : implementation of SELinux translatorManikandan Selvaganesh2017-05-031-0/+11
| | | | | | | | | | | | | | | | | | | | The patch implement a part of SELinux translator to support setting SELinux contexts on files in a glusterfs volume. URL: https://github.com/gluster/glusterfs-specs/blob/master/accepted/SELinux-client-support.md Change-Id: Id8916bd8e064ccf74ba86225ead95f86dc5a1a25 BUG: 1318100 Fixes : #55 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/13762 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Halo Replication feature for AFR translatorKevin Vigor2017-05-021-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Halo Geo-replication is a feature which allows Gluster or NFS clients to write locally to their region (as defined by a latency "halo" or threshold if you like), and have their writes asynchronously propagate from their origin to the rest of the cluster. Clients can also write synchronously to the cluster simply by specifying a halo-latency which is very large (e.g. 10seconds) which will include all bricks. In other words, it allows clients to decide at mount time if they desire synchronous or asynchronous IO into a cluster and the cluster can support both of these modes to any number of clients simultaneously. There are a few new volume options due to this feature: halo-shd-latency: The threshold below which self-heal daemons will consider children (bricks) connected. halo-nfsd-latency: The threshold below which NFS daemons will consider children (bricks) connected. halo-latency: The threshold below which all other clients will consider children (bricks) connected. halo-min-replicas: The minimum number of replicas which are to be enforced regardless of latency specified in the above 3 options. If the number of children falls below this threshold the next best (chosen by latency) shall be swapped in. New FUSE mount options: halo-latency & halo-min-replicas: As descripted above. This feature combined with multi-threaded SHD support (D1271745) results in some pretty cool geo-replication possibilities. Operational Notes: - Global consistency is gaurenteed for synchronous clients, this is provided by the existing entry-locking mechanism. - Asynchronous clients on the other hand and merely consistent to their region. Writes & deletes will be protected via entry-locks as usual preventing concurrent writes into files which are undergoing replication. Read operations on the other hand should never block. - Writes are allowed from _any_ region and propagated from the origin to all other regions. The take away from this is care should be taken to ensure multiple writers do not write the same files resulting in a gfid split-brain which will require resolution via split-brain policies (majority, mtime & size). Recommended method for preventing this is using the nfs-auth feature to define which region for each share has RW permissions, tiers not in the origin region should have RO perms. TODO: - Synchronous clients (including the SHD) should choose clients from their own region as preferred sources for reads. Most of the plumbing is in place for this via the child_latency array. - Better GFID split brain handling & better dent type split brain handling (i.e. create a trash can and move the offending files into it). - Tagging in addition to latency as a means of defining which children you wish to synchronously write to Test Plan: - The usual suspects, clang, gcc w/ address sanitizer & valgrind - Prove tests Reviewers: jackl, dph, cjh, meyering Reviewed By: meyering Subscribers: ethanr Differential Revision: https://phabricator.fb.com/D1272053 Tasks: 4117827 Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1 BUG: 1428061 Signed-off-by: Kevin Vigor <kvigor@fb.com> Reviewed-on: http://review.gluster.org/16099 Reviewed-on: https://review.gluster.org/16177 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/dht: Make rebalance throttle option tuned by numberSusant Palai2017-04-291-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current rebalance throttle options: lazy/normal/aggressive may not always be sufficient for the purpose of throttling. In our recent test, we observed for certain setups, normal and aggressive modes behaved similarly consuming full disk bandwidth. So in cases like this admin should be able to tune it down(or vice versa) depending on the need. Along with old throttle configurations, thread counts are tuned based on number. e.g. gluster v set vol-name cluster-rebal.throttle 5. Admin can tune up/down between 0 and the number of cores available. Note: For heterogenous servers, validation will fail on the old server if "number" is given for throttle configuration. The message looks something like this: "volume set: failed: Staging failed on vm2. Error: cluster.rebal-throttle should be {lazy|normal|aggressive}" Test: Manual test by logging active thread number after reconfiguring throttle option. testcase: tests/basic/distribute/throttle-rebal.t Change-Id: I46e3cde546900307831028b344ecf601fd9b02c3 BUG: 1438370 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/16980 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>