summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/ec/src/ec.c
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: Fix op-version for disperse.other-eager-lockXavier Hernandez2017-11-161-1/+1
| | | | | | | | | | | | | The op-version used for the new option was wrong. It has been set to 3.13.0. >Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf >BUG: 1502610 >Signed-off-by: Xavier Hernandez <jahernan@redhat.com> Change-Id: I88fbd7834e4a8018c8906303e734c251e90be8cf BUG: 1512460 Signed-off-by: Xavier Hernandez <jahernan@redhat.com>
* cluster/ec: create eager-lock option for non-regular filesXavier Hernandez2017-11-161-12/+25
| | | | | | | | | | | | | A new option is added to allow independent configuration of eager locking for regular files and non-regular files. >Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 >BUG: 1502610 >Signed-off-by: Xavier Hernandez <jahernan@redhat.com> Change-Id: I8f80e46d36d8551011132b15c0fac549b7fb1c60 BUG: 1512460 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* cluster/ec: Implement DISCARD FOP for ECSunil Kumar Acharya2017-10-251-1/+2
| | | | | | | | | | | Updates #254 This code change implements DISCARD FOP support for EC. BUG: 1461018 Change-Id: I09a9cb2aa9d91ec27add4f422dc9074af5b8b2db Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
* cluster/ec: Allow parallel writes in EC if possiblePranith Kumar K2017-10-241-21/+30
| | | | | | | | | | | | | | | | | | Problem: Ec at the moment sends one modification fop after another, so if some of the disks become slow, for a while then the wait time for the writes that are waiting in the queue becomes really bad. Fix: Allow parallel writes when possible. For this we need to make 3 changes. 1) Each fop now has range parameters they will be updating. 2) Xattrop is changed to handle parallel xattrop requests where some would be modifying just dirty xattr. 3) Fops that refer to size now take locks and update the locks. Fixes #251 Change-Id: Ibc3c15372f91bbd6fb617f0d99399b3149fa64b2 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* ec: Increase notification in all the casesAshish Pandey2017-06-281-31/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: "gluster v heal <volname> info" is taking long time to respond when a brick is down. RCA: Heal info command does virtual mount. EC wait for 10 seconds, before sending UP call to upper xlator, to get notification (DOWN or UP) from all the bricks. Currently, we are increasing ec->xl_notify_count based on the current status of the brick. So, if a DOWN event notification has come and brick is already down, we are not increasing ec->xl_notify_count in ec_handle_down. Solution: Handle DOWN even as notification irrespective of what is the current status of brick. Change-Id: I0acac0db7ec7622d4c0584692e88ad52f45a910f BUG: 1464091 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: https://review.gluster.org/17606 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Implement FALLOCATE FOP for ECSunil Kumar Acharya2017-05-231-2/+3
| | | | | | | | | | | | | | | FALLOCATE file operations is not implemented in the existing EC code. This change set implements it for EC. BUG: 1448293 Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/15200 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: return all node uuids from all subvolumesXavier Hernandez2017-05-171-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | EC was retuning the UUID of the brick with smaller value. This had the side effect of not evenly balancing the load between bricks on rebalance operations. This patch modifies the common functions that combine multiple subvolume values into a single result to take into account the subvolume order and, optionally, other subvolumes that could be damaged. This makes easier to add future features where brick order is important. It also makes possible to easily identify the originating brick of each answer, in case some brick will have an special meaning in the future. Change-Id: Iee0a4da710b41224a6dc8e13fa8dcddb36c73a2f BUG: 1366817 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/17297 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Implement self-heal-window_size optionSunil Kumar Acharya2017-04-251-0/+14
| | | | | | | | | | | | | | | | Fix implements the heal window size option for EC. This option control the maximum size of read/write operation carried out in self-heal process. BUG: 1441491 Change-Id: I6c0ef65c9ca18b0828f91b319d4f52ac5b77d0d8 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17098 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Don't mark dirty on entry/meta ops in query-infoPranith Kumar K2017-03-071-3/+2
| | | | | | | | | | | | | | | | | | | | | We wanted to mark dirty for metadata/entry operations whenever query-info is set and info is not yet there because we are anyway sending xattrop over the network. But this is causing 25% regression from 3.8.8 so removing this optimization Also fixed two small issues that we didn't find in the previous patch 1) reconfigure failure was sending return value 0 for optimistic-changelog 2) ec->optimistic_changelog was set to true even before OPTION_INIT BUG: 1408809 Change-Id: Iabb0b64bd4d3623688790e4b67e5c20b4da977a1 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16865 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Introduce optimistic changelog in ECPranith Kumar K2017-03-041-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Fix to https://bugzilla.redhat.com/show_bug.cgi?id=1316873 has made changes to set dirty flag before every update fop, data or metadata, and unset it after successful operation. That makes some of the fops very slow such as entry operations or metadata operations. Solution: File data operations are the only operation which take some time and setting dirty flag before a fop and unsetting it after serves the purpose as probability of failure of a fop is high when the time duration is more. For all the other operations, set dirty flag at the end of the fop, if any brick is down and need heal. Providing following option to choose between high performance or better heal marking for metadata and entry fops. Set/Unset dirty flag for every update fop at the start of the fop. If ON, this option impacts performance of entry operations or metadata operations as it will set dirty flag at the start and unset it at the end of ALL update fop. If OFF and all the bricks are good, dirty flag will be set at the start only for file fops For metadata and entry fops dirty flag will not be set at the start, if all the bricks are good. This does not impact performance for metadata operations and entry operation but has a very small window to miss marking entry as dirty in case it is required to be healed. Thanks to Xavi and Ashish for the design Picked the .t file from Ashish' patch https://review.gluster.org/16298 BUG: 1408809 Change-Id: I3ce860063f0e2901e50754dcfc3e4ed22daf819f Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16821 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Xavier Hernandez <xhernandez@datalab.es> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: fix selinux issues with mmap()Xavier Hernandez2017-02-021-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EC uses mmap() to create a memory area for the dynamic code. Since the code is created on the fly and executed when needed, this region of memory needs to have write and execution privileges. This combination is not allowed by default by selinux. To solve the problem a file is used as a backend storage for the dynamic code and it's mapped into two distinct memory regions, one with write access and the other one with execution access. This approach is the recommended way to create dynamic code by a program in a more secure way, and selinux allows it. Additionally selinux requires that the backend file be stored in a directory marked with type bin_t to be able to map it in an executable area. To satisfy this condition, GLUSTERFS_LIBEXECDIR has been used. This fix also changes the error check for mmap(), that was done incorrectly (it checked against NULL instead of MAP_FAILED), and it also correctly propagates the error codes and makes sure they aren't silently ignored. Change-Id: I71c2f88be4e4d795b6cfff96ab3799c362c54291 BUG: 1402661 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: https://review.gluster.org/16405 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-10/+9
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Invalidations in disperse volume should not update the statPoornima G2017-01-051-0/+13
| | | | | | | | | | | | | | | | | | | | | | Issue: In disperse volume, the file is present across bricks, hence the stat from one brick doesn't carry the valid size of the file. Therefore the upcall from one brick updating the md-cache results in wrong size being updated. Fix: If the notification is cache invalidation then, indicate md-cache that the attributes is invalid. BUG: 1410375 Change-Id: Id89d2283478e70b62b435a8891fffc86d2be8cb2 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16329 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix lk-owner set race in ec_unlockPranith Kumar K2016-12-131-4/+4
| | | | | | | | | | | | | | | | | | | | | | Problem: Rename does two locks. There is a case where when it tries to unlock it sends xattrop of the directory with new version, callback of these two xattrops can be picked up by two separate epoll threads. Both of them will try to set the lk-owner for unlock in parallel on the same frame so one of these unlocks will fail because the lk-owner doesn't match. Fix: Specify the lk-owner which will be set on inodelk frame which will not be over written by any other thread/operation. BUG: 1402710 Change-Id: I666ffc931440dc5253d72df666efe0ef1d73f99a Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/16074 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UPPoornima G2016-11-211-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently these are few events related to child_up/down: GF_EVENT_CHILD_UP : Issued when any of the protocol client connects. GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec GF_EVENT_CHILD_DOWN : Issued when any of the protocol client disconnects. These events get modified at the dht/afr/ec layers. Here is a brief on the same. DHT: - All the subvolumes reported once, and atleast one child came up, then GF_EVENT_CHILD_UP is issued - connect GF_EVENT_CHILD_UP is issued - disconnect GF_EVENT_CHILD_MODIFIED is issued - All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued AFR: - First subvolume came up, then GF_EVENT_CHILD_UP is issued - Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED - Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued - Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP, GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes go up or down. Now with md-cache changes, there is a necessity to differentiate between child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and getting rid of GF_EVENT_CHILD_MODIFIED. [1] http://review.gluster.org/12573 Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58 BUG: 1396038 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15764 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* cluster/ec: Implement heal info with lockAshish Pandey2016-10-111-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: Currently heal info command prints all the files/directories if the index for the file/directory is present in .glusterfs/indices folder. After implementing patch http://review.gluster.org/#/c/13733/ indices of the file which is going through update fop will also be present in .glusterfs/indices even if the fop is successful on all the brick. At this time if heal info command is being used, it will also display this file which is actually healthy and does not require any heal. Solution: Take lock on a file corresponding to the indices and inspect xattrs to decide if the file needs heal or not. Change-Id: I6361e2813ece369be12d02e74816df4eddb81cfa BUG: 1366815 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15543 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* ec: Implement ipc fopPoornima G2016-09-251-1/+9
| | | | | | | | | | | | | | | The ipc will be wound to all the bricks, but for it to be successfull, the fop should succeed on minimum number of bricks. Change-Id: I3f8cb6a349e87bafd0773583def9d4e3765aa140 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15387 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add support for hardware accelerationXavier Hernandez2016-09-081-9/+44
| | | | | | | | | | | | | | | | | | | | | | | This patch implements functionalities for fast encoding/decoding using hardware support. Currently optimized x86_64, SSE and AVX is added. Additionally this patch implements a caching mecanism for inverse matrices to reduce computation time, as well as a new method for computing the inverse that takes quadratic time instead of cubic. Finally some unnecessary memory copies have been eliminated to further increase performance. Change-Id: I26c75f26fb4201bd22b51335448ea4357235065a BUG: 1289922 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/12837 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add events for EC translatorAshish Pandey2016-09-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch will generates events in following cases which will be consumed by new event framework. Consider an EC volume with (K+M) configuration K = Data bricks M = Redundancy bricks 1- EVENT_EC_MIN_BRICKS_NOT_UP - When minimum "K" number of bricks, required for any ec fop, are not up. 2- EVENT_EC_MIN_BRICKS_UP When minimum "K" number of bricks, required for any ec fop, are up. Change-Id: I0414b8968c39740a171e5aa14b087afd524d574f BUG: 1371470 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/15348 Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Do multi-threaded self-healPranith Kumar K2016-08-241-0/+25
| | | | | | | | | | | BUG: 1368451 Change-Id: I5d6b91d714ad6906dc478a401e614115c89a8fbb Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15083 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ashish Pandey <aspandey@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* cluster/ec: Restrict the launch of replace brick healAshish Pandey2016-06-061-1/+2
| | | | | | | | | | | | | | | | | | | | Problem: When features.cache-invalidation is ON, a lot of ec_notify function gets called which leads to launch of too many heals. This leads to no heal completion, which causes accumulation of heals. Solution: ec_launch_replace_heal should not be launch for every event. Replace brick will trigger a child up event and then only this heal function should be called. Change-Id: I57b44c6a279d57230daea1d93229be6069245b7d BUG: 1342796 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/14649 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Add/Modify description for eager-lock optionAshish Pandey2016-06-031-2/+12
| | | | | | | | | | | | | | | | | | | This patch provides description for disperse.eager-lock option for disperse volume. It also modifies the description for cluster.eager-lock option to indicate that this option is only for replica volume. Change-Id: Ie73298947fcaaa6aaf825978bc2d27ceaff386d2 BUG: 1327171 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13999 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Provide an option to enable/disable eager lockAshish Pandey2016-03-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If a fop takes lock, and completes its operation, it waits for 1 second before releasing the lock. However, If ec find any lock contention within this time period, it release the lock immediately before time expires. As we take lock on first brick, for few operations, like read, it might happen that discovery of lock contention might take long time and can degrades the performance. Solution: Provide an option to enable/disable eager lock. If eager lock is disabled, lock will be released as soon as fop completes. gluster v set <VOLUME NAME> disperse.eager-lock on gluster v set <VOLUME NAME> disperse.eager-lock off Change-Id: I000985a787eba3c190fdcd5981dfbf04e64af166 BUG: 1314649 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13605 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Automate heal for replace brickAshish Pandey2016-02-081-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: After a replace brick command, newly added brick does not contain data which existed on old brick. Solution: Do getxattr after initialization of all the bricks. This will trigger heal for brick root as soon as it finds the version mismatch on newly added brick. Removing tests from ec-new-entry.t which were required to simulate automation of heal after replace brick. Change-Id: I08e3dfa565374097f6c08856325ea77727437e11 BUG: 1304686 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13353 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: add seek() FOPXavier Hernandez2016-02-051-1/+11
| | | | | | | | | | | | BUG: 1220173 Change-Id: Iaa23ba81df4ee78ddaab1f96b3d926a563b4bb3d Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11494 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/ec: Create this->itable in all casesPranith Kumar K2016-01-281-0/+4
| | | | | | | | | | | | | | | | | | | Problem: glfsheal operates based on mount's volfile which doesn't have iamshd flag due to which this->itable is NULL, this leads to "inode not found" logs in glfsheal logs. Fix: Ec only allocates itable with 10 inodes, so allocating this->itable in all cases in init. Change-Id: I01d3c05e93a17007a4716a2d6f392d2aa306a34b BUG: 1294743 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/13112 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Implement gfid-hash read-policyPranith Kumar K2015-10-091-1/+38
| | | | | | | | | | | | | | Add a policy in ec to performs reads from same bricks as long as they are good. Based on the gfid of the file/directory it determines the bricks to be considered for reading. Change-Id: Ic97b5c54c086a28b5e07a330a4fd448551b49376 BUG: 1261260 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12133 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Minimize usage of EIO errorXavier Hernandez2015-07-281-2/+2
| | | | | | | | | | Change-Id: I82e245615419c2006a2d1b5e94ff0908d2f5e891 BUG: 1245276 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/11741 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/ec: Handle race between unlock-timer, new lockPranith Kumar K2015-07-231-30/+3
| | | | | | | | | | | | | | | | | | | | | | | | Problem: New lock could come at the time timer is on the way to unlock. This was leading to crash in timer thread because thread executing new lock can free up the timer_link->fop and then timer thread will try to access structures already freed. Fix: If the timer event is fired, set lock->release to true and wait for unlock to complete. Thanks to Xavi and Bhaskar for helping in confirming that this race is the RC. Thanks to Kritika for pointing out and explaining how Avati's patch can be used to fix this bug. Change-Id: I45fa5470bbc1f03b5f3d133e26d1e0ab24303378 BUG: 1243187 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11670 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* ec : Implement check for the cluster.heal-timeout valuesAshish Pandey2015-07-141-0/+10
| | | | | | | | | | | | | | | | | | | for disperse volume. Problem : User can set wrong values for cluster.heal-timeout using cli. Any value like string, negative numbers or 0 could be set. Solution : Check the value given by user. It should be numerical, non negative and within the range. Change-Id: I5184ef1a11bb2c225f42ac9ccb1ba680a86cfe09 BUG: 1239037 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/11573 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Make background healing optional behaviorPranith Kumar K2015-07-061-3/+47
| | | | | | | | | | | Provide options to control number of active background heal count and qlen. Change-Id: Idc2419219d881f47e7d2e9bbc1dcdd999b372033 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11473 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Add throttling in background healingPranith Kumar K2015-07-011-0/+2
| | | | | | | | | | | | | | - 8 parallel heals can happen. - 128 heals will wait for their turn - Heals will be rejected if 128 heals are already waiting. Change-Id: I2e99bf064db7bce71838ed9901a59ffd565ac390 BUG: 1237381 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11471 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* ec: Porting messages to new logging frameworkNandaja Varma2015-06-261-25/+41
| | | | | | | | | Change-Id: Ia05ae750a245a37d48978e5f37b52f4fb0507a8c BUG: 1194640 Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com> Reviewed-on: http://review.gluster.org/10465 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Wind unlock fops at all costPranith Kumar K2015-06-101-5/+21
| | | | | | | | | | | | | | | | | Problem: While files are being created if more than redundancy number of bricks go down, then unlock for these fops do not go to the bricks. This will lead to stale locks leading to hangs. Fix: Wind unlock fops at all costs. Change-Id: I50a87e8b4d6d2dde5bf7405b82e3aeecd95ad00e BUG: 1220348 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Correctly cleanup delayed locksXavier Hernandez2015-05-201-21/+76
| | | | | | | | | | | | | | | | | When a delayed lock is pending, a graph switch doesn't correctly terminate it. This means that the update of version and size xattrs is lost, causing EIO errors. This patch handles GF_EVENT_PARENT_DOWN event to correctly finish pending udpdates before completing the graph switch. Change-Id: I394f3b8d41df8d83cdd36636aeb62330f30a66d5 BUG: 1188145 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/10787 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Fix dictionary compare functionPranith Kumar K2015-05-041-1/+5
| | | | | | | | | | | | | | | | | | | | If both dicts are NULL then equal. If one of the dicts is NULL but the other has only ignorable keys then also they are equal. If both dicts are non-null then check if for each non-ignorable key, values are same or not. value_ignore function is used to skip comparing values for the keys which must be present in both the dictionaries but the value could be different. geo-rep's stime xattr doesn't need to be present in list xattr but when getxattr comes on stime xattr even if there aren't enough responses with the xattr we should still give out an answer which is maximum of the stimes available. Change-Id: I8de2ceaa2db785b797f302f585d88e73b154167d BUG: 1207712 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10078 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Fix readdir de-itransformPranith Kumar K2015-04-111-0/+46
| | | | | | | | | | | | | | | | | | | Problem: gf_deitransform returns the glbal client-id in the complete graph. So except for the first disperse subvolume under dht, all the other disperse subvolumes will return a client-id greater than ec->nodes, so readdir will always error out in those subvolumes. Fix: Get the client subvolume whose client-id matches the client-id returned by gf_deitransform of offset. Change-Id: I26aa17504352d48d7ff14b390b62f49d7ab2d699 BUG: 1209113 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10165 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* cluster/ec: Implement heal info for ecPranith Kumar K2015-03-301-0/+27
| | | | | | | | | | | | This also lists the files that are on-going I/O, which will be fixed later. Change-Id: Ib3f60a8b7e8798d068658cf38eaef2a904f9e327 BUG: 1203581 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10020 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* libxlator: Change marker xattr handling interfacePranith Kumar K2015-03-251-0/+23
| | | | | | | | | | | | | | | | | | | | | - Changed the implementation of marker xattr handling to take just a function which populates important data that is different from default 'gauge' values and subvolumes where the call needs to be wound. - Removed duplicate code I found while reading the code and moved it to cluster_marker_unwind. Removed unused structure members. - Changed dht/afr/stripe implementations to follow the new implementation - Implemented marker xattr handling for ec. Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4 BUG: 1200372 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9892 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Add self-heal-daemon command handlersPranith Kumar K2015-03-091-21/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the changes required in ec xlator to handle index/full heal. Index healer threads: Ec xlator start an index healer thread per local brick. This thread keeps waking up every minute to check if there are any files to be healed based on the indices kept in index directory. Whenever child_up event comes, then also this index healer thread wakes up and crawls the indices and triggers heal. When self-heal-daemon is disabled on this particular volume then the healer thread keeps waiting until it is enabled again to perform heals. Full healer threads: Ec xlator starts a full healer thread for the local subvolume provided by glusterd to perform full crawl on the directory hierarchy to perform heals. Once the crawl completes the thread exits if no more full heals are issued. Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic. Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Wait for all bricks to notify before notifying parentPranith Kumar K2015-02-021-14/+34
| | | | | | | | | | | | | This is to prevent spurious heals that can result in self-heal. Change-Id: I0b27c1c1fc7a58e2683cb1ca135117a85efcc6c9 BUG: 1179180 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9523 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Handle CHILD UP/DOWN in all casesPranith Kumar K2015-01-281-104/+132
| | | | | | | | | | | | | | | | | | | | | Problem: When all the bricks are down at the time of mounting the volume, then mount command hangs. Fix: 1. Ignore all CHILD_CONNECTING events comming from subvolumes. 2. On timer expiration (without enough up or down childs) send CHILD_DOWN. 3. Once enough up or down subvolumes are detected, send the appropriate event. When rest of the subvols go up/down without changing the overall ec-up/ec-down send CHILD_MODIFIED to parent subvols. Change-Id: Ie0194dbadef2dce36ab5eb7beece84a6bf3c631c BUG: 1179180 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9396 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: Implement Volume heal enable/disablePranith Kumar K2015-01-201-6/+28
| | | | | | | | | | | | | | | | | | For volumes with replicate, disperse xlators, self-heal daemon should do healing. This patch provides enable/disable functionality for the xlators to be part of self-heal-daemon. Replicate already had this functionality with 'gluster volume set cluster.self-heal-daemon on/off'. But this patch makes it uniform for both types of volumes. Internally it still does 'volume set' based on the volume type. Change-Id: Ie0f3799b74c2afef9ac658ef3d50dce3e8072b29 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9358 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* cluster/ec: Handle internal xattr get/setPranith Kumar K2015-01-081-30/+95
| | | | | | | | | | | | | | | | | | | | Problem: Internal xattrs of EC like trusted.ec.size/config/version can be modified by users and that can lead to misbehavior in EC. Fix: Don't let the user modify the xattrs. Hide these xattrs in getfattr outputs. Change-Id: I39cec96ae12826b506b496fda7da74201015fd75 BUG: 1178688 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9385 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* ec: Fix incorrect value of EC_MAX_NODESXavier Hernandez2014-12-041-1/+1
| | | | | | | | | | | | | EC_MAX_NODES was incorrectly calculated. Now the value if computed as the minimum between the theoretical maximum and the limit imposed by the Galois Field. Change-Id: I75a8345147f344f051923d66be2c10d405370c7b BUG: 1167419 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9193 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Change licenseXavier Hernandez2014-12-031-16/+6
| | | | | | | | | | Change-Id: Iae90ade2421898417b53dec0417a610cf306c44b BUG: 1168167 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9201 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Add state dump supportXavier Hernandez2014-10-031-0/+30
| | | | | | | | | | | | Change-Id: I4504f3050674dde217e79af28cb4d2b5370fe2d5 BUG: 1148010 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8891 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Tested-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Optimize read/write performanceXavier Hernandez2014-09-151-1/+8
| | | | | | | | | | | | | | | | | | This patch significantly improves performance of read/write operations on a dispersed volume by reusing previous inodelk/ entrylk operations on the same inode/entry. This reduces the latency of each individual operation considerably. Inode version and size are also updated when needed instead of on each request. This gives an additional boost. Change-Id: I4b98d5508c86b53032e16e295f72a3f83fd8fcac BUG: 1122586 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8369 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* ec: Do not destroy inode context on inode invalidationXavier Hernandez2014-09-111-28/+13
| | | | | | | | | | | | | | Currently there is no need to handle inode invalidation requests, so this callback has been removed. Change-Id: I0ac2e47679bf62b1493e0403178305923bc036e8 BUG: 1126932 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/8420 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Added erasure code translatorXavier Hernandez2014-07-111-0/+904
Change-Id: I293917501d5c2ca4cdc6303df30cf0b568cea361 BUG: 1118629 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/7749 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>