summaryrefslogtreecommitdiffstats
path: root/tests/bugs
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: Add option to get all volume options through get-state CLISamikshan Bairagya2017-07-251-0/+18
| | | | | | | | | | | | | | | | | | | | This commit makes the get-state CLI capable to returning the values for all volume options for all volumes. This is similar to what you get when you issue a `gluster volume get <volname> all` command. This is the new usage for the get-state CLI: # gluster get-state [<daemon>] [[odir </path/to/output/dir/>] \ [file <filename>]] [detail|volumeoptions] Fixes: #277 Change-Id: Ice52d936a5a389c6fa0ba5ab32416a65cdfde46d Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17858 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Gaurav Yadav <gyadav@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* tests: replace brick failure shouldn't corrupt volfilesRaghavendra Talur2017-07-241-0/+31
| | | | | | | | | | | | | | This is a test to present the known issue. It will be skipped as it has the known issue marker. Change-Id: Id6fa5d323abe0bc76a58cd92cb8e52fcde41b49b BUG: 1473026 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: https://review.gluster.org/17828 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* storage/posix: Don't allow gfid/volume-id xattr to be removedPranith Kumar K2017-07-183-0/+145
| | | | | | | | | | | | | | | | | | | | | | | Problem: Bulk xattr removal doesn't check if the xattrs that are coming in xdata have gfid/volume-id xattrs, so there is potential for bulkremovexattr removing gfid/volume-id. I also observed that bulkremovexattr is not available for fremovexattr. Fix: Do proper checks in bulk removexattr to remove gfid/volume-id. Refactor [f]removexattr to reduce the differences. BUG: 1470489 Change-Id: Ia845b31846a149500111c0996646e648f72cdce6 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17765 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/ec: Non-disruptive upgrade on EC volume failsSunil Kumar Acharya2017-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | Problem: Enabling optimistic changelog on EC volume was not handling node down scenarios appropriately resulting in volume data inaccessibility. Solution: Update dirty xattr appropriately on good bricks whenever nodes are down. This would fix the metadata information as part of heal and thus ensures data accessibility. BUG: 1468261 Change-Id: I08b0d28df386d9b2b49c3de84b4aac1c729ac057 Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com> Reviewed-on: https://review.gluster.org/17703 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: mark non sources as sinks in metadata healRavishankar N2017-07-131-0/+64
| | | | | | | | | | | | | | | | | | Problem: In a 3 way replica, when the source brick does not have pending xattrs for the sinks, but the 2 sinks blame each other, metadata heal was not happpening because we were not setting all non-sources as sinks. Fix: Mark all non-sources as sinks, like it is done in data and entry heal. Change-Id: I534978940f5087302e307fcc810a48ffe898ce08 BUG: 1468279 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17717 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Introduce option to limit no. of muxed bricks per processSamikshan Bairagya2017-07-101-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces a new global option that can be set to limit the number of multiplexed bricks in one process. Usage: `# gluster volume set all cluster.max-bricks-per-process <value>` If this option is not set then multiplexing will happen for now with no limitations set; i.e. a brick process will have as many bricks multiplexed to it as possible. In other words the current multiplexing behaviour won't change if this option isn't set to any value. This commit also introduces a brick process instance that contains information about brick processes, like the number of bricks handled by the process (which is 1 in non-multiplexing cases), list of bricks, and port number which also serves as an unique identifier for each brick process instance. The brick process list is maintained in 'glusterd_conf_t'. Updates: #151 Change-Id: Ib987d14ab0a4f6034dac01b73a4b2839f7b0b695 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17469 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* features/shard: Remove ctx from LRU in shard_forgetPranith Kumar K2017-06-301-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: There is a race when the following two commands are executed on the mount in parallel from two different terminals on a sharded volume, which leads to use-after-free. Terminal-1: while true; do dd if=/dev/zero of=file1 bs=1M count=4; done Terminal-2: while true; do cat file1 > /dev/null; done In the normal case this is the life-cycle of a shard-inode 1) Shard is added to LRU when it is first looked-up 2) For every operation on the shard it is moved up in LRU 3) When "unlink of the shard"/"LRU limit is hit" happens it is removed from LRU But we are seeing a race where the inode stays in Shard LRU even after it is forgotten which leads to Use-after-free and then some memory-corruptions. These are the steps: 1) Shard is added to LRU when it is first looked-up 2) For every operation on the shard it is moved up in LRU Reader-handler Truncate-handler 1) Reader handler needs shard-x to be read. 1) Truncate has just deleted shard-x 2) In shard_common_resolve_shards(), it does inode_resolve() and that leads to a hit in LRU, so it is going to call __shard_update_shards_inode_list() to move the inode to top of LRU 2) shard-x gets unlinked from the itable and inode_forget(inode, 0) is called to make sure the inode can be purged upon last unref 3) when __shard_update_shards_inode_list() is called it finds that the inode is not in LRU so it adds it back to the LRU-list Both these operations complete and call inode_unref(shard-x) which leads to the inode getting freed and forgotten, even when it is in Shard LRU list. When more inodes are added to LRU, use-after-free will happen and it leads to undefined behaviors. Fix: I see that the inode can be removed from LRU even by the protocol layers like gfapi/gNFS when LRU limit is reached. So it is better to add a check in shard_forget() to remove itself from LRU list if it exists. BUG: 1466037 Change-Id: Ia79c0c5c9d5febc56c41ddb12b5daf03e5281638 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17644 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* index: Do not proceed with init if brick is not mountedRavishankar N2017-06-193-0/+16
| | | | | | | | | | | | | | | | | | | | | ..or else when a volume start force is given, we end up creating /brick-path/.glusterfs/indices folder and various subdirs under it and eventually starting the brick process. As a part of this patch, glusterd_get_index_basepath() is added in glusterd, who will then use it to create the basepath during volume-create, add-brick, replace-brick and reset-brick. It also uses this function to set the 'index-base' xlator option for the index translator. Change-Id: Id018cf3cb6f1e2e35b5c4cf438d1e939025cb0fc BUG: 1457202 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/17426 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* upcall: Update the access time in missing casesPoornima G2017-06-091-0/+36
| | | | | | | | | | | | | | | | | | Issue: In fops like rename, link, unlink etc, the parent dirrs' client access time was not being updated. And in fops like create, link, symlink etc. the new file/dirs' client access time was not updated. Solution: Update the client access time for both parent and new entry. Change-Id: Id9f63583216ae857f6251dca15797ac66fa85430 BUG: 1458127 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17450 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* core: fix spelling errorsKaleb S. KEITHLEY2017-06-021-1/+1
| | | | | | | | | | | | | | fixes for various minor spelling errors and typos Reported-by: Patrick Matthäi <pmatthaei@debian.org> Change-Id: Ic1be36f82e3d822bbdc9559878bd79520fc0fcd5 BUG: 1457808 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: https://review.gluster.org/17442 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* glusterfs: Not able to mount running volume after enable brick mux and ↵Mohit Agrawal2017-05-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | stopped any volume Problem: After enabled brick mux if any volume has down and then try ot run mount with running volume , mount command is hung. Solution: After enable brick mux server has shared one data structure server_conf for all associated subvolumes.After down any subvolume in some ungraceful manner (remove brick directory) posix xlator sends GF_EVENT_CHILD_DOWN event to parent xlatros and server notify updates the child_up to false in server_conf.When client is trying to communicate with server through mount it checks conf->child_up and it is FALSE so it throws message "translator are not yet ready". From this patch updated structure server_conf to save child_up status for xlator wise. Another improtant correction from this patch is cleanup threads from server side xlators after stop the volume. BUG: 1453977 Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17356 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* features/shard: Handle offset in appending writesPranith Kumar K2017-05-272-0/+211
| | | | | | | | | | | | | | | | | | | | | When a file is opened with append, all writes are appended at the end of file irrespective of the offset given in the write syscall. This needs to be considered in shard size update function and also for choosing which shard to write to. At the moment shard piggybacks on queuing from write-behind xlator for ordering of the operations. So if write-behind is disabled and two parallel appending-writes come both of which can increase the file size beyond shard-size the file will be corrupted. BUG: 1455301 Change-Id: I9007e6a39098ab0b5d5386367bd07eb5f89cb09e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/17387 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* libglusterfs : Fix crash in glusterd while peer probingGaurav Yadav2017-05-261-0/+25
| | | | | | | | | | | | | | | | | | | | | | glusterd crashes when port is being set explcitly to a range which is outside greater than short data type range. Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156" In above case glusterd crashes while parsing the port. With this fix glusterd will be able to handle port range between INT_MIN to INT_MAX Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1 BUG: 1454418 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/17359 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* nl-cache: In case of nameless operations do not cachePoornima G2017-05-221-0/+25
| | | | | | | | | | | | | | | | | | Issue: In nameless lookup/other fops, parent inode will be NULL, when we try to add the cache to the NULL inode, it causes a crash. Hence handle the scenario of nameless fops, and do not cache/serve the nameless fops. Change-Id: I3b90f882ac89e6aaf3419db89e6f890797f37700 BUG: 1451588 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17316 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* rda, glusterd: Change the max of rda-cache-limit to INFINITYPoornima G2017-05-212-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The max value of rda-cache-limit is 1GB before this patch. When parallel-readdir is enabled, there will be many instances of readdir-ahead, hence the rda-cache-limit depends on the number of instances. Eg: On a volume with distribute count 4, rda-cache-limit when parallel-readdir is enabled, will be 4GB instead of 1GB. Consider a followinf sequence of operations: - Enable parallel readdir - Set rda-cache-limit to lets say 3GB - Disable parallel-readdir, this results in one instance of readdir-ahead and the rda-cache-limit will be back to 1GB, but the current value is 3GB and hence the mount will stop working as 3GB > max 1GB. Solution: To fix this, we can limit the cache to 1GB even when parallel-readdir is enabled. But there is no necessity to limit the cache to 1GB, it can be increased if the system has enough resources. Hence getting rid of the rda-cache-limit max value is more apt. If we just change the rda-cache-limit max to INFINITY, we will render older(<3.11) clients broken, when the rda-cache-limit is set to > 1GB (as the older clients still expect a value < 1GB). To safely change the max value of rda-cache-limit to INFINITY, add a check in glusted to verify all the clients are > 3.11 if the value exceeds 1GB. Change-Id: Id0cdda3b053287b659c7bf511b13db2e45b92032 BUG: 1446516 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17338 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Don't spawn new glusterfsds on node reboot with brick-muxSamikshan Bairagya2017-05-181-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | With brick multiplexing enabled, upon a node reboot new bricks were not being attached to the first spawned brick process even though there wasn't any compatibility issues. The reason for this is that upon glusterd restart after a node reboot, since brick services aren't running, glusterd starts the bricks in a "no-wait" mode. So after a brick process is spawned for the first brick, there isn't enough time for the corresponding pid file to get populated with a value before the compatibilty check is made for the next brick. This commit solves this by iteratively waiting for the pidfile to be populated in the brick compatibility comparison stage before checking if the brick process is alive. Change-Id: Ibd1f8e54c63e4bb04162143c9d70f09918a44aa4 BUG: 1451248 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17307 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: remove useless options from glusterd's volume set tableZhou Zhengping2017-05-171-10/+10
| | | | | | | | | | | | | | | | These options will cause brick's log complains: _log_if_unknown_option] 0-patchy-quota: option 'timeout' is not recognized _log_if_unknown_option] 0-patchy-server: option 'ping-timeout' is not recognized Change-Id: Ida2add13f792736a4e52bfaf38d1169309283a3f BUG: 1449008 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: https://review.gluster.org/17213 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* tests/gfapi:Adding testcase to check handling of "." and ".."Mohammed Rafi KC2017-05-122-0/+165
| | | | | | | | | | | | | | | | Adding a testcase to check the proper handling of "." and ".." in gfapi path. The patch which fix the issue is https://review.gluster.org/#/c/17177 Change-Id: I5c9cceade30f7d8a3b451b5f34f1cf9815729c4a BUG: 1447266 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/17216 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Make reset-brick work correctly if brick-mux is onSamikshan Bairagya2017-05-101-0/+79
| | | | | | | | | | | | | | | | | | | Reset brick currently kills of the corresponding brick process. However, with brick multiplexing enabled, stopping the brick process would render all bricks attached to it unavailable. To handle this correctly, we need to make sure that the brick process is terminated only if brick-multiplexing is disabled. Otherwise, we should send the GLUSTERD_BRICK_TERMINATE rpc to the respective brick process to detach the brick that is to be reset. Change-Id: I69002d66ffe6ec36ef48af09b66c522c6d35ac58 BUG: 1446172 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17128 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* afr: fixes to quorum-type in afr_priv_dump()Ravishankar N2017-05-101-0/+47
| | | | | | | | | | | | | | | | | | Include the 'none' option as well in the output. This fixes the bug in commit 335555d256d444f4952ce239168f72b393370f01. Also added a test-case. This is a Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I479a14ae69ecae5a03e85e73ed50c19b483df603 BUG: 1448804 Reviewed-on: https://review.gluster.org/17215 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: socketfile & pidfile related fixes for brick multiplexing featureMohit Agrawal2017-05-093-0/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: While brick-muliplexing is on after restarting glusterd, CLI is not showing pid of all brick processes in all volumes. Solution: While brick-mux is on all local brick process communicated through one UNIX socket but as per current code (glusterd_brick_start) it is trying to communicate with separate UNIX socket for each volume which is populated based on brick-name and vol-name.Because of multiplexing design only one UNIX socket is opened so it is throwing poller error and not able to fetch correct status of brick process through cli process. To resolve the problem write a new function glusterd_set_socket_filepath_for_mux that will call by glusterd_brick_start to validate about the existence of socketpath. To avoid the continuous EPOLLERR erros in logs update socket_connect code. Test: To reproduce the issue followed below steps 1) Create two distributed volumes(dist1 and dist2) 2) Set cluster.brick-multiplex is on 3) kill glusterd 4) run command gluster v status After apply the patch it shows correct pid for all volumes BUG: 1444596 Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/17101 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Fixes quota aux mount failureSanoj Unnikrishnan2017-05-088-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aux mount is created on the first limit/remove_limit/list command and it remains until volume is stopped / deleted / (quota is disabled) , where we do a lazy unmount. If the process is uncleanly terminated, then the mount entry remains and we get (Transport disconnected) error on subsequent attempts to run quota list/limit-usage/remove commands. Second issue, There is also a risk of inadvertent rm -rf on the /var/run/gluster causing data loss for the user. Ideally, /var/run is a temp path for application use and should not cause any data loss to persistent storage. Solution: 1) unmount the aux mount after each use. 2) clean stale mount before mounting, if any. One caveat with doing mount/unmount on each command is that we cannot use same mount point for both list and limit commands. The reason for this is that list command needs mount to be accessible in cli after response from glusterd, So it could be unmounted by a limit command if executed in parallel (had we used same mount point) Hence we use separate mount points for list and limit commands. Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0 BUG: 1433906 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com> Reviewed-on: https://review.gluster.org/16938 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* afr: don't do a post-op on a brick if op failedRavishankar N2017-04-181-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In afr-v2, self-blaming xattrs are not there by design. But if the FOP failed on a brick due to an error other than ENOTCONN (or even due to ENOTCONN, but we regained connection before postop was wound), we wind the post-op also on the failed brick, leading to setting self-blaming xattrs on that brick. This can lead to undesired results like healing of files in split-brain etc. Fix: If a fop failed on a brick on which pre-op was successful, do not perform post-op on it. This also produces the desired effect of not resetting the dirty xattr on the brick, which is how it should be because if the fop failed on a brick, there is no reason to clear the dirty bit which actually serves as an indication of the failure. Change-Id: I5f1caf4d1b39f36cf8093ccef940118638caa9c4 BUG: 1438255 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16976 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* dht: Add readdir-ahead in rebalance graph if parallel-readdir is onPoornima G2017-04-182-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: The value of linkto xattr is generally the name of the dht's next subvol, this requires that the next subvol of dht is not changed for the life time of the volume. But with parallel readdir enabled, the readdir-ahead loaded below dht, is optional. The linkto xattr for first subvol, when: - parallel readdir is enabled : "<volname>-readdir-head-0" - plain distribute volume : "<volname>-client-0" - distribute replicate volume : "<volname>-afr-0" The value of linkto xattr is "<volname>-readdir-head-0" when parallel readdir is enabled, and is "<volname>-client-0" if its disabled. But the dht_lookup takes care of healing if it cannot identify which linkto subvol, the xattr points to. In dht_lookup_cbk, if linkto xattr is found to be "<volname>-client-0" and parallel readdir is enabled, then it cannot understand the value "<volname>-client-0" as it expects "<volname>-readdir-head-0". In that case, dht_lookup_everywhere is issued and then the linkto file is unlinked and recreated with the right linkto xattr. The issue is when parallel readdir is enabled, mount point accesses the file that is currently being migrated. Since rebalance process doesn't have parallel-readdir feature, it expects "<volname>-client-0" where as mount expects "<volname>-readdir-head-0". Thus at some point either the mount or rebalance will fail. Solution: Enable parallel-readdir for rebalance as well and then do not allow enabling/disabling parallel-readdir if rebalance is in progress. Change-Id: I241ab966bdd850e667f7768840540546f5289483 BUG: 1436090 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17056 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Make rebalance honor min-free-diskSusant Palai2017-04-131-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | test: Manual created files of size 1K on 2 brick(of size 1GB) setup . added a brick of size 16GB. set min-free-disk to 12GB(so that first two bricks won't receive any files). removed one of the 1st brick of size 1GB. Logs from test: [2017-04-12 08:52:08.196484] W [MSGID: 0] [dht-rebalance.c:895:__dht_check_free_space] 0-test1-dht: Write will cross min-free-disk for file - /tile32 on subvol - test1-client-1. Looking for new subvol. [2017-04-12 08:52:08.196904] I [MSGID: 0] [dht-rebalance.c:925:__dht_check_free_space] 0-test1-dht: new target found - test1-client-2 for file - /tile32 - Post migration we have two files. The new destination (/brick/1) has the data file [root@vm1 ~]# ll /brick/1/tile32 -rw-r--r--. 2 root root 0 Apr 12 14:22 /brick/1/tile32 - On the old target the linkto file is there with linkto xattr pointing to /brick/1 [root@vm1 ~]# ll /tmp/2/tile32 ---------T. 2 root root 1000 Apr 12 14:22 /tmp/2/tile32 [root@vm1 ~]# getfattr -m . -de text /tmp/2/tile32 getfattr: Removing leading '/' from absolute path names security.selinux="unconfined_u:object_r:user_tmp_t:s0" trusted.gfid="����:Aс�#�/'b2" trusted.glusterfs.dht.linkto="test1-client-2" Marking ./tests/features/worm_sh.t as bad test. Reason being, this patch failed on master branch as well and it has nothing to do with rebalance/remove-brick. BUG: 1441508 Change-Id: I90bae251cda3d957a49cdceda90cd08311a392fb Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: https://review.gluster.org/17034 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: Add client details to get-state outputSamikshan Bairagya2017-04-121-16/+17
| | | | | | | | | | | | | | | | | | | | | | | | This commit optionally adds client details corresponding to the locally running bricks to the get-state output. Since getting the client details involves sending RPC requests to the respective local bricks, this is a relatively more costly operation. These client details would be added to the get-state output only if the get-state command is invoked with the 'detail' option. This commit therefore also changes the get-state CLI usage. The modified usage is as follows: # gluster get-state [<daemon>] [[odir </path/to/output/dir/>] \ [file <filename>]] [detail] Change-Id: I42cd4ef160f9e96d55a08a10d32c8ba44e4cd3d8 BUG: 1431183 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/17003 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tests: remove tests/bugs/core/bug-1421590-brick-mux-reuse-ports.tAtin Mukherjee2017-04-121-60/+0
| | | | | | | | | | | | | | | | | | | | | | | | | bug-1421590-brick-mux-reuse-ports.t seems to be a bad test to me and here is my reasoning: This test tries to check if the ports are reused or not. When a volume is restarted, by the time glusterd tries to allocate a new port to the one of the brick processes of the volume there is no guarantee that the older port will be allocated given the kernel might take some extra time to free up the port between this time frame. From https://build.gluster.org/job/regression-test-burn-in/2932/console we can clearly see that post restart of the volume, glusterd allocated port 49153 & 49155 for brick1 & brick2 respectively but the test was expecting the ports to be matched with 49155 & 49156 which were allocated before the volume was restarted. Change-Id: Id887bf28445261d4de04fc7502e58057659c9512 BUG: 1441035 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17033 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* geo-rep: filter out xtime attribute during getxattrSaravanakumar Arumugam2017-04-112-3/+4
| | | | | | | | | | | | | | | | | | | | | | georep gsyncd's xtime needs to filtered irrespective of any process access. This way, we can avoid (unnecessarily)syncing xtime attribute to slave, which may raise permission denied errors. test case modified to check for xtime xattr only in backend. Change-Id: I2390b703048d5cc747d91fa2ae884dc55de58669 BUG: 1353952 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: https://review.gluster.org/14880 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Kotresh HR <khiremat@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: Add validation for options rda-cache-limit rda-request-sizePoornima G2017-04-111-0/+30
| | | | | | | | | | | | | | | | | Currently when prarallel readdir is enabled, setting any junk value to rda-cache-limit and rda-request-size succeeds. This is because of bug in the special handling of these options. Fixing the same in this patch Change-Id: I902cd9ac9134c158ab6f8aea4b001254a03547bd BUG: 1439640 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/17008 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* TIER/TESTS: improving regression test for tierhari gowtham2017-04-101-73/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test files that were marked as bad test were checked and updated for centos. The tests that had issue were fixed. Tests that aren't needed anymore are removed. REASON: tests/basic/tier/tier-file-create.t This test checks one line after creating a tiered volume (which is done in every tier test). So this line is moved along with other test in tier and the file is deleted. tests/bugs/tier/bug-1286974.t This bug checks for the tier as a task and tier has been moved from a task to service as a part of the tier as a service patch https://review.gluster.org/#/c/13365/ So it is removed from bad tests. tests/basic/tier/record-metadata-heat.t This test had a bug and has been fixed. tests/basic/tier/bug-1214222-directories_missing_after_attach_tier.t tests/basic/tier/fops-during-migration.t tests/basic/tier/tier-snapshot.t tests/basic/tier/tier_lookup_heal.t These test seem to work fine on centos now. Change-Id: I05537f4bbb91584410177ce43543897eff8761a1 BUG: 1421600 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: https://review.gluster.org/16605 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
* cli/auth : auth.allow and auth.reject does not accept FQDN/host nameMohit Agrawal2017-04-101-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : At the time of set FQDN name to "auth.allow/auth.reject" through gluster cli,it does not accept FQDN/host name. Solution: Condition needs to be update in verify_host_name and gf_auth to accept FQDN/host name. Fix : Change the condition to accept FQDN/host Name. To verify the patch followed below procedure 1) Try to set FQDN name for auth.allow or auth.reject parameter gluster v set myvol auth.reject <fqdn name> It gives error "fqdn-name" is not a valid internet-address-list 2) After apply the patch it does not give any error. 3) To verify auth.allow/reject try to mount volume on some client. Change-Id: Ieb76cbb93d43323fd29c7ca04efe3790edb4281b BUG: 1321578 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Reviewed-on: https://review.gluster.org/15086 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd : Disallow peer detach if snapshot bricks exist on itGaurav Yadav2017-03-311-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : - Deploy gluster on 2 nodes, one brick each, one volume replicated - Create a snapshot - Lose one server - Add a replacement peer and new brick with a new IP address - replace-brick the missing brick onto the new server (wait for replication to finish) - peer detach the old server - after doing above steps, glusterd fails to restart. Solution: With the fix detach peer will populate an error : "N2 is part of existing snapshots. Remove those snapshots before proceeding". While doing so we force user to stay with that peer or to delete all snapshots. Change-Id: I3699afb9b2a5f915768b77f885e783bd9b51818c BUG: 1322145 Signed-off-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-on: https://review.gluster.org/16907 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterd: hold off volume deletes while still restarting bricksJeff Darcy2017-03-302-0/+96
| | | | | | | | | | | | | | | | We need to do this because modifying the volume/brick tree while glusterd_restart_bricks is still walking it can lead to segfaults. Without waiting we could accidentally "slip in" while attach_brick has released big_lock between retries and make such a modification. Change-Id: I30ccc4efa8d286aae847250f5d4fb28956a74b03 BUG: 1432542 Signed-off-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-on: https://review.gluster.org/16927 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* protocol : fix auth-allow regressionAtin Mukherjee2017-03-301-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | One of the brick multiplexing patches (commit 1a95fc3) had some changes in gf_auth () & server_setvolume () functions which caused auth-allow feature to be broken. mount doesn't succeed even if it's part of the auth-allow list. This fix does the following: 1. Reintroduce the peer-info data back in gf_auth () so that fnmatch has valid input and it can decide on the result. 2. config-params dict should capture key values pairs for all the bricks in case brick multiplexing is on. In case brick multiplexing isn't enabled, then config-params should carry attributes from protocol/server such that all rpc auth related attributes stay in tact in the dictionary. Change-Id: I007c4c6d78620a896b8858a29459a77de8b52412 BUG: 1433815 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16920 Tested-by: Jeff Darcy <jeff@pl.atyp.us> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us> Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
* cluster/afr: Undo pending xattrs only on the up brickskarthik-us2017-03-271-0/+89
| | | | | | | | | | | | | | | | | | | | | Problem: While doing conservative merge, even if a brick is down, it will reset the pending xattr on that. When that brick comes up, as part of the heal, it will consider this brick as the source and removes the entries on the other bricks, which leads to data loss. Fix: Undo pending only for the bricks which are up. Change-Id: I18436fa0bb1faa5f60531b357dea3f6b20446303 BUG: 1433571 Signed-off-by: karthik-us <ksubrahm@redhat.com> Reviewed-on: https://review.gluster.org/16913 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* rpc: bump up conn->cleanup_gen in rpc_clnt_reconnect_cleanupAtin Mukherjee2017-03-201-0/+14
| | | | | | | | | | | | | | | | | | | | | | | Commit 086436a introduced generation number (cleanup_gen) to ensure that rpc layer doesn't end up cleaning up the connection object if application layer has already destroyed it. Bumping up cleanup_gen was done only in rpc_clnt_connection_cleanup (). However the same is needed in rpc_clnt_reconnect_cleanup () too as with out it if the object gets destroyed through the reconnect event in the application layer, rpc layer will still end up in trying to delete the object resulting into double free and crash. Peer probing an invalid host/IP was the basic test to catch this issue. Change-Id: Id5332f3239cb324cead34eb51cf73d426733bd46 BUG: 1433578 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16914 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* tests: Fix tests/bugs/distribute/bug-1161311.tShreyas Siravara2017-03-061-2/+8
| | | | | | | | | | | | | | | | | Summary: - Bigger buffer size made rebalance faster and broke the test. - We made the file bigger so rebalance takes longer. BUG: 1428058 Change-Id: I7acfec5611b28bdae0a9d9fc03eb104659a5563a Signed-off-by: Shreyas Siravara <sshreyas@fb.com> Author: Shreyas Siravara <sshreyas@fb.com> Reviewed-on: https://review.gluster.org/16802 Tested-by: Vijay Bellur <vbellur@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* core: Clean up pmap registry up correctly on volume/brick stopSamikshan Bairagya2017-02-271-0/+55
| | | | | | | | | | | | | | | | | | | | | | | This commit changes the following: 1. In glusterfs_handle_terminate, send out individual pmap signout requests to glusterd for every brick. 2. Add another parameter to glusterfs_mgmt_pmap_signout function to pass the brickname that needs to be removed from the pmap registry. 3. Make sure pmap_registry_search doesn't break out from the loop iterating over the list of bricks per port if the first brick entry corresponding to a port is whitespaced out. 4. Make sure the pmap registry entries are removed for other daemons like snapd. Change-Id: I69949874435b02699e5708dab811777ccb297174 BUG: 1421590 Signed-off-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-on: https://review.gluster.org/16689 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Gaurav Yadav <gyadav@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* TESTS/TIER: bug-1303028-Rebalance-glusterd-rpc-connection-issue.thari gowtham2017-02-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | PROBLEM: spurious failure of the test. CAUSE: the function "rebalance_run_time" calculates the total time the tier has been running for. this being a test case, the run time of tier can be 0 and when the function adds up zero it results in zero. and thus it starts to fail. FIX: Give it some time for the function to add up the values. Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: Ie270f3f3c8942081cca85dc49ef8fec76f3a261a BUG: 1425743 Reviewed-on: https://review.gluster.org/16711 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* marker: Fix inode value in loc, in setxattr fopPoornima G2017-02-201-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On recieving a rename fop, marker_rename() stores the, oldloc and newloc in its 'local' struct, once the rename is done, the xtime marker(last updated time) is set on the file, but sending a setxattr fop. When upcall receives the setxattr fop, the loc->inode is NULL and it crashes. The loc->inode can be NULL only in one valid case, i.e. in rename case where the inode of new loc can be NULL. Hence, marker should have filled the inode of the new_loc before issuing a setxattr. marker_rename_cbk was already fixed in a previous commit. Fixing marker_rename_done to send valid inode in this commit. Also in upcall check for NULL inode so that there is no crash. Change-Id: I3ed2a05118fed3367dfe3251ce4477310cb480d0 BUG: 1422776 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16633 Reviewed-by: Kotresh HR <khiremat@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: take conn->lock around operations on conn->reconnectJeff Darcy2017-02-181-0/+25
| | | | | | | | | | | | | | | | Failure to do this could lead to a race in which a timer would be removed twice concurrently, corrupting the timer list (because gf_timer_call_cancel has no internal protection against this) and possibly causing a crash. Change-Id: Ic1a8b612d436daec88fd6cee935db0ae81a47d5c BUG: 1421721 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16662 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: keep snapshot bricks separate from regular onesJeff Darcy2017-02-101-13/+23
| | | | | | | | | | | | | | | | | | | | | | | The problem here is that a volume's transport options can change, but any snapshots' bricks don't follow along even though they're now incompatible (with respect to multiplexing). This was causing the USS+SSL test to fail. By keeping the snapshot bricks separate (though still potentially multiplexed with other snapshot bricks including those for other volumes) we can ensure that they remain unaffected by changes to their parent volumes. Also fixed various issues with how the test waits (or more precisely didn't) for various events to complete before it continues. Change-Id: Iab4a8a44fac5760373fac36956a3bcc27cf969da BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16544 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Avra Sengupta <asengupt@redhat.com>
* afr: all children of AFR must be up to resolve s-brainRavishankar N2017-02-091-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | Problem: The various split-brain resolution policies (favorite-child-policy based, CLI based and mount (get/setfattr) based) attempt to resolve split-brain even when not all bricks of replica are up. This can be a problem when say in a replica 3, the only good copy is down and the other 2 bricks are up and blame each other (i.e. split-brain). We end up healing the file in such a case and allow I/O on it. Fix: A decision on whether the file is in split-brain or not must be taken only if we are able to examine the afr xattrs of *all* bricks of a given replica. Change-Id: Icddb1268b380005799990f5379ef957d84639ef9 BUG: 1417522 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: https://review.gluster.org/16476 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: ignore return code of glusterd_restart_bricksAtin Mukherjee2017-02-091-0/+40
| | | | | | | | | | | | | | | | | | | | | When GlusterD is restarted on a multi node cluster, while syncing the global options from other GlusterD, it checks for quorum and based on which it decides whether to stop/start a brick. However we handle the return code of this function in which case if we don't want to start any bricks the ret will be non zero and we will end up failing the import which is incorrect. Fix is just to ignore the ret code of glusterd_restart_bricks () Change-Id: I37766b0bba138d2e61d3c6034bd00e93ba43e553 BUG: 1420637 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16574 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* tests : turn off nfs.disable in bug-1238706-daemons-stop-on-peer-cleanup.tAtin Mukherjee2017-02-071-0/+2
| | | | | | | | | | | | | | To validate this test and remove it from the list of bad tests, turn off nfs.disable option so that nfs daemon can come up. Change-Id: I8146c2d7f72ac53cac7e395dbb9e819d729eb6a9 BUG: 1257792 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/16514 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: double-check brick liveness for remove-brick validationJeff Darcy2017-02-021-2/+4
| | | | | | | | | | | | | | | | Same problem as https://review.gluster.org/#/c/16509/ in a different place. Tests detach bricks without glusterd's knowledge, so glusterd's internal brick state is out of date and we have to re-check (via the brick's pidfile) as well. BUG: 1385758 Change-Id: I169538c1c62d72a685a49d57ef65fb6c3db6eab2 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16529 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tests: use kill_brick instead of kill -9Jeff Darcy2017-02-027-9/+14
| | | | | | | | | | | | | | The system actually handles this OK, but with multiplexing the result of killing the whole process is not what some tests assumed. Change-Id: I89ebf0039ab1369f25b0bfec3710ec4c13725915 BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16528 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-3011-77/+95
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: daemon restart logic should adhere server side quorumAtin Mukherjee2017-01-271-0/+57
| | | | | | | | | | | | | | | | Just like brick processes, other daemon services should also follow the same logic of quorum checks to see if a particular service needs to come up if glusterd is restarted or the incoming friend add/update request is received (in glusterd_restart_bricks () function) Change-Id: I54a1fbdaa1571cc45eed627181b81463fead47a3 BUG: 1383893 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/15626 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* gfapi: create statedump when glusterd requests itNiels de Vos2017-01-242-7/+101
| | | | | | | | | | | | | | | | | | | | | When GlusterD sends the STATEDUMP procedure to the libgfapi client, the client checks if it matches the PID that should take the statedump. If so, it will do a statedump for the glfs_t that is connected to this mgmt connection. BUG: 1169302 Change-Id: I70d6a1f4f19d525377aebc8fa57f51e513b92d84 See-also: http://review.gluster.org/9228 Signed-off-by: Poornima G <pgurusid@redhat.com> [ndevos: separated patch from 9228] Reviewed-on: https://review.gluster.org/16415 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com>