summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
...
* cluster/afr : Do not copy dict when it is NULLAnuradha2015-05-131-1/+1
| | | | | | | | | | | | | | | | In afr_lookup_xattr_req_prepare(), dict_copy was done even though source dict was NULL. Change-Id: I85a5d2823ba021e7f78c1ce13402a0f16b08cb51 BUG: 1220332 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10755 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* dht/rebalance: Change log_level to DEBUGSusant Palai2015-05-121-2/+3
| | | | | | | | | | Change-Id: I646367581d8ee8a9e5966ee302b19273a0c780ff BUG: 1220329 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10756 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* uss : implement statfs fop for snapdMohammed Rafi KC2015-05-102-0/+112
| | | | | | | | | | | | | | | snapview-client and snapview-server doesnot have statfs fop implemented Change-Id: I2cdd4c5784414b0549a01af9a28dbc723b7cdc67 BUG: 1176837 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10358 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: make lookup-unhashed=auto do something actually usefulJeff Darcy2015-05-1010-88/+664
| | | | | | | | | | | | | | | | | | | | | | | | The key concept here is to determine whether a directory is "clean" by comparing its last-known-good topology to the current one for the volume. These are stored as "commit hashes" on the directory and the volume root respectively. The volume's commit hash changes whenever a brick is added or removed, and a fix-layout is done. A directory's commit hash changes only when a full rebalance (not just fix-layout) is done on it. If all bricks are present and have a directory commit hash that matches the volume commit hash, then we can assume that every file is in its "proper" place. Therefore, if we look for a file in that proper place and don't find it, we can assume it's not on any other subvolume and *safely* skip the global (broadcast to all) lookup. Change-Id: Id6ce4593ba1f7daffa74cfab591cb45960629ae3 BUG: 1219637 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/7702 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli/tiering: volume info should display details about tierMohammed Rafi KC2015-05-106-3/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | >> gluster volume info patchy Volume Name: patchy Type: Tier Volume ID: 8bf1a1ca-6417-484f-821f-18973a7502a8 Status: Created Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: hostname:/home/brick30 Brick2: hostname:/home/brick31 Cold Bricks: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 2) = 6 Brick3: hostname:/home/brick20 Brick4: hostname:/home/brick21 Brick5: hostname:/home/brick23 Brick6: hostname:/home/brick24 Brick7: hostname:/home/brick25 Brick8: hostname:/home/brick26 Change-Id: I7b9025af81263ebecd641b4b6897b20db8b67195 BUG: 1212400 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10339 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bitrot: scrubber should crawl based on the scrubber frequency valueGaurav Kumar Garg2015-05-103-5/+192
| | | | | | | | | | | | | | | Currently scrubber is crawling all the files continuously. It should crawl files based on the scrubber frequency which user have set. By default scrubber crawling frequency value will be biweekly. Change-Id: I5762a92c1e700134cfe4283d1f631904adbfe31d BUG: 1208131 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10602 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* CTR/Libgfdb: Log typo fixJoseph Fernandes2015-05-101-1/+1
| | | | | | | | | | | | | | Log typo fix for CTR Xlator and Libgfdb Change-Id: I8d7208b9756199f8dc0a72771a3c953b4fa00f3f BUG: 1215896 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10668 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cli/tiering: display hot tier, and cold tier separatelyMohammed Rafi KC2015-05-093-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cli commands display the brick information without a way to distinguish hot tier, and cold tier. This patch will change all the cli related output, without changing the corresponding xml output. This patch will change following things >> gluster volume info Volume Name: patchy Type: Tier Volume ID: 7745d367-811a-4fe9-a500-d04e7afa94bf Status: Created Number of Bricks: 3 x 2 = 6 Transport-type: tcp Hot Bricks: Brick1: hostname:/home/brick21 Brick2: hostname:/home/brick20 Cold Bricks: Brick3: hostname:/home/brick19 Brick4: hostname:/home/brick16 Brick5: hostname:/home/brick17 Brick6: hostname:/home/brick18 >>gluster volume status Status of volume: patchy Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick hostname:/home/brick21 49152 0 Y 4690 Brick hostname:/home/brick20 49153 0 Y 4707 Cold Bricks: Brick hostname:/home/brick19 49154 0 Y 4724 Brick hostname:/home/brick16 49155 0 Y 4741 Brick hostname:/home/brick17 49156 0 Y 4758 Brick hostname:/home/brick18 49157 0 Y 4775 NFS Server on localhost 2049 0 Y 4793 Task Status of Volume patchy ------------------------------------------------------------------------------ There are no active volume tasks >>gluster volume status pathy detail Status of volume: patchy Hot Bricks: ------------------------------------------------------------------------------ Brick : Brick hostname:/home/brick21 TCP Port : 49162 RDMA Port : 0 Online : Y Pid : 22677 File System : ext4 Device : /dev/mapper/luks-cd077c56-42ba-44b1-8195-f214b9bc990c Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 127.3GB Total Disk Space : 165.4GB Inode Count : 11026432 Free Inodes : 10998043 ------------------------------------------------------------------------------ Brick : Brick hostname:/home/brick20 TCP Port : 49161 RDMA Port : 0 Online : Y Pid : 22660 File System : ext4 Device : /dev/mapper/luks-cd077c56-42ba-44b1-8195-f214b9bc990c Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 127.3GB Total Disk Space : 165.4GB Inode Count : 11026432 Free Inodes : 10998043 Cold Bricks: ------------------------------------------------------------------------------ Brick : Brick hostname:/home/brick19 TCP Port : 49157 RDMA Port : 0 Online : Y Pid : 22501 File System : ext4 Device : /dev/mapper/luks-cd077c56-42ba-44b1-8195-f214b9bc990c Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 127.3GB Total Disk Space : 165.4GB Inode Count : 11026432 Free Inodes : 10998043 ------------------------------------------------------------------------------ Brick : Brick hostname:/home/brick16 TCP Port : 49158 RDMA Port : 0 Online : Y Pid : 22518 File System : ext4 Device : /dev/mapper/luks-cd077c56-42ba-44b1-8195-f214b9bc990c Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 127.3GB Total Disk Space : 165.4GB Inode Count : 11026432 Free Inodes : 10998043 ------------------------------------------------------------------------------ Brick : Brick hostname:/home/brick17 TCP Port : 49159 RDMA Port : 0 Online : Y Pid : 22535 File System : ext4 Device : /dev/mapper/luks-cd077c56-42ba-44b1-8195-f214b9bc990c Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 127.3GB Total Disk Space : 165.4GB Inode Count : 11026432 Free Inodes : 10998043 ------------------------------------------------------------------------------ Brick : Brick hostname:/home/brick18 TCP Port : 49160 RDMA Port : 0 Online : Y Pid : 22552 File System : ext4 Device : /dev/mapper/luks-cd077c56-42ba-44b1-8195-f214b9bc990c Mount Options : rw,seclabel,relatime,data=ordered Inode Size : 256 Disk Space Free : 127.3GB Total Disk Space : 165.4GB Inode Count : 11026432 Free Inodes : 10998043 Change-Id: I7d584eb8782129c12876cce2ba8ffba6c0a620bd BUG: 1206546 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10328 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* bitrot: Volfile generation should not proceed if node doesn't have any brick.Gaurav Kumar Garg2015-05-092-104/+105
| | | | | | | | | | | | | | | | | | | glusterd crashes when bitrot is enabled on a distributed volume from a node which doesn't host a brick. While generating volfile glusterd should check number of brick on that node. If node doesn't have any brick then graph generation for bitrot and scrubber should not proceed further. Change-Id: I2158113e20e93738cde2a22fd73f0ae6b22aae9e BUG: 1219784 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10664 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* geo-rep: Update Not Started to Created in code and docAravinda VK2015-05-091-1/+1
| | | | | | | | | | | | "Not Started" status is now "Created", replaced "Not Started" string in code and doc. Change-Id: If7d606c2cc8156e41291e7eebe9d0da4ad7ac28d Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1219937 Reviewed-on: http://review.gluster.org/10698 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* ec: Fix failures with missing filesXavier Hernandez2015-05-095-283/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | When a file does not exist on a brick but it does on others, there could be problems trying to access it because there was some loc_t structures with null 'pargfid' but 'name' was set. This forced inode resolution based on <pargfid>/name instead of <gfid> which would be the correct one. To solve this problem, 'name' is always set to NULL when 'pargfid' is not present. Another problem was caused by an incorrect management of errors while doing incremental locking. The only allowed error during an incremental locking was ENOTCONN, but missing files on a brick can be returned as ESTALE. This caused an EIO on the operation. This patch doesn't care of errors during an incremental locking. At the end of the operation it will check if there are enough successfully locked bricks to continue or not. Change-Id: I9360ebf8d819d219cea2d173c09bd37679a6f15a BUG: 1176062 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9407 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd/tiering : cksum mismatch for tiered volumeMohammed Rafi KC2015-05-091-2/+2
| | | | | | | | | | | | | | | | | | | Once we updated the volinfo from orginator node, the hot type was overwritten with volume type. Then the same dictionary was sent to peer node to perform the commit of attach-tier, that will cause hot type to replace with volume type, eventually end up in cksum mismatch Change-Id: I402dceb4d672d0b3a7b91a92f52c1057050dbedc BUG: 1215660 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Conflicts: xlators/mgmt/glusterd/src/glusterd-brick-ops.c Reviewed-on: http://review.gluster.org/10406 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: use reference counting for mem_acct structuresJeff Darcy2015-05-092-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When freeing memory, our memory-accounting code expects to be able to dereference from the (previously) allocated block to its owning translator. However, as we have already found once in option validation and twice in logging, that translator might itself have been freed and the dereference attempt causes on of our daemons to crash with SIGSEGV. This patch attempts to fix that as follows: * We no longer embed a struct mem_acct directly in a struct xlator, but instead allocate it separately. * Allocated memory blocks now contain a pointer to the mem_acct instead of the xlator. * The mem_acct structure contains a reference count, manipulated in both the normal and translator allocate/free code using atomic increments and decrements. * Because it's now a separate structure, we can defer freeing the mem_acct until its reference count reaches zero (either way). * Some unit tests were disabled, because they embedded their own copies of the implementation for what they were supposedly testing. Life's too short to spend time fixing tests that seem designed to impede progress by requiring a certain implementation as well as behavior. Change-Id: Id929b11387927136f78626901729296b6c0d0fd7 BUG: 1211749 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/10417 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterd: Invoking daemon reconfigure functions correctlyanand2015-05-092-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | Problem : Scrub and bitd reconfigure functions were not invoking if quota is not enabled. Reason : In glusterd_svcs_reconfigure, if quota is not enabled then it is returning in the middle of the function without calling bitd and scrub reconfigure functions. Fix : If quota is not enable on volume, skip quota reconfigure and continue for other daemon reconfigure. This patch also address the state dump issue for bitd and quotad (logs the scrub and bitd info into state dump). Change-Id: I39ea004b70c95543c08496245be595b3ea044a29 BUG: 1219355 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/10622 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* cluster/dht: change log level of developer logs to DEBUGVijay Bellur2015-05-091-2/+2
| | | | | | | | | | | | | A few log messages in dht directory self heal at log level INFO are useful only for developers and these logs tend to casue excessive logs in our log files. Hence moving the log level of such logs to DEBUG. Change-Id: I8a543f4ddeb5c20b2978a0f7b18d8baccc935a54 BUG: 1217949 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/10281 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* geo-rep: rename handling in dht volume(changelog changes)Saravanakumar Arumugam2015-05-081-38/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Background: Glusterfs changelogs are stored in each brick, which records the changes happened in that brick. Georep will run in all the nodes of master and processes changelogs "independently". Processing changelogs is in brick level, but all the fops will be replayed on "slave mount" point. Problem: With a DHT volume, in changelog "internal fops" are NOT recorded. For Rename case, Rename is recorded in "hashed" brick changelog. (DHT's internal fops like creating linkto file, unlink is NOT recorded). This lead us to inconsistent rename operations. For example, Distribute volume created with Two bricks B1, B2. //Consider master volume mounted @ /mnt/master and following operations executed: cd /mnt/master touch f1 // f1 falls on B1 Hash mv f1 f2 // f2 falls on B2 Hash // Here, Changelogs are recorded as below: @B1 CREATE f1 @B2 RENAME f1 f2 Here, race exists between Brick B1 and B2, say B2 will get executed first. Source file f1 itself is "NOT PRESENT", so it will go ahead and create f2 (Current implementation). We have this problem When rename falls in another brick and file is unlinked in Master. Similar kind of issue exists in following case too(multiple rename): CREATE f1 RENAME f1 f2 RENAME f2 f1 Solution: Instead of carrying out "changelogging" at "HASHED volume", carry out at the "CACHED volume". This way we have rename operations carried out where actual files are present. So,Changelog recorded as : @B1 CREATE f1 RENAME f1 f2 Note: This patch is dependent on dht changes from this patch. http://review.gluster.org/10410/ changelog related changes are separated out for review. In changelog, xdata passed from DHT is considered as : 1. In case of unlink (internal operation as part of rename), xdata value is set , it is considered as RENAME and recorded accordingly. 2. In case of rename (Hash and Cache different), xdata value is NOT set, recording rename operation is SKIPPED. BUG: 1141379 Change-Id: Ifca675e6d4ef8c4e3b7ef4a7ec85de8b3a38dc08 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/10220 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Kotresh HR <khiremat@redhat.com>
* cli/tiering: Enhance cli output for tieringMohammed Rafi KC2015-05-082-1/+10
| | | | | | | | | | | | | | Fix for handling cli output for attach-tier and detach-tier Change-Id: I4d17f4b09612754fe1b8cec6c2e14927029b9678 BUG: 1211562 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10284 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: Fix buffer overflow in snprintfKotresh HR2015-05-081-8/+8
| | | | | | | | | | Change-Id: Ie7e7c6028c7bffe47e60a2e93827e0e8767a3d66 BUG: 1219894 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10687 Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* geo-rep: Fix corrupt gsyncd outputAravinda VK2015-05-081-2/+2
| | | | | | | | | | | | When gsyncd fails with Python traceback, glusterd fails parsing gsyncd output and shows error. BUG: 1219937 Change-Id: Ic32fd897c49a5325294a6588351b539c6e124338 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10694 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tiering: Remove unwanted check for tier type volumeMohammed Rafi KC2015-05-082-2/+3
| | | | | | | | | | Change-Id: I2def61ebf348558e5f6a138265e3329d9a5407a3 BUG: 1216898 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10494 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System
* nfs.c nfs3.c: port log messages to a new frameworkHari Gowtham2015-05-083-375/+585
| | | | | | | | | | Change-Id: I9ddb90d66d3ad3adb2916c0c949834794ee7bdf3 BUG: 1194640 Signed-off-by: Hari Gowtham <hari.gowtham005@gmail.com> Reviewed-on: http://review.gluster.org/10216 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Meghana M <mmadhusu@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/bit-rot-stub: versioning of objects in write/truncate fop instead ↵Raghavendra Bhat2015-05-086-339/+1101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of open * This patch brings in the changes where object versioning is done in write and truncate fops instead of tracking them in open and create fops. This model works for both regular and anonymous fds. It also removes the race associated with open calls, create and lookups. This patch follows the below method for object versioning and notifications: Before sending writev on the fd, increase the ongoing version first. This makes anonymous fd write similar to the regular fd write by having the ongoing version increased before doing the write. Do following steps to do versioning: 1) For anonymous fds set the fd context (so that release is invoked) and add the fd context to the list maintained in the inode context. For regular fds the above think would have been done in open itself. 2) Increase the on-disk ongoing version 3) Increase the in memory ongoing version and mark inode as non-dirty 3) Once versioning is successfully done send write operation. If versioning fails, then fail the write fop. 5) In writev_cbk mark inode as modified. Change-Id: I7104391bbe076d8fc49b68745d2ec29a6e92476c BUG: 1207979 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/10233 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: add counter support for tiered volumesDan Lambright2015-05-085-1/+79
| | | | | | | | | | | | | | | | This fix adds support to view the number of promoted or demoted files from the cli. The mechanism is isolmorphic to checking the status of volumes being rebalanced. gluster volume rebalance <vol> tier status Change-Id: I1b11ca27355ceec36c488967c23531202030e205 BUG: 1213063 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10292 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/changelog: Fixing compilation issueAravinda VK2015-05-082-2/+5
| | | | | | | | | | | | This issue introduced due to manual rebase. Change-Id: I0589f4a0a1270190340f419b8022d6483bcf853d Signed-off-by: Aravinda VK <avishwan@redhat.com> BUG: 1219479 Reviewed-on: http://review.gluster.org/10685 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com>
* cluster/ec: Change meaning of trusted.ec.dirtyPranith Kumar K2015-05-087-208/+555
| | | | | | | | | | | | | | | | - With this change, the xattr will represent if the file needs to be healed or not. It will have different values for data/entry and metadata changes. - inode ref leaks and dict_set_dynstr related leaks fixed - Added support for trylock/lock based on heal-cmd execution or not in data heal. - Made fixes to pass regression runs Change-Id: I9d8def4c2badde18a76b7898816fecfac113737a BUG: 1215265 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10385 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: data heal implementation for ecPranith Kumar K2015-05-083-8/+788
| | | | | | | | | | | | | | | | | | | | | | | | | Data self-heal: 1) Take inode lock in domain 'this->name:self-heal' on 0-0 range (full file), So that no other processes try to do self-heal at the same time. 2) Take inode lock in domain 'this->name' on 0-0 range (full file), 3) perform fxattrop+fstat and get the xattrs on all the bricks 3) Choose the brick with ec->fragment number of same version as source 4) Truncate sinks 5) Unlock lock taken in 2) 5) For each block take full file lock, Read from sources write to the sinks, Unlock 6) Take full file lock and see if the file is still sane copy i.e. File didn't become unusable while the bricks are offline. Update mtime to before healing 7) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 8) unlock lock acquired in 6) 9) unlock lock acquired in 1) Change-Id: I6f4d42cd5423c767262c9d7bb5ca7767adb3e5fd BUG: 1215265 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10384 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: metadata/name/entry heal implementation for ecPranith Kumar K2015-05-082-0/+1058
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Metadata self-heal: 1) Take inode lock in domain 'this->name' on 0-0 range (full file) 2) perform lookup and get the xattrs on all the bricks 3) Choose the brick with highest version as source 4) Setattr uid/gid/permissions 5) removexattr stale xattrs 6) Setxattr existing/new xattrs 7) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 8) unlock lock acquired in 1) Entry self-heal: 1) take directory lock in domain 'this->name:self-heal' on 'NULL' to prevent more than one self-heal 2) we take directory lock in domain 'this->name' on 'NULL' 3) Perform lookup on version, dirty and remember the values 4) unlock lock acquired in 2) 5) readdir on all the bricks and trigger name heals 6) xattrop with -ve values of 'dirty' and difference of highest and its own version values for version xattr 7) unlock lock acquired in 1) Name heal: 1) Take 'name' lock in 'this->name' on 'NULL' 2) Perform lookup on 'name' and get stat and xattr structures 3) Build gfid_db where for each gfid we know what subvolumes/bricks have a file with 'name' 4) Delete all the stale files i.e. the file does not exist on more than ec->redundancy number of bricks 5) On all the subvolumes/bricks with missing entry create 'name' with same type,gfid,permissions etc. 6) Unlock lock acquired in 1) Known limitation: At the moment with present design, it conservatively preserves the 'name' in case it can not decide whether to delete it. this can happen in the following scenario: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) rename d1/f1 -> d2/f2 is performed but the rename is successful only on one of the bricks (Lets say B) 3) Now name self-heal on d1 and d2 would re-create the file on both d1 and d2 resulting in d1/f1 and d2/f2. Because we wanted to prevent data loss in the case above, the following scenario is not healable, i.e. it needs manual intervention: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) We have two hard links: d1/a, d2/b and another file d3/c even before the brick went down 3) rename d3/c -> d2/b is performed 4) Now name self-heal on d2/b doesn't heal because d2/b with older gfid will not be deleted. One could think why not delete the link if there is more than 1 hardlink, but that leads to similar data loss issue I described earlier: Scenario: 1) we have 3=2+1 (bricks: A, B, C) ec volume and 1 brick is down (Lets say A) 2) We have two hard links: d1/a, d2/b 3) rename d1/a -> d3/c, d2/b -> d4/d is performed and both the operations are successful only on one of the bricks (Lets say B) 4) Now name self-heal on the 'names' above which can happen in parallel can decide to delete the file thinking it has 2 links but after all the self-heals do unlinks we are left with data loss. Change-Id: I3a68218a47bb726bd684604efea63cf11cfd11be BUG: 1213358 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10298 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: Avoid creation of empty changelogsSaravanakumar Arumugam2015-05-083-23/+154
| | | | | | | | | | | | | | | | | | An empty changelog when rolled over gets unlinked and indexed with a modified path-name in htime file. The modification is "changelog" not "CHANGELOG" in basename of the empty changelog file. Change-Id: I77fd0b48b5c33c245418f5ac7a9756f08ece24d9 BUG: 1208470 Signed-off-by: Ajeet Jha <ajha@redhat.com> Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/9572 Tested-by: NetBSD Build System Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kotresh HR <khiremat@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: Fixing dereference after null checkarao2015-05-081-13/+38
| | | | | | | | | | | | | | CID: 1223229 The 'loc' ptr is not checked before dereferencing, which is handled. Change-Id: Icf668150bde190e6f1b9f58a038099338516efe8 BUG: 789278 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/9666 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/bitrot: Scrubber pause/resumeVenky Shankar2015-05-083-9/+58
| | | | | | | | | | | | | | | | | | | | | | With logical scan/scrub split, pausing filesystem scrubber is an override to the thread throttling mechanism, which effectively throttles "down" number of scrubber threads to zero. This causes scanner to wait until threads are spawned again (when resumed) thereby continuing where it left off (since the file tree walk stack is effectively preserved when the main scanner thread is waiting for scrubbers to consume scanned entries). The only catch is when scrubber daemon restarts: file tree walk stack is lost and scrubbing initiates from root. This is probably OK for now (can be changed later to persist parent directory information before entering pause state). Change-Id: I5109a749b7fccd0f5367765078f46e6522dd32a1 BUG: 1208131 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10521 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* tiering: Do not allow some operations on tiered volumeMohammed Rafi KC2015-05-075-0/+111
| | | | | | | | | | | | | | | | | Some operations like add-brick,remove-brick,rebalance, replace-brick are not supported on tiered volume. But there is no code level check for this. This patch will allow to do the same Change-Id: I12689f4e902cf0cceaf6f7f29c71057305024977 BUG: 1205624 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/10349 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* features/bitrot: Throttle filesystem scrubberVenky Shankar2015-05-075-51/+712
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces multithreaded filesystem scrubber based on throttling option configured for a particular volume. The implementation "logically" breaks scanning and scrubbing with the number of scrubber threads auto-configured depending upon the throttle configuration. Scanning (crawling) is left single threaded (per brick) with entries scrubbed in bulk. On reaching this "bulk" watermark, scanner waits until entries are scrubbed. Bricks for a particular volume have a set of thread(s) assigned for scrubbing, with entries for each brick scrubbed in a round robin fashion to avoid scrub "stalls" when a brick (out of N bricks) is under active scrubbing. This mechanism helps us implement "pause/resume" with ease: all one need to do is to cleanup scrubber threads and let the main scanner thread "wait" untill scrubbing is resumed (where the scrubber thread(s) are spawned again), therefore continuing where we left off (unless we restart the deamons, where crawl initiates from root directory again, but I guess that's OK). [ NOTE: Throttling is optional for the signer daemon, without which it runs full throttle. However, passing "-DBR_RATE_LIMIT_SIGNER" predefined in CFLAGS enables CPU throttling (during checksum calculation) thereby avoiding high CPU usage. ] Subsequent patches would introduce CPU throttling during hash calculation for scrubber. Change-Id: I5701dd6cd4dff27ca3144ac5e3798a2216b39d4f BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10511 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/shard: Implement [f]truncate fopsKrutika Dhananjay2015-05-073-309/+804
| | | | | | | | | | | | | | To-Do: * Make ftruncate work even in the absence of path * Aggregate and update ia_blocks appropriately when a file is truncated to a lower size. Change-Id: Ifd24c2f5e80d2c3bc921261f5481251df8948126 BUG: 1207615 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/10631 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* nfs: allocate the auth_cache->cache_dict on auth_cache_init()Niels de Vos2015-05-071-14/+10
| | | | | | | | | | | | | | | | It seems possible that auth_cache->cache_dict is not always allocated before it is accessed. Instead of allocating the dict upon the 1st access, just create it in auth_cache_init(). Change-Id: I00e60522478b433cb0aae0c1f0948eac544dfd2b URL: http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10710 BUG: 1143880 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10600 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: NetBSD Build System
* glupy: fix tuntime search path and python module directory layoutEmmanuel Dreyfus2015-05-073-3/+11
| | | | | | | | | | | | | | | | | | | | 1) The glupy.so xlator should embed the runtime search path for the python libraries. Unfortunately, python-config does not gives the appprioate flags, therefore we need to also use pkg-config to obtain them 2) Fix the glupy python module directory layout so that python can import the module without problem That two fixes seems to let glupy.t pass on NetBSD again. BUG: 1129939 Change-Id: I397aa726ab8bf7d91fa0d6d870a30910a5f4a5d9 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10616 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Restore build on non Linux systemsEmmanuel Dreyfus2015-05-072-4/+7
| | | | | | | | | | | | | | | | | | | This change broke the build on NetBSD, FreeBSD, and MacOS X: http://review.gluster.org/10526/ We restore the build with two fixes: - Use POSIX-compliant sysconf(_SC_NPROCESSORS_ONLN) to get the number of processors, instead of Linux specific get_nprocs(). That let us remove Linux-specific #include <sys/sysinfo.h> - Only define MAX() if it is not already defined. NetBSD defines it in <sys/param.h> which is already included BUG: 1129939 Change-Id: I62341c670598670e47ea2f69ab94864f96588b18 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10652 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Token Bucket based throttlingVenky Shankar2015-05-076-9/+432
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BitRot daemons (signer & scrubber) are disk/cpu hoggers when left running full throttle. Checksum calculations (especially SHA family of hash routines) can be quite CPU intensive. Moreover periodic disk scans performed by scrubber followed by reading data blocks for hash calculation (which is also done by signer) generate lot of heavy IO request(s). This causes interference with actual client operations (be it a regular client or filesystems daemons such as self-heal, etc..) and results in degraded system performance. This patch introduces throttling based on Token Bucket Filtering[1]. It's a well known algorithm for checking (and ensuring) that data transmission conform to defined limits and generally used in packet switched networks. Linux control groups (Cgroups) uses a variant[2] of this algorithm to provide block device IO throttling (cgroup subsys "blkio": blk-iothrottle). So, why not just live with Cgroups? Cgroups is linux specific. We need to have a throttling mechanism for other supported UNIXes. Moreover, having our own implementation gives much more finer control in terms of tuning it for our needs (plus the simplicity of the alogorithm itself). Ideally, throttling should be a part of server stack (either as a separate translator or integrated with io-threads) since that's the point of entry for IO request(s) from *all* client(s). That way one could selectively throttle IO request(s) based on client PIDs (frame->root->pid), e.g., self-heal daemon, bitrot, etc.. (*actual* clients can run full throttle). This implementation avoids that deliberately (there needs to be a much more smarter queueing mechanism) and throttles CPU usage for hash calculations. This patch is just the infrastructure part with no interfaces exposed to set various throttling values. The tunable selected here (basically hardcoded) avoids 100% CPU usage during hash calculation (with some bursts cycles). We'd need much more intensive test(s) to assign values for various throttling options (lazy/normal/aggressive). [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://en.wikipedia.org/wiki/Token_bucket#Hierarchical_token_bucket Change-Id: Icc49af80eeab6adb60166d0810e69ef37cfe2fd8 BUG: 1207020 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10307 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* snapshot: Handshake with glusterd is not properMohammed Rafi KC2015-05-071-20/+127
| | | | | | | | | | | | | | | | | | | | If a snap is activated or deactivated, when a node is down, it is not retrieving the data properly during the handshake of glusterd With this patch, a version check will made when a glusterd is started running. If there is a mismach in version, then peers will exchange the healed data. Change-Id: I8bd2a347723db2194d3fa73295878b4dd2e9be5d BUG: 1122377 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/9664 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Kaushal M <kaushal@redhat.com>
* stripe: set ENOENT when a READ hits EOFNiels de Vos2015-05-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | The NFS-server sets EOF only in the READ reply when op_errno is set to ENOENT. Xlators are expected to set op_errno to ENOENT when EOF is reached, op_ret will contain the number of bytes returned by the READ. When an NFS-client (like VMware ESXi) do a READ that exceeds the size of the file, errno should be set to EOF and the return value contains the number of bytes that are read (from the requested offset, until the end of the file). Not setting EOF on a correct short READ, can result in errors on the NFS-client. This is not an issue with the Linux NFS-client (or VFS). Linux is smart enough to not try to read more bytes than the file contains. BUG: 1209298 Change-Id: Ib15538744908a6001d729288d3e18a432d19050b Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10142 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
* dht/rebalance: Throttle rebalanceSusant Palai2015-05-074-4/+169
| | | | | | | | | | | | Throttle value will be "normal" by default. For throttling down, a thread will be put in to sleep. And for throttling up, gf_defrag_process_dir will wake up the sleeping threads. Change-Id: I74d530e3effd6e60e6eec81ccc8ff65789fa9c13 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/10526 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/afr : Prevent inode-evict during split-brain resolutionAnuradha2015-05-075-58/+303
| | | | | | | | | | | | | | | | | | | | | 1) Provided setfattr command to set timeout for split-brain choice. 2) If split-brain inspection/resolution is being done from the mount for a file, ref the inode when split-brain-choice is set. This inode will be unconditionally unref-ed after timeout seconds set by the user/default otherwise. 3) Updated the doc and testcase to reflect the changes. Change-Id: I15c9037dee28855f21e680e7e3632e1f48dba4e1 BUG: 1209104 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/10134 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* bitrot/glusterd: Bitrot scrub pause/resume should give proper errorGaurav Kumar Garg2015-05-071-6/+40
| | | | | | | | | | | | | | bitrot scrubber paused/resume command should give proper error messages if scrubber already pause/resume and user again try to perform same operation on a volume. Change-Id: I01ad69c80f03b177535a4e5f1c95ab7709a804b0 BUG: 1210684 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10209 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* glusterd: Use generation number to find peerinfo in RPC notificationsKaushal M2015-05-075-8/+53
| | | | | | | | | | | | | | | | | | The generation number for each peerinfo object is unique. It can be used to find the exact peerinfo object, which is required for peer RPC notifications. Using hostname and uuid matching to find peerinfos can return incorrect peerinfos to be returned in certain cases like multi network peer probe. This could cause updates to happen to incorrect peerinfos. Change-Id: Ia0aada8214fd6d43381e5afd282e08d53a277251 BUG: 1215018 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/10495 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: remove replace brick with data migration support form cli/glusterdGaurav Kumar Garg2015-05-0710-1830/+94
| | | | | | | | | | | | | | | | Replace-brick operation with data migration support have been deprecated from gluster. With this fix replace brick command will support only one commad gluster volume replace-brick <VOLNAME> <SOURCE-BRICK> <NEW-BRICK> {commit force} Change-Id: Ib81d49e5d8e7eaa4ccb5830cfec2bc081191b43b BUG: 1094119 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10101 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* features/changelog: Version support for Changelog ParserAravinda VK2015-05-062-11/+70
| | | | | | | | | | | | | | | | | | As optional feature, during unlink, full path will be recorded. Changelog Version number to be bumped up to 1.2. With this patch, parser checks the version number before parsing and handles accordingly. Change-Id: Ic1ad98259c39e417029a08e26a1d4b467817e65a BUG: 1214561 Signed-off-by: Aravinda VK <avishwan@redhat.com> Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/10166 Reviewed-by: Saravanakumar Arumugam <sarumuga@redhat.com> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/ec: add separate versions for data/entry, metadataAshish Pandey2015-05-068-29/+104
| | | | | | | | | | | | | | | | | | | Adding 64 bits in "version" key of extended attributes. First 64 bits (Left) represents Data version. Last 64 bits (right) represents Meta Data version. Note: 3.7 and 3.6 version ec can't co-exist with this change because xattrop in 3.6 will fail with ERANGE as the buffer passed to it will be '8' bytes where as the value will be 16 bytes in 3.7. Where as 3.7 version clients can work with old version files. For upgrades we need to tell users to complete heals and then upgrade BUG: 1215265 Change-Id: Ib85114680cb7e75b8371c984d9f7b6401c1ffb93 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/10312 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* quota/marker: turn off inode quotas by defaultvmallika2015-05-069-37/+213
| | | | | | | | | | | | | | | | | | | | | inode quota is a new feature implemented in glusterfs-3.7 if quota is enabled in the older version and is upgraded to a new version, we can hit setxattr spike during self-heal of inode quotas. So, when a quota is enabled, turn off inode-quotas with a xlator option. With this patch, we still account for inode quotas but only when a write operation is performed for a particular file. User will be able to query inode quotas once the Inode-quota xlator option is enabled. Change-Id: I52fb28bf7024989ce7bb08ac63a303bf3ec1ec9a BUG: 1209430 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/10152 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* NFS-Ganesha : Locking global options fileMeghana Madhusudhan2015-05-064-17/+84
| | | | | | | | | | | | | | | | | | | | | Global option gluster features.ganesha enable writes into the global 'option' file. The snapshot feature also writes into the same file. To handle concurrent multiple transactions correctly, a new lock has to be introduced on this file. Every operation using this file needs to contest for the new lock type. Change-Id: Ia8a324d2a466717b39f2700599edd9f345b939a9 BUG: 1200254 Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com> Reviewed-on: http://review.gluster.org/10130 Reviewed-by: Avra Sengupta <asengupt@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Tested-by: NetBSD Build System Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* ctr/xlator: Named lookup heal of pre-existing files, before ctr was ON.Joseph Fernandes2015-05-066-4/+945
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The CTR xlator records file meta (heat/hardlinks) into the data. This works fine for files which are created after ctr xlator is switched ON. But for files which were created before CTR xlator is ON, CTR xlator is not able to record either of the meta i.e heat or hardlinks. Thus making those files immune to promotions/demotions. Solution: The solution that is implemented in this patch is do ctr-db heal of all those pre-existent files, using named lookup. For this purpose we use the inode-xlator context variable option in gluster. The inode-xlator context variable for ctr xlator will have the following, a. A Lock for the context variable b. A hardlink list: This list represents the successful looked up hardlinks. These are the scenarios when the hardlink list is updated: 1) Named-Lookup: Whenever a named lookup happens on a file, in the wind path we copy all required hardlink and inode information to ctr_db_record structure, which resides in the frame->local variable. We dont update the database in wind. During the unwind, we read the information from the ctr_db_record and , Check if the inode context variable is created, if not we create it. Check if the hard link is there in the hardlink list. If its not there we add it to the list and send a update to the database using libgfdb. Please note: The database transaction can fail(and we ignore) as there already might be a record in the db. This update to the db is to heal if its not there. If its there in the list we ignore it. 2) Inode Forget: Whenever an inode forget hits we clear the hardlink list in the inode context variable and delete the inode context variable. Please note: An inode forget may happen for two reason, a. when the inode is delete. b. the in-memory inode is evicted from the inode table due to cache limits. 3) create: whenever a create happens we create the inode context variable and add the hardlink. The database updation is done as usual by ctr. 4) link: whenever a hardlink is created for the inode, we create the inode context variable, if not present, and add the hardlink to the list. 5) unlink: whenever a unlink happens we delete the hardlink from the list. 6) mknod: same as create. 7) rename: whenever a rename happens we update the hardlink in list. if the hardlink was not present for updation, we add the hardlink to the list. What is pending: 1) This solution will only work for named lookups. 2) We dont track afr-self-heal/dht-rebalancer traffic for healing. Change-Id: Ia4bbaf84128ad6ce8c3ddd70bcfa82894c79585f BUG: 1212037 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/10370 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs : fix for coredump caused by export/netgroup feature in the regressionJiffin Tony Thottan2015-05-061-0/+1
| | | | | | | | | Change-Id: Idaae234b9e81c40040393e748db1f61363a48ed0 BUG: 1211913 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Reviewed-on: http://review.gluster.org/10250 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>