summaryrefslogtreecommitdiffstats
path: root/xlators/cluster
Commit message (Collapse)AuthorAgeFilesLines
* cluster/ec: Use fd when appropriate for updating size/versionPranith Kumar K2015-03-271-3/+9
| | | | | | | | | | Change-Id: I5d3aca101c8cdda406d31d06c40404fa6a2b7170 BUG: 1192378 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* libxlator: Change marker xattr handling interfacePranith Kumar K2015-03-258-188/+88
| | | | | | | | | | | | | | | | | | | | | - Changed the implementation of marker xattr handling to take just a function which populates important data that is different from default 'gauge' values and subvolumes where the call needs to be wound. - Removed duplicate code I found while reading the code and moved it to cluster_marker_unwind. Removed unused structure members. - Changed dht/afr/stripe implementations to follow the new implementation - Implemented marker xattr handling for ec. Change-Id: Ib0c3626fe31eb7c8aae841eabb694945bf23abd4 BUG: 1200372 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9892 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Fixing build of xlator/cluster/dht/src/dht-rebalance.c when tiering is disabledJoseph Fernandes2015-03-252-4/+8
| | | | | | | | | | | | | 1) Removed unnecessary include tier.h in dht-rebalance.c 2) tier xlator will only compile when tiering is enabled in configure.ac Change-Id: Ia21aa9ff403506dc898a83236e9e2d382a0594da BUG: 1204604 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/9973 Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Sachin Pandit <spandit@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/bit-rot: Implementation of bit-rot xlatorVenky Shankar2015-03-241-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the "Signer" -- responsible for signing files with their checksums upon last file descriptor close (last release()). The event notification facility provided by the changelog xlator is made use of. Moreover, checksums are as of now SHA256 hash of the object data and is the only available hash at this point of time. Therefore, there is no special "what hash to use" type check, although it's does not take much to add various hashing algorithms to sign objects with. Signatures are stored in extended attributes of the objects along with the the type of hashing used to calculate the signature. This makes thing future proof when other hash types are added. The signature infrastructure is provided by bitrot stub: a little piece of code that sits over the POSIX xlator providing interfaces to "get or set" objects signature and it's staleness. Since objects are signed upon receiving release() notification, pre-existing data which are "never" modified would never be signed. To counter this, an initial crawler thread is spawned The crawler scans the entire brick for objects that are unsigned or "missed" signing due to the server going offline (node reboots, crashes, etc..) and triggers an explicit sign. This would also sign objects when bit-rot is enabled for a volume and/or after upgrade. Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b BUG: 1170075 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9711 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Refactor ec-dir-writePranith Kumar K2015-03-242-670/+158
| | | | | | | | | | | | - Also fixed iatt_combine to go over all the valid iatts Change-Id: I1d52d705ed0437f602357acde3e479cedb748681 BUG: 1199767 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9827 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* build: tier.h does not need to include sys/xattr.hNiels de Vos2015-03-231-1/+0
| | | | | | | | | | | | | FreeBSD does not have sys/xattr.h, including it in tier.h breaks building on FreeBSD. There is nothing in tier.h that seems to require definitions from the sys/xattr.h header, just remove it. BUG: 1194753 Change-Id: If970272a0ce7728e0f18e5ae026880688ac31408 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9965 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Sachin Pandit <spandit@redhat.com>
* build: pass SQLITE_CFLAGS while building dht files that include tier.hNiels de Vos2015-03-231-2/+1
| | | | | | | | | | | | | | | | Building on FreeBSD has been broken by http://review.gluster.org/9724 which introduces the cluster/tiering xlator. The CFLAGS passed to the compiler do not include the path where sqlite3.h can be found. In fact, an attempt was made to pass the flags on, but a later variable overwrite these again. BUG: 1194753 Change-Id: I1c890fa9a0d82492726306fe6b03bd50ca985e31 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Sachin Pandit <spandit@redhat.com>
* cluster/dht: Add tier translator.Dan Lambright2015-03-2112-31/+1300
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tier translator shares most of DHT's code. It differs in how subvolumes are chosen for I/Os, and how file migration (cache promotion and demotion) is managed. That different functionality is split to either DHT or tier logic according to the "tier_methods" structure. A cache promotion and demotion thread is created in a manner similar to the rebalance daemon. The thread operates a timing wheel which periodically checks for promotion and demotion candidates (files). Candidates are queued and then migrated. Candidates must exist on the same node as the daemon and meet other critera per caching policies. This patch has two authors (Dan Lambright and Joseph Fernandes). Dan did the DHT changes and Joe wrote the cache policies. The fix depends on DHT readidr changes and the database library which have been submitted separately. Header files in libglusterfs/src/gfdb should be reviewed in patch 9683. For more background and design see the feature page [1]. [1] http://www.gluster.org/community/documentation/index.php/Features/data-classification Change-Id: Icc26c517ccecf5c42aef039f5b9c6f7afe83e46c BUG: 1194753 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9724 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-195-33/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/quota : Introducing inode quotavmallika2015-03-182-52/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ========================================================================== Inode quota ========================================================================== = Currently, the only way to retrieve the number of files/objects in a = = directory or volume is to do a crawl of the entire directory/volume. = = This is expensive and is not scalable. = = = = The proposed mechanism will provide an easier alternative to determine = = the count of files/objects in a directory or volume. = = = = The new mechanism proposes to store count of objects/files as part of = = an extended attribute of a directory. Each directory's extended = = attribute value will indicate the number of files/objects present = = in a tree with the directory being considered as the root of the tree. = = = = The count value can be accessed by performing a getxattr(). = = Cluster translators like afr, dht and stripe will perform aggregation = = of count values from various bricks when getxattr() happens on the key = = associated with file/object count. = A new interface is introduced: ------------------------------ limit-objects : limit the number of inodes at directory level list-objects : list the directories where the limit is set remove-objects : remove the limit from the directory ========================================================================== CLI COMMAND: gluster volume quota <volname> limit-objects <path> <number> [<percent>] * <number> is a hard-limit for number of objects limitation for path "<path>" If hard-limit is exceeded, creation of file/directory is no longer permitted. * <percent> is a soft-limit for number of objects creation for path "<path>" If soft-limit is exceeded, a warning is issued for each creation. CLI COMMAND: gluster volume quota <volname> remove-objects [path] ========================================================================== CLI COMMAND: gluster volume quota <volname> list-objects [path] ... Sample output: ------------------ Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------ -------------------------------------- /dir 10 80% 10 0 Yes Yes ========================================================================== [root@snapshot-28 dir]# ls a b file11 file12 file13 file14 file15 file16 file17 [root@snapshot-28 dir]# touch a1 touch: cannot touch `a1': Disk quota exceeded * Nine files are created in directory "dir" and directory is included in * the count too. Hence the limit "10" is reached and further file creation fails ========================================================================== Note: We have also done some re-factoring in cli for volume name validation. New function cli_validate_volname is created ========================================================================== Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b BUG: 1190108 Signed-off-by: Sachin Pandit <spandit@redhat.com> Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9769 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Change the subvolume encoding in d_off to be a "global"Dan Lambright2015-03-189-176/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* dht: suggest to add more bricks when min-free-disk is exceeded.Humble Devassy Chirammal2015-03-182-3/+3
| | | | | | | | | | Change-Id: I628fbd99c2478fcb8bb6e5be55e43467f25227bf BUG: 1165870 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9879 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Lalatendu Mohanty <lmohanty@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/afr: Make read child match check in afr optionalKrutika Dhananjay2015-03-183-0/+19
| | | | | | | | | | | | | | | Having this particular check which was introduced by commit c78998c39f0857ea7aacba360632c148afc54a55 causes a drop in performance in readdirp. So the behavior is made configurable with this patch. Change-Id: I2858fc18b3539df7aa6d3f489e0d5cfaeb8a9b3c BUG: 1202669 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9917 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: remove stale index entriesRavishankar N2015-03-173-4/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: During pre-op phase, the index xlator 1. Creates the entry inside .glusterfs/indices/xattrop 2. Winds the xattrop fop to posix to mark dirty/pending changelogs. If the brick crashes after 1, the xattrop entry becomes stale and never gets removed by shd during subsequent crawls because there is nothing to heal (changelogs are zero). Though the stale entry does not get displayed in the output of 'heal info' command, it nevertheless stays there forever unless a new write transaction is performed on the file. Fix: During index self-heal if afr xattrs are found to be clean (indicated by ret value of 2 on a call to afr_shd_selfheal(), send a dummy post-op with all 0s for the xattr values, which makes the index xlator to unlink the stale entry. Change-Id: I02cb2bc937f2e3f3f3cb35d67b006664dc7ef919 BUG: 1190069 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9714 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Convert quota size from n/w to host order before useKrutika Dhananjay2015-03-121-0/+4
| | | | | | | | | | | | Change-Id: I3e4fe15716556441546fcd62b8ac2833869b21cf BUG: 1200670 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9853 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/ec: Add self-heal-daemon command handlersPranith Kumar K2015-03-098-39/+719
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the changes required in ec xlator to handle index/full heal. Index healer threads: Ec xlator start an index healer thread per local brick. This thread keeps waking up every minute to check if there are any files to be healed based on the indices kept in index directory. Whenever child_up event comes, then also this index healer thread wakes up and crawls the indices and triggers heal. When self-heal-daemon is disabled on this particular volume then the healer thread keeps waiting until it is enabled again to perform heals. Full healer threads: Ec xlator starts a full healer thread for the local subvolume provided by glusterd to perform full crawl on the directory hierarchy to perform heals. Once the crawl completes the thread exits if no more full heals are issued. Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic. Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9787 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Handle getxattr of quota-size keyPranith Kumar K2015-03-093-61/+103
| | | | | | | | | | | | | Afr needs to query QUOTA_SIZE_KEY from all the subvolumes and return the value which is maximum of the readable bricks. Change-Id: Ibb9064c8652aea0d984796e7a06f8adca72aa971 BUG: 1199431 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9820 Reviewed-by: Anuradha Talur <atalur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
* cluster/dht: create request dictionary if necessary during refreshRaghavendra G2015-03-061-10/+17
| | | | | | | | | layout. Change-Id: I5a5d793c86ee5de345608eede5618e4e6c02af9f BUG: 1195668 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/9733
* cluster/afr: Implementation of quorum-readsPranith Kumar K2015-03-055-2/+32
| | | | | | | | | | | Provide a way of disabling reads when quorum is not met. Change-Id: Ic4f57c2b87a0b8514600759de3a7a47e217fe3b5 BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9543 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Allow heal on name less locPranith Kumar K2015-03-053-14/+39
| | | | | | | | | | | | loc->parent may not always be populated. Even in those cases, self-heal should happen if it can be completed using nameless loc. Change-Id: I8871fc811bec8b881ae7fb09dcd202c6693b9877 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9717 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* testing: Switch to cmocka the successor of cmockery2Niels de Vos2015-03-053-18/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | This uses https://cmocka.org/ as the unit testing framework. With this change, unit testing is made optional as well. We assume there is no cmocka available while building. cmocka will be enabled by default later on. For now, to build with cmocka run: $ ./configure --enable-cmocka This change is based on the work of Andreas (replacing cmockery2 with cmocka) and Kaleb (make cmockery2 an optional build dependency). The only modifications I made, are additional #defines in unittest.h for making sure the unit tests function as expected. Change-Id: Iea4cbcdaf09996b49ffcf3680c76731459cb197e BUG: 1067059 Merged-change: http://review.gluster.org/9762/ Signed-off-by: Andreas Schneider <asn@samba.org> Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Change-Id: Ia2e955481c102d5dce17695a9205395a6030e985 Reviewed-on: http://review.gluster.org/9738 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* xlators/dht : divide by zero coverity fixManikandan Selvaganesh2015-03-051-1/+2
| | | | | | | | | | | | | CID:1226163. BUG: 789278 Change-Id: Ie31d65da236d7029784defad963672b2ded2676a BUG:1192435 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9563 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: fixes to should_layout_fix logicRaghavendra G2015-03-052-12/+53
| | | | | | | | | | | | | | * Don't consider "dir-spread-count" option. This option is not supported. * Consider transition to weighted to equal distribution or vice-versa a valid case for fixing the layout. Change-Id: I0dcfe555dae9269ce20a41611cfdaa4f96c9e98b BUG: 1196615 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/9809 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/afr: Do not increment healed_count if no healing was performedKrutika Dhananjay2015-03-047-67/+92
| | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: When file modifications are happening while index heal is launched, index healer could pick up entries which appeared in indices/xattrop transiently during the course of the operations on the mount point, and do not really need any heal. This will cause index healer to keep doing index-heal in a loop as long as it finds this entry, by believing that it did successfully heal some gfids even when it didn't. FIX: afr_selfheal() now returns a 1 to indicate that it did not (need to) heal a given gfid. afr_shd_selfheal() will not increment healed_count whenever afr_selfheal() returns a 1. Change-Id: I0d97e11392a032a852e8c6508f691300ef0e5b98 BUG: 1194305 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9713 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glfs_fini: Clean up all the resources allocated in glfs_new.Poornima G2015-03-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially even after calling glfs_fini(), all the threads created during init and many other resources like memory pool, iobuf pool, event pool and other memory allocs were not being freed. With this patch these resources are freed in glfs_fini(). The two thumb rules followed in this patch are: - The threads are not killed, they are made to exit voluntarily, once the queued tasks are completed. The main thread waits for the other threads to exit. - Free the memory pools and destroy the graphs only after all the other threads are stopped, so that there are less chances of hitting access after free. Resources freed and its order: 1. Destroy the inode table of all the graphs - Call forget on all the inodes. This will not be required when the cleanup during graph switch is implemented to perform inode table destroy. 2. Deactivate the current graph, call fini of all the xlators. 3. Syncenv destroy - Join the synctask threads and cleanup syncenv resources Sets the destroy mode, complete the existing synctasks, then join the synctask threads. After entering the destroy mode, -if a new synctask is submitted, it fails. -if syncenv_new() is called, it will end up creating new threads, but this is called only during init. 4. Poller thread destroy Register an event handler which sets the destroy mode for the poller. Once the poller is done processing all the events, it exits. 5. Tear down the logging framework The log file is closed and the log level is set to none, after this point no log messages appear either in log file or in stderr. 6. Destroy the timer thread Set the destroy bit, once the pending timer events are processed the timer thread exits. Note: Log infrastructure should be shutdown before destroying the timer thread as gf_log uses timers. 7. Destroy the glusterfs_ctx_t For all the graphs(active and passive), free graph, xlator structs and few other lists. Free the memory pools - iobuf pool, event pool, dict, logbuf pool, stub mem pool, stack mem pool, frame mem pool. Few things not addressed in this patch: 1. rpc_transport object not destroyed, the PARENT_DOWN should have destroyed this object but has not, needs to be addressed as a part of different patch 2. Each xlator fini should clean up the local pool allocated by its xlator. Needs to be addresses as a part of different patch. 3. Each xlator should implement forget to free its inode_ctx. Needs to be addresses as a part of different patch. 3. Few other leaks reported by valgrind. 4. fd and fd contexts The numbers: The resource usage by the test case in this patch: Without the fix, Memory: ~3GB; Threads: ~81 With this fix, Memory: 300MB; Threads: 1(main thread) Change-Id: I96b9277541737aa8372b4e6c9eed380cb871e7c2 BUG: 1093594 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/7642 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/dht: Fixes to should_fix_layout logicRaghavendra G2015-03-031-6/+38
| | | | | | | | | | | | | | | | | * With recent introduction of locking in self-heal codepath, fix layout was not allowed to progress during remove-brick. This patch fixes the issue. * dht_should_fix_layout also considers "dir-spread-count" option if set, to determine whether we should proceed with fix-layout or not. Change-Id: Icd96986f7af705744131d62e7f1456114ac1ee53 BUG: 1196615 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/9764 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* dht : logically dead code removedManikandan Selvaganesh2015-03-032-12/+4
| | | | | | | | | | | | | CID :1124378 1124401 Change-Id: Ib48e4a8d3fb12c4e0323a3946afb46eeb3926984 BUG: 789278 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9584 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* libglusterfs: Moved common functions as utils in syncop/common-utilsPranith Kumar K2015-02-273-138/+11
| | | | | | | | | | | | | | | These will be used by both afr and ec. Moved syncop_dirfd, syncop_ftw, syncop_dir_scan functions also into syncop-utils.c Change-Id: I467253c74a346e1e292d36a8c1a035775c3aa670 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9740 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Propagate an event only after hearing the same from all subvolumesAnoop C S2015-02-261-0/+3
| | | | | | | | | | | | | | | | | In dht_notify(), we propagate each event without checking whether all subvolumes have reported the same event earlier. As a result separate events are being forwarded for each dht-subvolume. This change is to make sure that we propagate a particular event only if all other subvolumes have already reported the same event once earlier. Change-Id: I6c73fa105e967f29648af9e2030f91a94f2df130 BUG: 1176543 Signed-off-by: Anoop C S <achiraya@redhat.com> Reviewed-on: http://review.gluster.org/9322 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht: fix for dht_lock_count() compile errorNiels de Vos2015-02-262-2/+2
| | | | | | | | | | | | | | | dht-common.h includes a function definition with "inline", but the function is not declared in the header. Dropping the "inline" compile directive so that linking against .o files works correctly. BUG: 1196650 Change-Id: I105be591125b29cd455769b0c4ff22d6e139227d Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9760 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: serialize execution of dht_discover_complete andRaghavendra G2015-02-251-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | STACK_DESTROY (frame). In the current code, dht_discover_complete can be invoked because of: 1. attempt_unwind is true 2. we are processing reply from the last subvolume In scenario 1, following race is possible: T1: calls dht_frame_return. T2: calls dht_frame_return. This happens to be last call and hence it invokes dht_discover_complete, goes ahead and destroys frame T1: since attempt_unwind is true, calls dht_discover_complete. However, since frame is already freed, call to dht_discover_complete can result in a crash. The fix is to make sure that destruction of the frame is done only by the thread executing dht_discover_complete. Change-Id: I45765b90c4a9d0af0b33f8911b564d99e12d099e BUG: 1195120 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/9729 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* cluster/afr : provide split-brain info by using getxattrAnuradha2015-02-253-0/+140
| | | | | | | | | | | | | | | | | | | | | | This patch is one part to enable users analyze and resolve split-brain. Problem : To know if a file is in data/metadata split-brain Solution : Performing "getfattr -n afr.split-brain-status <path-to-file>" from the mount provides this information. Also provides the list of afr children to analyse to get more information. Change-Id: I4d9b429794759a906371416cb84c84a212e2c7b9 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9633 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* ec: Add trusted.ec.dirty xattrXavier Hernandez2015-02-238-121/+297
| | | | | | | | | | | | | | | | | This xattr will be incremented before each data modifying operation and decremented after it. This will add the possibility to detect partially updated writes and refuse them on reads. It will also be useful for interacting with index xlator and have a way to heal dispersed files from the self-heal daemon. Change-Id: Ie644a8dd074ae0f254c809c5863bdb030be5486a BUG: 1190581 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9607 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: synchronize with other concurrent healers while healing layout.Raghavendra G2015-02-205-41/+562
| | | | | | | | | | | | | | | | | | | | | | | | | | Current layout heal code assumes layout setting is idempotent. This allowed multiple concurrent healers to set the layout without any synchronization. However, this is not the case as different healers can come up with different layout for same directory and making layout setting non-idempotent. So, we bring in synchronization among healers to 1. Not to overwrite an ondisk well-formed layout. 2. Refresh the in-memory layout with the ondisk layout if in-memory layout needs healing and ondisk layout is well formed. This patch can synchronize 1. among multiple healers. 2. among multiple fix-layouts (which extends layout to consider added or removed brick) 3. (but) not between healers and fix-layouts. So, the problem of in-memory stale layouts (not matching with layout ondisk), is not _completely_ fixed by this patch. Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ia285f25e8d043bb3175c61468d0d11090acee539 BUG: 1176008 Reviewed-on: http://review.gluster.org/9302 Reviewed-by: N Balachandran <nbalacha@redhat.com>
* Storage/posix : Adding error checks in path formationNithya Balachandran2015-02-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Renaming directories can cause the size of the buffer required for posix_handle_path to increase between the first call, which calculates the size, and the second call which forms the path in the buffer allocated based on the size calculated in the first call. The path created in the second call overflows the allocated buffer and overwrites the stack causing the brick process to crash. The fix adds a buffer size check to prevent the buffer overflow. It also checks and returns an error if the posix_handle_path call is unable to form the path instead of working on the incomplete path, which is likely to cause subsequent calls using the path to fail with ELOOP. Preventing buffer overflow and handling errors BUG: 1113960 Change-Id: If3d3c1952e297ad14f121f05f90a35baf42923aa Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/9289 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* cluster/dht: Fix dht_link to follow files under migrationShyam2015-02-171-19/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if a file is under migration, a hardlink to that file is lost post migration of the file. This is due to the fact that the hard link is created against the cached subvol of the source and as the source is under migration, it shifts to a linkto file post migration. Thus losing the hardlink. This change follows the stat information that triggers a phase1/2 detection for a file under migration, to create the link on the new subvol that the source file is migrating to. Thereby preserving the hard link post migration. NOTES: The test case added create a ~1GB file, so that we can catch the file during migration, smaller files may not capture this state and the test may fail. Even if migration of the file fails, we would only be left with stale linkto files on the subvol that the source was migrating to, which is not a problem. This change would create a double linkto, i.e new target hashed subvol would point to old source cached subol, which would point to the real cached subvol. This double redirection although not handled directly in DHT, works as lookup searches everywhere on hitting linkto files. The downside is that it never heals the new target hashed subvol linkto file, which is another bug to be resolved (does not cause functional impact). Change-Id: I871e6885b15e65e05bfe70a0b0180605493cb534 BUG: 1161311 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/9105 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: susant palai <spalai@redhat.com> Reviewed-by: venkatesh somyajulu <vsomyaju@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Modify dht-rebalance.c to use trusted.distribute.migrate-dataDan Lambright2015-02-133-5/+25
| | | | | | | | | | | | | | | rather than distribute.migrate-data and work with CLI. The change makes this work both when it is internally driven and from the shell. The problem is further described in bugzilla # 1147107. Change-Id: I4fe04cae661dca25432530ddf5ac6ff2c957d6b3 BUG: 1147107 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9284 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* cluster/afr: Enable auto-quorum for replicate with odd number of bricksPranith Kumar K2015-02-091-8/+16
| | | | | | | | | | | Change-Id: I908934f1f22cf7d2d0ceccc0dedf28a69861997f BUG: 1187885 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9517 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* cluster/afr: Re-introduce heal-timeout optionPranith Kumar K2015-02-063-1/+14
| | | | | | | | | | | Change-Id: I87484c810006a92ed7489284b6d74e9b0aecae80 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9598 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: Provide syncop_ftw and syncop_dir_scan utilsPranith Kumar K2015-02-061-257/+117
| | | | | | | | | | | | | | | | | ftw provides file tree walk. dir_scan does just a readdir not readdirp. Also changed Afr's self-heal-daemon's crawling functions to use this. These utils will be used by ec in future to do proactive/full healing. Change-Id: I05715ddb789592c1b79a71e98f1e8cc29aac5c26 BUG: 1177601 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9485 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* ec: Special handling of anonymous fdXavier Hernandez2015-02-051-11/+11
| | | | | | | | | | | | | | | | | | | | Anonymous file descriptors need to be handled specially because they can be used in some non standard ways (i.e. an anonymous fd can be used without having been opened). This caused NFS to fail on some operations because ec always expected to have a previous successful opendir call (from patch http://review.gluster.org/9098/). This patch treats all anonymous fd as opened on all subvolumes. Change-Id: I09dbbce2ffc1ae3a5bcbb328bed55b84f4f0b9f8 BUG: 1187474 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9513 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Fix parent read subvol selection policy in lookupKrutika Dhananjay2015-02-041-1/+1
| | | | | | | | | | | | | | | | | When lookup has succeeded on multiple subvols of AFR (including the read child of the parent dir) and all of them are "readable", ideally the call must be unwound with postparent from the parent's read child. But that is not the case, due to a bug introduced in the commit c78998c39f0857ea7aacba360632c148afc54a55. This patch fixes the issue. Change-Id: I83b0c26494a5d0bdbc30fcbe974fbdb6f7e9c84a BUG: 1179169 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9569 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/ec: Wait for all bricks to notify before notifying parentPranith Kumar K2015-02-021-14/+34
| | | | | | | | | | | | | This is to prevent spurious heals that can result in self-heal. Change-Id: I0b27c1c1fc7a58e2683cb1ca135117a85efcc6c9 BUG: 1179180 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9523 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* Compare key with GF_XATTR_LOCKINFO_KEY with length of GF_XATTR_LOCKINFO_KEY ↵Dennis Schafroth2015-02-021-1/+1
| | | | | | | | | | | | | | | | instead of length of 0 BUG: 1187952 Change-Id: I0a97c553e85a0f9260ab01d4b48c64831bf67c18 Signed-off-by: Dennis Schafroth <dennis@schafroth.com> Reviewed-on: http://review.gluster.org/9518 Reviewed-by: Joe Julian <joe@julianfamily.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: When parent and entry read subvols are different, set ↵Krutika Dhananjay2015-02-022-23/+94
| | | | | | | | | | | | | | | entry->inode to NULL That way a lookup would be forced on the entry, and its attributes will always be selected from its read subvol. Change-Id: Iaba25e2cd5f83e983fc8b1a1f48da3850808e6b8 BUG: 1179169 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9477 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr : Change in heal info split-brain commandAnuradha2015-01-302-5/+12
| | | | | | | | | | | | | Implementation of heal info split-brain command with glfs-heal. Change-Id: I233eb790de6eb5468a4cbb12a1cef0f97db2a1d2 BUG: 1183019 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9459 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: Don't write to sparse regions of sink.Ravishankar N2015-01-301-2/+39
| | | | | | | | | | | | | | | | | | | | Problem: When data-self-heal-algorithm is set to 'full', shd just reads from source and writes to sink. If source file happened to be sparse (VM workloads), we end up actually writing 0s to the corresponding regions of the sink causing it to lose its sparseness. Fix: If the source file is sparse, and the data read from source and sink are both zeros for that range, skip writing that range to the sink. Change-Id: I787b06a553803247f43a40c00139cb483a22f9ca BUG: 1166020 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/9480 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* doc: Fix typos in stripe xlator src, no code changes presentHumble Chirammal2015-01-281-5/+5
| | | | | | | | | Change-Id: Icec4e6b0fe7845667b59e13e261734a5f69eba7e BUG: 1075417 Signed-off-by: Humble Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/8264 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/ec: Handle CHILD UP/DOWN in all casesPranith Kumar K2015-01-282-104/+134
| | | | | | | | | | | | | | | | | | | | | Problem: When all the bricks are down at the time of mounting the volume, then mount command hangs. Fix: 1. Ignore all CHILD_CONNECTING events comming from subvolumes. 2. On timer expiration (without enough up or down childs) send CHILD_DOWN. 3. Once enough up or down subvolumes are detected, send the appropriate event. When rest of the subvols go up/down without changing the overall ec-up/ec-down send CHILD_MODIFIED to parent subvols. Change-Id: Ie0194dbadef2dce36ab5eb7beece84a6bf3c631c BUG: 1179180 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/9396 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* ec: Fix posix compliance failuresXavier Hernandez2015-01-286-87/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch solves some problems that caused dispersed volumes to not pass posix smoke tests: * Problems in open/create with O_WRONLY Opening files with -w- permissions using O_WRONLY returned an EACCES error because internally O_WRONLY was replaced with O_RDWR. * Problems with entrylk on renames. When source and destination were the same, ec tried to acquire the same entrylk twice, causing a deadlock. * Overwrite of a variable when reordering locks. On a rename, if the second lock needed to be placed at the beggining of the list, the 'lock' variable was overwritten and later its timer was cancelled, cancelling the incorrect one. * Handle O_TRUNC in open. When O_TRUNC was received in an open call, it was blindly propagated to child subvolumes. This caused a discrepancy between real file size and the size stored into trusted.ec.size xattr. This has been solved by removing O_TRUNC from open and later calling ftruncate. Change-Id: I20c3d6e1c11be314be86879be54b728e01013798 BUG: 1161886 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-on: http://review.gluster.org/9420 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>