summaryrefslogtreecommitdiffstats
path: root/libglusterfs
Commit message (Collapse)AuthorAgeFilesLines
...
* libglusterfs: Implementation of sync lock as recursive lock to avoid crash.anand2015-04-302-54/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem : In glusterd,we are using big lock which is implemented based on sync task frame work for thread synchronization and rcu lock for data consistency. sync task frame work swap the threads if there is no worker poll threads available,due to this rcu lock and rcu unlock was happening in different threads (urcu-bp will not allow this),resulting into glusterd crash. fix : To avoid releasing the sync lock(big lock) in between rcu critical section,implemented sync lock as recursive lock. More details: link : http://www.spinics.net/lists/gluster-devel/msg14632.html Change-Id: I2b56c1caf3f0470f219b1adcaf62cce29cdc6b88 BUG: 1216942 Signed-off-by: anand <anekkunt@redhat.com> Reviewed-on: http://review.gluster.org/10285 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System Reviewed-by: Vijay Bellur <vbellur@redhat.com> (cherry picked from commit ada6b3a8800867934af57a57d5312f5a5d8374f0) Reviewed-on: http://review.gluster.org/10432 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* timer: Fix use after free issuePranith Kumar K2015-04-261-1/+5
| | | | | | | | | | BUG: 1215173 Change-Id: Ic0fcfa009d1e4828ffd245154cd3e042a1b2bc2e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10368 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: avoid crashes in gf_msg dup-detection codeJeff Darcy2015-04-221-0/+13
| | | | | | | | | | | | | | | | | Use global_xlator for allocations so that we don't try to free objects belonging to an already-deleted translator (which will crash). Change-Id: Ie72a546e7770cf5cb8a8370e22448c8d09e3ab37 BUG: 1214220 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/10319 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/10330 Tested-by: NetBSD Build System
* quota/disperse: handle inode quotas in xattr aggregatevmallika2015-04-142-7/+25
| | | | | | | | | | | | | | | | | | | | with the inode quota feature, quota size is now increased from 64bit to 192bits which contains values of 'file size', 'file count' and 'dir count' This change in quota size xattr needs to be handled in disperse xattr aggregation Signed-off-by: vmallika <vmallika@redhat.com> Change-Id: I5fd28aa9f5b8b6cba83a98360236417a97ac16ee BUG: 1207967 Reviewed-on: http://review.gluster.org/10112 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: Replace transaction peers listsKaushal M2015-04-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transaction peer lists were used in GlusterD to peers belonging to a transaction. This was needed to prevent newly added peers performing partial transactions, which could be incorrect. This was accomplished by creating a seperate transaction peers list at the beginning of every transaction. A transaction peers list referenced the peerinfo data structures of the peers which were present at the beginning of the transaction. RCU protection of peerinfos referenced by the transaction peers list is a hard problem and difficult to do correctly. To have proper RCU protection of peerinfos, the transaction peers lists have been replaced by an alternative method to identify peers that belong to a transaction. The alternative method is to the global peers list along with generation numbers to identify peers that should belong to a transaction. This change introduces a global peer list generation number, and a generation number for each peerinfo object. Whenever a peerinfo object is created, the global generation number is bumped, and the peerinfos generation number is set to the bumped global generation. With the above changes, the algorithm to identify peers belonging to a transaction with RCU protection is as follows, - At the beginning of a transaction, the current global generation number is saved - To identify if a peers belonging to the transaction, - Start a RCU read critical section - For each peer in the global peers list, - If the peers generation number is not greater than the saved generation number, continue with the action on the peer - End the RCU read critical section The above algorithm guarantees that, - The peer list is not modified when a transaction is iterating through it - The transaction actions are only done on peers that were present when the transaction started But, as a transaction could iterate over the peers list multiple times, the algorithm cannot guarantee that same set of peers will be selected every time. A peer could get deleted between two iterations of the list within a transaction. This problem existed with transaction peers list as well, but unlike before now it will not lead to invalid memory access and potential crashes. This problem will be addressed seprately. This change was developed on the git branch at [1]. This commit is a combination of the following commits on the development branch. 52ded5b Add timespec_cmp 44aedd8 Add create timestamp to peerinfo 7bcbea5 Fix some silly mistakes 13e3241 Add start time to opinfo 17a6727 Use timestamp comparisions to identify xaction peers instead of a xaction peer list 3be05b6 Correct check for peerinfo age 70d5b58 Use read-critical sections for peer list iteration ba4dbca Use peerinfo timestamp checks in op-sm instead of xaction peer list d63f811 Add more peer status checks when iterating peers list in glusterd-syncop 1998a2a Timestamp based peer list traversal of mgmtv3 xactions f3c1a42 Remove transaction peer lists b8b08ee Remove unused labels 32e5f5b Remove 'npeers' usage a075fb7 Remove 'npeers' from mgmt-v3 framework 12c9df2 Use generation number instead of timestamps. 9723021 Remove timespec_cmp 80ae2c6 Remove timespec.h include a9479b0 Address review comments on 10147/4 [1]: https://github.com/kshlm/glusterfs/tree/urcu Change-Id: I9be1033525c0a89276f5b5d83dc2eb061918b97f BUG: 1205186 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/10147 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* build: make contrib/uuid dependency optionalNiels de Vos2015-04-108-37/+110
| | | | | | | | | | | | | | | | | | | On Linux systems we should use the libuuid from the distribution and not bundle and statically link the contrib/uuid/ bits. libglusterfs/src/compat-uuid.h has been introduced and should become an abstraction layer for different UUID APIs. Non-Linux operating systems should implement their compatibility layer there. Once all operating systems have an implementation in compat-uuid.h, we can remove contrib/uuid/ from the repository completely. Change-Id: I345e5357644be2521685e00358bb8c83c4ea0577 BUG: 1206587 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10129 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* quota/cli: validate quota hard-limit optionvmallika2015-04-102-16/+53
| | | | | | | | | | | | | | | | | | | Quota hard-limit is supported only upto: 9223372036854775807 (int 64) In CLI, it is allowed to set the value upto 16384PB (unsigned int 64), this is not a valid value as the xattrop for quota accounting and the quota enforcer operates on a signed int64 limit value. This patches fixes the problem in CLI and allows user to set the hard-limit value only from range 0 - 9223372036854775807 Change-Id: Ifce6e509e1832ef21d3278bacfa5bd71040c8cba BUG: 1206432 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/10022 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: avoid possibility of crash when ctx is nullHumble Devassy Chirammal2015-04-081-0/+3
| | | | | | | | | | | | | | ctx is passed to gf_log_inject_timer_event() and pass through to __gf_log_inject_timer_event() where the struct members are getting dereferenced, and can cause crash if the passed ctx is null. This patch avoids the issue. Change-Id: I153dbb5d3744898429139e3d40bb4f0e9093632a BUG: 1208118 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/10102 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* libglusterfs/syncop: Add xdata to all syncop callsRaghavendra Talur2015-04-083-169/+509
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for xdata in both the request and response path of syncops. Few calls like lookup already had the support; have renamed variables in few places to maintain uniformity. xdata passed downwards is known as xdata_in and xdata passed upwards is known as xdata_out. There is an old patch by Jeff Darcy at http://review.gluster.org/#/c/8769/3 which does the same for some selected calls. It also brings in xdata support at gfapi level. xdata support at gfapi level would be introduced in subsequent patches. Change-Id: I340e94ebaf2a38e160e65bc30732e8fe1c532dcc BUG: 1158621 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9859 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs : introducing logging framework for nfs translatorManikandan Selvaganesh2015-04-081-0/+4
| | | | | | | | | | Change-Id: I3a47cdd06595c87da8e822d11683d68b43c11cda BUG: 1194640 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9945 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* ctr : Fix for heating of files during promotion/demotionJoseph Fernandes2015-04-084-19/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix will solve the heating of the files during the promotion or demotion. Promotion: ~~~~~~~~~ When a file gets promoted it get the current time stamp during creation only, but following writes or reads during the migration wont heat the file. Demotion: ~~~~~~~~ When a file gets demoted it get the wind/unwind time stamp is set to zero. The following writes or reads during the migration wont heat the file. What is remaining ? ~~~~~~~~~~~~~~~~~ Bug 1209129 ( https://bugzilla.redhat.com/show_bug.cgi?id=1209129 ) Inspite of this fix there is still a issue remaining, i.e the heat of the file is not keep intact during a internal rebalance activity i.e a rebalance within a tier. Change-Id: I01e82dc226355599732d40e699062cee7960b0a5 BUG: 1207867 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10080 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* bitrot/scrub: Scrubber fixesVenky Shankar2015-04-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a handful of problem with scrubber which are detailed below. Scrubber used to skip objects for verification due to missing fd iterface to fetch versioning extended attributes. Similar to the inode interface, an fd based interface in POSIX is now introduced. Moreover, this patch also fixes potential false reporting by scrubber due to: An object gets dirtied and signed when scrubber is busy calculatingobject checksum. This is fixed by caching the signed version when an object is first inspected for stalenes, i.e., during pre-compute stage. This version is used to verify checksum in the post-compute stage when the signatures are compared for possible corruption. Side effect of _not_ sending signature length during signing resulted in "truncated" signature to be set for an object. Now, at the time of signing, the signature length is sent and is used in place of invoking strlen() to get signature length (which could have possible 00s). The signature length itself is not persisted in the signature xattr, but is calculated on-the-fly by substracting the xattr length by the "structure" header size. Some of the log entries are made more meaningful (as and aid for debugging). Change-Id: I938bee5aea6688d5d99eb2640053613af86d6269 BUG: 1207624 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/10118 Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: fix tier.c problems found prior to feature freezeDan Lambright2015-04-062-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch resolves tiering translator issues taken from the list in bug 1203776. These issues have been selected to be fixed first. The rest will be fixed in a subsequent patch (or are not a problem). 3. Replace hardcoded #defines of promote/demote file names 6. Use loc_wipe() in migrate_using_query_file() 9. Only promote/demote files on the same node on which they reside. 14. Replace calloc with GF_CALLOC in tier.c and ensure freeing done properly. 15. Handle if parse_query_str fails 22. Only load gfdb library on server side, remove SQL references from client. Change-Id: I6563b11e58ab2e4c6b1ce44db755781ad6d930fb BUG: 1203776 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9987 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* Avoid conflict between contrib/uuid and system uuidEmmanuel Dreyfus2015-04-047-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glusterfs relies on Linux uuid implementation, which API is incompatible with most other systems's uuid. As a result, libglusterfs has to embed contrib/uuid, which is the Linux implementation, on non Linux systems. This implementation is incompatible with systtem's built in, but the symbols have the same names. Usually this is not a problem because when we link with -lglusterfs, libc's symbols are trumped. However there is a problem when a program not linked with -lglusterfs will dlopen() glusterfs component. In such a case, libc's uuid implementation is already loaded in the calling program, and it will be used instead of libglusterfs's implementation, causing crashes. A possible workaround is to use pre-load libglusterfs in the calling program (using LD_PRELOAD on NetBSD for instance), but such a mechanism is not portable, nor is it flexible. A much better approach is to rename libglusterfs's uuid_* functions to gf_uuid_* to avoid any possible conflict. This is what this change attempts. BUG: 1206587 Change-Id: I9ccd3e13afed1c7fc18508e92c7beb0f5d49f31a Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/10017 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* iobuf: Do not call __iobuf_ref directlyRaghavendra Talur2015-04-021-2/+2
| | | | | | | | | | | | | | | | | | iobuf_get will be creating the iobuf and hence lock is not necessary to increment ref. However, it is a good practice to call iobuf_ref instead of __iobuf_ref so that we have a single point to get refs and this can be used later to do mem accounting etc. Change-Id: I1fd328c3c463c23fd5f6df505ccb5c86f6207f28 BUG: 1199075 Signed-off-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-on: http://review.gluster.org/9812 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libgfdb: fix possible illegal memory access (CID 1288820)Michael Adam2015-04-021-1/+2
| | | | | | | | | | | | | | | | | | Coverity CID 1288820 strncpy executed with a limit equal to the target array size potentially leaves the target string not null terminated. Make sure the copied string is a valid 0 terminated string. Change-Id: I39ff6a64ca5b9e30562226dd34c5b06267b75b87 BUG: 789278 Signed-off-by: Michael Adam <obnox@samba.org> Reviewed-on: http://review.gluster.org/10063 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Joseph Fernandes <josferna@redhat.com> Tested-by: Joseph Fernandes <josferna@redhat.com>
* function gf_string2bytesize_range should handle 'xB' byte valuesvmallika2015-03-312-1/+2
| | | | | | | | | | | Change-Id: I208289aae2423e4bb015cf33bafd2a961e1c3fc6 BUG: 1197593 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9779 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Sachin Pandit <spandit@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs : dead code fix.Manikandan Selvaganesh2015-03-311-1/+0
| | | | | | | | | | | | | CID : 1124884 Change-Id: I3332e844a01c1432f1d80a6acda7a87e8b01801c BUG: 789278 Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-on: http://review.gluster.org/9677 Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/ec: Implement heal info for ecPranith Kumar K2015-03-301-1/+1
| | | | | | | | | | | | This also lists the files that are on-going I/O, which will be fixed later. Change-Id: Ib3f60a8b7e8798d068658cf38eaef2a904f9e327 BUG: 1203581 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/10020 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* Gfdb Query Fix and Volume option fixJoseph Fernandes2015-03-301-2/+2
| | | | | | | | | | | | | | 1) Query fix in find_changed_with_freq() 2) Volume option typo fix for write_freq_threshold and read_freq_threshold Change-Id: I38e154818178aab412b2d7b2914cd29acef66ffb BUG: 1207343 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/10050 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* core : free up mem_acct.rec in xlator_destroyAtin Mukherjee2015-03-301-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: We've observed that glusterd was OOM killed after some minutes when volume set command was run in a loop. Analysis: Initially the suspection was in glusterd code, but a deep dive into the codebase revealed that while validating all the options as part of graph reconfiguration at the time of freeing up the xlator object its one of the member mem_acct is left over which causes memory leak. Solution: Free up xlator's mem_acct.rec in xlator_destroy () Change-Id: Ie9e7267e1ac4ab7b8af6e4d7c6660dfe99b4d641 BUG: 1201203 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/9862 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* mem-pool: invalidate memory on GF_FREE to aid debuggingNiels de Vos2015-03-302-54/+103
| | | | | | | | | | | | | | | | | | | | | | | | Debugging where memory gets free'd with help from overwriting the memory before it is free'd with some structures (repeatedly). The struct mem_invalid starts with a magic value (0xdeadc0de), followed by a pointer to the xlator, the mem-type. the size of the GF_?ALLOC() requested area and the baseaddr pointer to what GF_?ALLOC() returned. With these details, and the 'struct mem_header' that is placed when calling GF_?ALLOC(), it is possible to identify overruns and possible use-after-free. A memory dump (core) or running with a debugger is needed to read the surrounding memory of corrupt structures. This additional memory invalidation/poisoning needs to be enabled by passing --enable-debug to ./configure. Change-Id: I9f5f37dc4b5b59142adefc90897d32e89be67b82 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/10019 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* libgfapi: revamp glfs_new functionHumble Devassy Chirammal2015-03-301-5/+7
| | | | | | | | | | | | | | | | | Current glfs_new() function is not flexible enough to error out and destroy the struct members or objects initialized just before the error path/condition. This make the structs or objects to continue or left out with partially recorded data in fs and ctx structs and cause crashes/issues later in the code path. This patch avoid the issue. Change-Id: Ie4514b82b24723a46681cc7832a08870afc0cb28 BUG: 1202492 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9903 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* glusterd: Maintain local xaction_peer list for op-smAtin Mukherjee2015-03-261-0/+1
| | | | | | | | | | | | | | | http://review.gluster.org/9269 addresses maintaining local xaction_peers in syncop and mgmt_v3 framework. This patch is to maintain local xaction_peers list for op-sm framework as well. Change-Id: Idd8484463fed196b3b18c2df7f550a3302c6e138 BUG: 1204727 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/9972 Reviewed-by: Anand Nekkunti <anekkunt@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* features/bit-rot: filesystem scrubberVenky Shankar2015-03-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scrubber performs signature verification for objects that were signed by signer. This is done by recalculating the signature (using the hash algorithm the object was signed with) and verifying it aginst the objects persisted signature. Since the object could be undergoing IO opretaion at the time of hash calculation, the signature may not match objects persisted signature. Bitrot stub provides additional information about the stalesness of an objects signature (determinted by it's versioning mechanism). This additional bit of information is used by scrubber to determine the staleness of the signature, and in such cases the object is skipped verification (although signature staleness is performed twice: once before initiation of hash calculation and another after it (an object could be modified after staleness checks). The implmentation is a part of the bitrot xlator (signer) which acts as a signer or scrubber based on a translator option. As of now the scrub process is ever running (but has some form of weak throttling mechanism during filesystem scan). Going forward, there needs to be some form of scrub scheduling and IO throttling (during hash calculation) tunables (via CLI). Change-Id: I665ce90208f6074b98c5a1dd841ce776627cc6f9 BUG: 1170075 Original-Author: Raghavendra Bhat <rabhat@redhat.com> Original-Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9914 Tested-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/bit-rot: Implementation of bit-rot xlatorVenky Shankar2015-03-2411-0/+276
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the "Signer" -- responsible for signing files with their checksums upon last file descriptor close (last release()). The event notification facility provided by the changelog xlator is made use of. Moreover, checksums are as of now SHA256 hash of the object data and is the only available hash at this point of time. Therefore, there is no special "what hash to use" type check, although it's does not take much to add various hashing algorithms to sign objects with. Signatures are stored in extended attributes of the objects along with the the type of hashing used to calculate the signature. This makes thing future proof when other hash types are added. The signature infrastructure is provided by bitrot stub: a little piece of code that sits over the POSIX xlator providing interfaces to "get or set" objects signature and it's staleness. Since objects are signed upon receiving release() notification, pre-existing data which are "never" modified would never be signed. To counter this, an initial crawler thread is spawned The crawler scans the entire brick for objects that are unsigned or "missed" signing due to the server going offline (node reboots, crashes, etc..) and triggers an explicit sign. This would also sign objects when bit-rot is enabled for a volume and/or after upgrade. Change-Id: I1d9a98bee6cad1c39c35c53c8fb0fc4bad2bf67b BUG: 1170075 Original-Author: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9711 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: check and act based on gf_malloc result.Humble Devassy Chirammal2015-03-241-0/+18
| | | | | | | | | | Change-Id: If54f4be7db8b6f98e65570b09c07251e21ebae15 BUG: 1194640 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9837 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* Bitrot StubVenky Shankar2015-03-242-0/+18
| | | | | | | | | | | | | Bitrot stub implements object versioning required for identifying signature freshness. More details about versioning is explained as a part of the "bitrot feature documentation" patch. Change-Id: I2ad70d9eb109ba4a12148ab8d81336afda529ad9 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9709 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: Add inode context merge callbackVenky Shankar2015-03-243-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Certain translators may require to update the inode context of an already linked inode before unwinding the call to the client. Normally, such a case in encountered during parallel operations when a fresh inode is chosen at call (wind) time. In the callback path, one of inodes is successfully linked in the inode table, thereby the other inodes being thrown away (and the inode pointers for these calls being pointed to the linked inode). Translators which may have strict dependency on the correct value in the inode context would get stale values in inode context. This patch introduces a new callback which provides gives translators an opportunity to "patch" their respective inode contexts. Note that, as of now, this callback is only invoked during create()s unwind path. Although this might needed to be done for all dentry fops and lookup, but let that be done as an when required (bitrot stub requires this *only* for create()). Change-Id: I6cd91c2af473c44d1511208060d3978e580c67a6 BUG: 1170075 Original-Author: Raghavendra Bhat <rabhat@redhat.com> Original-Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9913 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: Make use of IPC fopVenky Shankar2015-03-241-0/+5
| | | | | | | | | | | | | Translators which wish to send event notifications can send "down" an IPC FOP with op_type as GF_IPC_TARGET_CHANGELOG and xdata carrying event structures (changelog_event_t). Change-Id: I0e5f8c9170161c186f0e58d07105813e34e18786 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Add tier translator.Dan Lambright2015-03-214-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tier translator shares most of DHT's code. It differs in how subvolumes are chosen for I/Os, and how file migration (cache promotion and demotion) is managed. That different functionality is split to either DHT or tier logic according to the "tier_methods" structure. A cache promotion and demotion thread is created in a manner similar to the rebalance daemon. The thread operates a timing wheel which periodically checks for promotion and demotion candidates (files). Candidates are queued and then migrated. Candidates must exist on the same node as the daemon and meet other critera per caching policies. This patch has two authors (Dan Lambright and Joseph Fernandes). Dan did the DHT changes and Joe wrote the cache policies. The fix depends on DHT readidr changes and the database library which have been submitted separately. Header files in libglusterfs/src/gfdb should be reviewed in patch 9683. For more background and design see the feature page [1]. [1] http://www.gluster.org/community/documentation/index.php/Features/data-classification Change-Id: Icc26c517ccecf5c42aef039f5b9c6f7afe83e46c BUG: 1194753 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9724 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* build: pass the correct CFLAGS and LDFLAGS when linking (to) libgfdbNiels de Vos2015-03-191-2/+2
| | | | | | | | | Change-Id: Id9a7d0f457d9759ab7d0a52a4000b5ae36d211f8 BUG: 1194753 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9946 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/afr : enable inspection & resolution of files in split-brainAnuradha2015-03-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part 2/2 patch to enable users analyze and resolve split-brain. This patch enables : 1) Users to inspect the files in data and metadata split-brain. 2) Resolve the split-brain. Both using a series of setfattr commands. Consider a volume "test" with 2 bricks. 1) To inspect a file f1: setfattr -n replica.split-brain-choice -v test-client-0 f1 After the execution of this command, if no read_subvol is found, reads will be served from test-client-0 (corresponding to brick-0). 2) To resolve split-brain : setfattr -n replica.split-brain-heal-finalize -v test-client-0 f1 Execution of this command will lead to the resolution of data and metadata split-brain with subvol mentioned in the command (test-client-0 here) as the source and the rest as sink. Change-Id: Ia20f3ee5abd3119e3d54fcc599f1e55ac65fd179 BUG: 1191396 Signed-off-by: Anuradha <atalur@redhat.com> Reviewed-on: http://review.gluster.org/9743 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* Adding ChangeTimeRecorder(CTR) Xlator to GlusterFSJoseph Fernandes2015-03-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ********************************************************************** ChangeTimeRecorder(CTR) Xlator | ********************************************************************** ChangeTimeRecorder(CTR) is server side xlator(translator) which sits just above posix xlator. The main role of this xlator is to record the access/write patterns on a file residing the brick. It records the read(only data) and write(data and metadata) times and also count on how many times a file is read or written. This xlator also captures the hard links to a file(as its required by data tiering to move files). CTR Xlator is the consumer of libgfdb. To Enable/Disable CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.ctr-enabled {on/off} To Enable/Disable Frequency Counter Recording in CTR Xlator: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gluster volume set <volume-name> features.record-counters {on/off} Change-Id: I5d3cf056af61ac8e3f8250321a27cb240a214ac2 BUG: 1194753 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/9935 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterfsd: add "print-netgroups" and "print-exports" commandNiels de Vos2015-03-182-0/+5
| | | | | | | | | | | | | | | | | | | | | | | NFS now has the ability to use a separate file for "netgroups" and "exports". An administrator should have the ability to check the validity of the files before applying the configuration. The "glusterfsd" command now has the following additional arguments that can be used to check the configuration: --print-netgroups: Validate the netgroups file and print it out --print-exports: Validate the exports file and print it out BUG: 1143880 Change-Id: I24c40d50110d49d8290f9fd916742f7e4d0df85f URL: http://www.gluster.org/community/documentation/index.php/Features/Exports_Netgroups_Authentication Original-author: Shreyas Siravara <shreyas.siravara@gmail.com> CC: Richard Wareing <rwareing@fb.com> CC: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9365 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* contrib/timer-wheel: import linux kernel timer-wheelVenky Shankar2015-03-182-2/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch imports timer-wheel[1] algorithm from the linux kernel (~/kernel/time/timer.c) with some modifications. Timer-wheel is an efficent way to track millions of timers for expiry. This is a variant of the simple but RAM heavy approach of having a list (timer bucket) for every future second. Timer-wheel categorizes every future second into a logarithmic array of arrays. This is done by splitting the 32 bit "timeout" value into fixed "sliced" bits, thereby each category has a fixed size array to which buckets are assigned. A classic split would be 8+6+6+6 (used in this patch) which results in 256+64+64+64 == 512 buckets. Therefore, the entire 32 bit futuristic timeouts have been mapped into 512 buckets. [ NOTE: There are other possible splits, such as "8+8+8+8", but this patch sticks to the widely used and tested default. ] Therfore, the first category "holds" timers whose expiry range is between 1..256, the next cateogry holds 257..16384, third category 16385..1048576 and so on. When timers are added, unless it's in the first category, timers with different timeouts could end up in the same bucket. This means that the timers are "partially sorted" -- sorted in their highest bits. The expiry code walks the first array of buckets and exprires any pending timers (1..256). Next, at time value 257, timers in the first bucket of the second array is "cascaded" onto the first category and timers are placed into respective buckets according to the thier timeout values. Cascading "brings down" the timers timeout to the coorect bucket of their respective category. Therefore, timers are sorted by their highest bits of the timeout value and then by the lower bits too. [1] https://lwn.net/Articles/152436/ Change-Id: I1219abf69290961ae9a3d483e11c107c5f49c4e3 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9707 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs/rot-buffs: rotational buffersVenky Shankar2015-03-185-14/+650
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces rotational buffers aiming at the classic multiple producer and multiple consumer problem. A fixed set of buffer list is allocated during initialization, where each list consist of a list of buffers. Each buffer is an iovec pointing to a memory region of fixed allocation size. Multiple producers write data to these buffers. A buffer list starts with a single buffer (iovec) and allocates more when required (although this can be preallocatd in multiples of k). rot-buffs allow multiple producers to write data parallely with a bit of extra cost of taking locks. Therefore, it's much suited for large writes. Multiple producers are allowed to write in the buffer parallely by "reserving" write space for selected number of bytes and returning pointer to the start of the reserved area. The write size is selected by the producer before it starts the write (which is often known). Therefore, the write itself need not be serialized -- just the space reservation needs to be done safely. The other part is when a consumer kicks in to consume what has been produced. At this point, a buffer list switch is performed. The "current" buffer list pointer is safely pointed to the next available buffer list. New writes are now directed to the just switched buffer list (the old buffer list is now considered out of rotation). Note that the old buffer still may have producers in progress (pending writes), so the consumer has to wait till the writers are drained. Currently this is the slow path for producers (write completion) and needs to be improved. Currently, there is special handling for cases where the number of consumers match (or exceed) the number of producers, which could result in writer starvation. In this scenario, when a consumers requests a buffer list for consumption, a check is performed for writer starvation and consumption is denied until at least another buffer list is ready of the producer for writes, i.e., one (or more) consumer(s) completed, thereby putting the buffer list back in rotation. [ NOTE: I've not performance tested this producer-consumer model yet. It's being used in changelog for event notification. The list of buffers (iovecs) are directly passed to RPC layer. ] Change-Id: I88d235522b05ab82509aba861374a2312bff57f2 BUG: 1170075 Signed-off-by: Venky Shankar <vshankar@redhat.com> Reviewed-on: http://review.gluster.org/9706 Tested-by: Vijay Bellur <vbellur@redhat.com>
* features/quota : Introducing inode quotavmallika2015-03-185-2/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ========================================================================== Inode quota ========================================================================== = Currently, the only way to retrieve the number of files/objects in a = = directory or volume is to do a crawl of the entire directory/volume. = = This is expensive and is not scalable. = = = = The proposed mechanism will provide an easier alternative to determine = = the count of files/objects in a directory or volume. = = = = The new mechanism proposes to store count of objects/files as part of = = an extended attribute of a directory. Each directory's extended = = attribute value will indicate the number of files/objects present = = in a tree with the directory being considered as the root of the tree. = = = = The count value can be accessed by performing a getxattr(). = = Cluster translators like afr, dht and stripe will perform aggregation = = of count values from various bricks when getxattr() happens on the key = = associated with file/object count. = A new interface is introduced: ------------------------------ limit-objects : limit the number of inodes at directory level list-objects : list the directories where the limit is set remove-objects : remove the limit from the directory ========================================================================== CLI COMMAND: gluster volume quota <volname> limit-objects <path> <number> [<percent>] * <number> is a hard-limit for number of objects limitation for path "<path>" If hard-limit is exceeded, creation of file/directory is no longer permitted. * <percent> is a soft-limit for number of objects creation for path "<path>" If soft-limit is exceeded, a warning is issued for each creation. CLI COMMAND: gluster volume quota <volname> remove-objects [path] ========================================================================== CLI COMMAND: gluster volume quota <volname> list-objects [path] ... Sample output: ------------------ Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------ -------------------------------------- /dir 10 80% 10 0 Yes Yes ========================================================================== [root@snapshot-28 dir]# ls a b file11 file12 file13 file14 file15 file16 file17 [root@snapshot-28 dir]# touch a1 touch: cannot touch `a1': Disk quota exceeded * Nine files are created in directory "dir" and directory is included in * the count too. Hence the limit "10" is reached and further file creation fails ========================================================================== Note: We have also done some re-factoring in cli for volume name validation. New function cli_validate_volname is created ========================================================================== Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b BUG: 1190108 Signed-off-by: Sachin Pandit <spandit@redhat.com> Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9769 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Adding Libgfdb to GlusterFSJoseph Fernandes2015-03-1811-1/+4259
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ************************************************************************* Libgfdb | ************************************************************************* Libgfdb provides abstract mechanism to record extra/rich metadata required for data maintenance, such as data tiering/classification. It provides consumer with API for recording and querying, keeping the consumer abstracted from the data store used beneath for storing data. It works in a plug-and-play model, where data stores can be plugged-in. Presently we have plugin for Sqlite3. In the future will provide recording and querying performance optimizer. In the current implementation the schema of metadata is fixed. Schema: ~~~~~~ GF_FILE_TB Table: ~~~~~~~~~~~~~~~~~ This table has one entry per file inode. It holds the metadata required to make decisions in data maintenance. GF_ID (Primary key) : File GFID (Universal Unique IDentifier in the namespace) W_SEC, W_MSEC : Write wind time in sec & micro-sec UW_SEC, UW_MSEC : Write un-wind time in sec & micro-sec W_READ_SEC, W_READ_MSEC : Read wind time in sec & micro-sec UW_READ_SEC, UW_READ_MSEC : Read un-wind time in sec & micro-sec WRITE_FREQ_CNTR INTEGER : Write Frequency Counter READ_FREQ_CNTR INTEGER : Read Frequency Counter GF_FLINK_TABLE: ~~~~~~~~~~~~~~ This table has all the hardlinks to a file inode. GF_ID : File GFID (Composite Primary Key)``| GF_PID : Parent Directory GFID (Composite Primary Key) |-> Primary Key FNAME : File Base Name (Composite Primary Key)__| FPATH : File Full Path (Its redundant for now, this will go) W_DEL_FLAG : This Flag is used for crash consistancy, when a link is unlinked. i.e Set to 1 during unlink wind and during unwind this record is deleted LINK_UPDATE : This Flag is used when a link is changed i.e rename. Set to 1 when rename wind and set to 0 in rename unwind Libgfdb API: ~~~~~~~~~~~ Refer libglusterfs/src/gfdb/gfdb_data_store.h Change-Id: I2e9fbab3878ce630a7f41221ef61017dc43db11f BUG: 1194753 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/9683 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/dht: Change the subvolume encoding in d_off to be a "global"Dan Lambright2015-03-185-1/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* libgfapi, timer: Fix a crash seen in timer when glfs_fini was invoked.Poornima G2015-03-171-5/+11
| | | | | | | | | | | | | | | | | | | | | | The crash is seen when, glfs_init failed for some reason and glfs_fini was called for cleaning up the partial initialization. The fix is in two folds: 1. In timer store and restore the THIS, previously it was being overwritten. 2. In glfs_free_from_ctx() and glfs_fini() check for NULL before destroying. Change-Id: If40bf69936b873a1da8e348c9d92c66f2f07994b BUG: 1202290 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/9895 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Quota/marker : Support for inode quotavmallika2015-03-175-1/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the only way to retrieve the number of files/objects in a directory or volume is to do a crawl of the entire directory/volume. This is expensive and is not scalable. The new mechanism proposes to store count of objects/files as part of an extended attribute of a directory. Each directory's extended attribute value will indicate the number of files/objects present in a tree with the directory being considered as the root of the tree. Currently file usage is accounted in marker by doing multiple FOPs like setting and getting xattrs. Doing this with STACK WIND and UNWIND can be harder to debug as involves multiple callbacks. In this code we are replacing current mechanism with syncop approach as syncop code is much simpler to follow and help us implement inode quota in an organized way. Change-Id: Ibf366fbe07037284e89a241ddaff7750fc8771b4 BUG: 1188636 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Sachin Pandit <spandit@redhat.com> Reviewed-on: http://review.gluster.org/9567 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Vijay Bellur <vbellur@redhat.com>
* every/where: add GF_FOP_IPC for inter-translator communicationJeff Darcy2015-03-1711-4/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several features - e.g. encryption, erasure codes, or NSR - involve multiple cooperating translators which sometimes need a "private" means of communication amongst themselves. Historically we've used virtual or synthetic xattrs, but that's not very elegant and clutters up the getxattr/setxattr path which must also handle real xattr requests. This new fop should address that. The only argument is an int32_t "op" which should be recognized by the target translator. It is recommended that translators using these feature follow some convention regarding the ops that they define, to avoid conflicts. Using a hash of the target translator's type string as a base for a series of ops would probably be a good start. Any other information can be passed in both directions using xdata. The default behavior for this fop, as with any other, is to pass through to FIRST_CHILD. That makes use of this fop "transparent" to other translators that were written before it existed, but it also means that it only really works with pass-through translators. If a routing translator (such as DHT) or a fan-out translator (such as AFR) is involved, the IPC might not reach its intended destination unless those translators are modified to forward IPC fops along all paths. If an IPC gets all the way to storage/posix it is considered an error, much like an uncaught exception. We don't actually *do* anything in that case, but we do log it send back an EOPNOTSUPP error. This makes the "unrecognized opcode" condition distinguishable from the "no IPC support" condition (which would yield an RPC error instead) so clients can probe for the presence of a handler for their own favorite opcode and either use that or use old-school xattrs depending on the result. BUG: 1158628 Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: I84af1b17babe5b30ec03ecf027ae37d09b873968 Reviewed-on: http://review.gluster.org/8812 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* Upcall: New xlator to store various states and send cbk eventsSoumya Koduri2015-03-173-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | Framework on the server-side, to handle certain state of the files accessed and send notifications to the clients connected. A generic and extensible framework, used to maintain states in the glusterfsd process for each of the files accessed (including the clients info doing the fops) and send notifications to the respective glusterfs clients incase of any change in that state. This patch handles "Inode Update/Invalidation" upcall event. Feature page: URL: http://www.gluster.org/community/documentation/index.php/Features/Upcall-infrastructure Below link has a writeup which explains the code changes done - URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/ Change-Id: Ie3d724be9a3419fcf18901a753e8ec2df2ac802f BUG: 1200262 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/9535 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* inode: 'this' has been set unwantedly.Humble Devassy Chirammal2015-03-171-5/+2
| | | | | | | | | | | | | | | | | Problem: CC libglusterfs_la-inode.lo inode.c: In function 'inode_table_destroy': inode.c:1630:19: warning: variable 'this' set but not used [-Wunused-but-set-variable] xlator_t *this = NULL; Change-Id: If4b37ab896ee0a309826d4be48c6599d6ec2710b Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com> Reviewed-on: http://review.gluster.org/9846 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anoop C S <achiraya@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Poornima G <pgurusid@redhat.com>
* xlator: avoiding possibility of a crash if (xl->ctx) is NULL.Humble Devassy Chirammal2015-03-151-0/+3
| | | | | | | | | | Change-Id: I41acd9970bef04bb16cd4d8532a84a95d5fb642a BUG: 1199003 Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>. Reviewed-on: http://review.gluster.org/9810 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* gNFS: Export / Netgroup authentication on Gluster NFS mountNiels de Vos2015-03-152-0/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Parses linux style export file/netgroups file into a structure that can be lookedup. * This parser turns each line into a structure called an "export directory". Each of these has a dictionary of hosts and netgroups which can be looked up during the mount authentication process. (See Change-Id Ic060aac and I7e6aa6bc) * A string beginning withan '@' is treated as a netgroup and a string beginning without an @ is a host. (See Change-Id Ie04800d) * This parser does not currently support all the options in the man page ('man exports'), but we can easily add them. BUG: 1143880 URL: http://www.gluster.org/community/documentation/index.php/Features/Exports_Netgroups_Authentication Change-Id: I181e8c1814d6ef3cae5b4d88353622734f0c0f0b Original-author: Shreyas Siravara <shreyas.siravara@gmail.com> CC: Richard Wareing <rwareing@fb.com> CC: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8758 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs: more fine grained authentication for the MOUNT protocolNiels de Vos2015-03-152-0/+70
| | | | | | | | | | | | | | | | The /etc/exports format for NFS-exports (see Change-Id I7e6aa6b) allows a more fine grained control over the authentication. This change adds the functions and structures that will be used in by Change-Id I181e8c1. BUG: 1143880 Change-Id: Ic060aac7c52d91e08519b222ba46383c94665ce7 Original-author: Shreyas Siravara <shreyas.siravara@gmail.com> CC: Richard Wareing <rwareing@fb.com> CC: Jiffin Tony Thottan <jthottan@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/9362 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: assign lk_owner for the newly created framevmallika2015-03-141-0/+2
| | | | | | | | | | | | | | | | syncop_inodelk doesn't work properly as lk_owner is not set in the frame created by 'synctask_create'. There is a possibility that more than one thread can acquire inode lock with syncop_inodelk Change-Id: I8193edb0d24b3a6e3a3f6a0c5d7ab5a1be8e7daf BUG: 1188636 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/9858 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Replace pipe2 with pipe.Poornima G2015-03-101-1/+16
| | | | | | | | | | | | | | pipe2() doesn't works on Linux kernel version < 2.6.27 and glibc < version 2.9. Hence replacing it with pipe(), so that the build will not fail on Centos5. Change-Id: If17aed0d51466cd7528cf8dde0edfa28b68139e5 BUG: 1200255 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/9844 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>