summaryrefslogtreecommitdiffstats
path: root/libglusterfs
Commit message (Collapse)AuthorAgeFilesLines
* core: utilize mempool for frame->local allocationsAmar Tumballi2012-02-214-5/+11
| | | | | | | | | | | | | | | in each translator, which uses 'frame->local', we are using GF_CALLOC/GF_FREE, which would be costly considering the number of allocation happening in a lifetime of 'fop'. It would be good to utilize the mem pool framework for xlator's local structures, so there is no allocation overhead. Change-Id: Ida6e65039a24d9c219b380aa1c3559f36046dc94 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765336 Reviewed-on: http://review.gluster.com/2772 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* rpc/clnt: handle PARENT_DOWN event appropriatelyAmar Tumballi2012-02-211-0/+1
| | | | | | | | | | Change-Id: I4644e944bad4d240d16de47786b9fa277333dba4 BUG: 767862 Signed-off-by: Raghavendra G <raghavendra@gluster.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.com/2735 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* fuse-bridge: Handle graph-switch.Raghavendra G2012-02-214-2/+109
| | | | | | | | | | | | | | | | | The purpose of this patch is to let protocol/client know when its transports can be disconnected, without application running on gluster mount noticing any effects of graph switch. In order to do this, we migrate all fds and blocked locks to new graph. Once this migration is complete and there are no in-transit frames as viewed by fuse-bridge, we send a PARENT_DOWN event to its children. protocol/client on receiving this event, can disconnect up its transports. Change-Id: Idcea4bc43e23fb077ac16538b61335ebad84ba16 BUG: 767862 Reviewed-on: http://review.gluster.com/2734 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Add commands to see self-heald opsPranith Kumar K2012-02-203-3/+14
| | | | | | | | | Change-Id: Id92d3276e65a6c0fe61ab328b58b3954ae116c74 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: Self-heald, Index integrationPranith Kumar K2012-02-202-0/+62
| | | | | | | | | Change-Id: Ic68eb00b356a6ee3cb88fe2bde50374be7a64ba3 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2749 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* syncop: Multi-processor support in syncenvAnand Avati2012-02-202-67/+216
| | | | | | | | | | | | | | | | | | | This patch introduces: - multithreading of syncop processors permitting synctasks to be executed concurrently if the runqueue has many tasks. - Auto scaling of syncop processors based on runqueue length. - Execute a synctask (synctask_new) in a blocking way if callback function is set NULL. The return value of the syncfn will be the return value of synctask_new() Change-Id: Iff369709af9adfd07be3386842876a24e1a5a9b5 BUG: 763820 Reviewed-on: http://review.gluster.com/443 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* storage/posix: Add xattr for gfid2pathPranith Kumar K2012-02-201-0/+1
| | | | | | | | | Change-Id: I1fe987d255bf50e8433043749b482b67554a0ac3 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2774 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* features/index: Index translator implementationPranith Kumar K2012-02-201-0/+3
| | | | | | | | | | Change-Id: If8a11ecbdd010f64fb4409add5751080f4b59086 BUG: 763820 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/2722 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* NLM - Network Lock Manger V4Krishna Srinivas2012-02-202-7/+44
| | | | | | | | | Change-Id: Ic31b8bb10a28408da2a623f4ecc0c60af01c64af BUG: 795421 Signed-off-by: Krishna Srinivas <ksriniva@redhat.com> Reviewed-on: http://review.gluster.com/2711 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Implement circular buffer and event historyRaghavendra Bhat2012-02-2011-10/+435
| | | | | | | | | | | | | | | | | Implement circular buffer framework, so that it can be used by other components such as event history management. And event history is implemented which can be used by xlator to dump some information to a file (such as information of some structure etc). Through statedump, history of each xlator can be dumped. An option called history should be given to the statedump command. Change-Id: I7c5e8f6bd1018584eaee856e933e7c4b94c6709c BUG: 795419 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2769 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* protocol/client,server: fcntl lock self healing.Mohammed Junaid2012-02-206-8/+552
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently(with out this patch), on a disconnect the server cleans up the transport which inturn closes the fd's and releases the locks acquired on those fd's by that client. On a reconnect, client just reopens the fd's but doesn't reacquire the locks. The application that had previously acquired the locks still is under the assumption that it is the owner of those locks which might have been granted to other clients(if they request) by the server leading to data corruption. This patch allows the client to reacquire the fcntl locks (held on the fd's) during client-server handshake. * The server identifies the client via process-uuid-xl (which is a combination of uuid and client-protocol name, it is assumed to be unique) and lk-version number. * The client maintains a list of process-uuid-xl, lk-version pair for each accepted connection. On a connect, the server traverses the list for a matching pair, if a matching pair is not found the the server returns lk-version with value 0, else it returns the lk-version it has in store. * On a disconnect, the server and client enter grace period, and on the completion of the grace period, the client bumps up its lk-version number (which means, it will reacquire the locks the next time) and the server will distroy the connection. If reconnection happens within the grace period, the server will find the matching (process-uuid-xl, lk-version) pair in its list which guarantees that the fd's and there corresponding locks are still valid for this client. Configurable options: To set grace-timeout, the following options are option server.grace-timeout value option client.grace-timeout value To enable or disable the lk-heal, option lk-heal [on|off] gluster volume set command can be used to configurable options Change-Id: Id677ef1087b300d649f278b8b2aa0d94eae85ed2 BUG: 795386 Signed-off-by: Mohammed Junaid <junaid@redhat.com> Reviewed-on: http://review.gluster.com/2766 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterd: auth allow enhancementsRajesh Amaravathi2012-02-201-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * PROBLEM: When address-based authentication is enabled on a volume, the gNfs server, self-heal daemon (shd), and other operations such as quota, rebalance, replace-brick and geo-replication either stop working or the services are not started if all the peers' ipv{4,6} addresses or hostnames are not added in the "set auth.allow" operation, breaking the functionality of several operations. E.g: volume vol in a cluster of two peers: /mnt/brick1 in 192.168.1.4 /mnt/brick2 in 192.168.1.5 option auth.allow 192.168.1.6 (allow connection requests only from 192.168.1.6) This will disrupt the nfs servers on 192.168.1.{4,5}. brick server processes reject connection requests from both nfs servers (on 4,5), because the peer addresses are not in the auth.allow list. Same holds true for local mounts (on peer machines), self-heal daemon, and other operations which perform a glusterfs mount on one of the peers. * SOLUTION: Login-based authentication (username/password pairs, henceforth referred to as "keys") for gluster services and operations. These *per-volume* keys can be used to by-pass the addr-based authentication, provided none of the peers' addresses are put in the auth.reject list, to enable gluster services like gNfs, self-heal daemon and internal operations on volumes when auth.allow option is exercised. * IMPLEMENTATION: 1. Glusterd generates keys for each volume and stores it in memory as well as in respective volfiles. A new TRUSTED-FUSE volfile is generated which is fuse volfile + keys in protocol/client, and is named trusted-<volname>-fuse.vol. This is used by all local mounts. ANY local mount (on any peer) is granted the trusted-fuse volfile instead of fuse volfile via getspec. non-local mounts are NOT granted the trusted fuse volfile. 2. The keys generated for the volume is written to each server volfile telling servers to allow users with these keys. 3. NFS, self-heal daemon and replace-brick volfiles are updated with the volume's authentication keys. 4. The keys are NOT written to fuse volfiles for obvious reasons. 5. The ownership of volfiles and logfiles is restricted to root users. 6. Merging two identical definitions of peer_info_t in auth/addr and rpc-lib, throwing away the one in auth/addr. 7. Code cleanup in numerous places as appropriate. * IMPORTANT NOTES: 1. One SHOULD NOT put any of the peer addresses in the auth.reject list if one wants any of the glusterd services and features such as gNfs, self-heal, rebalance, geo-rep and quota. 2. If one wants to use username/password based authentication to volumes, one shall append to the server, nfs and shd volfiles, the keys one wants to use for authentication, *while_retaining those_generated_by_glusterd*. See doc/authentication.txt file for details. Change-Id: Ie0331d625ad000d63090e2d622fe1728fbfcc453 BUG: 789942 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/2733 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/dht: Support for hardlink rebalance when decommissioningshishir gowda2012-02-193-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The support for hardlink rebalance is only available for decommissioning of a node. this can be triggered in two ways 1. remove-brick start 2. if decommission node value is set in vol file, then a normal rebalance command The way we handle it is- if (nlink > 1) do * if src file doesnt have linkto xattr * mark src's linkto to the dst * else * perform a link on the dst * do a look up * if nlinks = dst.nlinks * migrate data * else * continue crawling done Signed-off-by: shishir gowda <shishirng@gluster.com> Change-Id: If43b5524b872fd1413e9f7aa7f436cb244e30d8d BUG: 763844 Reviewed-on: http://review.gluster.com/2737 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* cluster/dht: Rebalance will be a new glusterfs processshishirng2012-02-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | rebalance will not use any maintainance clients. It is replaced by syncops, with the volfile. Brickop (communication between glusterd<->glusterfs process) is used for status and stop commands. Dept-first traversal of dir is maintained, but data is migrated as and when encounterd. fix-layout (dir) do Complete migrate-data of dir fix-layout (subdir) done Rebalance state is saved in the vol file, for restart-ability. A disconnect event and pidfile state determine the defrag-status Signed-off-by: shishirng <shishirng@gluster.com> Change-Id: Iec6c80c84bbb2142d840242c28db3d5f5be94d01 BUG: 763844 Reviewed-on: http://review.gluster.com/2540 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* cli: Fix for statedump crashing gluster processesKaushal M2012-02-171-8/+4
| | | | | | | | | | | | | | | | 1. Fixes the bug in statedump causing the gluster process to crash when an unknown option was given in the 'glusterdump.*.options' file. 2. Also fixes cli, making it send full statedump option strings even when only partial option strings are given in 'volume statedump' command. 3. Minor change to order of operations during statedump to allow option parsing errors to be written to the dump file. Change-Id: Ic878cbca4dbf46b83fba0fd88fcb3c03f05ae46d BUG: 772586 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/2706 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* posix: handle some internal behavior in posix_mknod()Amar Tumballi2012-02-161-0/+2
| | | | | | | | | | | | | | | | | | | | assume a case of link() systemcall, which is handled in distribute by creating a 'linkfile' in hashed subvolume, if the 'oldloc' is present in different subvolume. we have same 'gfid' for the linkfile as that of file for consistency. Now, a file with multiple hardlinks, we may end up with 'hardlinked' linkfiles. dht create linkfile using 'mknod()' fop, and as now posix_mknod() is not equipped to handle this situation. this patch fixes the situation by looking at the 'internal' key set in the dictionary to differentiate the call which originates from inside with regular system calls. Change-Id: Ibff7c31f8e0c8bdae035c705c93a295f080ff985 BUG: 763844 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/2755 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* locks: Added a getxattr interface to clear locks on a given inode.Krishnan Parthasarathi2012-02-141-0/+1
| | | | | | | | | | | | | | | | getxattr returns a summary of no. of inodelks/entrylks cleared. cmd_structure: trusted.glusterfs.clrlk.t<type>.k<kind>[.{range|basename}] where, type = "inode"| "entry"| "posix" kind = "granted"| "blocked" | "all" range = off,a-b, where a, b = 'start', 'len' from offset 'off' Change-Id: I8a771530531030a9d4268643bc6823786ccb51f2 BUG: 789858 Signed-off-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.com/2551 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: add an extra flag to readv()/writev() APIAmar Tumballi2012-02-147-37/+53
| | | | | | | | | | | | needed to implement a proper handling of open flag alterations using fcntl() on fd. Change-Id: Ic280d5db6f1dc0418d5c439abb8db1d3ac21ced0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2723 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol: code cleanupAmar Tumballi2012-02-141-0/+45
| | | | | | | | | | | make dict serialize and unserialization code a macro Change-Id: I459c77c6c1f54118c6c94390162670f4159b9690 BUG: 764890 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/2742 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: log xlator options in initRajesh Amaravathi2012-02-091-1/+1
| | | | | | | | | | | | * The options (default as well as explicitly set) for each xlator are logged at DEBUG log-level Change-Id: I757e206bf06ef5dc60a3255e2377a821c284b6f1 BUG: 767087 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/2647 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cli, protocol/server : improve validation for the option auth.(allow/reject)Kaushal M2012-02-052-25/+127
| | | | | | | | | | | | | | | | | | cli now checks validity of address list given for 'volume set auth.*' Server xlator checks addresses supplied to auth.(allow/reject) option including wildcards for correctness in case volfile is manually edited. Original patch done by shylesh@gluster.com Original patch is at http://patches.gluster.com/patch/7566/ Change-Id: Icf52d6eeef64d6632b15aa90a379fadacdf74fef BUG: 764197 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/306 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* protocol/client : prevent client from reconnecting when serverKaushal M2012-02-033-1/+4
| | | | | | | | | | | | | | | | | | authentication fails This prevents the client from trying to reconnect on server authentication failure. Reconnecting on authentcation failure causes hung mounts on unauthorised clients. This patch fixes this problem. Also, mount.glusterfs script unmounts mount-point on mount failure to prevent hung mounts. Change-Id: I5615074d27948077bad491a38cecae1b7f5159fb BUG: 765240 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/398 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* features/marker,quota: Incorporate changes to support nameless lookupRaghavendra Bhat2012-02-021-0/+4
| | | | | | | | | | Change-Id: Ic5f00a9891bd835ebee5a3e103ef0f75d0b7fc25 BUG: 783925 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/2702 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cli: Extend "volume status" with statedump infoKaushal M2012-01-277-6/+518
| | | | | | | | | | | | | | | | | This patch enhances and extends the "volume status" command with information obtained from the statedump of the bricks of volumes. Adds new status types : clients, inode, fd, mem, callpool The new syntax of "volume status" is, #gluster volume status [all|{<volname> [<brickname>] [misc-details|clients|inode|fd|mem|callpool]}] Change-Id: I8d019718465bbc3de727653a839de7238f45da5c BUG: 765495 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/2637 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kp@gluster.com>
* common-utils: Add support for solaris, bsd and mac os/xHarshavardhana2012-01-261-20/+19
| | | | | | | | | | | | get_mem_size() is more standardized now. Change-Id: I8e3dc29df0a64a5eb8eea4fd3965b268cb1a85c2 BUG: 772808 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/2618 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Add format/printf/attribute to log declsKaleb KEITHLEY2012-01-261-5/+10
| | | | | | | | | | | | | | | This enables compile-time checking of printf-style format checking Reason for doing it this way: N/A Description of test cases: N/A Change-Id: I9e26a5dceef5b545b9434b1d418c3d1193b4ef9a Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.com/2693 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* complete the implementation of missing 'f**xattr()' fopsAmar Tumballi2012-01-252-0/+30
| | | | | | | | | | | | | | in debug/* and cluster/* translators and a syncop_fsetxattr() added a test case for testing the working of 'f-fop()' on fuse mount. Change-Id: I0c2aeeb30a0fb382ef2495cca1e66b00abaffd35 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 766571 Reviewed-on: http://review.gluster.com/802 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: add 'fremovexattr()' fopAmar Tumballi2012-01-2511-0/+216
| | | | | | | | | | | so operations can be done on fd for extended attribute removal Change-Id: Ie026f1b53793aeb4ae33e96ea5408c7a97f34bf6 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 766571 Reviewed-on: http://review.gluster.com/778 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: get xattrs also as part of readdirpAmar Tumballi2012-01-259-35/+56
| | | | | | | | | | | | | readdirp_req() call sends a dict_t * as an argument, which contains all the xattr keys for which the entries got in readdirp_rsp() are having xattr value filled dictionary. Change-Id: I8b7e1290740ea3e884e67d19156ce849227167c0 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 765785 Reviewed-on: http://review.gluster.com/771 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: change lk-owner as a 1k bufferAmar Tumballi2012-01-248-21/+186
| | | | | | | | | | | | | so, NLM can send the lk-owner field directly to the locks translators, while doing the same effort, also enabled sending maximum of 500 aux gid over protocol. Change-Id: I87c2514392748416f7ffe21d5154faad2e413969 Signed-off-by: Amar Tumballi <amar@gluster.com> BUG: 767229 Reviewed-on: http://review.gluster.com/779 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* pump: move internal pump xattrs out of trusted domainRajesh Amaravathi2012-01-231-0/+7
| | | | | | | | | | | | | | | | | | | * the trusted.glusterfs.pump.{start|pause|commit|status|abort} xattrs have been moved out of trusted domain. This enables separation of xattrs used as gluster-internal commands (handled by pump) for replace-brick, which are not set in the back-end, from xattrs set on the replace-brick source and destinations bricks. * macros definitions from pump.h and glusterd.h, #defining these xattrs have been merged and put into libglusterfs/src/glusterfs.h Change-Id: I87b8bfbf045aa140f5d3f0c9baa9b2e79f87b67b BUG: 783049 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/2663 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: GFID filehandle based backend and anonymous FDsAnand Avati2012-01-205-113/+220
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core/setxattr: prevent users from setting glusterfs xattrsRajesh Amaravathi2012-01-141-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | * Each xlator prevents the user from setting glusterfs-internal xattrs like trusted.gfid by handling it in respective setxattr functions. The speacial case of trusted.gfid is handled in fuse (Not in posix because posix_setxattr is used to set gfid). * For xlators which did not define setxattr and/or fsetxattr, the functions have been implemented with appropriate checks. xlator | fops-added _______________|__________________________ | 1. afr | fsetxattr 2. stripe | setxatrr and fsetxattr 3. quota | setxattr and fsetxattr Change-Id: Ib62abb7067415b23a708002f884d30e8866fbf48 BUG: 765487 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/685 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* cluster/distribute: dht_aggregate() fix a logic error before xattr comparisonsHarshavardhana2012-01-121-3/+6
| | | | | | | | | Change-Id: I20f015263bed9851225005d5f41a5d518bd22592 BUG: 769691 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/2557 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* common-utils: Cleanup old stuff and standardizeHarshavardhana2012-01-121-192/+51
| | | | | | | | | | Change-Id: If15058c3a1cc4070d1ebbabe37e012a70353310e BUG: 769691 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/2635 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cli: volume status enhancementRajesh Amaravathi2012-01-122-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Support "gluster volume status (all)" option to display all volumes' status. * On option "detail" appended to "gluster volume status *", amount of storage free, total storage, and backend filesystem details like inode size, inode count, free inodes, fs type, device name of each brick is displayed. * One can also obtain [detailed]status of only one brick. * Format of the enhanced volume status command is: "gluster volume status [all|<vol>] [<brick>] [detail]" * Some generic functions have been added to common-utils: skipword get_nth_word These functions enable parsing and fetching of words in a sentence. glusterd_get_brick_root (in glusterd) These are self explanatory. Change-Id: I6f40c1e19810f8504cd3b1786207364053d82e28 BUG: 765464 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/777 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* core/stack: perform locked windsAnand Avati2012-01-101-16/+35
| | | | | | | | | | | | | In configurations like pump, where there is a cluster translator on top of io-threads, there are situations where two concurrent stack-winds can be performed on the same call stack in multiple threads. This patch holds locks during the call frame list manipulation Change-Id: I51539210dc8101f7a80cf9bc103b5eff0c86dc9f BUG: 765522 Reviewed-on: http://review.gluster.com/774 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* lib/mount-broker: move utility functions to common-utilsRajesh Amaravathi2012-01-052-0/+26
| | | | | | | | | | | | | functions skipwhite and nwstrtail have been moved from mount-broker to common-utils library for general use. Change-Id: I9cfefb28bbfcf5d0bd37e35865ff3f3b7923fc53 BUG: 765464 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.com/2560 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com> Reviewed-by: Csaba Henk <csaba@gluster.com>
* libglusterfs: Add boundary conditions to data_to_int8() function.Harshavardhana2012-01-031-2/+16
| | | | | | | | | | Change-Id: Iff50d44568895a75fc8743a10346990f764cf8fa BUG: 769692 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/2556 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* libglusterfs: set inode path to be NULL in case of errorsPranith Kumar K2011-12-221-0/+2
| | | | | | | | | | | | | | | Some of the functions calling inode_path, __inode_path are assuming the path to be set to NULL in case of errors. Instead of fixing it in the calling functions, fixing it in the __inode_path function. Change-Id: I77736a2700d3c2c9732a536bcf2a398fe626d54e BUG: 765430 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/810 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* transport/rdma: Add attr_timeout, attr_retry_cnt, attr_rnr_retryHarshavardhana2011-12-202-0/+31
| | | | | | | | | | | | as configurable options Change-Id: Ifc4710f149be35979746ddfbfb4181638601bc64 BUG: 766040 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/766 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* protocol/client: Be strict about gfids in fop reqPranith Kumar K2011-12-151-0/+8
| | | | | | | | | | Change-Id: I7508ab3a93329bb6a679801fddfcd0e5b0c7c134 BUG: 765198 Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Reviewed-on: http://review.gluster.com/770 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* iobuf: fix a crash in iobuf when statedump is takenv3.3.0qa15Raghavendra Bhat2011-12-071-1/+1
| | | | | | | | | | | With the previous patch this change was missed. Change-Id: If536cef3fa423415eaa4104e6c3e5e72c5d0a22d BUG: 3854 Signed-off-by: Raghavendra Bhat <raghavendrabhat@gluster.com> Reviewed-on: http://review.gluster.com/775 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* libglusterfs/iobuf: have fixed number of arenasAmar Tumballi2011-12-072-105/+163
| | | | | | | | | | | | | | * so overall memory usage will be in limit. * the array is hard-coded, need to improve upon this. * need more benchmarking to tune the proper values to the array * fixed the issue of pruning of arenas. Change-Id: I38a8ffab37378c25d78f77a2d412b1b8935c67d3 BUG: 3474 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/543 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/distribute: Assert checks at known locationsHarshavardhana2011-12-062-0/+31
| | | | | | | | | | | and new function dict_get_ptr_and_len(). Change-Id: I653a1cc8123baa36d750250d02721aa98b196f38 BUG: 3158 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/744 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* bz 3826, fix for parallel make in fedora build systemKaleb KEITHLEY2011-12-011-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | builds of glusterfs in the fedora build system often fail due to a race condition between running yacc and starting to compile the y.tab.c produced by yacc We found that the previous fix would still occasionally trip the race condition. This revised patch changes the automake Makefile.am to generate the parser files without incurring the race condition. An extra dimension of the problem is that the tarball from http://download.gluster.com/pub/gluster/glusterfs/3.2/3.2.5/glusterfs-3.2.5.tar.gz contains files that you don't get when you clone from the git repo (e.g. libgluster/src/{graph.lex.c,y.tab.c}, and all the Makefile.in files) so build issues on fedora build systems do not manifest themselves on jenkins and vice versa. This works on jenkins, the fedora build system, and my f16 vm/guest machines. (Finding the right combination that works on all three was an exercise to say the least. I'm open to other suggestions for avoiding the race condition.) Run autogen.sh to (re)generate the Makefile.ins. Then run configure to produce all Makefiles, followed by `make -j X` where X>1 see also https://bugzilla.redhat.com/show_bug.cgi?id=756510 BUG: 3826 Change-Id: Iaeecb59c61a77bf3927da18253c83cf5ffed4254 Reviewed-on: http://review.gluster.com/765 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@gluster.com>
* contrib/uuid: Make sure that uuid_types.h are generated per system specific.Harshavardhana2011-11-301-1/+4
| | | | | | | | | | | | | Just the same way e2fsprogs maintains. This avoids unnecessary problems for different architectures. Change-Id: I3911998373756707996afb7b926ec0780ea18b81 BUG: 3833 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/764 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* bz 3826, fix for parallel make in fedora build systemKaleb KEITHLEY2011-11-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | builds of glusterfs in the fedora build system often fail due to a race condition between running yacc and starting to compile the y.tab.c produced by yacc This patch changes the automake Makefile.am to generate the parser files without incurring the race condition This works on jenkins, the fedora build system, and my f16 vm/guest machines. (Finding the right combination that works on all three was an exercise to say the least. I'm open to other suggestions for avoiding the race condition.) Run autogen.sh to (re)generate the Makefile.ins. Then run configure to produce all Makefiles, followed by `make -j X` where X>1 see also https://bugzilla.redhat.com/show_bug.cgi?id=756510 BUG: 3826 Change-Id: I06ba0d0a1d59f0f44c0dd2cd9d227ca08d99e205 Reviewed-on: http://review.gluster.com/763 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@gluster.com>
* Let NetBSD use its recently added Linux xattr APIEmmanuel Dreyfus2011-11-241-7/+7
| | | | | | | | | Change-Id: Ibd365e8d83c6faf631df7cb99ec62440496fcbdf BUG: 2923 Reviewed-on: http://review.gluster.com/230 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@gluster.com>
* usleep(3) shall not be called with argument higher than 1sEmmanuel Dreyfus2011-11-231-1/+2
| | | | | | | | | Change-Id: Ied0a2fedb3b7604f6abbf0a4aa7f71e43a5ea568 BUG: 2923 Reviewed-on: http://review.gluster.com/595 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Anand Avati <avati@gluster.com>