summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* rpc: fix up mountbroker xdr defs and regenerate headersCsaba Henk2011-09-153-30/+42
| | | | | | | | | | Change-Id: I8a88d2b9228c3614ee7cbaf48782a419e6aee0f6 BUG: 3482 Reported-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-on: http://review.gluster.com/432 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kp@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Proactive self heal process implementationPranith Kumar K2011-09-1426-525/+1602
| | | | | | | | Change-Id: I96db0d94566ceabf1649f890318363f738c06553 BUG: 2458 Reviewed-on: http://review.gluster.com/403 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* dict: add reset function which empties the dictPranith Kumar K2011-09-142-0/+25
| | | | | | | | Change-Id: I267c81a129197534fb318671eafb76e144a15c8c BUG: 2458 Reviewed-on: http://review.gluster.com/402 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* debug/io-stats: Allow multiple children in graphPranith Kumar K2011-09-141-2/+2
| | | | | | | | Change-Id: Ie4fb75d8000ff95daa8bf9f6757926822de28a65 BUG: 2458 Reviewed-on: http://review.gluster.com/401 Reviewed-by: Vijay Bellur <vijay@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* features/quota: explicitly create xattrs in marker_create_cbkRaghavendra G2011-09-142-3/+11
| | | | | | | | | | | | | | - the earlier approach of creating quota related xattrs through side-effect of updating size and contribution values won't work, since when no contribution xattr is present, the updation process treats contribution value as zero and hence will be equal to size of freshly created files Change-Id: If9b2063b1ac3a4cf50d3fe2c81e907bc8eccb677 BUG: 3531 Reviewed-on: http://review.gluster.com/385 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Mohammed Junaid <junaid@gluster.com>
* features/quota: implement mknod fop.Raghavendra G2011-09-141-0/+142
| | | | | | | | Change-Id: If8f2a0bb635160ee78f35787ee9f8a4db87ae8ac BUG: 3531 Reviewed-on: http://review.gluster.com/384 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Mohammed Junaid <junaid@gluster.com>
* glusterfsd: log the package version just after log initAmar Tumballi2011-09-141-0/+5
| | | | | | | | | | | helps getting output of 'glusterfs --version' from the users while debugging any issues/bugs. Change-Id: I4f15aca143c1e403c70409378afc9ed7a678b286 BUG: 2346 Reviewed-on: http://review.gluster.com/415 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* libglusterfs/common-utils: resolve_ip6() to take AI_ADDRCONFIGAmar Tumballi2011-09-141-0/+1
| | | | | | | | | | | | AI_ADDRCONFIG flag is needed for 'getaddrinfo()' call as hint so that while resolving a hostname, ip list will be taken from proper configured address family Change-Id: Iad6067ad64444d3930d5be593ca819a8de5fc0c1 BUG: 3548 Reviewed-on: http://review.gluster.com/414 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* GlusterFS Hadoop specific DSL for mountbrokerVenky Shankar2011-09-134-12/+66
| | | | | | | | Change-Id: Ie379992bdea0974c8c5e1a4d7bc3e87cefe0d256 BUG: 3539 Reviewed-on: http://review.gluster.com/404 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterd rebalance: fix minor issuesAmar Tumballi2011-09-132-35/+42
| | | | | | | | | | | | | | | there were bugs introduced due to parallelizing rebalance op. * argument to dict_set_str () should be static as for the life of dict * uuid_utoa() output should not be considered as static * overloading 'volinfo->defrag' in other nodes is a overkill, just KISS Change-Id: I43d00c8e22beb2dd5c5f9824552f7337543b2255 BUG: 2112 Reviewed-on: http://review.gluster.com/407 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* support for de-commissioning a node using 'remove-brick'Amar Tumballi2011-09-1322-177/+855
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to achieve this, we now create volume-file with 'decommissioned-nodes' option in distribute volume, then just perform the rebalance set of operations (with 'force' flag set). now onwards, the 'remove-brick' (with 'start' option) operation tries to migrate data from removed bricks to existing bricks. 'remove-brick' also supports similar options as of replace-brick. * (no options) -> works as 'force', will have the current behavior of remove-brick, ie., no data-migration, volume changes. * start (starts remove-brick with data-migration/draining process, which takes care of migrating data and once complete, will commit the changes to volume file) * pause (stop data migration, but keep the volume file intact with extra options whatever is set) * abort (stop data-migration, and fall back to old configuration) * commit (if volume is stopped, commits the changes to volumefile) * force (stops the data-migration and commits the changes to volume file) Change-Id: I3952bcfbe604a0952e68b6accace7014d5e401d3 BUG: 1952 Reviewed-on: http://review.gluster.com/118 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* mgmt/glusterd: check the availability of fuse for few glusterd operationsKaushik BV2011-09-135-0/+76
| | | | | | | | | Change-Id: I410cc6a86c32637566e5498f69f46cb40322e7fb BUG: 2715 Reviewed-on: http://review.gluster.com/364 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* mgmt/glusterd: fail glusterd if gsyncd does not behave as expectedKaushik BV2011-09-131-10/+26
| | | | | | | | Change-Id: Ic54220328f15c579dcf441de2aad8620751a97ef BUG: 2744 Reviewed-on: http://review.gluster.com/331 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Csaba Henk <csaba@gluster.com>
* socket: provide proper arguments to getaddrinfoAmar Tumballi2011-09-131-1/+1
| | | | | | | | | | | | | | | | | ----- from 'man getaddrinfo' : If hints.ai_flags includes the AI_ADDRCONFIG flag, then IPv4 addresses are returned in the list pointed to by res only if the local system has at least one IPv4 address configured, and IPv6 addresses are only returned if the local system has at least one IPv6 address configured. ----- Change-Id: Ie30344daf1bb9d41ac58741b38e83af35cd8b5e9 BUG: 2456 Reviewed-on: http://review.gluster.com/405 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* distribute rebalance: handle the open file migrationAmar Tumballi2011-09-1217-1657/+3040
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Complexity involved: To migrate a file with open fd, we have to notify the other client process which has the open fd, and make sure the write()s happening on that fd is properly synced to the migrated file. Once the migration is complete, the client process which has open-fd should get notified and it should start performing all the operations on the new subvolume, instead of earlier cached volume. How to solve the notification part: We can overload the 'postbuf' attribute in the _cbk() function to understand if a file is 'under-migration' or 'migration-complete' state. (This will be something similar to deciding whether a file is DHT-linkfile by its 'mode'). Overall change includes below mentioned major changes: 1. dht_linkfile is decided by only 2 factors (mode(01000), xattr(trusted.glusterfs.dht.linkto)), instead of earlier 3 factors (size==0) 2. in linkfile self-heal part (in 'dht_lookup_everywhere_cbk()'), don't delete a linkfile if there is a open-fd on it. It means, there may be a migration in progress. 3. if a file's revalidate fails with ENOENT, it may be due to file migration, and hence need a lookup_everywhere() 4. There will be 2 phases of file-migration. -> Phase 1: Migration in progress * The source data file will have SGID and STICKY bit set in its mode. * The source data file will have a 'linkto' xattr pointing the destination. * Destination file will have mode set to '01000', and 'linkto' xattr set to itself. -> Phase 2: File migration Complete * The source data file will have mode '01000', and will be 'truncated' to size 0. * The destination file will have inherited mode from the source. (without sgid and sticky bit) and its 'linkto' attribute will be removed. 4. Changes in distribute to work smoothly with a file which is in migration / got migrated. The 'fops' are divided into 3 categories, inode-read, inode-write and others. inode-read fops need to handle only 'phase 2' notification, where as, the inode-write fops need to handle both 'phase 1' and phase2. The inode-write operations will be done on source file, and if any of 'file-migration' procedures are detected in _cbk(), then the operations should be performed on the destination too. when a phase-2 is detected, then the inode-ctx itself should be changed to represent a new layout. With these changes, the open file migration will work smoothly with multiple clients. Change-Id: I512408463814e650f34c62ed009bf2101d016fd6 BUG: 3071 Reviewed-on: http://review.gluster.com/209 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* geo-rep: partial support for unprivileged gsyncd via mountbrokerCsaba Henk2011-09-125-46/+197
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | gsyncd: - mounting code is split to a direct and a mountbroker based backend - option gluster-command gone - new options: gluster-params, gluster-cli-options, mountbroker - mountbroker mount backend is used if either a mountbroker label is given through the mountbroker option, or if gsyncd is unprivileged; in this case the username is used as label - have gluster cli invocations log to stderr so that we don't hit a permission issue with the logfiles glusterd: - do gsyncd pre-config with new options - add option geo-replication-log-group, so if that specified geo-rep logfile directories are given to that group (and thus members of the given group can do logging there) This is just WIP as geo-rep relies on trusted extended attributes and those are not accessible for unprivileged users. Even if we solved this issue, glusterd security settings are too coarse, so that if we made it possible for an unprivileged gsyncd to operate, we would open up too far. Change-Id: Icd520b58cbadccea3fad7c0f437b99de1e22db14 BUG: 2825 Reviewed-on: http://review.gluster.com/399 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterd / cli: mount-broker serviceCsaba Henk2011-09-1213-3/+1377
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mountbroker is configured in glusterd volfile through a DSL which is restriced enough to be able to appear in the role of the value of a volfile knob. Basically the DSL describes set-theorical requirements against the option set which is sent by the cli (in the hope of getting a mount with these options). If the requirements meet and the volume id and the uid who is to "own" the mount can be unambigously deduced from the given request, glusterd does the mount with the given parameters. The use case of geo-replication is sugared by means of volume options which then generate a complete mount-broker option set. Demo: - add the following option to your glusterd volfile: option mountbroker-root /tmp/mbr option mountbroker.fool EQL(volfile-id=pop*|user-map-root=*|volfile-server=localhost)&MEET(user-map-root=john|user-map-root=jane) - before starting glusterd, create /tmp/mbr owned by root with mode 0755 - with cli, do $ gluster system:: mount fool volfile-id=pop33 user-map-root=jane volfile-server=localhost - on succesful completion (volume pop33 exists and is started, jane is a valid username), the mount path will be echoed to you - you can get rid of the mount by $ gluster system:: umount <mount-path> Change-Id: I629cf64add0a45500d05becc3316f67cdb5b42ff BUG: 3482 Reviewed-on: http://review.gluster.com/128 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* add --user-map-root optionCsaba Henk2011-09-125-0/+42
| | | | | | | | | | | | | | | | | | This makes client fake that given user is a superuser, by changing FUSE requests coming with uid of user so that uid is set to 0. User can be given in numeric form, in which case it's treated as an uid directly, or else it's tried to be resolved to an uid with getpwnam(3). Implies --acl. Change-Id: I2d5a3d3e178be7ffdf22b46a56f33a7eeaaa7fe1 BUG: 3242 Reviewed-on: http://review.gluster.com/127 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cli: add --log-{file,level} optionsCsaba Henk2011-09-122-16/+28
| | | | | | | | | | | | Apart from diagnostic purposes, it's needed when cli is ran by unprivileged user who most likely has no write access to the canonical log file. Change-Id: Ib9d1a31711966ff1efe2592fbc0a911820cf8ee3 BUG: 3242 Reviewed-on: http://review.gluster.com/95 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cli: fix option parsing bug which implied that only a single option can be usedCsaba Henk2011-09-121-0/+2
| | | | | | | | Change-Id: I89467d00030f4714568ef63650ecef0aef1bf753 BUG: 3242 Reviewed-on: http://review.gluster.com/94 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cli: reimplement "--version" without excess bloatCsaba Henk2011-09-121-0/+5
| | | | | | | | Change-Id: I787d241e72b6a6c43db96c220d68fe369bb700a4 BUG: 3242 Reviewed-on: http://review.gluster.com/93 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Revert "cli: Only admin should run gluster CLI"Csaba Henk2011-09-121-5/+0
| | | | | | | | | | This reverts commit 8d64ca70b4c2467d4ed8c76a9eae385abdebd7a7. Change-Id: Id2c6b491c7dd133510380f576ad59655a01d6121 BUG: 3242 Reviewed-on: http://review.gluster.com/92 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Revert "cli: gluster --version implementation"Csaba Henk2011-09-121-59/+15
| | | | | | | | | | | | | | This reverts commit e93b270e8d09fc9d36a39b22987d3a172197e73b. Conflicts: cli/src/cli.c Change-Id: Ibc0f6f752f713b15afc8b5458be5045ecb21235e BUG: 3242 Reviewed-on: http://review.gluster.com/91 Reviewed-by: Vijay Bellur <vijay@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gsyncd: python3 compat fixesCsaba Henk2011-09-124-5/+36
| | | | | | | | | | | Also add __codecheck script which can verify if source is OK at the syntactical level with a given Python interpreter. Change-Id: Ieff34bcd3efd1cdc0e8f9a510c05488f35897bbe BUG: 1570 Reviewed-on: http://review.gluster.com/320 Reviewed-by: Kaushik BV <kaushikbv@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: fix cleaning up of runner objectCsaba Henk2011-09-121-2/+1
| | | | | | | | | | | in lack of that, if geo-rep component is not installed, glusterd got a zombie child Change-Id: Ic4a2a4ffc943de68dd02db76a32b1618821ddf56 BUG: 2744 Reviewed-on: http://review.gluster.com/317 Reviewed-by: Kaushik BV <kaushikbv@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* rpc: Need to keep ref when handling cbk in call_bail.Krishnan Parthasarathi2011-09-121-0/+5
| | | | | | | | Change-Id: Ic944599cfa2696a0ae77105185d49fb510ce6fec BUG: 3511 Reviewed-on: http://review.gluster.com/361 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* glusterd: free the allocated string to avoid memory leakRaghavendra Bhat2011-09-122-56/+24
| | | | | | | | Change-Id: I520abf3c57a15be8bb7dd1e92ad0b049ef5c8970 BUG: 3341 Reviewed-on: http://review.gluster.com/394 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* protocol/client: avoid code duplication in fd based operationsRaghavendra Bhat2011-09-112-340/+42
| | | | | | | | | Change-Id: I012f78bac8ba82333628c59ef51d5e5f43d05ac7 BUG: 3158 Reviewed-on: http://review.gluster.com/329 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* features/marker: unref the local incase of errors before unwindingRaghavendra Bhat2011-09-111-3/+5
| | | | | | | | Change-Id: I4dcad7ddf84bf98b4b7f4a0e407a418426674280 BUG: 2784 Reviewed-on: http://review.gluster.com/299 Reviewed-by: Vijay Bellur <vijay@gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* mgmt/glusterd: volume set help-xml format changeVijay Bellur2011-09-111-1/+1
| | | | | | | | Change-Id: I503364c855d52605e301f4d3c205af6c9fc0e1df BUG: 3366 Reviewed-on: http://review.gluster.com/380 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com>
* features/marker-quota: Prefix the function names with mq (marker-quota).Junaid2011-09-095-310/+310
| | | | | | | | | | | This is to fix to bug marker translator and quota translator cannot co-exist in same process. Change-Id: I9f132b663f03641f4f2c7e168df8400adbc5570f BUG: 3020 Reviewed-on: http://review.gluster.com/381 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cluster/afr: perform self-heal with least priorityPranith Kumar K2011-09-091-0/+7
| | | | | | | | Change-Id: Id8a1dffa3c3200234ad154d1749278a2d7c7021b BUG: 3502 Reviewed-on: http://review.gluster.com/336 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* glusterd rebalance: make co-operate with all other 'op'Amar Tumballi2011-09-094-56/+380
| | | | | | | | | | | | that way, we can share the rebalance state with other peers and can prevent confusion/conflicts when multiple rebalances are done by different peers. Change-Id: I24159e69332644718df7314f6f1da7fce9ff740e BUG: 2112 Reviewed-on: http://review.gluster.com/343 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* cli: 'gluster create volume' - improve option parsingAmar Tumballi2011-09-091-126/+93
| | | | | | | | | | now 'replica' 'stripe' and 'transport' options can be given in any order Change-Id: Ied992ae55e86028bd4f2d662ebd246db138d4548 BUG: 3521 Reviewed-on: http://review.gluster.com/370 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* features/marker-quota: Perform xattr related operations with root ↵Junaid2011-09-092-6/+39
| | | | | | | | | | permissions in rename fop. Change-Id: Id9ac1ecdd9753377c9eb24464f51dcbdc0cd2821 BUG: 3194 Reviewed-on: http://review.gluster.com/367 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* performance/io-threads: treat -ve pid as request for fop with least priorityPranith Kumar K2011-09-081-63/+325
| | | | | | | | Change-Id: Ib6730a708f008054fbd379889a0f6dd3b051b6ad BUG: 3502 Reviewed-on: http://review.gluster.com/335 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: Make data selfheal trigger to be configurable.Pranith Kumar K2011-09-0811-112/+217
| | | | | | | | | | | | | | | | | | | | By default, lookup triggers data self-heal but that is not the preferred way of operating replicated volumes. We would like the data self heals to be triggered in open instead. Number of back-ground self-heals allowed is 16 and lookups block until self-heal is completed. We want to prevent blocking in fops. We can not make lookups independent of self-heal frames because when there are gfid conflicts the decision of which file is correct is determined in self-heal phase. So in afr, lookup self-heal is going to guarantee name space consistency and open/fd fops will take responsibility for data consistency, these are non blocking. The user needs to set the option cluster.data-self-heal "open" for this behavior. Change-Id: If9463cdb9ebac114708558ec13bbca0270acd659 BUG: 3503 Reviewed-on: http://review.gluster.com/334 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* posix-acl: configurable super user IDAnand Avati2011-09-082-7/+61
| | | | | | | | | | | | In configurations with a uid mapper, super user ID could be mapped to a non-zero value. Hence making it configurable in access control would be necessary for proper super-user semantics. Change-Id: I51e8e0395680e9b96a99657a0af547659bd9affe BUG: 2815 Reviewed-on: http://review.gluster.com/332 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: eager locking of FD writesAnand Avati2011-09-084-58/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a change in the way write transactions hold a lock which optimizes the case of sequential writes from a single writer. Lock phase of a transaction has two sub-phases. First is an attempt to acquire locks in parallel by broadcasting non-blocking lock requests. If lock aquistion fails on any server, then the held locks are unlocked and revert to a blocking locked mode sequentially on one server after another. The change in this patch is to make the initial broadcasting lock request attempt to acquire lock on the entire file. If this fails, we revert back to the sequential "regional" blocking lock as before. In the case where such an "eager" lock is granted in the non-blocking phase, it gives rise to an opportunity for optimization. i.e, if the next write transaction on the same FD arrives before the unlock phase of the first transaction, it "takes over" the full file lock. Similarly if yet another transaction arrives before the unlock phase of the "optimized" transaction, that in turn "takes over" the lock as well. The actual unlock now happens at the end of the last "optimzed" transaction. Any operation which arrives before the unlock phase of the previous transaction is a potential candidate to become an "optimized" transaction. In cases where the previous transaction had aquired lock as a "regional" blocking lock, and the next transaction comes in before its unlock phase, then it would not be an "optimized" transaction. Implied assumption ------------------ Since two or more transactions can now operate within the same large lock, there is a possibility that overlapping transactions can arrive at oppoosite orders on the servers. However in the larger picture this is not possible as write-behind already ensures that no two overlapping writes on an inode are in transit at the same time. Overlapping writes across clients are not a problem as they compete at locks anyways. Theoretical benefits and potential harms ---------------------------------------- In case of a single writer: The benefits are large for sequential writes. In the best case the entire file write can happen with just one lock and unlock per server, provided writes are coming in fast enough and getting pipelined by write-behind soon enough (which is usually the case). If the writes are not coming in fast enough, then the optimization "kicks in" for only those subsets of writes which are close enough to get "piggybacked". For random writes the benefits are the same as well. In any case the overall performance is better than or equal to the performance without this optimization for a single writer. In case of multiple writers: When multiple writers are not writing concurrently, there is no negative performance impact. When multiple writers are writing concurrently to the same region, there is no negative impact either, as they were previously getting arbitrated at the locks translator too. In the case of multiple writers writing to different regions concurrently, there will be an increased number of "failovers" from failed parallel non-blocking to sequential blocking regional locks. This above "worst case" has a simple workaround that as soon as we detect > 1 open-fd-count in lookup xattr, we can disable this optimization on those fds. Beneficial side-effects ----------------------- There is another similar optimization in AFR for changelogs which goes by the name of "changelog-piggybacking". That works in a similar way where pending flags get 'taken over' or 'piggybacked' by the next transaction if its 'pre-op' phase kicks in before the 'post-op' phase of the previous transaction. It has been observed that this changelog-piggybacking optimization gives a saving of about ~55% savings of xattr calls hitting the wire, measured across various types of network interfaces. The side effect of this eager-lock optimization is that it gives an almost 100% saving of xattr calls by making the optimistic-changelog work much more efficiently as it gives a wider overlap of the xattr phases of two consecutive transactions. Change-Id: I41c02eb3b64c14c68ef66a344610ec3f024cd59d BUG: 3409 Reviewed-on: http://review.gluster.com/240 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* storage/posix: posix getxattr log enhancementRajesh Amaravathi2011-09-081-4/+4
| | | | | | | | | | Now the key is logged with getxattr failure. Change-Id: I96a9234cf138ae0922dc403e2fddcd4df0d89df8 BUG: 3283 Reviewed-on: http://review.gluster.com/373 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* mount/nfs: Gluster nfs crashes with subdirectory mountRajesh Amaravathi2011-09-081-1/+4
| | | | | | | | | | | | | Glusterfs used to crash trying to dereference a NULL pointer. Also, in mnt3_resolve_export_subdir, volume name was prefixed to sub directory exported, resulting in mount fail of sub directory. Fixed both issues. Change-Id: I746f0c244b4cbf03033d73ac3e40518762d76385 BUG: 3481 Reviewed-on: http://review.gluster.com/323 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* Save the mode flags set by the application when ACLs are in usePavan T C2011-09-081-1/+2
| | | | | | | | | | | | | | | | | While inheriting the ACLs from a directory that has default ACLs, make sure that the mode flags set by the application are saved. It is required to inherit only the Read, Write and Execute permissions while leaving the others viz. setuid, setgid and sticky bit untouched hence honouring the requests made by the application during create operations (mknod, mkdir et al). For a description of the problem, root cause and evaluation, refer: http://bugs.gluster.com/show_bug.cgi?id=3522 Change-Id: I994077fb321a35d8254f0cc5a7de99a17ec40c47 BUG: 3522 Reviewed-on: http://review.gluster.com/368 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* gsyncd: do the homework, document _everything_Csaba Henk2011-09-089-17/+483
| | | | | | | | Change-Id: I559e6a0709b8064cfd54c693e289c741f9c4c4ab BUG: 1570 Reviewed-on: http://review.gluster.com/319 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushik BV <kaushikbv@gluster.com>
* nfs3: Resolve entry vs. hash conflict at same dir depthShehjar Tikoo2011-09-072-15/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intro Note ========== The current code in hard fh resolution takes the first-match approach, i.e. which ever dirent either matches the hash or matches the gfid first is the one chosen as the result for the next step of fh resolution. In the latter case, i.e., dirent matches the gfid, we the next step is to conclude the fh resolution by returning the entry whose gfid matched. In the former, i.e., the hash matches the dirent, we choose the hash-matching dirent as the next directory to descend into, for searching the file to be operated upon. Problem ======= When performing hard fh resolution, there can be a situation where: o the hash of the primary entry,i.e. the entry we're looking for and the hash of another sibling directory, match. Note the use of "sibling", meaning both the primary entry and the hash matching one are in the same directory, i.e., their filehandle.hashcount will be same. o the sibling directory is encountered first during the dir search. Because of the current code described in "Intro", we'll end up descending into the sibling directory even though the correct behaviour is to ignore this and wait till we encounter the primary entry in the same parent directory. Once we end up descending into this sibling directory, the directory depth validation check fails. The check fails because it notices that the resolution is attempting to open a directory that is deeper in the fs tree than the file we're looking for. When this check fails, we return an ESTALE. So basically, a false-positive results in an estale to Specsfs. This is not a theoretical situation. Me and Avati saw this on specsfs test where sfs created terabytes-sized file system for its tests. The number of files was so huge in a single directory that the hashes of two entries ended up colliding. Change-Id: I4a6df11d326a67a507b1cd716c2c8e00b5a858a4 BUG: 3510 Reviewed-on: http://review.gluster.com/357 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shehjar Tikoo <shehjart@gluster.com>
* Eliminate many "var set but not used" warnings with newer gcc.Jeff Darcy2011-09-0751-479/+2
| | | | | | | | | | | | | | | | This fixes ~200 such warnings, but leaves three categories untouched. (1) Rpcgen code. (2) Macros which set variables in the outer (calling function) scope. (3) Variables which are set via function calls which may have side effects. Change-Id: I6554555f78ed26134251504b038da7e94adacbcd BUG: 2550 Reviewed-on: http://review.gluster.com/371 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* glusterd: send the 'stripe-count' value to peer during handshakeAmar Tumballi2011-09-071-0/+17
| | | | | | | | | | | without which, if a peer is added after volume of type 'stripe-replica' is created, it won't be reflected in the newly added peer. Change-Id: I77ee6aa3f33994bd4c6dbfefd853cc7e7491c1db BUG: 3523 Reviewed-on: http://review.gluster.com/369 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* glusterd: fail the 'peer probe' if the first connect attempt failsAmar Tumballi2011-09-071-4/+48
| | | | | | | | | | | so 'gluster peer probe' command doesn't hang till timeout (120s), instead it will send the proper error msg to client. Change-Id: I398fa16d526f869f1d27eeb57aeb7ee4451fbecd BUG: 1852 Reviewed-on: http://review.gluster.com/342 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* modify to the way we used XDR definitions files (.x files)Amar Tumballi2011-09-0750-4137/+957
| | | | | | | | | | | | | | | | | | | | | | | | Earlier: step 1: copy the existing <xdr>.x files to /tmp step 2: generate '.[ch]' files using 'rpcgen <xdr>.x' step 3: check diff with the to the existing files, add only your part of changes back to the original file. (ignore other changes). step 4: there is another file to write wrapper functions to convert structures to/from XDR buffers, update it with your new structure. step 5: use these wrapper functions in the newly written procedures. step 6: commit :-| Now: step 1: update (mostly adding only) the <xdr>.x file step 2: run '<path-to-src>/extras/generate-xdr-files.sh <xdr>.x' command step 3: implement rpc procedure to handle the request/response. step 4: commit :-) Change-Id: I219f9159fc980438c86e847c6b030be96e595ea2 BUG: 3488 Reviewed-on: http://review.gluster.com/341 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* glusterfs protocol: bring in variable sized iobuf supportAmar Tumballi2011-09-079-198/+297
| | | | | | | | | | | is a step towards reducing glusterfs memory footprint. should also help a bit in overall performance. Change-Id: I074d5813602b2c960d59562e792b3dc6e43d2f42 BUG: 3475 Reviewed-on: http://review.gluster.com/322 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* cluster/afr: Prevent double big lock when data self-heal loops are not spawnedPranith Kumar K2011-09-062-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The steps in normal data self heal: 1) take big lock by self-heal frame. Get the xattrs/stat to decide source, sink information. 2) spawn loop frames which perform self-heal by taking small locks on the file. Every time a new lock is taken and the old lock is released. 3) Before releasing the final small lock a big lock is taken by the self-heal frame, and unlock on small-lock. Erasing of the pending xattrs happen then the big unlock happen and that is the end of the data self-heal. When a data self-heal is needed for a file and the fop that triggers the self-heal is open with O_TRUNC. Fuse sends open then an explicit truncate for this. Open triggers the self-heal but by the time it tries to spawn the loops the file size is truncated to 0, so no loops are formed. These are the steps: 1) Take big lock by self-heal frame. Get the xattrs/stat to decide source, sink information. 2) loop frames are not spawned. The big lock is not released. 3) One more big lock is taken by the same self-heal frame, Erasing of the pending xattrs etc happen, now it does two big unlocks, but after the first unlock, the information on which the locks were performed is forgotten, so the next unlock becomes a no-op. So there is a stale big lock on that file preventing further writes. As a fix, if the loops are not spawned, use the previous big lock to perform the rest of the operations needed in completing the data self-heal. No need to have one more big lock. Change-Id: Id03171269594e447b2b6d1331e362d83bd1e3430 BUG: 3506 Reviewed-on: http://review.gluster.com/339 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>