summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht/src/dht-helper.c
Commit message (Collapse)AuthorAgeFilesLines
* support for nano second resolution for mtime,ctime,atime attributes.krishna2012-02-091-3/+15
| | | | | | | | | Change-Id: Id5078f270d0fec280b53d4aa7b16bbaf42a2df05 BUG: 784095 Signed-off-by: krishna <ksriniva@redhat.com> Reviewed-on: http://review.gluster.com/2730 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: GFID filehandle based backend and anonymous FDsAnand Avati2012-01-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: remove 'ino' variable from 'inode_t' structureAmar Tumballi2011-11-161-3/+2
| | | | | | | | Change-Id: I0f078d1753db65d2f2e0380d1b0450c114cf40dd BUG: 3518 Reviewed-on: http://review.gluster.com/522 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* support for de-commissioning a node using 'remove-brick'Amar Tumballi2011-09-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to achieve this, we now create volume-file with 'decommissioned-nodes' option in distribute volume, then just perform the rebalance set of operations (with 'force' flag set). now onwards, the 'remove-brick' (with 'start' option) operation tries to migrate data from removed bricks to existing bricks. 'remove-brick' also supports similar options as of replace-brick. * (no options) -> works as 'force', will have the current behavior of remove-brick, ie., no data-migration, volume changes. * start (starts remove-brick with data-migration/draining process, which takes care of migrating data and once complete, will commit the changes to volume file) * pause (stop data migration, but keep the volume file intact with extra options whatever is set) * abort (stop data-migration, and fall back to old configuration) * commit (if volume is stopped, commits the changes to volumefile) * force (stops the data-migration and commits the changes to volume file) Change-Id: I3952bcfbe604a0952e68b6accace7014d5e401d3 BUG: 1952 Reviewed-on: http://review.gluster.com/118 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* distribute rebalance: handle the open file migrationAmar Tumballi2011-09-121-6/+410
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Complexity involved: To migrate a file with open fd, we have to notify the other client process which has the open fd, and make sure the write()s happening on that fd is properly synced to the migrated file. Once the migration is complete, the client process which has open-fd should get notified and it should start performing all the operations on the new subvolume, instead of earlier cached volume. How to solve the notification part: We can overload the 'postbuf' attribute in the _cbk() function to understand if a file is 'under-migration' or 'migration-complete' state. (This will be something similar to deciding whether a file is DHT-linkfile by its 'mode'). Overall change includes below mentioned major changes: 1. dht_linkfile is decided by only 2 factors (mode(01000), xattr(trusted.glusterfs.dht.linkto)), instead of earlier 3 factors (size==0) 2. in linkfile self-heal part (in 'dht_lookup_everywhere_cbk()'), don't delete a linkfile if there is a open-fd on it. It means, there may be a migration in progress. 3. if a file's revalidate fails with ENOENT, it may be due to file migration, and hence need a lookup_everywhere() 4. There will be 2 phases of file-migration. -> Phase 1: Migration in progress * The source data file will have SGID and STICKY bit set in its mode. * The source data file will have a 'linkto' xattr pointing the destination. * Destination file will have mode set to '01000', and 'linkto' xattr set to itself. -> Phase 2: File migration Complete * The source data file will have mode '01000', and will be 'truncated' to size 0. * The destination file will have inherited mode from the source. (without sgid and sticky bit) and its 'linkto' attribute will be removed. 4. Changes in distribute to work smoothly with a file which is in migration / got migrated. The 'fops' are divided into 3 categories, inode-read, inode-write and others. inode-read fops need to handle only 'phase 2' notification, where as, the inode-write fops need to handle both 'phase 1' and phase2. The inode-write operations will be done on source file, and if any of 'file-migration' procedures are detected in _cbk(), then the operations should be performed on the destination too. when a phase-2 is detected, then the inode-ctx itself should be changed to represent a new layout. With these changes, the open file migration will work smoothly with multiple clients. Change-Id: I512408463814e650f34c62ed009bf2101d016fd6 BUG: 3071 Reviewed-on: http://review.gluster.com/209 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Change Copyright current yearPranith Kumar K2011-08-101-1/+1
| | | | | | | | Change-Id: I2d10f2be44f518f496427f257988f1858e888084 BUG: 3348 Reviewed-on: http://review.gluster.com/200 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* LICENSE: s/GNU Affero General Public/GNU General Public/Pranith Kumar K2011-08-061-3/+3
| | | | | | | | Change-Id: I3914467611e573cccee0d22df93920cf1b2eb79f BUG: 3348 Reviewed-on: http://review.gluster.com/182 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* core: fill 'ia_ino' from 'ia_gfid' in 'storage/posix' to preserve same ino ↵Amar Tumballi2011-06-161-2/+1
| | | | | | | | | | | | | | | number take the least significant 64bit from gfid and assign it to 'ia_ino', hence for a given file (or directory), the 'ia_ino' number is always same, and we need not worry about the 'itransform' in 'cluster/*' translators. Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 3042 (inode number should be constant on storage) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3042
* DHT:first_up_subvol should be the one up the longestshishir gowda2011-05-311-3/+9
| | | | | | | | Signed-off-by: shishir gowda <shishirng@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 2684 (Dir missing from mount point) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2684
* cluster/dht: log enhancementsAmar Tumballi2011-03-171-5/+0
| | | | | | | | | Signed-off-by: Shishir Gowda <shishirng@gluster.com> Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 2346 (Log message enhancements in GlusterFS - phase 1) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
* cluster/dht: whitespace cleanupAmar Tumballi2011-03-171-150/+150
| | | | | | | | | | also fill tabs by spaces (untabify), and indent the code Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 2346 (Log message enhancements in GlusterFS - phase 1) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
* cluster/dht: Perform self-heal as rootPranith K2011-02-081-31/+0
| | | | | | | | Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 2370 (cluster/afr: Perform self-heal as root) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2370
* Fix DHT getxattr for directoriesshishir gowda2010-11-031-0/+4
| | | | | | | | | | | | When a heal on the directory or layout changes, the user xattrs do not get healed in dht. The current fix sends the getxattr call n all the subvolumes, aggregates it and sends the response Signed-off-by: shishir gowda <shishirng@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 1991 (distribute directory self-heal does not copy user extended attributes) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1991
* Copyright changesVijay Bellur2010-10-111-1/+1
| | | | | | | | Signed-off-by: Vijay Bellur <vijay@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 971 (dynamic volume management) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
* distribute: check for 'conf' before dereferencing itAmar Tumballi2010-10-051-1/+18
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1806 () URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1806
* Change GNU GPL to GNU AGPLPranith K2010-10-041-3/+3
| | | | | | | | Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1388 () URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1388
* remove 'gen' from iatt/protocol structuresAmar Tumballi2010-09-141-1/+0
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 971 (dynamic volume management) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
* gfid: changes in distribute to handle uuids in iatt structureAnand Avati2010-09-041-0/+2
| | | | | | | | | | Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com> Signed-off-by: Anand V. Avati <avati@amp.gluster.com> Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 971 (dynamic volume management) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
* gfid: change in create() prototype to have params dictionary with uuid in itAnand Avati2010-09-041-0/+5
| | | | | | | | | Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com> Signed-off-by: Anand V. Avati <avati@amp.gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 971 (dynamic volume management) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
* dht enhancementsAmar Tumballi2010-07-271-0/+63
| | | | | | | | | | | | | | * create(filename@<distribute-volume-name>-<subvol-name>), will create the file in corresponding subvol of dht. * same for unlink() same logic can be extended to other fops if there is a need Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 1200 (enhance distribute with hooks so lot of debugging can be done online) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1200
* Memory accounting changesVijay Bellur2010-04-231-4/+5
| | | | | | | | | | | Memory accounting Changes. Thanks to Vinayak Hegde and Csaba Henk for their contributions. Signed-off-by: Vijay Bellur <vijay@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 329 (Replacing memory allocation functions with mem-type functions) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=329
* iatt: changes across the codebaseAnand V. Avati2010-03-161-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - libglusterfs -- call-stub -- inode -- protocol - libglusterfsclient - cluster/replicate - cluster/{dht,nufa,switch} - cluster/unify - cluster/HA - cluster/map - cluster/stripe - debug/error-gen - debug/trace - debug/io-stats - encryption/rot-13 - features/filter - features/locks - features/path-converter - features/quota - features/trash - mount/fuse - performance/io-threads - performance/io-cache - performance/quick-read - performance/read-ahead - performance/stat-prefetch - performance/symlink-cache - performance/write-behind - protocol/client - protocol/server - storage-posix Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 361 (GlusterFS 3.0 should work on Mac OS/X) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=361
* distribute: perform self-heal as rootShehjar Tikoo2010-03-041-0/+30
| | | | | | | | | | | prerserve original frame uid and gid and perform self-heal by changing uid/gid to root (0) temporarily. Signed-off-by: Shehjar Tikoo <shehjart@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 597 (miscellaneous fixes for xlators to work well with NFS xlator) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=597
* distribute: Respect end-of-dir on readdir only for last subvolShehjar Tikoo2010-03-041-0/+21
| | | | | | | | | | | | | | | | We need to ensure that only the last subvolume's end-of-directory notification is respected so that directory reading does not stop before all subvolumes have been read. That could happen because the posix for each subvolume sends a ENOENT on end-of-directory but in distribute we're not concerned only with a posix's view of the directory but the aggregated namespace' view of the directory, which is maintained by distribute. Signed-off-by: Shehjar Tikoo <shehjart@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 597 (miscellaneous fixes for xlators to work well with NFS xlator) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=597
* dht: unlink stale linkfiles in rmdir to prevent ENOTEMPTYAnand Avati2010-02-221-3/+43
| | | | | | | | | | | | Thanks to He Xaobing <allreol@gmail.com>, this patch is inspired by http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=188#c2 Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 188 ([ glusterfs 2.0.6rc2 ] - "Directory not empty" on rm -rf) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=188
* cluster/dht: Check for pointers before de-referencing in dht_stat_merge()Vijay Bellur2009-12-161-0/+3
| | | | | | | | Signed-off-by: Vijay Bellur <vijay@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 463 (Crash in dht_stat_merge ()) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=463
* distribute,nufa: layout handling changesAnand V. Avati2009-10-161-3/+18
| | | | | | | | | | changes to make revalidate not fail but instead perform fresh lookup and swap inode context (layout) safely Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 315 (generation number support) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=315
* Changed occurrences of Z Research to Gluster.Vijay Bellur2009-10-071-1/+1
| | | | Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
* dht_stat_merge - use the highest uid when ambiguousAnand Avati2009-08-041-2/+3
| | | | | | | | | When directories on different subvolumes have different ownerships, use the highest uid/gid till self-heal resolves the inconsistency Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 191 (random Permission denied errors) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=191
* rename dht_first_up_child to dht_first_up_subvolAnand V. Avati2009-06-261-2/+2
| | | | Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
* log message cleanup in distributeAnand V. Avati2009-04-241-2/+2
| | | | Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
* updated copyright header to extend copyright upto 2009Basavanagowda Kanur2009-02-261-1/+1
| | | | | | updated copyright header to include 2009. Signed-off-by: Anand V. Avati <avati@amp.gluster.com>
* Added all filesVikas Gorur2009-02-181-0/+326