summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht/src/dht-selfheal.c
Commit message (Collapse)AuthorAgeFilesLines
* cluster/dht: assign layout onto missing directories tooAnand Avati2013-09-101-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | The current self-healing algorithm is ignoring missing directories for assigning new layout. When lookup() is racing against mkdir() or when self-healing a half-done mkdir(), the layout assignment split must happen based on the final number of directories, and not the currently existing number of directories (because we finish mkdir() of missing directories before hash layout assignment). Without this fix, concurrent mkdir() and lookup() will step on each others feet, create a messed up layout on disk, and end up with different in-memory layouts. Once two clients have different in-memory layouts, creation of subdirectory will not arbitrate on the same hashed subvolume and will result in GFID mismatch of the sub-directory. Change-Id: Ia47acad67c265060405984c822b4d37512b9dbb3 BUG: 907072 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5871 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/distribute: Ignore non-participating subvols for layout checksshishir gowda2013-04-111-3/+65
| | | | | | | | | | | | | | | | | | | | | | | | | Backporting fix http://review.gluster.org/#/c/4668/ When subvols-per-directory is < available subvols, then there are layouts which are not populated. This leads to incorrect identification of holes or overlaps. We need to ignore layouts, which have err == 0, and start == stop. In the current scenario (start == stop == 0). Additionally, in layout-merge, treat missing xattrs as err = 0. In case of missing layouts, anomalies will reset them. For any other valid subvoles, err != 0 in case of layouts being zeroed out. Also reverted back dht_selfheal_dir_xattr, which does layout calculation only on subvols which have errors. BUG: 921408 Change-Id: I75a8edcb92af5b53b3253c9addd7a812e9242836 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4800 Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/distribute: Fix layout overlaps due to spread-count in selfheal pathshishir gowda2013-03-091-50/+12
| | | | | | | | | | | | | | | We needed to zero out the layout range, before we re-calculate the range. When spread-count is issued, we would end up with stale ranges in the layout. Replaced dht_selfheal_dir_xattr with dht_fix_dir_xattr, which correctly resets the un-used (after re-cal) layouts. Change-Id: I1a900d15df07335f59356bd23182ccec34381ab2 BUG: 884455 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4648 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: better layout-optimization algorithmJeff Darcy2013-02-071-22/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This method deals with the case where swapping might gain a bigger overlap for the xlator currently under consideration, but sacrifices even more from the xlator we're swapping with. For example: A = 0x00000000 - 0x44444443 (new 0x00000000 - 0x55555554) B = 0x44444444 - 0x77777776 (new 0x55555555 - 0xaaaaaaa9) C = 0x77777777 - 0xffffffff (new 0xaaaaaaaa - 0xffffffff) Here, the new range for B has a bigger overlap with the old C than with the old B (0x33333333 vs. 0x22222222 to be precise) so looking only at that might lead us to swap. However, such a swap turns the new C's overlap from 0x55555556 (vs. old C) to *zero* (vs. old B). In other words, we've gained 0x11111111 for B but lost 0x55555556 for C, so it's a bad idea. The new algorithm accounts for all effects of the swap, so it not only avoids bad swaps but can make some good ones that would have been missed previously. For example, if swapping a range X with a later range Y would not increase the overlap for X we would previously have skipped it even if the swap would increase Y's overlap without affecting X's. This is the normal case when we're adding a new brick (which initially has zero overlap with any old range) so finding more good swaps is probably even more important than avoiding bad ones. Also, the logic in dht_overlap_calc was completely broken before, causing integer overflows instead of providing correct values, so no matter what higher-level algorithm was in place the GIGO effect would have resulted in bad decisions. Change-Id: If61ed513cfcb931916c6b51da293e3efbaaf385f BUG: 853258 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/3908 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: get_layout should account only available subvolsv3.4.0qa7shishir gowda2013-01-231-4/+3
| | | | | | | | | | | | | | | | The earlier logic used to check if (layout-spread-count <= subvol_cnt - decommissioned bricks). With this if a subvol was down, and layout-spread was > upsubvols, a mkdir ended up creating holes in the layout. The fix is to consider only the combination of subvols which are usable (not down or not decommissioned). Change-Id: I61ad3bcaf4589f5a75f7887cfa595c98311ae3bb BUG: 902610 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4412 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/distribute: re-set layouts to prevent overlapsshishir gowda2012-12-111-8/+38
| | | | | | | | | | | | | | | | | When subvols-per-directory option is used, with bricks addition/removal the layouts might get distributed to other subvols, which were not part of the layout before. We need to clean up layouts on old subvolumes, to prevent overlaps. Also, we need to make sure if layout-cnt is never less than subvolume-cnt. Change-Id: I00994a092ca0c99aedcc41bd9412d43460f88a04 BUG: 884455 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4281 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: fail fix-layout if any of the subvol is downshishir gowda2012-11-291-23/+20
| | | | | | | | | | | | | | | | If any subvolume is down, and a layout is re-written and hash values change, entry names in the downed subvol can be reused in the other subvol which got the same hash range. when the downed subvol is brought back up, duplicate entried might appear Also separated handling of ENOSPC and ENOTCONN error. Change-Id: I5ed93990425a4cee70df2dab7c7c119fdc87ad56 BUG: 860663 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4000 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Heal dir uid/gidshishir gowda2012-11-291-0/+47
| | | | | | | | | | | | Identify mismatching uid/gid in lookup, and trigger a syncop heal. uid/gid of subvol with latest ctime is trusted (local->prebuf). Change-Id: Ib5c4bc438e7f4b1f33080e73593f40f400e997f0 BUG: 862967 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/3964 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: improve dht_fix_layout_of_directory for better re-assignmentAnand Avati2012-09-121-145/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jeff Darcy wrote: > AFAICT, the fix-layout code doesn't do the same rotation that the > new-directory code does. Therefore, the new bricks always claim > completely predictable hash ranges for every directory, leading to > either a 0-1-2-3 pattern or a 1-0-2-3 pattern. In other words, a > file whose hash falls into the second quarter of the range will always > be assigned to brick 2, and a file whose hash falls into the fourth > quarter will always be assigned to brick 3. The rest will be split > according to the original pattern. Put still another way, instead of > same-named files in different directories being spread across N bricks, > they might be spread across only two bricks (bad) or totally > concentrated on one brick (worse) regardless of N. The current dht_fix_layout_of_directory() code, in an attempt to maximize overlap of new layout with existing layout (to minimize movement of data) fails to do a good job of randomizing new assignment even when it could do a better job. In an example where we expand from 2 nodes to 4 nodes, the current possibilities are limited in the following way - (theoretical hash range: 00 - 99) OLD 1 ----- server1: 00 - 49 server2: 50 - 99 NEW 1 ----- server1: 00 - 24 server2: 50 - 74 server3: 25 - 49 server4: 75 - 99 OLD 2 ----- server1: 50 - 99 server2: 00 - 49 NEW 2 ------ server1: 50 - 74 server2: 00 - 24 server3: 25 - 49 server4: 75 - 99 The above shows that when add-brick from 2 bricks to 4 bricks, server3 and server4 always get the _same_ hash range no matter what the original hash range assignment was. The fix in this patch is first do the standard new directory assignment to a directory (with rotation etc.) and then do the reassignment to maximize overlap. This way newly added servers still get random ranges and existing servers have a probability of getting either of the quarters which were part of its half previously. The same principles hold for all add-brick from M to M+N. Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f BUG: 853258 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/3883 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* remove useless if-before-free (and free-like) functionsJim Meyering2012-07-131-4/+2
| | | | | | | | | | | | See comments in http://bugzilla.redhat.com/839925 for the code to perform this change. Signed-off-by: Jim Meyering <meyering@redhat.com> BUG: 839925 Change-Id: I10e4ecff16c3749fe17c2831c516737e08a3205a Reviewed-on: http://review.gluster.com/3661 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* distribute: support user-specified layouts.Jeff Darcy2012-05-311-1/+7
| | | | | | | | | | | | | The new type is DHT_HASH_TYPE_DM_USER=1 (on disk in network byte order) and we treat it the same as DHT_HASH_TYPE_DM except that we don't stomp on it during rebalance. Change-Id: I893571a9b89577acdea2fe868915b18d3663fd77 BUG: 807312 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.com/3004 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* license: dual license under GPLV2 and LGPLV3+Kaleb KEITHLEY2012-05-101-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the license was not changed in any of the following: .../argp-standalone/... .../booster/... .../cli/... .../contrib/... .../extras/... .../glusterfsd/... .../glusterfs-hadoop/... .../mod_clusterfs/... .../scheduler/... .../swift/... The license was not changed in any of the non-building xlators. The license was not changed in any of the xlators that seemed — to me — to be clearly server-side only, e.g. protocol/server Note too that copyright was changed along with the license; I did not change the copyright in files where the license did not change. If you find any errors or ommissions please don't hesitate to let me know. The complete list of files with the license change is: libglusterfs/src/byte-order.h libglusterfs/src/call-stub.c libglusterfs/src/call-stub.h libglusterfs/src/checksum.c libglusterfs/src/checksum.h libglusterfs/src/circ-buff.c libglusterfs/src/circ-buff.h libglusterfs/src/common-utils.c libglusterfs/src/common-utils.h libglusterfs/src/compat-errno.c libglusterfs/src/compat-errno.h libglusterfs/src/compat.c libglusterfs/src/compat.h libglusterfs/src/daemon.c libglusterfs/src/daemon.h libglusterfs/src/defaults.c libglusterfs/src/defaults.h libglusterfs/src/dict.c libglusterfs/src/dict.h libglusterfs/src/event-history.c libglusterfs/src/event-history.h libglusterfs/src/event.c libglusterfs/src/event.h libglusterfs/src/fd-lk.c libglusterfs/src/fd-lk.h libglusterfs/src/fd.c libglusterfs/src/fd.h libglusterfs/src/gf-dirent.c libglusterfs/src/gf-dirent.h libglusterfs/src/globals.c libglusterfs/src/globals.h libglusterfs/src/glusterfs.h libglusterfs/src/graph-print.c libglusterfs/src/graph-utils.h libglusterfs/src/graph.c libglusterfs/src/hashfn.c libglusterfs/src/hashfn.h libglusterfs/src/iatt.h libglusterfs/src/inode.c libglusterfs/src/inode.h libglusterfs/src/iobuf.c libglusterfs/src/iobuf.h libglusterfs/src/latency.c libglusterfs/src/latency.h libglusterfs/src/list.h libglusterfs/src/lkowner.h libglusterfs/src/locking.h libglusterfs/src/logging.c libglusterfs/src/logging.h libglusterfs/src/mem-pool.c libglusterfs/src/mem-pool.h libglusterfs/src/mem-types.h libglusterfs/src/options.c libglusterfs/src/options.h libglusterfs/src/rbthash.c libglusterfs/src/rbthash.h libglusterfs/src/run.c libglusterfs/src/run.h libglusterfs/src/scheduler.c libglusterfs/src/scheduler.h libglusterfs/src/stack.c libglusterfs/src/stack.h libglusterfs/src/statedump.c libglusterfs/src/statedump.h libglusterfs/src/syncop.c libglusterfs/src/syncop.h libglusterfs/src/syscall.c libglusterfs/src/syscall.h libglusterfs/src/timer.c libglusterfs/src/timer.h libglusterfs/src/trie.c libglusterfs/src/trie.h libglusterfs/src/xlator.c libglusterfs/src/xlator.h libglusterfsclient/src/libglusterfsclient-dentry.c libglusterfsclient/src/libglusterfsclient-internals.h libglusterfsclient/src/libglusterfsclient.c libglusterfsclient/src/libglusterfsclient.h rpc/rpc-lib/src/auth-glusterfs.c rpc/rpc-lib/src/auth-null.c rpc/rpc-lib/src/auth-unix.c rpc/rpc-lib/src/protocol-common.h rpc/rpc-lib/src/rpc-clnt.c rpc/rpc-lib/src/rpc-clnt.h rpc/rpc-lib/src/rpc-transport.c rpc/rpc-lib/src/rpc-transport.h rpc/rpc-lib/src/rpcsvc-auth.c rpc/rpc-lib/src/rpcsvc-common.h rpc/rpc-lib/src/rpcsvc.c rpc/rpc-lib/src/rpcsvc.h rpc/rpc-lib/src/xdr-common.h rpc/rpc-lib/src/xdr-rpc.c rpc/rpc-lib/src/xdr-rpc.h rpc/rpc-lib/src/xdr-rpcclnt.c rpc/rpc-lib/src/xdr-rpcclnt.h rpc/rpc-transport/rdma/src/name.c rpc/rpc-transport/rdma/src/name.h rpc/rpc-transport/rdma/src/rdma.c rpc/rpc-transport/rdma/src/rdma.h rpc/rpc-transport/socket/src/name.c rpc/rpc-transport/socket/src/name.h rpc/rpc-transport/socket/src/socket.c rpc/rpc-transport/socket/src/socket.h xlators/cluster/afr/src/afr-common.c xlators/cluster/afr/src/afr-dir-read.c xlators/cluster/afr/src/afr-dir-read.h xlators/cluster/afr/src/afr-dir-write.c xlators/cluster/afr/src/afr-dir-write.h xlators/cluster/afr/src/afr-inode-read.c xlators/cluster/afr/src/afr-inode-read.h xlators/cluster/afr/src/afr-inode-write.c xlators/cluster/afr/src/afr-inode-write.h xlators/cluster/afr/src/afr-lk-common.c xlators/cluster/afr/src/afr-mem-types.h xlators/cluster/afr/src/afr-open.c xlators/cluster/afr/src/afr-self-heal-algorithm.c xlators/cluster/afr/src/afr-self-heal-algorithm.h xlators/cluster/afr/src/afr-self-heal-common.c xlators/cluster/afr/src/afr-self-heal-common.h xlators/cluster/afr/src/afr-self-heal-data.c xlators/cluster/afr/src/afr-self-heal-entry.c xlators/cluster/afr/src/afr-self-heal-metadata.c xlators/cluster/afr/src/afr-self-heal.h xlators/cluster/afr/src/afr-self-heald.c xlators/cluster/afr/src/afr-self-heald.h xlators/cluster/afr/src/afr-transaction.c xlators/cluster/afr/src/afr-transaction.h xlators/cluster/afr/src/afr.c xlators/cluster/afr/src/afr.h xlators/cluster/afr/src/pump.c xlators/cluster/afr/src/pump.h xlators/cluster/dht/src/dht-common.c xlators/cluster/dht/src/dht-common.h xlators/cluster/dht/src/dht-diskusage.c xlators/cluster/dht/src/dht-hashfn.c xlators/cluster/dht/src/dht-helper.c xlators/cluster/dht/src/dht-inode-read.c xlators/cluster/dht/src/dht-inode-write.c xlators/cluster/dht/src/dht-layout.c xlators/cluster/dht/src/dht-linkfile.c xlators/cluster/dht/src/dht-mem-types.h xlators/cluster/dht/src/dht-rebalance.c xlators/cluster/dht/src/dht-rename.c xlators/cluster/dht/src/dht-selfheal.c xlators/cluster/dht/src/dht.c xlators/cluster/dht/src/nufa.c xlators/cluster/dht/src/switch.c xlators/cluster/stripe/src/stripe-helpers.c xlators/cluster/stripe/src/stripe-mem-types.h xlators/cluster/stripe/src/stripe.c xlators/cluster/stripe/src/stripe.h xlators/features/index/src/index-mem-types.h ¹ xlators/features/index/src/index.c ¹ xlators/features/index/src/index.h ¹ xlators/performance/io-cache/src/io-cache.c xlators/performance/io-cache/src/io-cache.h xlators/performance/io-cache/src/ioc-inode.c xlators/performance/io-cache/src/ioc-mem-types.h xlators/performance/io-cache/src/page.c xlators/performance/io-threads/src/io-threads.c xlators/performance/io-threads/src/io-threads.h xlators/performance/io-threads/src/iot-mem-types.h xlators/performance/md-cache/src/md-cache-mem-types.h xlators/performance/md-cache/src/md-cache.c xlators/performance/quick-read/src/quick-read-mem-types.h xlators/performance/quick-read/src/quick-read.c xlators/performance/quick-read/src/quick-read.h xlators/performance/read-ahead/src/page.c xlators/performance/read-ahead/src/read-ahead-mem-types.h xlators/performance/read-ahead/src/read-ahead.c xlators/performance/read-ahead/src/read-ahead.h xlators/performance/symlink-cache/src/symlink-cache.c xlators/performance/write-behind/src/write-behind-mem-types.h xlators/performance/write-behind/src/write-behind.c xlators/protocol/auth/addr/src/addr.c ¹ xlators/protocol/auth/login/src/login.c ¹ xlators/protocol/client/src/client-callback.c xlators/protocol/client/src/client-handshake.c xlators/protocol/client/src/client-helpers.c xlators/protocol/client/src/client-lk.c xlators/protocol/client/src/client-mem-types.h xlators/protocol/client/src/client.c xlators/protocol/client/src/client.h xlators/protocol/client/src/client3_1-fops.c ¹ Copyright only, license reverted to original Change-Id: If560e826c61b6b26f8b9af7bed6e4bcbaeba31a8 BUG: 820551 Signed-off-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.com/3304 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* dht: Log message fix gfid's to gfidsshylesh2012-04-241-1/+1
| | | | | | | | | Change-Id: I9ad67b6ded05992d1b56ca843f685c4fe2b15b71 BUG: 815186 Signed-off-by: shylesh <shmohan@redhat.com> Reviewed-on: http://review.gluster.com/3217 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* core: coverity issues fixedAmar Tumballi2012-04-231-0/+3
| | | | | | | | | | | | this is not a complete set of issues getting fixed. Will address other issues in another patch. Change-Id: Ib01c7b11b205078cc4d0b3f11610751e32d14b69 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 789278 Reviewed-on: http://review.gluster.com/3145 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* cluster/dht : Acl fix for distribute directory selfhealKaushal M2012-03-261-2/+43
| | | | | | | | | | | | Send acl xattrs, if present in the xattrs returned during lookup, during directory self-heal. Change-Id: I5337bbd3f3963aeed500a8a552e5f6713089b53e BUG: 764787 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.com/737 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: adding extra data for fopsAmar Tumballi2012-03-221-8/+9
| | | | | | | | | | | | | with this change, the xlator APIs will have a dictionary as extra argument, which is passed between all the layers. This can be utilized for overloading in some of the operations. Change-Id: I58a8186b3ef647650280e63f3e5e9b9de7827b40 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 782265 Reviewed-on: http://review.gluster.com/2960 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/dht: Prevent crash in dir xattr selfhealshishir gowda2012-03-191-0/+5
| | | | | | | | | Change-Id: I2ca4ab2f8e8611e7b2ac9ed2edc2e304727259dc BUG: 804280 Signed-off-by: shishir gowda <shishirng@gluster.com> Reviewed-on: http://review.gluster.com/2970 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/distribute: fix log level of few messagesAmar Tumballi2012-03-071-4/+4
| | | | | | | | | Change-Id: Ibdeeb705e75a94bb96a1ae259be32ddd2ca5fec8 Signed-off-by: Amar Tumballi <amarts@redhat.com> BUG: 797715 Reviewed-on: http://review.gluster.com/2885 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* core: GFID filehandle based backend and anonymous FDsAnand Avati2012-01-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. What -------- This change introduces an infrastructure change in the filesystem which lets filesystem operation address objects (inodes) just by its GFID. Thus far GFID has been a unique identifier of a user-visible inode. But in terms of addressability the only mechanism thus far has been the backend filesystem path, which could be derived from the GFID only if it was cached in the inode table along with the entire set of dentry ancestry leading up to the root. This change essentially decouples addressability from the namespace. It is no more necessary to be aware of the parent directory to address a file or directory. 2. Why ------- The biggest use case for such a feature is NFS for generating persistent filehandles. So far the technique for generating filehandles in NFS has been to encode path components so that the appropriate inode_t can be repopulated into the inode table by means of a recursive lookup of each component top-down. Another use case is the ability to perform more intelligent self-healing and rebalancing of inodes with hardlinks and also to detect renames. A derived feature from GFID filehandles is anonymous FDs. An anonymous FD is an internal USABLE "fd_t" which does not map to a user opened file descriptor or to an internal ->open()'d fd. The ability to address a file by the GFID eliminates the need to have a persistent ->open()'d fd for the purpose of avoiding the namespace. This improves NFS read/write performance significantly eliminating open/close calls and also fixes some of today's limitations (like keeping an FD open longer than necessary resulting in disk space leakage) 3. How ------- At each storage/posix translator level, every file is hardlinked inside a hidden .glusterfs directory (under the top level export) with the name as the ascii-encoded standard UUID format string. For reasons of performance and scalability there is a two-tier classification of those hardlinks under directories with the initial parts of the UUID string as the directory names. For directories (which cannot be hardlinked), the approach is to use a symlink which dereferences the parent GFID path along with basename of the directory. The parent GFID dereference will in turn be a dereference of the grandparent with the parent's basename, and so on recursively up to the root export. 4. Development --------------- 4a. To leverage the ability to address an inode by its GFID, the technique is to perform a "nameless lookup". This means, to populate a loc_t structure as: loc_t { pargfid: NULL parent: NULL name: NULL path: NULL gfid: GFID to be looked up [out parameter] inode: inode_new () result [in parameter] } and performing such lookup will return in its callback an inode_t populated with the right contexts and a struct iatt which can be used to perform an inode_link () on the inode (without a parent and basename). The inode will now be hashed and linked in the inode table and findable via inode_find(). A fundamental change moving forward is that the primary fields in a loc_t structure are now going to be (pargfid, name) and (gfid) depending on the kind of FOP. So far path had been the primary field for operations. The remaining fields only serve as hints/helpers. 4b. If read/write is to be performed on an inode_t, the approach so far has been to: fd_create(), STACK_WIND(open, fd), fd_bind (in callback) and then perform STACK_WIND(read, fd) etc. With anonymous fds now you can do fd_anonymous (inode), STACK_WIND (read, fd). This results in great boost in performance in the inbuilt NFS server. 5. Misc ------- The inode_ctx_put[2] has been renamed to inode_ctx_set[2] to be consistent with the rest of the codebase. Change-Id: Ie4629edf6bd32a595f4d7f01e90c0a01f16fb12f BUG: 781318 Reviewed-on: http://review.gluster.com/669 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* glusterfs: An effort to fix all the spell mistakes and typoHarshavardhana2011-11-161-1/+1
| | | | | | | | | | | | | | | in the entire glusterfs codebase. This patch fixes many of spell mistakes and typo in the entire glusterfs codebase and all supported modules. Change-Id: I83238a41aa08118df3cf4d1d605505dd3cda35a1 BUG: 3809 Signed-off-by: Harshavardhana <fharshav@redhat.com> Reviewed-on: http://review.gluster.com/731 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amar@gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* support for de-commissioning a node using 'remove-brick'Amar Tumballi2011-09-131-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to achieve this, we now create volume-file with 'decommissioned-nodes' option in distribute volume, then just perform the rebalance set of operations (with 'force' flag set). now onwards, the 'remove-brick' (with 'start' option) operation tries to migrate data from removed bricks to existing bricks. 'remove-brick' also supports similar options as of replace-brick. * (no options) -> works as 'force', will have the current behavior of remove-brick, ie., no data-migration, volume changes. * start (starts remove-brick with data-migration/draining process, which takes care of migrating data and once complete, will commit the changes to volume file) * pause (stop data migration, but keep the volume file intact with extra options whatever is set) * abort (stop data-migration, and fall back to old configuration) * commit (if volume is stopped, commits the changes to volumefile) * force (stops the data-migration and commits the changes to volume file) Change-Id: I3952bcfbe604a0952e68b6accace7014d5e401d3 BUG: 1952 Reviewed-on: http://review.gluster.com/118 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vijay@gluster.com>
* Eliminate many "var set but not used" warnings with newer gcc.Jeff Darcy2011-09-071-2/+0
| | | | | | | | | | | | | | | | This fixes ~200 such warnings, but leaves three categories untouched. (1) Rpcgen code. (2) Macros which set variables in the outer (calling function) scope. (3) Variables which are set via function calls which may have side effects. Change-Id: I6554555f78ed26134251504b038da7e94adacbcd BUG: 2550 Reviewed-on: http://review.gluster.com/371 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* Change Copyright current yearPranith Kumar K2011-08-101-1/+1
| | | | | | | | Change-Id: I2d10f2be44f518f496427f257988f1858e888084 BUG: 3348 Reviewed-on: http://review.gluster.com/200 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* LICENSE: s/GNU Affero General Public/GNU General Public/Pranith Kumar K2011-08-061-3/+3
| | | | | | | | Change-Id: I3914467611e573cccee0d22df93920cf1b2eb79f BUG: 3348 Reviewed-on: http://review.gluster.com/182 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@gluster.com>
* storage/posix: handle dictionary being NULL in a functionAmar Tumballi2011-08-031-1/+1
| | | | | | | | | | | also print a warning message if dictionary is NULL, while sending a mkdir request in distribute self-heal. Change-Id: Ib9cac6ed1635203802f089986f8acb1ce416265d BUG: 3215 Reviewed-on: http://review.gluster.com/136 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <shishirng@gluster.com>
* cluster/distribute: while fixing layout, consider ENOSPC errorsAmar Tumballi2011-07-261-4/+4
| | | | | | | | | | | | | in case of layout 'creation', layout->err == ENOSPC should be ignored where as, while layout 'fixing' we should consider what was already present in the layout. Change-Id: Ifb613b41065813c9f1202e65e94b4b0282766d11 BUG: 2258 Signed-off-by: Amar Tumballi <amar@gluster.com> Reviewed-on: http://review.gluster.com/15 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shishir Gowda <shishirng@gluster.com>
* cluster/distribute: bring in directory-spread-count optionAmar Tumballi2011-07-141-0/+3
| | | | | | | | | | global spread count option is given through volume file Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 2257 (enhance distribute to control the spread count (ie, control the hashing range)) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2257
* cluster/distribute: handle layout overlaps while giving a new fixAmar Tumballi2011-07-141-37/+282
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 2258 (enhance gluster volume rebalance) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2258
* core: fill 'ia_ino' from 'ia_gfid' in 'storage/posix' to preserve same ino ↵Amar Tumballi2011-06-161-3/+0
| | | | | | | | | | | | | | | number take the least significant 64bit from gfid and assign it to 'ia_ino', hence for a given file (or directory), the 'ia_ino' number is always same, and we need not worry about the 'itransform' in 'cluster/*' translators. Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand Avati <avati@gluster.com> BUG: 3042 (inode number should be constant on storage) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=3042
* cluster/dht: log enhancementsAmar Tumballi2011-03-171-16/+19
| | | | | | | | | Signed-off-by: Shishir Gowda <shishirng@gluster.com> Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 2346 (Log message enhancements in GlusterFS - phase 1) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
* cluster/dht: whitespace cleanupAmar Tumballi2011-03-171-365/+365
| | | | | | | | | | also fill tabs by spaces (untabify), and indent the code Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 2346 (Log message enhancements in GlusterFS - phase 1) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2346
* cluster/dht: restore attrs of dirs in self-healPranith K2011-02-071-3/+63
| | | | | | | | Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 2371 (dht: Set owners of directories after performing self-heal as root) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2371
* Revert "distribute: Return ESTALE when dir selfheal finds no fix"Anand V. Avati2011-02-041-6/+2
| | | | This reverts commit a4c948aca6058049523e31acf33ce5770f8693ad.
* Copyright changesVijay Bellur2010-10-111-1/+1
| | | | | | | | Signed-off-by: Vijay Bellur <vijay@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 971 (dynamic volume management) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
* Change GNU GPL to GNU AGPLPranith K2010-10-041-3/+3
| | | | | | | | Signed-off-by: Pranith Kumar K <pranithk@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1388 () URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1388
* distribute: don't update the inode's gfid directlyAmar Tumballi2010-09-231-7/+16
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1680 () URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1680
* distribute: while selfhealing directory, send proper gfid in dictAmar Tumballi2010-09-221-3/+16
| | | | | | | | | | | | | * this was the root cause for having layout mismatches in case of add-brick, because the gfid of directories on newly added brick was always mismatching, which caused many operation on that particular brick fail. Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1629 (files missing during add-brick) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1629
* distribute: Return ESTALE when dir selfheal finds no fixShehjar Tikoo2010-09-221-2/+6
| | | | | | | | Signed-off-by: Shehjar Tikoo <shehjart@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1641 (Distribute must return error when dir selfheal has no fix) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1641
* gfid: changes in mkdir() prototype to have params dictionary with uuid in itAnand Avati2010-09-041-1/+2
| | | | | | | | | Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com> Signed-off-by: Anand V. Avati <avati@amp.gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 971 (dynamic volume management) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=971
* cluster/dht: check for op_ret in dht_selfheal_dir_mkdir_cbk ()Vijay Bellur2010-09-021-8/+13
| | | | | | | | Signed-off-by: Vijay Bellur <vijay@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1503 (segfault in distribute during failover testing) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1503
* fixes in 'gluster volume rebalance'Amar Tumballi2010-08-311-1/+1
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Vijay Bellur <vijay@dev.gluster.com> BUG: 1481 () URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1481
* Remove dead variables.Sachidananda2010-08-021-18/+8
| | | | | | | | Signed-off-by: Sachidananda Urs <sac@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 1065 () URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1065
* NULL dereference fixes in code base after running with 'clang'Amar Tumballi2010-07-021-1/+1
| | | | | | | | | | | | * 212 logical (NULL deref/divide by zero) errors reduced to 28 (27 of them in contrib/ and lex part of codebase, 1 is invalid) * 11 API errors reduced to 0 Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 966 (NULL check for avoiding NULL dereferencing of pointers..) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=966
* dht: don't overwrite the layout after the subvolume expansionAmar Tumballi2010-04-291-1/+1
| | | | | | | | Signed-off-by: Amar Tumballi <amar@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 864 (dht self-heal overwrites layout causing 'missing' files.) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=864
* Memory accounting changesVijay Bellur2010-04-231-1/+1
| | | | | | | | | | | Memory accounting Changes. Thanks to Vinayak Hegde and Csaba Henk for their contributions. Signed-off-by: Vijay Bellur <vijay@gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 329 (Replacing memory allocation functions with mem-type functions) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=329
* iatt: changes across the codebaseAnand V. Avati2010-03-161-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - libglusterfs -- call-stub -- inode -- protocol - libglusterfsclient - cluster/replicate - cluster/{dht,nufa,switch} - cluster/unify - cluster/HA - cluster/map - cluster/stripe - debug/error-gen - debug/trace - debug/io-stats - encryption/rot-13 - features/filter - features/locks - features/path-converter - features/quota - features/trash - mount/fuse - performance/io-threads - performance/io-cache - performance/quick-read - performance/read-ahead - performance/stat-prefetch - performance/symlink-cache - performance/write-behind - protocol/client - protocol/server - storage-posix Signed-off-by: Anand V. Avati <avati@blackhole.gluster.com> Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 361 (GlusterFS 3.0 should work on Mac OS/X) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=361
* distribute,nufa: layout handling changesAnand V. Avati2009-10-161-3/+3
| | | | | | | | | | changes to make revalidate not fail but instead perform fresh lookup and swap inode context (layout) safely Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 315 (generation number support) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=315
* shuffle hash layouts on directoriesAnand Avati2009-10-151-2/+44
| | | | | | | | | | | allow for hash layouts to be written differently for different directories to give a better spread for same filenames across directories Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 324 (distribute does not spread files of the same name among all servers) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=324
* Changed occurrences of Z Research to Gluster.Vijay Bellur2009-10-071-1/+1
| | | | Signed-off-by: Anand V. Avati <avati@dev.gluster.com>
* distribute: NFS-friendly logic changesShehjar Tikoo2009-10-011-0/+7
| | | | | | | Signed-off-by: Anand V. Avati <avati@dev.gluster.com> BUG: 145 (NFSv3 related additions to 2.1 task list) URL: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=145