summaryrefslogtreecommitdiffstats
path: root/xlators/cluster/dht
Commit message (Collapse)AuthorAgeFilesLines
* tier/ctr: CTR DB named lookup heal of cold tier during attach tierJoseph Fernandes2015-10-101-2/+128
| | | | | | | | | | | | | | | | | | | | | Heal hardlink in the db for already existing data in the cold tier during attach tier. i.e during fix layout do lookup to files in the cold tier. CTR xlator on the brick/server side does db update/insert of the hardlink on a namelookup. Currently the namedlookup is done synchronous to the fixlayout that is triggered by attach tier. This is not performant, adding more time to fixlayout. The performant approach is record the hardlinks on a compressed datastore and then do the namelookup asynchronously later, giving the ctr db eventual consistency Change-Id: I4ffc337fffe7d447804786851a9183a51b5044a9 BUG: 1252586 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: add watermarks and policy driverDan Lambright2015-10-105-100/+456
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix introduces infrastructure to support different policies for promotion and demotion. Currently the tier feature automatically promotes and demotes files periodically based on access. This is good for testing but too stringent for most real workloads. It makes it difficult to fully utilize a hot tier- data will be demoted before it is touched- its unlikely a 100GB hot SSD will have all its data touched in a window of time. A new parameter "mode" allows the user to pick promotion/demotion polcies. The "test mode" will be used for *.t and other general testing. This is the current mechanism. The "cache mode" introduces watermarks. The watermarks represent levels of data residing on the hot tier. "cache mode" policy: The % the hot tier is full is called P. Do not promote or demote more than D MB or F files. A random number [0-100] is called R. Rules for migration: if (P < watermark_low) don't demote, always promote. if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P. if (P > watermark_hi) always demote, don't promote. gluster volume set {vol} cluster.watermark-hi % gluster volume set {vol} cluster.watermark-low % gluster volume set {vol} cluster.tier-max-mb {D} gluster volume set {vol} cluster.tier-max-files {F} gluster volume set {vol} cluster.tier-mode {test|cache} Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2 BUG: 1257911 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12039 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* tier/ctr: Solution for db locks for tier migrator and ctr using sqlite ↵Joseph Fernandes2015-10-083-16/+297
| | | | | | | | | | | | | | | | | | | | | | | | | | | version less than 3.7 i.e rhel 6.7 Problem: On RHEL 6.7, we have sqlite version 3.6.2 which doesnt support WAL journaling mode, as this journaling mode is only available in sqlite 3.7 and above. As a result we cannot have to progreses concurrently accessing sqlite, without running into db locks! Well WAL is also need for performace on CTR side. Solution: This solution is to use CTR db connection for doing queries when WAL mode is absent. i,e tier migrator will send sync_op ipc calls to CTR, which in turn will do the query and create/update the query file suggested by tier migrator. Pending: Well this solution will stop the db locks but the performance is still an issue for CTR. We are developing an in-Memory Transaction Log (iMeTaL) which will help boost the CTR performance by doing in memory udpates on the IO path and later flush the updates to the db in a batch/segment flush. Change-Id: Ie3149643ded159234b5cc6aa6cf93b9022c2f124 BUG: 1240577 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12191 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Luis Pabon <lpabon@redhat.com>
* dht/rebalance: fix layout and dict leaksSusant Palai2015-10-062-0/+11
| | | | | | | | | Change-Id: Ib3911dfa1f950ff9decbe249ad798e97226dd06d BUG: 1266877 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12295 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht/rebalance: fix mem-leak in migration code pathSusant Palai2015-10-011-5/+21
| | | | | | | | | Change-Id: I37faf983fc02996541f3d96a17cb2a2c2cdb6781 BUG: 1266877 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12235 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd, dht: volume set for use-readdirp in dhtPranith Kumar K2015-10-011-0/+3
| | | | | | | | | | | | Change-Id: Icab246b1d02808864d878d949fa56f9f889b538a BUG: 1265677 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12221 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* build: export minimum symbols from xlators for correct resolutionKaleb S. KEITHLEY2015-09-245-5/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been lucky that we haven't had any symbol collisions until now. Now we have a collision between the snapview-client's svc_lookup() and libntirpc's svc_lookup() with nfs-ganesha's FSAL_GLUSTER and libgfapi. As a short term solution all the snapview-client's FOP methods were changed to static scope. See http://review.gluster.org/11805. This works in snapview-client because all the FOP methods are defined in a single source file. This solution doesn't work for other xlators with FOP methods defined in multiple source files. To address this we link with libtool's '-export-symbols $symbol-file' (a wrapper around `ld --version-script ...` --- on linux anyway) and only export the minimum required symbols from the xlator sharedlib. N.B. the libtool man page says that the symbol file should be named foo.sym, thus the rename of *.exports to *.sym. While foo.exports worked, we will follow the documentation. Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 1248669 Change-Id: I1de68b3e3be58ae690d8bfb2168bfc019983627c Reviewed-on: http://review.gluster.org/11814 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/tier: Handle FOPs on files being migratedN Balachandran2015-09-226-87/+475
| | | | | | | | | | | | | | | | Determine which DHT level is responsible for handling fops on a file undergoing migration based on the name of the the linkto xattr set on the file being migrated and process accordingly. Change-Id: I82772e39314d4fe7f2ba0dcf22de0c6a374ee139 BUG: 1254428 Signed-off-by: N Balachandran <nbalacha@redhat.com> Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12090 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/dht: unlink fails after lookup in a directoryMohammed Rafi KC2015-09-171-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unlink fails with invalid argument for files that are being present on cold tier, before attaching. All of the fops will be hashed to hot_tier after attach-tier (unless explicitly set the "rule" option). Lookups sent to directory, will eventually search the directory using readdirp, and will populate inode_ctx for the inodes based on the output, in respective dht_xlators. So the readdirp will populate inodes_ctx for the files (that is already present in volume before attaching) in cold-dht only because it got the entries from the cold-tier. So when an unlink comes on such an inode, the lookup associated with the unlink will be send as a re validate request to cold-tier only, since already a lookup was performed on the inode, and the new lookup will succeed. So from the unlink of dht, it will hash to cold-tier but the cached_subvol will be cold, since there is a mismatch in hash and cach , it chose hashed subvolume and will sent the fop to hot dht, and the fops fail with EINVAL from the hot-dht since it does not have inode_ctx stored for that inode (because, no lookup was performed from hot-dht). Change-Id: Ib7c14a9297a22d615f7a890a060be4809b5a745a BUG: 1236032 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11675 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier do not flag migration error on already migrated fileDan Lambright2015-09-161-15/+13
| | | | | | | | | | | | | | In some cases a brick will try to migrate a file that has already been migrated. This is a legal case, e.g. when both bricks are replica pairs. Change-Id: If2578b947014cbbdfb3c6591db9044d6b1d92774 BUG: 1263726 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12185 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier: Fixed a crash in tieringNithya Balachandran2015-09-161-2/+2
| | | | | | | | | | | | | | | An incorrect check was causing the arguments to the promote thread to be cleared before the thread was done with them. This caused the process to crash when it tried to dereference a NULL pointer. Change-Id: I8348309ef4dad33b7f648c7a2c2703487e401269 BUG: 1263204 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12179 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Joseph Fernandes
* dht: reverting changes that takes lock on all subvols to prevent rmdir vs ↵Sakshi2015-09-145-331/+84
| | | | | | | | | | | | | | | lookup selfheal race Locking on all subvols before an rmdir is unable to remove all directory entries. Hence reverting the patch for now. Change-Id: I31baf2b2fa2f62c57429cd44f3f229c35eff1939 BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12125 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht : Propagate op_errno on failureNithya Balachandran2015-09-131-0/+1
| | | | | | | | | | | | | Fixed issue where dht_selfheal_layout_lock_cbk does not propagate the op_errno. Change-Id: I0b968339db65d2969e36e64407eeb724cc6516bd BUG: 1262438 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12165 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/cluster: Avoid crash if local is NULLSusant Palai2015-09-131-5/+17
| | | | | | | | | | | | | | This patch addresses crash handling if local is NULL. In addition to that, we were not unwinding if no lock is taken in dht_linkfile_create_cbk(create/mknod). This patch handles that also. Change-Id: Ibcff317f10d60e7865fd7ffb9479b3af53c9ef17 BUG: 1260051 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12160 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier fix bug with sql includes introduced by 12031Dan Lambright2015-09-112-3/+3
| | | | | | | | | | | | We accidentally introduced a bug where client translators have a dependency on sql. This broke freebsd smoke tests. Fix is to abstract from the client those dependencies. Change-Id: I7152573a489bacc8f32e6eb139f9ff4408288f5b BUG: 1260730 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12155 Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht/remove-brick: Avoid data loss for hard link migrationSusant Palai2015-09-091-6/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If the hashed subvol of a file has reached cluster.min-free-disk, for a create opertaion a linkto file will be created on the hashed and the data file will be created on some other brick. For creation of the linkfile we populate the dictionary with linkto key and value as the cached subvol. After successful linkto file creation, the linkto-key-value pair is not deleted form the dictionary and hence, the data file will also have linkto xattr which points to itself.This looks something like this. client-0 client-1 -------T file rwx------file linkto.xattr=client-1 linkto.xattr=client-1 Now coming to the data loss part. Hardlink migration highly depend on this linkto xattr on the data file. This value should be the new hashed subvol of the first hardlink encountered post fix-layout. But when it tries to read the linkto xattr it gets the same target as where it is sitting. Now the source and destination are same for migration. At the end of migration the source file is truncated and deleted, which in this case is the destination and also the only data file it self resulting in data loss. Change-Id: I36b1d105752bd9467757ecf3f103b45c666783d6 BUG: 1260051 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12105 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: NULL dereferencing causes crashMohammed Rafi KC2015-09-081-2/+2
| | | | | | | | | | | | | | If linkfile_create is failed for some reason, then we are trying to dereference a null variable Change-Id: I3c6ff3715821b9b993d1bab7b90167de2861e190 BUG: 1260147 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12106 Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/ctr: Solving DB Lock issue due to write contention from db connectionsJoseph Fernandes2015-09-083-44/+105
| | | | | | | | | | | | | | | | | | | | | | | Problem: The DB on the brick is been accessed by CTR, for write and tier migrator, for read and write. The write from tier migrator is reseting the heat counters after a cycle. Since we are using sqlite, two connections trying to write would cause a db lock contention. As a result CTR used to fail to update the db. Solution: Using the same db connection of CTR for reseting the heat counters. 1) Introducted a new IPC FOP for CTR 2) After the query do a ipc syncop to the underlying client xlator associated to the brick. 3) CTR in brick will catch the IPC FOP and cleat the heat counters. Change-Id: I53306bfc08dcdba479deb4ccc154896521336150 BUG: 1260730 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12031 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: avoid filling /var/run with tiering filesDan Lambright2015-09-021-4/+28
| | | | | | | | | | | | | | We failed to delete old promote/demote workfiles in /var/run. This fix removes the <pid> postfix so there will be only a single pair of files. Change-Id: Ib9aafe7b4a9d4b0c05cf03a94cc1057a423a27d2 BUG: 1253970 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11931 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* build: Fix build on Mac OS X, booleanKaleb S. KEITHLEY2015-09-011-2/+1
| | | | | | | | | | | | | | bool and true conflict with clang macros in clang on Mac OS X, possibly with newer (?) versions of clang on Linux Change-Id: Ia8c56ae68b4ebffb99b0684ac72d68ec50eaa7fa BUG: 1249391 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/11816 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* all: reduce "inline" usageJeff Darcy2015-09-015-18/+11
| | | | | | | | | | | | | | | | | | | | | | | | | There are three kinds of inline functions: plain inline, extern inline, and static inline. All three have been removed from .c files, except those in "contrib" which aren't our problem. Inlines in .h files, which are overwhelmingly "static inline" already, have generally been left alone. Over time we should be able to "lower" these into .c files, but that has to be done in a case-by-case fashion requiring more manual effort. This part was easy to do automatically without (as far as I can tell) any ill effect. In the process, several pieces of dead code were flagged by the compiler, and were removed. Change-Id: I56a5e614735c9e0a6ee420dab949eac22e25c155 BUG: 1245331 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/11769 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/dht: Don't set posix acls on linkto filesNithya Balachandran2015-08-311-0/+34
| | | | | | | | | | | | | | | | | | | | Posix acls on a linkto file change the file's permission bits and cause DHT to treat it as a non-linkto file.This happens on the migration failure of a file on which posix acls were set. The fix prevents posix acls from being set on a linkto file and copies them across only after a file has been successfully migrated. Change-Id: Iccf7ff6fba49fe05d691d9b83bf76a240848b212 BUG: 1247563 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12025 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* fd: Do fd_bind on successful openPranith Kumar K2015-08-283-0/+6
| | | | | | | | | | | | | | | - fd_unref should decrement fd->inode->fd_count only if it is present in the inode's fd list. - successful open/opendir should perform fd_bind. Change-Id: I81dd04f330e2fee86369a6dc7147af44f3d49169 BUG: 1207735 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11044 Reviewed-by: Anoop C S <anoopcs@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht : lock on subvols to prevent lookup vs rmdir raceSakshi2015-08-275-84/+331
| | | | | | | | | | | | | | | | | There is a possibility that while an rmdir is completed on some non-hashed subvol and proceeding to others. A lookup selfheal can recreate the same directory on those subvols for which the rmdir had succeeded. The fix is to take a blocking inodelk on the subvols before starting rmdir. Since selfheal requires lock on all subvols, if an rmdir is in progess acquiring locks will fail and vice versa. Change-Id: I841a44758c3b88f5e04d1cb73ad36e0cac9fdabb BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11725 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: avoid mknod on decommissioned brickSusant Palai2015-08-252-35/+334
| | | | | | | | | | Change-Id: I8c39ce38e257758e27e11ccaaff4798138203e0c BUG: 1256243 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11998 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht: block/handle create op falling to decommissioned brickSusant Palai2015-08-235-57/+455
| | | | | | | | | | | | | | | | | | | | | | | Problem: Post remove-brick start till commit phase, the client layout may not be in sync with disk layout because of lack of lookup. Hence,a create call may fall on the decommissioned brick. Solution: Will acquire a lock on hashed subvol. So that a fix-layout or selfheal can not step on layout while reading the layout. Even if we read a layout before remove-brick fix-layout and the file falls on the decommissioned brick, the file should be migrated to a new brick as per the fix-layout. Change-Id: If84a12ec34f981adb2b9b224e80f535cfe5bf9f2 BUG: 1232378 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11260 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/tier : Use dht_* versions for xlator_fopsN Balachandran2015-08-181-16/+28
| | | | | | | | | | | | | The tier xlator was using the default_* versions for some xlator_fops. Changed to use the dht_* versions for all xlator_fops Change-Id: I8252fb3911b8a48a55e9eee42b89bd66bbacf799 BUG: 1254451 Signed-off-by: N Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/11948 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht: return non NULL xattr,xdata for ret >= 0Susant Palai2015-08-131-2/+2
| | | | | | | | | | Change-Id: I4a3dd8c00894ceeed4af77df2d960f372281a03b BUG: 1235989 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/11409 Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht/tier :rename fails with EBUSYMohammed Rafi KC2015-08-131-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the files was in hot tier and the look up was done already, then hashed and cached subvolume will be hot-tier. Once the file is moved from hot-tier to cold-tier, then subsequent lookup will send a revalidate lookup to hot-tier and it will find out that the file was actually moved and there is only link in the cached subvolume. So dht will return an ESTALE to fuse. Upon receiving ESTALE for a lookup, fuse will create a new inode and sent a fresh lookup. This lookup will be successful, and it will locate the file properly. Then fuse try to link the inode, but the older inode was already there in inmemory inode cache with same gfid and that is also shared with fuse kernal. So inode_link will return the older ionode itself. So the subsequent rename fop will come to gluster with the older inode. From dht_rename, we will take a lock on the inode and after successful inodelk on inode dht will send lookup before creating a link. this lookup will again find out that the file is a link file, and then dht will think that file is migrating/migrated in the mean time, and will send EBUSY. Change-Id: Ib3a01e5b1d7f64514b04bb6234026d049f082679 BUG: 1248306 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11768 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht : updating return value for layout set functionSakshi2015-08-121-2/+2
| | | | | | | | | | | Change-Id: I7fd89e00b418391afe0a13c2033919c979cc8bbb BUG: 789278 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/10869 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/libgfdb : Setting Freq counters of un-selected files to zeroJoseph Fernandes2015-08-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Change Time Recorder increments the write/read frequency counters on a read or write of a file, if the "features.record-counters" is "on". It is the responsibility of the tiering migrator to reset these counters to zero for un-selected files to reset them to zero as frequency counters are function of promotion/Demotion cycles. If the counters are not set to zero then, 1) the counters may overflow in the DB 2) The file may be wrongly promoted or demoted. This fix will reset the freq counters of un-selected files to zero after promotion/demotion frequency. Change-Id: Ideea2c76a52d421a7e67c37fb0c823f552b3da7a BUG: 1242504 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11648 Tested-by: Joseph Fernandes Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: fix demotion when cold tier is ECDan Lambright2015-08-121-0/+2
| | | | | | | | | | | | | | | We did not set the gfid in the loc structure in tier demotion. EC has a sanity check which fails FOPs when the loc gfid mismatches with the file attribute. When the FOP failed demotion was aborted. Change-Id: I69022c9ccb135b86e1feea93b01801b6a4100509 BUG: 1251121 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11855 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: mohammed rafi kc <rkavunga@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* cluster/dht: Reset source file mode bits on migration failureNithya Balachandran2015-08-121-3/+95
| | | | | | | | | | | | | | | DHT rebalance uses the sgid and sticky bits to indicate that a file is being migrated. These were not removed if the file migration failed. The fix resets these bits to the original values. Change-Id: I9801bfc0bd80c0800251ccd66c1c91a51cffd909 BUG: 1236512 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/11454 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/tiering : create new dictionary during migrationMohammed Rafi KC2015-08-061-2/+10
| | | | | | | | | | | | | To avoid setting wrong xattr during creating link file Change-Id: Iad8de3521eae17e510035ed42e3e01933d647096 BUG: 1250828 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11838 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dict: dict_set_bin() should never free the pointer on errorNiels de Vos2015-07-242-0/+4
| | | | | | | | | | | | | | | | | | | | | | dict_set_bin() is handling the pointer that it passed inconsistently. Depending on the errors that can occur, the pointer passed to the dict can be free'd, but there is no guarantee. It is cleaner to have the caller free the pointer that allocated it and dict_set_bin() returned an error. When dict_set_bin() returned success, the given pointer will be free'd when dict_unref() calls data_destroy(). Many callers of dict_set_bin() already take care of free'ing the pointer on error. The ones that did not, are corrected with this change too. Change-Id: I39a4f7ebc0cae6d403baba99307d7ce408f25966 BUG: 1242280 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11638 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* dht: send lookup even for fd based operations during rebalanceRavishankar N2015-07-221-23/+30
| | | | | | | | | | | | | | | | | | | | | Problem: dht_rebalance_inprogress_task() was not sending lookups to the destination subvolume for a file undergoing writes during rebalance. Due to this, afr was not able to populate the read_subvol and failed the write with EIO. Fix: Send lookup for fd based operations as well. Thanks to Raghavendra G for helping with the RCA. Change-Id: I638c203abfaa45b29aa5902ffd76e692a8212a19 BUG: 1244165 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/11713 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* features/shard: Use xattrop (as opposed to setxattr) for updates to size xattrKrutika Dhananjay2015-07-151-2/+2
| | | | | | | | | | Change-Id: Icd8984976812bb47ae7129426f6c1aa9393b3ab9 BUG: 1232391 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11467 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/tier: fixes for migration over ec as cold tierDan Lambright2015-07-132-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | An opendir is done in rebalance. The graph constructed when EC is used in tiering may have no local volumes (if all the hot volumes are on one node and all the others on another node). Previously the opendir only sent fops down the local subvolumes for migration. They must be sent down both the hot and cold subvolumes for tiering. When setxattr2() received a NULL subvolume; this dereferenced an uninitialized variable. When a lookup is done during creation of the destination file, the xattr dict is "polluted" with virtual xattrs. These cause subsequent xattrs in the new file to not be written by posix. They are required by EC. The inode gfid for "entry_loc" in gf_defrag_migrate_single_file() was not initialized. This made underlying translators think the gfid was 0, and failed migration. Change-Id: I6ccda8ca8e43485b9b354341bbfcb302496f632c BUG: 1236212 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11433 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* syncop: Include iatt to 'syncop_link' argsSoumya Koduri2015-07-101-1/+1
| | | | | | | | | | | | | | Include iatt to 'syncop_link' args to fetch proper attributes of the newly linked inode. Signed-off-by: Soumya Koduri <skoduri@redhat.com> Change-Id: If6b92961bd7a89add3791ed3a9b494087348b492 BUG: 1241788 Reviewed-on: http://review.gluster.org/11611 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* cluster/dht: use refcount to manage memory used to store migrationRaghavendra G2015-07-012-21/+46
| | | | | | | | | | | | | | | | information. Without refcounting, we might free up memory while other fops are still accessing it. BUG: 1235927 Change-Id: Ia4fa4a651cd6fe2394a0c20cef83c8d2cbc8750f Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11418 Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/tier: stop tier migration after graph switchDan Lambright2015-06-261-0/+16
| | | | | | | | | | | | | | | | | On a graph switch, a new xlator and private structures are created. The tier migration daemon must stop using the old xlator and private structures and begin using the new ones. Otherwise, when RPCs arrive (such as counter queries from glusterd), the new xlator will be consulted but it will not have up to date information. The fix detects a graph switch and exits the daemon in this case. Typical graph switches for the tier case would be turning off performance translators. Change-Id: Ibfbd4720dc82ea179b77c81b8f534abced21e3c8 BUG: 1226005 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11372
* tier/ctr: Ignore creation of T file and Ctr Lookup heal improvememntsJoseph Fernandes2015-06-261-3/+2
| | | | | | | | | | | | | | | | | | 1) Ignore creation of T file in ctr_mknod 2) Ignore lookup for T file in ctr_lookup 3) Ctr_lookup: a. If the gfid and pgfid in empty dont record b. Decreased log level for multiple heal attempts c. Inode/File heal happens after an expiry period, which is configurable. d. Hardlink heal happens after an expiry period, which is configurable. Change-Id: Id8eb5092e78beaec22d05f5283645081619e2452 BUG: 1235269 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11334 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tiering/rebalance: tier daemon stopped with out updating statusMohammed Rafi KC2015-06-251-3/+0
| | | | | | | | | | | | | | | | When a subvol goes down, tier daemon stopped immediately, and the status shows as "Progressing". With this change, with respect to tier xlator, when a subvol goes offline it will update the status as failed. Change-Id: I9f722ed0d35cda8c7fc1a7e75af52222e2d0fdb7 BUG: 1227803 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11068 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* dht: Adding log messages to the new logging frameworkarao2015-06-2315-342/+975
| | | | | | | | | | | | | Change-Id: Ib3bb61c5223f409c23c68100f3fe884918d2dc3f BUG: 1194640 Signed-off-by: arao <arao@redhat.com> Reviewed-on: http://review.gluster.org/10021 Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Joseph Fernandes Tested-by: Joseph Fernandes Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* dht : Error value check before performing rebalance completeSakshi2015-06-191-5/+13
| | | | | | | | | | | | | Change-Id: I7a0cd288d16f27b887c7820162efdbe99a039d95 BUG: 1188242 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11097 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/dht: Fix Null pointer dereference while loggingRaghavendra G2015-06-181-8/+8
| | | | | | | | | Change-Id: I1ea358b83267b0bcdf654ce18fe881fd4a6bf08d BUG: 1233139 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/11313 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* cluster/dht: Prevent use after free bugPranith Kumar K2015-06-171-1/+3
| | | | | | | | | Change-Id: I2d1f5bb2dd27f6cea52c059b4ff08ca0fa63b140 BUG: 1231425 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/11209 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* tier/dht: Fixing non atomic promotion/demotion w.r.t to frequency periodJoseph Fernandes2015-06-111-40/+59
| | | | | | | | | | | | | | | | | | | | | This fixes the ping-pong issue i.e files getting demoted immediately after promition, caused by off-sync promotion/demotion processes. The solution is do promotion/demotion refering to the system time. To have the fix working all the file serving nodes should have thier system time synchronized with each other either manually or using a NTP Server. NOTE: The ping-pong issue can re-appear even with this fix, if the admin have different promotion freq period and demotion freq period, but this would be under the control of the admin. Change-Id: I1b33a5881d0cac143662ddb48e5b7b653aeb1271 BUG: 1218717 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11110 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier: account for reordered layoutsDan Lambright2015-06-112-14/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a tiered volume the cold subvolume is always at a fixed position in the graph. DHT's layout array, on the other hand, may have the cold subvolume in either the first or second index, therefore code cannot make any assumptions. The fix searches the layout for the correct position dynamically rather than statically. The bug manifested itself in NFS, in which a newly attached subvolume had not received an existing directory. This case is a "stale entry" and marked as such in the layout for that directory. The code did not see this, because it looked at the wrong index in the layout array. The fix also adds the check for decomissioned bricks, and fixes a problem in detach tier related to starting the rebalance process: we never received the right defrag command and it did not get directed to the tier translator. Change-Id: I77cdf9fbb0a777640c98003188565a79be9d0b56 BUG: 1214289 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> Reviewed-by: Joseph Fernandes <josferna@redhat.com> Reviewed-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11092
* features/changelog: Avoid setattr fop logging during renameSaravanakumar Arumugam2015-06-113-23/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: When a file is renamed and the (renamed)file's Hashing falls into a different brick, DHT creates a special file(linkto file) in the brick(Hashed subvolume) and carries out setattr operation on that file. Currently, Changelog records this(setattr) operation in Hashed subvolume. glusterfind in turn records this operation as MODIFY operation. So, there is a NEW entry in Cached subvolume and MODIFY entry in Hashed subvolume for the same file. Solution: Avoid logging setattr operation carried out, by marking the operation as internal fop using xdata. In changelog translator, check whether setattr is set as internal fop and skip accordingly. Change-Id: I21b09afb5a638b88a4ccb822442216680b7b74fd BUG: 1230007 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11137 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>