summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
...
* tier/shd: create shd volfile for tieringMohammed Rafi KC2015-10-113-20/+262
| | | | | | | | | | | | | | | Currently shd graph will only start if it is replicate or disperse volume. But in case of tiering, volume type will be tier. So we need to start shd if any of the cold or hot is compatible with shd volume. Change-Id: Ic689746ac7d2fc6a9eccdabd8518dc9139829de2 BUG: 1261276 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/11962 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tier/ctr: CTR DB named lookup heal of cold tier during attach tierJoseph Fernandes2015-10-103-6/+133
| | | | | | | | | | | | | | | | | | | | | Heal hardlink in the db for already existing data in the cold tier during attach tier. i.e during fix layout do lookup to files in the cold tier. CTR xlator on the brick/server side does db update/insert of the hardlink on a namelookup. Currently the namedlookup is done synchronous to the fixlayout that is triggered by attach tier. This is not performant, adding more time to fixlayout. The performant approach is record the hardlinks on a compressed datastore and then do the namelookup asynchronously later, giving the ctr db eventual consistency Change-Id: I4ffc337fffe7d447804786851a9183a51b5044a9 BUG: 1252586 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier: add watermarks and policy driverDan Lambright2015-10-106-113/+578
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fix introduces infrastructure to support different policies for promotion and demotion. Currently the tier feature automatically promotes and demotes files periodically based on access. This is good for testing but too stringent for most real workloads. It makes it difficult to fully utilize a hot tier- data will be demoted before it is touched- its unlikely a 100GB hot SSD will have all its data touched in a window of time. A new parameter "mode" allows the user to pick promotion/demotion polcies. The "test mode" will be used for *.t and other general testing. This is the current mechanism. The "cache mode" introduces watermarks. The watermarks represent levels of data residing on the hot tier. "cache mode" policy: The % the hot tier is full is called P. Do not promote or demote more than D MB or F files. A random number [0-100] is called R. Rules for migration: if (P < watermark_low) don't demote, always promote. if (P >= watermark_low) && (P < watermark_hi) demote if R < P; promote if R > P. if (P > watermark_hi) always demote, don't promote. gluster volume set {vol} cluster.watermark-hi % gluster volume set {vol} cluster.watermark-low % gluster volume set {vol} cluster.tier-max-mb {D} gluster volume set {vol} cluster.tier-max-files {F} gluster volume set {vol} cluster.tier-mode {test|cache} Change-Id: I157f19667ec95aa1d53406041c1e3b073be127c2 BUG: 1257911 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12039 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/ec: Implement gfid-hash read-policyPranith Kumar K2015-10-094-10/+81
| | | | | | | | | | | | | | Add a policy in ec to performs reads from same bricks as long as they are good. Based on the gfid of the file/directory it determines the bricks to be considered for reading. Change-Id: Ic97b5c54c086a28b5e07a330a4fd448551b49376 BUG: 1261260 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12133 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
* features/shard: Regulate memory consumption by individual shards' inode_t ↵Krutika Dhananjay2015-10-082-18/+141
| | | | | | | | | | | | | | | | | objects Shard translator will now maintain an lru list of inodes associated with individual shards of constant size, and will make sure that at no point the number of these inodes will exceed the configured limit. This is to keep the memory consumption by the thousands of shards of every large file from exploding. Change-Id: I5e60eea5dcf3130257fb431ca70cfaba53cae7f3 BUG: 1252263 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12254 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* tiering/glusterd: keep afr/ec xlators name constantMohammed Rafi KC2015-10-083-30/+112
| | | | | | | | | | | | | | | | | | | afr uses the translator name for locking purpose, so it is mandatory to keep afr/ec xlators name constant across graph change currently when a tier is attached, afr names are appended either with hot or cold. ie that breaks the above mentioned constraint. Change-Id: I3699dcdaa8190bab3ba81cbc01e8fa126d37ba0d BUG: 1261276 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12134 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* feature/quota: Make message-id for quota start from 120000Susant Palai2015-10-081-23/+23
| | | | | | | | | | | Change-Id: I2076fcab51f4ecc529dffd89ca6ee9eb99d80f09 BUG: 1265531 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12218 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* quota: fix crash in quota_fallocatevmallika2015-10-081-0/+2
| | | | | | | | | | | | | | | list head was not initialized and brick was crashing with fallocate. This patch fixes the issue Change-Id: I9757b88eab61054892f0fe3de63af2683cd4fef7 BUG: 1269754 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12314 Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* tier/ctr: Solution for db locks for tier migrator and ctr using sqlite ↵Joseph Fernandes2015-10-085-53/+599
| | | | | | | | | | | | | | | | | | | | | | | | | | | version less than 3.7 i.e rhel 6.7 Problem: On RHEL 6.7, we have sqlite version 3.6.2 which doesnt support WAL journaling mode, as this journaling mode is only available in sqlite 3.7 and above. As a result we cannot have to progreses concurrently accessing sqlite, without running into db locks! Well WAL is also need for performace on CTR side. Solution: This solution is to use CTR db connection for doing queries when WAL mode is absent. i,e tier migrator will send sync_op ipc calls to CTR, which in turn will do the query and create/update the query file suggested by tier migrator. Pending: Well this solution will stop the db locks but the performance is still an issue for CTR. We are developing an in-Memory Transaction Log (iMeTaL) which will help boost the CTR performance by doing in memory udpates on the IO path and later flush the updates to the db in a batch/segment flush. Change-Id: Ie3149643ded159234b5cc6aa6cf93b9022c2f124 BUG: 1240577 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/12191 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Luis Pabon <lpabon@redhat.com>
* fuse: resolve complete path after a graph switchMohammed Rafi KC2015-10-084-19/+156
| | | | | | | | | | | | | | | | | | If a graph switch has happended as part of a attach-tier, then there is a chance to hash fops to newly added brick before fix-layout. This causes on going i/o to fail. This patch will resolve a path, for graph switch by sending recursive lookup to the parent directories. Those lookups will help to heal the directory. Change-Id: Ia2bb4b43a21e5cc6875ba1205628744c3f0ce4e5 BUG: 1263549 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12184 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com>
* xlators: add JSON FOP statistics dumps every N secondsRichard Wareing2015-10-085-106/+502
| | | | | | | | | | | | | | | | | | | | | | | Summary: - Adds a thread to the io-stats translator which dumps out statistics every N seconds where N is configurable by an option called "diagnostics.stats-dump-interval" - Thread cleanly starts/stops when translator is unloaded - Updates macros to use "Atomic Builtins" (e.g. intel CPU extentions) to use memory barries to update counters vs using locks. This should reduce overhead and prevent any deadlock bugs due to lock contention. Test Plan: - Test on development machine - Run prove -v tests/basic/stats-dump.t Change-Id: If071239d8fdc185e4e8fd527363cc042447a245d BUG: 1266476 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/12209 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Avra Sengupta <asengupt@redhat.com>
* cluster/afr: Handle stack reset failuresPranith Kumar K2015-10-072-0/+8
| | | | | | | | | | | | | | | When all the bricks go down in the middle of the self-heal, in AFR_STACK_RESET afr_local_init will fail because all the bricks are down. So local will remain NULL for the frame. This leads to crashes as this failure is not handled in both entry and data self-heals. Change-Id: I71a02f161f2c4dbfdc8bb7f2a6f32807191ed253 BUG: 1269470 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12309 Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd/add-brick: change add-brick implementation to v3 frameworkMohammed Rafi KC2015-10-072-17/+134
| | | | | | | | | | | | | | | | | | | | | add-brick commit first happens on local node and followed by peers. As part of the commit of local-host glusterd will send the updated volfiles to the clients connected to the local-host even before the commit of peers happen. If any of the newly added brick was hosted by any peer, that brick won't be started when client (connected to local-host) try to send fops. By changing to v3 framework we can send post validate ops after commit operation that helps to send volfile fetch request only after completing commits on all nodes. Change-Id: Ib7312e01143326128c010c11fc2ed206f37409ad BUG: 1263549 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12237 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* quota: use copy_frame when creating new frame during quota_check_limitvmallika2015-10-061-2/+1
| | | | | | | | | | | | | | | | | DHT re-balance, sets frame root PID < 0 and quota_check_limit skips enforcement if this PID is less than 0. When creating new frame for quota_check_limit we need to use copy_frame instead of create_frame, so that all auth information are copied from original frame. Change-Id: Ib3b4a3744f8b0d72a8bc32826f6edae836d6faed BUG: 1267812 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12265 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/rebalance: fix layout and dict leaksSusant Palai2015-10-062-0/+11
| | | | | | | | | Change-Id: Ib3911dfa1f950ff9decbe249ad798e97226dd06d BUG: 1266877 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12295 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* cluster/ec : Mark new entry changelog in entry self-healAshish Pandey2015-10-063-8/+80
| | | | | | | | | | | | | | | | | | | | Problem : When a new entry is created dirty mark xattrs are not created this will need full heal to be performed, even when there are partial failures. Solution : Marks new entry changelog in self-heal. PS: Also fixed erasing of dirty markers when no data heal is required. BUG: 1254121 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: I156e3d3201afa77efe118e1aaace1d91c90a9613 Reviewed-on: http://review.gluster.org/11938 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/shard: Use the xattr rsp dict to pick shard xattrs in xattrop cbkKrutika Dhananjay2015-10-051-1/+1
| | | | | | | | | | | | The change http://review.gluster.org/#/c/11938/ makes a fix in posix translator which would cause sharding to fail fops post xattrop without this patch. Change-Id: If096965b319f393608b0f763402b9b90acb61492 BUG: 1268796 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12300 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* server/protocol: option for dynamic authorization of client permissionsPrasanna Kumar Kalever2015-10-044-4/+61
| | | | | | | | | | | | | | | | | | | | | | | | | problem: assuming gluster volume is already mounted (for gfapi: say client transport connection has already established), now if somebody change the volume permissions say *.allow | *.reject for a client, gluster should allow/terminate the client connection based on the fresh set of volume options immediately, but in existing scenario neither we have any option to set this behaviour nor we take any action until and unless we remount the volume manually solution: Introduce 'dynamic-auth' option (default: on). If 'dynamic-auth' is 'on' gluster will perform dynamic authentication to allow/terminate client transport connection immediately in response to *.allow | *.reject volume set options, thus if volume permissions have changed for a particular client (say client is added to auth.reject list), his transport connection to gluster volume will be terminated immediately. Change-Id: I6243a6db41bf1e0babbf050a8e4f8620732e00d8 BUG: 1245380 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/12229 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd: validate function for replica volume optionsSakshi2015-10-011-12/+42
| | | | | | | | | | Change-Id: I5b4a28db101e9f7e07f4b388c7a2594051c9e8dd BUG: 1265479 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12215 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* dht/rebalance: fix mem-leak in migration code pathSusant Palai2015-10-011-5/+21
| | | | | | | | | Change-Id: I37faf983fc02996541f3d96a17cb2a2c2cdb6781 BUG: 1266877 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12235 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* storage/posix: Reduce number of getxattrs for internal xattrsPranith Kumar K2015-10-011-4/+53
| | | | | | | | | | | | | | | Most of the gluster internal xattrs don't exceed 256 bytes. So try getxattr with ~256 bytes. If it gives ERANGE then go the old way of getxattr with NULL 'buf' to find the length and then getxattr with allocated 'buf' to fill the data. This way we reduce lot of getxattrs. Change-Id: I716d484bc9ba67a81d0cedb5ee3e72a5ba661f6d BUG: 1265893 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12240 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* glusterd, dht: volume set for use-readdirp in dhtPranith Kumar K2015-10-012-0/+9
| | | | | | | | | | | | Change-Id: Icab246b1d02808864d878d949fa56f9f889b538a BUG: 1265677 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12221 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* quota/marker: marker code cleanupvmallika2015-09-303-2350/+3
| | | | | | | | | | | | | marker is re-factored with syncop approach, remove unused old code Change-Id: I36e670e63b6c166db5e64d3149d2978981e2f7c2 BUG: 1240581 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11560 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* protocol/client: Remove dead code from client_rpc_notifyAnoop C S2015-09-283-20/+9
| | | | | | | | | | | | | | | | | | | | Normally GF_EVENT_CHILD_UP is dispatched after client handshake. But we have some dead code in client_rpc_notify which is assumed to do the same on receiving RPC_CLNT_CONNECT. This dispatch is based on a condition whether "disable-handshake" is enabled or not. Since we require client-handshake everytime we have a connect this check for "disable-handshake" is invalid and no longer required. Moreover this option is never handled in any of the translators. Change-Id: Ic862d6ac08cd3b18cf231f50140cd00e84e52ca0 BUG: 1227667 Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/12170 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* posix: xattrop 'GF_XATTROP_ADD_ARRAY_WITH_DEFAULT' implementationvmallika2015-09-281-10/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implementation of xattrop type: GF_XATTROP_ADD_ARRAY_WITH_DEFAULT GF_XATTROP_ADD_ARRAY64_WITH_DEFAULT These operations are similar to 'GF_XATTROP_ADD_ARRAY', except that it adds a default value if xattr is missing or its value is zero on disk. One use-case of this operation is in inode-quota. When a new directory is created, its default dir_count should be set to 1. So when a xattrop performed setting inode-xattrs, it should account initial dir_count 1 if the xattrs are not present Here is the usage of this operation value required in xdata for each key struct array { int32_t newvalue_1; int32_t newvalue_2; ... int32_t newvalue_n; int32_t default_1; int32_t default_2; ... int32_t default_n; }; or struct array { int32_t value_1; int32_t value_2; ... int32_t value_n; } data[2]; fill data[0] with new value to add fill data[1] with default value xattrop GF_XATTROP_ADD_ARRAY_WITH_DEFAULT for i from 1 to n { if (xattr (dest_i) is zero or not set in the disk) dest_i = newvalue_i + default_i else dest_i = dest_i + newvalue_i } value in xdata after xattrop is successful struct array { int32_t dest_1; int32_t dest_2; ... int32_t dest_n; }; Change-Id: Ic6a08473e99fd98299a839d4d8416081a7534efd BUG: 1243946 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11702 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* storage/posix: Prevent extra handle-pathPranith Kumar K2015-09-281-12/+2
| | | | | | | | | | | | | | In readdirp_fill we already have the path of the file/directory. No need to construct handle-path again. This saves two lstats and at least two readlink calls per directory. Change-Id: I8d1b2afeda3e053265a243d4e9a101192f5f509e BUG: 1265893 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/12222 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* features/shard: Port log messages to new frameworkKrutika Dhananjay2015-09-274-93/+339
| | | | | | | | | | Change-Id: Iac01e6a89a0d0c37a12a5e47f17f7ced85a31590 BUG: 1265516 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12217 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* build: export minimum symbols from xlators for correct resolutionKaleb S. KEITHLEY2015-09-2464-64/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've been lucky that we haven't had any symbol collisions until now. Now we have a collision between the snapview-client's svc_lookup() and libntirpc's svc_lookup() with nfs-ganesha's FSAL_GLUSTER and libgfapi. As a short term solution all the snapview-client's FOP methods were changed to static scope. See http://review.gluster.org/11805. This works in snapview-client because all the FOP methods are defined in a single source file. This solution doesn't work for other xlators with FOP methods defined in multiple source files. To address this we link with libtool's '-export-symbols $symbol-file' (a wrapper around `ld --version-script ...` --- on linux anyway) and only export the minimum required symbols from the xlator sharedlib. N.B. the libtool man page says that the symbol file should be named foo.sym, thus the rename of *.exports to *.sym. While foo.exports worked, we will follow the documentation. Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> BUG: 1248669 Change-Id: I1de68b3e3be58ae690d8bfb2168bfc019983627c Reviewed-on: http://review.gluster.org/11814 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* features/shard: Performance improvements in IO path - Part 2Krutika Dhananjay2015-09-221-0/+80
| | | | | | | | | | | | | | | | This is change 2/2 of the performance improvements for sharding. The changes are with respect to maintaining up-to-date values of file attributes in [f]stat, [f]setattr, link, and [f]truncate codepaths. Change-Id: Ia3ce4664fb33be869e4dc76494adbe9c314cc098 BUG: 1258905 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12138 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* features/shard: Performance improvements in IO pathKrutika Dhananjay2015-09-222-70/+233
| | | | | | | | | | | | | | | | | | | | | | | | This is patch 1/2 of the performance improvement work for sharding in the IO path. What this patch does: Since the primary use-case where sharding is targeted - VM store - is a single-writer workload, instead of performing lookup on the base file everytime to gather the size and block count from the backend in reads, writes and truncate, now the size and block count is also cached and kept up-to-date after every inode write in the inode ctx. TO-DO: Make changes in rename, link, unlink, [f]setattr and [f]stat to keep the relevant iatt members up-to-date in the inode ctx. Change-Id: Ica87d020dabc3a3dbccec814b26b01d6a629ff4d BUG: 1258905 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12126 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* glusterfsd : newly added brick receives fops only after it is startedSakshi2015-09-221-1/+4
| | | | | | | | | | | | | | | | | | | When new bricks are added in the middle of an on-going fop like 'rm', the volfile changes without waiting for the newly added bricks to get port. Fops are sent to all bricks and may fail on some with ENOTCONN as these bricks may not have a port yet. This patch ensures that the volfile change happens only after all the bricks have a port. Change-Id: I7ed2413475f80d0cc8849fed33036ade8d75a191 BUG: 1233151 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/11342 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/tier: Handle FOPs on files being migratedN Balachandran2015-09-226-87/+475
| | | | | | | | | | | | | | | | Determine which DHT level is responsible for handling fops on a file undergoing migration based on the name of the the linkto xattr set on the file being migrated and process accordingly. Change-Id: I82772e39314d4fe7f2ba0dcf22de0c6a374ee139 BUG: 1254428 Signed-off-by: N Balachandran <nbalacha@redhat.com> Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12090 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd : check if all bricks are started before performing remove-brickSakshi2015-09-221-1/+10
| | | | | | | | | | Change-Id: Ie9e24e037b7a39b239a7badb983504963d664324 BUG: 1225716 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/10954 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd/utils: glusterd_copy_file does not truncate target fileRajesh Joseph2015-09-221-1/+1
| | | | | | | | | | | | | | | | | | glusterd_copy_file function copies source file to target. If the target file already exists and is bigger than the source file then it can cause file corruption. Target file should be truncated before copying source content. Signed-off-by: Rajesh Joseph <rjoseph@redhat.com> Change-Id: Ie973f3e9fa06309ded6f69dcde41e1b60b3e028e BUG: 1261482 Reviewed-on: http://review.gluster.org/12141 Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* Tier/cli: Change detach-tier commit force to detach-tier forceMohammed Rafi KC2015-09-221-1/+1
| | | | | | | | | | | | | | | | | | Current detach-tier cli command support commit force. Deprecating the same to force. So the new syntax would be: volume detach-tier <VOLNAME> <start|stop|status|commit|force> Change-Id: Ie86dfd72341078c0a1be94767f523730911312ef BUG: 1261862 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12151 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* marker: don't account destination linkto-file during internal migrationvmallika2015-09-224-47/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | During a DHT re-balance operation, quota accounts for the destination. Problem of accounting this destination file are: 1) Migration is an internal operation, 'quota list' shows more usage on the CLI and this will come to the normal numbers once the migration is complete 2) If the usage is close to the limit set, then we can get 'Disk Quota Exceeded' errors in the I/O path during file migration Solution is we should not account of the usage on the destination file during migration, at the end of the migration. We need to reduce size of the source directory and accounting for the migrated dest file We assume that there are sufficent disk space in the back-end. DHT migrator should make sure that there are sufficient disk space before it starts the migration process. Change-Id: Ie3cfe3e4ab5241c2a127ba0edc599a053d30c3a0 BUG: 1260545 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12113 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* Tiering: change in status for remove brick and rebalancehari gowtham2015-09-214-12/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when we trigger a detach tier start on a tier vol, it shows in the volume status task as "remove brick" instead of "Detach tier" Status of volume: vol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.42.171:/data/gluster/hbr1 49154 0 Y 25098 Cold Bricks: Brick 10.70.42.171:/data/gluster/p1 49152 0 Y 25101 Brick 10.70.42.171:/data/gluster/p2 49153 0 Y 25112 NFS Server on localhost N/A N/A N N/A Task Status of Volume vol1 ------------------------------------------------------------------------------ Task : Tier migrate ID : e11d5a3d-b1ae-4c3f-8f95-b28993c60939 Status : in progress Status of volume: vol1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.42.171:/data/gluster/hbr1 49154 0 Y 25098 Cold Bricks: Brick 10.70.42.171:/data/gluster/p1 49152 0 Y 25101 Brick 10.70.42.171:/data/gluster/p2 49153 0 Y 25112 NFS Server on localhost N/A N/A N N/A Task Status of Volume vol1 ------------------------------------------------------------------------------ Task : Detach tier ID : 76d700b1-5bbd-43ed-95fd-1640b2b4af31 Status : completed Change-Id: I4bd3b340d4e700e8afed00e1478b8a8b54dfe2e2 BUG: 1261837 Signed-off-by: hari gowtham <hgowtham@redhat.com> Signed-off-by: Hari Gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12149 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* Tier/glusterd: Do not allow attach-tier if remove-brick is not committedMohammed Rafi KC2015-09-181-0/+24
| | | | | | | | | | | | | | | | When attaching a tier, if there is a pending remove-brick task, then should not allow attach-tier. Since we are not supporting add/remove brick on a tiered volume, we won't able to commit pending remove-brick after attaching the tier Change-Id: Ib434e2e6bc75f0908762f087ad1ca711e6b62818 BUG: 1261819 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12148 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/glusterd: volume status failed after detach startMohammed Rafi KC2015-09-181-3/+4
| | | | | | | | | | | | | | | After triggering detach start on a tiered volume fails. This because of brick count was wrongly setting in rebal dictionary. Change-Id: I6a472bf2653a07522416699420161f2fb1746aef BUG: 1261757 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12146 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier/dht: unlink fails after lookup in a directoryMohammed Rafi KC2015-09-171-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unlink fails with invalid argument for files that are being present on cold tier, before attaching. All of the fops will be hashed to hot_tier after attach-tier (unless explicitly set the "rule" option). Lookups sent to directory, will eventually search the directory using readdirp, and will populate inode_ctx for the inodes based on the output, in respective dht_xlators. So the readdirp will populate inodes_ctx for the files (that is already present in volume before attaching) in cold-dht only because it got the entries from the cold-tier. So when an unlink comes on such an inode, the lookup associated with the unlink will be send as a re validate request to cold-tier only, since already a lookup was performed on the inode, and the new lookup will succeed. So from the unlink of dht, it will hash to cold-tier but the cached_subvol will be cold, since there is a mismatch in hash and cach , it chose hashed subvolume and will sent the fop to hot dht, and the fops fail with EINVAL from the hot-dht since it does not have inode_ctx stored for that inode (because, no lookup was performed from hot-dht). Change-Id: Ib7c14a9297a22d615f7a890a060be4809b5a745a BUG: 1236032 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/11675 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Tiering:Changing error message as detach-tier instead of "remove-brick"hari gowtham2015-09-161-4/+13
| | | | | | | | | | | Change-Id: Id93424a08f601a8d7540d96a47ed2b0497d4a631 BUG: 1263177 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/12177 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* cluster/tier do not flag migration error on already migrated fileDan Lambright2015-09-161-15/+13
| | | | | | | | | | | | | | In some cases a brick will try to migrate a file that has already been migrated. This is a legal case, e.g. when both bricks are replica pairs. Change-Id: If2578b947014cbbdfb3c6591db9044d6b1d92774 BUG: 1263726 Signed-off-by: Dan Lambright <dlambrig@redhat.com> Reviewed-on: http://review.gluster.org/12185 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Joseph Fernandes Tested-by: Gluster Build System <jenkins@build.gluster.com>
* cluster/tier: Fixed a crash in tieringNithya Balachandran2015-09-161-2/+2
| | | | | | | | | | | | | | | An incorrect check was causing the arguments to the promote thread to be cleared before the thread was done with them. This caused the process to crash when it tried to dereference a NULL pointer. Change-Id: I8348309ef4dad33b7f648c7a2c2703487e401269 BUG: 1263204 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12179 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Joseph Fernandes
* afr: perform replace-brick in a synctaskRavishankar N2015-09-154-14/+73
| | | | | | | | | | | | | | | | | | | | | | Problem: replace-brick setxattr is not performed inside a synctask. This can lead to hangs if the setxattr is executed by epoll thread, as the epoll thread will be waiting for replies to come where as epoll thread is the thread that needs to epoll_ctl for reading from socket and listen. Fix: Move replace-brick to synctask to prevent epoll thread hang. This patch is in line with the fix performed in http://review.gluster.org/#/c/12163/ Change-Id: I6a71038bb7819f9e98f7098b18a6cee34805868f BUG: 1262345 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12169 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* BZ 789278: Coverity bug fixes for logically dead codeAkhil Bhansali2015-09-151-29/+0
| | | | | | | | | | | | | | | | | Removing the logically dead code as reported by coverity tool run on GlusterFS. The code changes are removal of logically dead code, hence did not run the testcases. CIDs Fixed starts from 1292652 to 1292663 in sequence. Signed-off-by: Akhil Bhansali <bhansaliakhil@gmail.com> Change-Id: I05b35f744c89b5e49b6322635c7a0d367ef10abb BUG: 789278 Reviewed-on: http://review.gluster.org/12150 Reviewed-by: Anoop C S <anoopcs@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht: reverting changes that takes lock on all subvols to prevent rmdir vs ↵Sakshi2015-09-145-331/+84
| | | | | | | | | | | | | | | lookup selfheal race Locking on all subvols before an rmdir is unable to remove all directory entries. Hence reverting the patch for now. Change-Id: I31baf2b2fa2f62c57429cd44f3f229c35eff1939 BUG: 1245065 Signed-off-by: Sakshi <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12125 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr : get split-brain-status in a synctaskAnuradha Talur2015-09-146-22/+103
| | | | | | | | | | | | | | | On executing `getfattr -n replica.split-brain-status <file>` on mount, there is a possibility that the mount hangs. To avoid this hang, fetch the split-brain-status of a file in synctask. Change-Id: I87b781419ffc63248f915325b845e3233143d385 BUG: 1262345 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12163 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com>
* tier/glusterd : Disable subvol match check during detach tierMohammed Rafi KC2015-09-141-5/+13
| | | | | | | | | | | | | | | For tiering, user does not have authorization to choose for bricks to detach, so we don't need to whether subvols match for the bricks or not. Change-Id: I7e777ccc1aa261f652f9b158718fcd55185c7794 BUG: 1261741 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12145 Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* cluster/dht : Propagate op_errno on failureNithya Balachandran2015-09-131-0/+1
| | | | | | | | | | | | | Fixed issue where dht_selfheal_layout_lock_cbk does not propagate the op_errno. Change-Id: I0b968339db65d2969e36e64407eeb724cc6516bd BUG: 1262438 Signed-off-by: Nithya Balachandran <nbalacha@redhat.com> Reviewed-on: http://review.gluster.org/12165 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht/cluster: Avoid crash if local is NULLSusant Palai2015-09-131-5/+17
| | | | | | | | | | | | | | This patch addresses crash handling if local is NULL. In addition to that, we were not unwinding if no lock is taken in dht_linkfile_create_cbk(create/mknod). This patch handles that also. Change-Id: Ibcff317f10d60e7865fd7ffb9479b3af53c9ef17 BUG: 1260051 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/12160 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>