glusterfs.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	glusterd: volume inode/fd status broken with brick mux	hari gowtham	2018-04-19	2	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: The values for inode/fd was populated from the ctx received from the server xlator. Without brickmux, every brick from a volume belonged to a single brick from the volume. So searching the server and populating it worked. With brickmux, a number of bricks can be confined to a single process. These bricks can be from different volumes too (if we use the max-bricks-per-process option). If they are from different volumes, using the server xlator to populate causes problem. Fix: Use the brick to validate and populate the inode/fd status. Signed-off-by: hari gowtham <hgowtham@redhat.com> Change-Id: I2543fa5397ea095f8338b518460037bba3dfdbfd fixes: bz#1566067
*	glusterd: update listen-backlog value to 1024	Milind Changire	2018-04-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Update default value of listen-backlog to 1024 to reflect the changes in socket.c This keeps the actual implementation in socket.c and the help text in glusterd-volume-set.c consistent Change-Id: If04c9e0bb5afb55edcc7ca57bbc10922b85b7075 fixes: bz#1564600 Signed-off-by: Milind Changire <mchangir@redhat.com>
*	xlators/performance: Add pass-through option	Varsha Rao	2018-04-11	1	-1/+36
\| \| \| \| \| \| \| \| \| \|	Add pass-through option in performance traslators. Set the option in GF_OPTION_INIT() and GF_OPTION_RECONF() Updates: #304 Change-Id: If1537450147d154905831e36f7162a32866d7ad6 Signed-off-by: Varsha Rao <varao@redhat.com>
*	experimental/cloudsync: Download xlator for archival feature	Susant Palai	2018-04-10	2	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spec-files: https://review.gluster.org/#/c/18854/ Overview: * Cloudsync maintains three file states in it's inode-ctx i.e 1 - LOCAL, 2 - REMOTE, 3 - DOWNLOADING. * A data modifying fop is allowed only if the state is LOCAL. If the state is REMOTE or DOWNLOADING, client will download or wait for the download to finish initiated by other client. * Multiple download and upload from different clients are synchronized by inodelk. * In POSIX a state check is done (part of different commit)before allowing the fop to continue. If the state is remote/downloading the fop is unwound with EREMOTE. The client will then download the file and continue with the fop again. * Basic Algo for fop (let's say write fop): - If LOCAL -> resume fop - If REMOTE -> - INODELK - STAT (this gets state and heal the state if needed) - DOWNLOAD - resume fop Note: * Developers will need to write plugins for download, based on the remote store they choose. In phase-1, support will be added for one remote store per volume. In future, more options for multiple remote stores will be explored. TODOs: - Implement stat/lookup/readdirp to return size info from xattr - Make plugins configurable - Implement unlink fop - Add metrics collection - Add sharding support Design Contributions: Aravinda V K <avishwan@redhat.com> Amar Tumballi <amarts@redhat.com> Ram Ankireddypalle <areddy@commvault.com> Susant Palai <spalai@redhat.com> updates: #387 Change-Id: Iddf711ee7ab4e946ae3e472ff62791a7b85e6d4b Signed-off-by: Susant Palai <spalai@redhat.com>
*	glusterd: show brick online after port registration	Atin Mukherjee	2018-04-05	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gluster-block project needs a dependency check to see if all the bricks are online before bringing up the relevant gluster-block services. While the patch https://review.gluster.org/#/c/19785/ attempts to write the script but brick should be only marked as online only when the pmap_signin is completed. While this is perfectly fine for non brick multiplexing, but with brick multiplexing this patch still doesn't eliminate the race completely as the attach_req call is asynchrnous and glusterd immediately marks the port as registerd. Change-Id: I81db54b88f7315e1b24e0234beebe00de6429f9d Fixes: bz#1563273 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: mark port_registered to true for all running bricks with brick mux	Atin Mukherjee	2018-04-05	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glusterd maintains a boolean flag 'port_registered' which is used to determine if a brick has completed its portmap sign in process. This flag is (re)set in pmap_sigin and pmap_signout events. In case of brick multiplexing this flag is the identifier to determine if the very first brick with which the process is spawned up has completed its sign in process. However in case of glusterd restart when a brick is already identified as running, glusterd does a pmap_registry_bind to ensure its portmap table is updated but this flag isn't which is fine in case of non brick multiplex case but causes an issue if the very first brick which came as part of process is replaced and then the subsequent brick attach will fail. One of the way to validate this is to create and start a volume, remove the first brick and then add-brick a new one. Add-brick operation will take a very long time and post that the volume status will show all other brick status apart from the new brick as down. Solution is to set brickinfo->port_registered to true for all the running bricks when brick multiplexing is enabled. Change-Id: Ib0662d99d0fa66b1538947fd96b43f1cbc04e4ff Fixes: bz#1560957 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: fix txn_opinfo memory leak	Atin Mukherjee	2018-04-04	3	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	For transactions where there's no volname involved (eg : gluster v status), the originator node initiates with staging phase and what that means in op-sm there's no unlock event triggered which resulted into a txn_opinfo dictionary leak. Credits : cynthia.zhou@nokia-sbell.com Change-Id: I92fffbc2e8e1b010f489060f461be78aa2b86615 Fixes: bz#1550339 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: honour localtime-logging for all the daemons	Atin Mukherjee	2018-04-03	5	-0/+30
\| \| \| \| \| \|	Change-Id: I97a70d29365b0a454241ac5f5cae56d93eefd73a Fixes: bz#1563334 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: setting mgmt_v3_timer->timer to NULL after deleting mgmt_v3_timer	Sanju Rakonde	2018-04-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	We are setting mgmt_v3_timer->timer to NULL after mgmt_v3_timer is deleted which is unnecessary. So removing the statement. This issue is caught while running glusterd with ASAN. Change-Id: Ied1f91590a2c64ec1af36d4de9c3febd6cf94bb9 Fixes: bz#1562907 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	Revert "glusterd: handling brick termination in brick-mux"	Sanju Rakonde	2018-03-29	4	-55/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit a60fc2ddc03134fb23c5ed5c0bcb195e1649416b. This commit was causing multiple tests to time out when brick multiplexing is enabled. With further debugging, it's found that even though the volume stop transaction is converted into mgmt_v3 to allow the remote nodes to follow the synctask framework to process the command, there are other callers of glusterd_brick_stop () which are not synctask based. Change-Id: I7aee687abc6bfeaa70c7447031f55ed4ccd64693 updates: bz#1545048
*	glusterd: changing the op-version of volume stop mgmt v3	Kaleb S. KEITHLEY	2018-03-28	1	-3/+3
\| \| \| \| \| \| \| \|	log message describe the actual test Change-Id: I1ea7300a6b186032a65236492d6d2a6eef0ab983 fixes: bz#1560441 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
*	glusterd: handling brick termination in brick-mux	Sanju Rakonde	2018-03-28	4	-25/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: There's a race between the last glusterfs_handle_terminate() response sent to glusterd and the kill that happens immediately if the terminated brick is the last brick. Solution: When it is a last brick for the brick process, instead of glusterfsd killing itself, glusterd will kill the process in case of brick multiplexing. And also changing gf_attach utility accordingly. Change-Id: I386c19ca592536daa71294a13d9fc89a26d7e8c0 fixes: bz#1545048 BUG: 1545048 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: changing the op-version of volume stop mgmt v3	Sanju Rakonde	2018-03-27	1	-1/+1
\| \| \| \| \| \|	Change-Id: Iefc5a00d36436b23181871fa365f27b8d90cff0a fixes: bz#1560441 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	glusterd: Implementing volume stop in mgmt v3	Sanju Rakonde	2018-03-26	2	-1/+66
\| \| \| \| \| \|	Change-Id: I8f9c594cf56331d54eb4884335699744685ef20d fixes: bz#1560441 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
*	georep : Pause/Resume of geo-replication with wrong user	Sunny Kumar	2018-03-20	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \|	While performing pause/resume on geo-replication with wrong user (other user then you setup), always returns success. Which further leads to snapshot creation failure as it is detecting active geo-replication session. Change-Id: I6e96e8dd3e861348b057475387f0093cb903ae88 BUG: 1550936 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	glusterd: TLS verification fails while using intermediate CA	Mohit Agrawal	2018-03-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: TLS verification fails while using intermediate CA if mgmt SSL is enabled. Solution: There are two main issue of TLS verification failing 1) not calling ssl_api to set cert_depth 2) The current code does not allow to set certificate depth while MGMT SSL is enabled. After apply this patch to set certificate depth user need to set parameter option transport.socket.ssl-cert-depth <depth> in /var/lib/glusterd/secure_acccess instead to set in /etc/glusterfs/glusterd.vol. At the time of set secure_mgmt in ctx we will check the value of cert-depth and save the value of cert-depth in ctx.If user does not provide any value in cert-depth in that case it will consider default value is 1 BUG: 1555154 Change-Id: I89e9a9e1026e37efb5c20f9ec62b1989ef644f35 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
*	glusterd: glusterd crash in gd_mgmt_v3_unlock_timer_cbk	Gaurav Yadav	2018-03-15	1	-1/+0
\| \| \| \| \| \| \| \|	Memory cleanup of same pointer twice inside gd_mgmt_v3_unlock_timer_cbk causing glusterd to crash. Change-Id: I9147241d995780619474047b1010317a89b9965a BUG: 1550339
*	glusterd: volume get fixes for client-io-threads & quorum-type	Ravishankar N	2018-03-07	5	-7/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. If a replica volume created on glusterfs-3.8 was upgraded to glusterfs-3.12, `gluster vol get volname client-io-threads` displayed 'on' even though it wasn't and the xlator wasn't loaded on the client-graph. This was due to removing certain checks in glusterd_get_default_val_for_volopt as a part of commit 47604fad4c2a3951077e41e0c007ceb979bb2c24. Fix it. 2. Also, as a part of op-version bump-up, client-io-threads was being loaded on the clients during volfile regeneration. Prevent it. 3. AFR assumes quorum-type to be auto in newly created replic 3 (odd replica in general) volumes but `gluster vol get quorum-type` displays 'none'. Fix it. Change-Id: I19e586361ed1065c70fb378533d3b4dac1095df9 BUG: 1545056 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd : memory leak in mgmt_v3 lock functionality	Gaurav Yadav	2018-03-06	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	In order to take care of stale lock issue, a timer was intrduced in mgmt_v3 lock. This timer is not freeing the memory due to which this leak got introduced With this fix now memory cleanup in locking is handled properly Change-Id: I2e1ce3ebba3520f7660321f3d97554080e4e22f4 BUG: 1550339 Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
*	perfomance/io-threads: Add option to disable client disconnect feature	Varsha Rao	2018-02-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	> Add options to disable new features > Commit ID: c071992e8d > https://review.gluster.org/#/c/18291/ > By Michael Goulet <mgoulet@fb.com> This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: Ice477fdf4b8934f9fac0b4a2f6c93db97429a586 Signed-off-by: Varsha Rao <varao@redhat.com>
*	write-behind: Make aggregate size configurable	Poornima G	2018-02-26	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the aggregate size is by default 128K (page size). From performance perspective small number of large writes is faster than large number of small writes, especially in EC volumes. But identifying the right aggregate size depends on multiple factors like the memcpy overhead, network overhead etc. On local machine, combining 128k writes to 1M writes for EC volumes yielded 30% improvement. As a part of this patch, aggregate size is just made configurable and page_size is modified accordingly. Raghavendra Gowdappa had suggested that, while aggregating writes we should get rid of memcpy of large write size, and instead add the pointer to existinf vector, will be doing it as a part of another patch. Also, in EC volumes, the vectors are merged into one vector, so even if we save memcopy in write_behind, EC would anyways do memcopy for merging vectors into one vector. Updates: #364 Change-Id: Ib67294b8577bea14dde1c84cd271012ecea99f09 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	performance/io-threads: Add threads to priority based stagnant queues	Varsha Rao	2018-02-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	> performance/io-threads: Add watchdog to cover up a possible thread leak > Commit ID: 8b6804f75c > https://review.gluster.org/#/c/18239/ > By Shreyas Siravara <sshreyas@fb.com> This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: Id057c34a2abb9fc6dfb4afcd5c7bbbfe5693bbb8 Signed-off-by: Varsha Rao <varao@redhat.com>
*	glusterd: compare uuid instead of hostname while finding compatible brick	Atin Mukherjee	2018-02-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	If the above is not done, bricks created with different IP/hostname will not be compatible with brick multiplexing. Change-Id: I508eb59b0632df4b48466cca411c7ec6cc6bd577 BUG: 1547068 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	xlators/features/namespace: Add namespace xlator and link into brick graph	Varsha Rao	2018-02-21	2	-1/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following release-3.8-fb branch patch is upstreamed: > features/namespace: Add namespace xlator and link into brick graph > Commit ID: dbd30776f26e > https://review.gluster.org/#/c/18041/ > By Michael Goulet <mgoulet@fb.com> Changes in this patch: Removes extra config.h and namespace.h file in namespace.c Adds default_getspec_cbk to libglusterfs.sym Rename dict_for_each to dict_foreach_inline Remove fd.h header file stack.h Add test case for truncate, open and symlink This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: Ib88c95b89eecee9b8957df8a4c8712c899c761d1 Signed-off-by: Varsha Rao <varao@redhat.com>
*	build: add --without-server option	Niels de Vos	2018-02-19	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With Gluster 4.0 we will not provide the server components for EL6 and older. At one point Gluster 4.x will get GlusterD2, which requires Golang tools in the distribution. EL6 does not contain these at the moment. With this change, it is possible to `./configure --without-server` which prevents building glusterd and the xlators for the bricks. Building RPMs can pass `--without server` and the glusterfs-server sub-package will not be created. Change-Id: I97f5ccf9f2c76e60d9af83915fc59fae57ad6d25 BUG: 1074947 Signed-off-by: Niels de Vos <ndevos@redhat.com>
*	posix/afr: handle backward compatibility for rchecksum fop	Ravishankar N	2018-02-19	1	-0/+7
\| \| \| \| \| \| \| \| \|	Added a volume option 'fips-mode-rchecksum' tied to op version 4. If not set, rchecksum fop will use MD5 instead of SHA256. updates: #230 Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
*	glusterd: fix tier-enabled flag op-version check	Atin Mukherjee	2018-02-13	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	tier-enabled flag in volinfo structure was introduced in 3.10, however while writing this value to the glusterd store was done with a wrong op-version check which results into volume checksum failure during upgrades. Change-Id: I4330d0c4594eee19cba42e2cdf49a63f106627d4 BUG: 1544600 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd/snapshot : fix the compare snap logic	Atin Mukherjee	2018-02-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	In one of the case in commit cb0339f there's one particular case where after removing the old snap it wasn't writing the new snap version and this resulted into one of the test to fail spuriously. Change-Id: I3e83435fb62d6bba3bbe227e40decc6ce37ea77b BUG: 1540607 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: import volumes in separate synctask	Atin Mukherjee	2018-02-09	6	-70/+340
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With brick multiplexing, to attach a brick to an existing brick process the prerequisite is to have the compatible brick to finish it's initialization and portmap sign in and hence the thread might have to go to a sleep and context switch the synctask to allow the brick process to communicate with glusterd. In normal code path, this works fine as glusterd_restart_bricks () is launched through a separate synctask. In case there's a mismatch of the volume when glusterd restarts, glusterd_import_friend_volume is invoked and then it tries to call glusterd_start_bricks () from the main thread which eventually may land into the similar situation. Now since this is not done through a separate synctask, the 1st brick will never be able to get its turn to finish all of its handshaking and as a consequence to it, all the bricks will fail to get attached to it. Solution : Execute import volume and glusterd restart bricks in separate synctask. Importing snaps had to be also done through synctask as there's a dependency of the parent volume need to be available for the importing snap functionality to work. Change-Id: I290b244d456afcc9b913ab30be4af040d340428c BUG: 1540607 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	performance/io-threads: expose io-thread queue depths	Varsha Rao	2018-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following release-3.8-fb branch patch is upstreamed: > io-stats: Expose io-thread queue depths > Commit ID: 69509ee7d2 > https://review.gluster.org/#/c/18143/ > By Shreyas Siravara <sshreyas@fb.com> Changes in this patch: - Replace iot_pri_t with gf_fop_pri_t - Replace IOT_PRI_{HI, LO, NORMAL, MAX, LEAST} with GF_FOP_PRI_{HI, LO, NORMAL, MAX, LEAST} - Use dict_unref() instead of dict_destroy() This patch is required to forward port io-threads namespace patch. Updates: #401 Change-Id: I1b47a63185a441a30fbc423ca1015df7b36c2518 Signed-off-by: Varsha Rao <varao@redhat.com>
*	glusterd/store: handle the case of fsid being set to 0	Amar Tumballi	2018-02-05	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	Generally this would happen when a system gets upgraded from an version which doesn't have fsid details, to a version with fsid values. Without this change, after upgrade, people would see reduced 'df ' output, causing lot of confusions. Debugging Credits: Nithya B <nbalacha@redhat.com> Change-Id: Id718127ddfb69553b32770b25021290bd0e7c49a BUG: 1517260 Signed-off-by: Amar Tumballi <amarts@redhat.com>
*	cluster/dht: avoid overwriting client writes during migration	Susant Palai	2018-02-02	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>
*	glusterd: optimize glusterd import volumes code path	Atin Mukherjee	2018-01-31	1	-5/+7
\| \| \| \| \| \| \| \| \| \|	In case there's a version mismatch detected for one of the volumes glusterd was ending up with updating all the volumes which is a overkill. Change-Id: I6df792db391ce3a1697cfa9260f7dbc3f59aa62d BUG: 1539510 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	quiesce, gfproxy: Implement failover across multiple gfproxy nodes	Poornima G	2018-01-30	2	-0/+32
\| \| \| \| \| \|	Updates: #242 Change-Id: I767e574a26e922760a7130bd209c178d74e8cf69 Signed-off-by: Poornima G <pgurusid@redhat.com>
*	core: add some examples of site.h usage	Jeff Darcy	2018-01-30	1	-1/+1
\| \| \| \| \|	Change-Id: I6ce574a593eda8f3a6b2fc8969b5edf7c250b61c Signed-off-by: Jeff Darcy <jdarcy@fb.com>
*	protocol: Remove lock recovery logic from client and server	Anoop C S	2018-01-29	1	-24/+0
\| \| \| \| \| \|	Change-Id: I27f5e1e34fe3eac96c7dd88e90753fb5d3d14550 BUG: 1272030 Signed-off-by: Anoop C S <anoopcs@redhat.com>
*	glusterd: add profile_enabled flag in get-state	Atin Mukherjee	2018-01-25	4	-22/+27
\| \| \| \| \| \|	Change-Id: I09f348ed7ae6cd481f8c4d8b4f65f2f2f6aad84e BUG: 1537364 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: process pmap sign in only when port is marked as free	Atin Mukherjee	2018-01-25	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because of some crazy race in volume start code path because of friend handshaking with volumes with quorum enabled we might end up into a situation where glusterd would start a brick and get a disconnect and then immediately try to start the same brick instance based on another friend update request. And then if for the very first brick even if the process doesn't come up at the end sign in event gets sent and we end up having two duplicate portmap entries for the same brick. Since in brick start we mark the previous port as free, its better to consider a sign in request as no op if the corresponding port type is marked as free. Change-Id: I995c348c7b6988956d24b06bf3f09ab64280fc32 BUG: 1537362 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	dentry fop serializer: added new server side xlator for dentry fop serialization	Sakshi Bansal	2018-01-24	2	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problems addressed by this xlator : [1]. To prevent race between parallel mkdir,mkdir and lookup etc. Fops like mkdir/create, lookup, rename, unlink, link that happen on a particular dentry must be serialized to ensure atomicity. Another possible case can be a fresh lookup to find existance of a path whose gfid is not set yet. Further, storage/posix employs a ctime based heuristic 'is_fresh_file' (interval time is less than 1 second of current time) to check fresh-ness of file. With serialization of these two fops (lookup & mkdir), we eliminate the race altogether. [2]. Staleness of dentries This causes exponential increase in traversal time for any inode in the subtree of the directory pointed by stale dentry. Cause : Stale dentry is created because of following two operations: a. dentry creation due to inode_link, done during operations like lookup, mkdir, create, mknod, symlink, create and b. dentry unlinking due to various operations like rmdir, rename, unlink. The reason is __inode_link uses __is_dentry_cyclic, which explores all possible path to avoid cyclic link formation during inode linkage. __is_dentry_cyclic explores stale-dentry(ies) and its all ancestors which is increases traversing time exponentially. Implementation : To acheive this all fops on dentry must take entry locks before they proceed, once they have acquired locks, they perform the fop and then release the lock. Some documentation from email conversation: [1] http://www.gluster.org/pipermail/gluster-devel/2015-December/047314.html [2] http://www.gluster.org/pipermail/gluster-devel/2015-August/046428.html With this patch, the feature is optional, enable it by running: `gluster volume set $volname features.sdfs enable` Also the feature is tested for a month without issues in the experiemental branch for all the regression. Change-Id: I6e80ba3cabfa6facd5dda63bd482b9bf18b6b79b Fixes: #397 BUG: 1304962 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	md-cache: Implement dynamic configuration of xattr list for caching	Poornima G	2018-01-22	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the list of xattrs that md-cache can cache is hard coded in the md-cache.c file, this necessiates code change and rebuild everytime a new xattr needs to be added to md-cache xattr cache list. With this patch, the user will be able to configure a comma seperated list of xattrs to be cached by md-cache Updates #297 Change-Id: Ie35ed607d17182d53f6bb6e6c6563ac52bc3132e Signed-off-by: Poornima G <pgurusid@redhat.com>
*	cluster/afr: Adding option to take full file lock	karthik-us	2018-01-19	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: In replica 3 volumes there is a possibilities of ending up in split brain scenario, when multiple clients writing data on the same file at non overlapping regions in parallel. Scenario: - Initially all the copies are good and all the clients gets the value of data readables as all good. - Client C0 performs write W1 which fails on brick B0 and succeeds on other two bricks. - C1 performs write W2 which fails on B1 and succeeds on other two bricks. - C2 performs write W3 which fails on B2 and succeeds on other two bricks. - All the 3 writes above happen in parallel and fall on different ranges so afr takes granular locks and all the writes are performed in parallel. Since each client had data-readables as good, it does not see file going into split-brain in the in_flight_split_brain check, hence performs the post-op marking the pending xattrs. Now all the bricks are being blamed by each other, ending up in split-brain. Fix: Have an option to take either full lock or range lock on files while doing data transactions, to prevent the possibility of ending up in split brains. With this change, by default the files will take full lock while doing IO. If you want to make use of the old range lock change the value of "cluster.full-lock" to "no". Change-Id: I7893fa33005328ed63daa2f7c35eeed7c5218962 BUG: 1535438 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	locks: added inodelk/entrylk contention upcall notifications	Xavier Hernandez	2018-01-16	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The locks xlator now is able to send a contention notification to the current owner of the lock. This is only a notification that can be used to improve performance of some client side operations that might benefit from extended duration of lock ownership. Nothing is done if the lock owner decides to ignore the message and to not release the lock. For forced release of acquired resources, leases must be used. Change-Id: I7f1ad32a0b4b445505b09908a050080ad848f8e0 Signed-off-by: Xavier Hernandez <xhernandez@datalab.es>
*	glusterd: get-state memory leak fix	Atin Mukherjee	2018-01-08	1	-3/+12
\| \| \| \| \| \|	Change-Id: Ic4fcf2087f295d3dade944efb8fd08f7e2d7d516 BUG: 1531149 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	glusterd: fix up volume option flags	Csaba Henk	2018-01-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In glusterd volfile generation code options should be ornamented with the VOLOPT_FLAG_* flags. However, some are ornamented with OPT_FLAG_* flags (which are to be used in xlator context). The impact is: the OPT_FLAG_* that occurs is OPT_FLAG_CLIENT_OPT, which has the same value as VOLOPT_FLAG_XLATOR_OPT, so what was meant is "option affects clients" and what was there means "option enables/disables xlators". Because of this semantic shift, op version might be incorrectly calculated for volumes and clients. (At this point it's a theoretical possibility. Actual occurrence might depend on connecting client & server versions; it's also possible that there exists a proof of concept scenario but it's irrealistic.) This commit eliminates the OPT_FLAG_* occurrences from glusterd code, and replaces them with the appropriate VOLOPT_FLAG_* flags. Change-Id: Ia4e6fbac738d5a8d889c0f5561c4dea6783250b1 Signed-off-by: Csaba Henk <csaba@redhat.com>
*	glusterd: connect to an existing brick process when qourum status is ↵	Atin Mukherjee	2018-01-05	8	-11/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	NOT_APPLICABLE_QUORUM First of all, this patch reverts commit 635c1c3 as the same is causing a regression with bricks not coming up on time when a node is rebooted. This patch tries to fix the problem in a different way by just trying to connect to an existing running brick when quorum status is not applicable. Change-Id: I0efb5901832824b1c15dcac529bffac85173e097 BUG: 1509845 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	tests: Enable geo-rep test cases	Kotresh HR	2018-01-05	2	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch re-enables the geo-rep test cases. Along with it does following optimizations. 1. Use EXPECT_WITHIN instead of sleep 2. Clean up geo-rep ssh key after test 3. Changes to gverify.sh and S56glusterd-geo-rep-create-post.sh to use the given ssh identity file for geo-rep create 4. Make gluster-command-dir configurable and introduce slave-gluster-command-dir which points the parent directory of gluster binaries in master and slave respectively. Change-Id: Ia7696278d9dd3ba04224dcd7c3564088ca970b04 BUG: 1480491 Signed-off-by: Kotresh HR <khiremat@redhat.com>
*	glusterd: Nullify pmap entry for bricks belonging to same port	Atin Mukherjee	2018-01-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Commit 30e0b86 tried to address all the stale port issues glusterd had in case of a brick is abruptly killed. For brick multiplexing case because of a bug the portmap entry was not getting removed. This patch addresses the same. Change-Id: Ib020b967a9b92f1abae9cab9492f0cacec59aaa1 BUG: 1530281 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
*	mgmt/glusterd: Adding validation for setting quorum-count	karthik-us	2017-12-29	1	-5/+40
\| \| \| \| \| \| \| \| \| \| \|	In a replicated volume it was allowing to set the quorum-count value between the range [1 - 2147483647]. This patch adds validation for allowing only maximum of replica_count number of quorum-count value to be set on a volume. Change-Id: I13952f3c6cf498c9f2b91161503fc0fba9d94898 BUG: 1529515 Signed-off-by: karthik-us <ksubrahm@redhat.com>
*	snapshot : after brick reset/replace snapshot creation fails	Sunny Kumar	2017-12-29	3	-27/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	Problem : after brick reset/replace snapshot creation fails Solution : During brick reset/replace when we validate and aggrigate dictionary data from another node it was rewriting 'mount_dir' value to NULL which is critical for snapshot creation. Change-Id: Iabefbfcef7d8ac4cbd2a241e821c0e51492c093e BUG: 1512451 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
*	rchecksum/fips: Replace MD5 usage to enable fips support	Kotresh HR	2017-12-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	rchecksum uses MD5 which is not fips compliant. Hence using sha256 for the same. Updates: #230 Change-Id: I7fad016fcc2a9900395d0da919cf5ba996ec5278 Signed-off-by: Kotresh HR <khiremat@redhat.com>