summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* doc: Added release notes for 5.7v5.7hari gowtham2019-07-021-0/+26
| | | | | | | Fixes: bz#1697986 Change-Id: I6dc17424665431957152761eaec7b6b2226ae1f0 Signed-off-by: hari gowtham <hgowtham@redhat.com>
* cluster/ec: honor contention notifications for partially acquired locksXavi Hernandez2019-06-282-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | EC was ignoring lock contention notifications received while a lock was being acquired. When a lock is partially acquired (some bricks have granted the lock but some others not yet) we can receive notifications from acquired bricks, which should be honored, since we may not receive more notifications after that. Since EC was ignoring them, once the lock was acquired, it was not released until the eager-lock timeout, causing unnecessary delays on other clients. This fix takes into consideration the notifications received before having completed the full lock acquisition. After that, the lock will be releaed as soon as possible. Backport of: > BUG: bz#1708156 > Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: bz#1717282 Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Hari Gowtham <hgowtham@redhat.com>
* tests/utils: Fix py2/py3 util python scriptsKotresh HR2019-06-288-30/+261
| | | | | | | | | | | | | | | | | | | | | Following files are fixed. tests/bugs/distribute/overlap.py tests/utils/changelogparser.py tests/utils/create-files.py tests/utils/gfid-access.py tests/utils/libcxattr.py Have marked glupy as bad test. Backport of: > Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 > BUG: 1193929 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: I3db857cc19e19163d368d913eaec1269fbc37140 Updates: bz#1629877 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* ec: fix truncate lock to cover the write in tuncate cleanKinglong Mee2019-05-081-2/+6
| | | | | | | | | | | ec_truncate_clean does writing under the lock granted for truncate, but the lock is calculated by ec_adjust_offset_up, so that, the write in ec_truncate_clean is out of lock. Updates: bz#1699500 Change-Id: I15ed1b0807d75c5eb817323f1c227e97d03e0e7c Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> (cherry picked from commit 0e1223491e964096384edfae5032ed0d50d028ad)
* performance/write-behind: remove request from wip list in wb_writev_cbkRaghavendra G2019-05-081-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race in the way O_DIRECT writes are handled. Assume two overlapping write requests w1 and w2. * w1 is issued and is in wb_inode->wip queue as the response is still pending from bricks. Also wb_request_unref in wb_do_winds is not yet invoked. list_for_each_entry_safe (req, tmp, tasks, winds) { list_del_init (&req->winds); if (req->op_ret == -1) { call_unwind_error_keep_stub (req->stub, req->op_ret, req->op_errno); } else { call_resume_keep_stub (req->stub); } wb_request_unref (req); } * w2 is issued and wb_process_queue is invoked. w2 is not picked up for winding as w1 is still in wb_inode->wip. w1 is added to todo list and wb_writev for w2 returns. * response to w1 is received and invokes wb_request_unref. Assume wb_request_unref in wb_do_winds (see point 1) is not invoked yet. Since there is one more refcount, wb_request_unref in wb_writev_cbk of w1 doesn't remove w1 from wip. * wb_process_queue is invoked as part of wb_writev_cbk of w1. But, it fails to wind w2 as w1 is still in wip. * wb_requet_unref is invoked on w1 as part of wb_do_winds. w1 is removed from all queues including w1. * After this point there is no invocation of wb_process_queue unless new request is issued from application causing w2 to be hung till the next request. This bug is similar to bz 1626780 and bz 1379655. Change-Id: Iaa47437613591699d4c8ad18bc0b32de6affcc31 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1707198 (cherry picked from commit 6454132342c0b549365d92bcf3572ecd914f7fa8)
* cluster/afr: Remove local from owners_list on failure of lock-acquisitionPranith Kumar K2019-05-085-18/+61
| | | | | | | | | | | | | When eager-lock lock acquisition fails because of say network failures, the local is not being removed from owners_list, this leads to accumulation of waiting frames and the application will hang because the waiting frames are under the assumption that another transaction is in the process of acquiring lock because owner-list is not empty. Handled this case as well in this patch. Added asserts to make it easier to find these problems in future. fixes bz#1699736 Change-Id: I3101393265e9827755725b1f2d94a93d8709e923 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* cluster/dht: Request linkto xattrs in dht_rmdir opendirN Balachandran2019-04-101-1/+26
| | | | | | | | | | | | | | | | | | | If parallel-readdir is enabled, the rda xlator is loaded below dht in the graph and proactively lists and caches entries when an opendir is performed. dht_rmdir checks if the directory being deleted contains stale linkto files by performing a readdirp on its child subvols. However, as the entries are actually read in during the opendir operation which does not request the linkto xattr,no linkto xattrs are present for the entries causing dht to incorrectly identify them as data files and fail the rmdir operation with ENOTEMPTY. DHT now always adds the linkto xattr in the list of xattrs requested in the opendir. Change-Id: I0711198e66c59146282eb8b88084170bedfb4018 fixes: bz#1695399 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 110006bbcd5bb3e814b4cfe7d74cb41891ac3b0c)
* doc: Added release 5.6 notesv5.6ShyamsundarR2019-04-091-0/+34
| | | | | | Fixes: bz#1693300 Change-Id: I4deaa0ecdd1692fb11f2d90ecc30a2370a659c2f Signed-off-by: ShyamsundarR <srangana@redhat.com>
* glusterd: fix txn-id mem leakAtin Mukherjee2019-04-092-6/+36
| | | | | | | | | | | | | | | | This commit ensures the following: 1. Don't send commit op request to the remote nodes when gluster v status all is executed as for the status all transaction the local commit gets the name of the volumes and remote commit ops are technically a no-op. So no need for additional rpc requests. 2. In op state machine flow, if the transaction is in staged state and op_info.skip_locking is true, then no need to set the txn id in the priv->glusterd_txn_opinfo dictionary which never gets freed. Fixes: bz#1694612 Change-Id: Ib6a9300ea29633f501abac2ba53fb72ff648c822 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 34e010d64905b7387de57840d3fb16a326853c9b)
* cluster-syncop: avoid duplicate unlock of inodelk/entrylkKinglong Mee2019-04-081-0/+6
| | | | | | | | | | | | | | | When using ec, there are many messages at brick log as, [inodelk.c:514:__inode_unlock_lock] 0-test-locks: Matching lock not found for unlock 0-9223372036854775807, lo=68e040a84b7f0000 on 0x7f208c006f78 [MSGID: 115053] [server-rpc-fops_v2.c:280:server4_inodelk_cbk] 0-test-server: 2557439: INODELK <gfid:df4e41be-723f-4289-b7af-b4272b3e880c> (df4e41be-723f-4289-b7af-b4272b3e880c), client: CTX_ID:67d4a7f3-605a-4965-89a5-31309d62d1fa-GRAPH_ID:0-PID:1659-HOST:openfs-node2-PC_NAME:test-client-1-RECON_NO:-28, error-xlator: test-locks [Invalid argument] > Change-Id: Ib164d29ebb071f620a4ca9679c4345ef7c88512a > Signed-off-by: Kinglong Mee <mijinlong@open-fs.com> (cherry-pick of https://review.gluster.org/#/c/glusterfs/+/22377/) Change-Id: I8345ad6c8e1bbb676917eb47e1c5ed72c162f6ce Updates: bz#1690952 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* glusterfsd: Multiple shd processes are spawned on brick_mux environmentMohit Agrawal2019-04-082-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Multiple shd processes are spawned while starting volumes in the loop on brick_mux environment.glusterd spawn a process based on a pidfile and shd daemon is taking some time to update pid in pidfile due to that glusterd is not able to get shd pid Solution: Commit cd249f4cb783f8d79e79468c455732669e835a4f changed the code to update pidfile in parent for any gluster daemon after getting the status of forking child in parent.To resolve the same correct the condition update pidfile in parent only for glusterd and for rest of the daemon pidfile is updated in child > Change-Id: Ifd14797fa949562594a285ec82d58384ad717e81 > fixes: bz#1684404 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (cherry picked from commit 66986594a9023c49e61b32769b7e6b260b600626) Change-Id: Ie0aa2aebd2b92e114a49777a169b600f3a7163f9 fixes: bz#1696147 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* logging: Fix GF_LOG_OCCASSIONALLY APIAtin Mukherjee2019-04-081-1/+1
| | | | | | | | | | | | GF_LOG_OCCASSIONALLY doesn't log on the first instance rather at every 42nd iterations which isn't effective as in some cases we might not have the code flow hitting the same log for as many as 42 times and we'd end up suppressing the log. Fixes: bz#1695391 Change-Id: Iee293281d25a652b64df111d59b13de4efce06fa Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit d0d3e10d44366c68fc153e48b229e72a4aa26e61)
* client-rpc: Fix the payload being sent on the wirePoornima G2019-04-086-240/+304
| | | | | | | | | | | | | | | | | | | The fops allocate 3 kind of payload(buffer) in the client xlator: - fop payload, this is the buffer allocated by the write and put fop - rsphdr paylod, this is the buffer required by the reply cbk of some fops like lookup, readdir. - rsp_paylod, this is the buffer required by the reply cbk of fops like readv etc. Currently, in the lookup and readdir fop the rsphdr is sent as payload, hence the allocated rsphdr buffer is also sent on the wire, increasing the bandwidth consumption on the wire. With this patch, the issue is fixed. Fixes: bz#1673058 Change-Id: Ie8158921f4db319e60ad5f52d851fa5c9d4a269b Signed-off-by: Poornima G <pgurusid@redhat.com>
* cluster/dht: Fix lookup selfheal and rmdir raceN Balachandran2019-04-081-9/+25
| | | | | | | | | | | | | | | A race between the lookup selfheal and rmdir can cause directories to be healed only on non-hashed subvols. This can prevent the directory from being listed from the mount point and in turn causes rm -rf to fail with ENOTEMPTY. Fix: Update the layout information correctly and reduce the call count only after processing the response. Change-Id: I812779aaf3d7bcf24aab1cb158cb6ed50d212451 fixes: bz#1695403 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit b0f1d782fc45313fce4e1c0e74127401d5342d05)
* gfapi: Unblock epoll thread for upcall processingSoumya Koduri2019-04-011-8/+34
| | | | | | | | | | | | | | | | | | | | | | With commit#ad35193,we have made changes to offload processing upcall notifications to synctask so as not to block epoll threads. However seems like the issue wasnt fully addressed. In "glfs_cbk_upcall_data" -> "synctask_new1" after creating synctask if there is no callback defined, the thread waits on synctask_join till the syncfn is finished. So that way even with those changes, epoll threads are blocked till the upcalls are processed. Hence the right fix now is to define a callback function for that synctask "glfs_cbk_upcall_syncop" so as to unblock epoll/notify threads completely and the upcall processing can happen in parallel by synctask threads. Change-Id: I4d8645e3588fab2c3ca534e0112773aaab68a5dd fixes: bz#1694562 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 4a03a71c6171f6e8382664d9d29857d06ef37741)
* doc: Added release notes for 5.5v5.5ShyamsundarR2019-03-151-0/+42
| | | | | | Fixes: bz#1689214 Change-Id: I57a1afa2649828d0399fab2bf163a05cf35358db Signed-off-by: ShyamsundarR <srangana@redhat.com>
* eventsapi: Fix error while handling GlusterCmdExceptionAravinda VK2019-03-151-2/+6
| | | | | | | | | | `GlusterCmdException` was wrongly accessed instead of accessing `GlusterCmdException.message`. Fixes: bz#1687249 Change-Id: I35ec1b05726050bfd8761e05ad9b9e47917dc0c6 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit 27f6375df009c8c4798b72aeafce79456007d21f)
* cluster/afr: Send truncate on arbiter brick from SHDkarthik-us2019-03-123-16/+52
| | | | | | | | | | | | | | | | | | | Problem: In an arbiter volume configuration SHD will not send any writes onto the arbiter brick even if there is data pending marker for the arbiter brick. If we have a arbiter setup on the geo-rep master and there are data pending markers for the files on arbiter brick, SHD will not mark any data changelog during healing. While syncing the data from master to slave, if the arbiter-brick is considered as ACTIVE, then there is a chance that slave will miss out some data. If the arbiter brick is being newly added or replaced there is a chance of slave missing all the data during sync. Fix: If there is data pending marker for the arbiter brick, send truncate on the arbiter brick during heal, so that it will record truncate as the data transaction in changelog. Change-Id: I3242ba6cea6da495c418ef860d9c3359c5459dec fixes: bz#1687687 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* core: make compute_cksum function op_version compatibleSanju Rakonde2019-03-114-12/+23
| | | | | | | | | | | | | | | | | | | | Problem: commit 5a152a changed the mechansim of computing the checksum. In heterogeneous cluster, peers are running into rejected state because we have different cksum computation mechansims in upgraded and non-upgraded nodes. Solution: add a check for op-version so that all the nodes in the cluster follow the same mechanism for computing the cksum. fixes: bz#1684569 > Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193 > BUG: bz#1685120 > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> Change-Id: I1508f000e8c9895588b6011b8b6cc0eda7102193 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* dict: handle STR_OLD data type in xdr conversionsAmar Tumballi2019-03-072-0/+3
| | | | | | | | | | | | | | | | | Currently a dict conversion on wire for 3.x protocol happens using `dict_unserialize()`, which sets the type of data as STR_OLD. But the new protocol doesn't send it over the wire as its not considered as a valid format in new processes. But considering we deal with old and new protocol when we do a rolling upgrade, it will allow us to get all the information properly with new protocol. Credits: Krutika Dhananjay Fixes: bz#1684385 Change-Id: I165c0021fb195b399790b9cf14a7416ae75ec84f Signed-off-by: Amar Tumballi <amarts@redhat.com>
* doc: Added release notes for release 5.4v5.4ShyamsundarR2019-02-261-0/+33
| | | | | | Change-Id: I6aaa93edc4ae7d018c83c5d23d536075c013d0b5 fixes: bz#1667103 Signed-off-by: ShyamsundarR <srangana@redhat.com>
* eventsapi: Fix Python3 compatibility issuesAravinda VK2019-02-267-43/+52
| | | | | | | | | | | | - Fixed Relative import and non-package import related issues. - socketserver import issues fix - Renamed installed directory name to `gfevents` from `events`(To avoid any issues with other global libs) Fixes: bz#1649054 Change-Id: I3dc38bc92b23387a6dfbcc0ab8283178235bf756 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit cd68f7b88b9a2c9a4e4ff9fca61517384e54130a)
* packaging: Obsoleting glusterfs-gnfs for upgradeKaleb S. KEITHLEY2019-02-251-0/+6
| | | | | | | | | | | | | | Master refs > https://review.gluster.org/#/c/glusterfs/+/22253/ > fixes: bz#1672711 > Change-Id: Iad7194e788a8eeecd617614e9f8a1fe3264a384d > Signed-off-by: Sahina Bose <sabose@redhat.com> > Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> Fixes: bz#1679968 Change-Id: Ib66d15953abb2645238f01c8ee9df54d2b35736a Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
* socket: socket event handlers now return voidMilind Changire2019-02-256-24/+22
| | | | | | | | | | | | | | | | | | Problem: Returning any value from socket event handlers to the event sub-system doesn't make sense since event sub-system cannot handle socket sub-system errors. Solution: Change return type of all socket event handlers to 'void' mainline: > Reviewed-on: https://review.gluster.org/c/glusterfs/+/22221 Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f Fixes: bz#1651246 Signed-off-by: Milind Changire <mchangir@redhat.com> (cherry picked from commit 776ba851c6ee6c265253d44cf1d6e4e3d4a21772)
* fuse: reflect the actual default for lru-limit optionAmar Tumballi2019-02-252-2/+2
| | | | | | | | in both `--help` text and man page updates: bz#1667103 Change-Id: I9aa9367c6863ac8e2403255280697c9e6be26cf0 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* performance/write-behind: fix use-after-free in readdirpRaghavendra Gowdappa2019-02-221-18/+22
| | | | | | | | | | | | | | | Two issues were found: 1. in wb_readdirp_cbk, inode should unrefed after wb_inode is unlocked. Otherwise, inode and hence the context wb_inode can be freed by the type we try to unlock wb_inode 2. wb_readdirp_mark_end iterates over a list of wb_inodes of children of a directory. But inodes could've been freed and hence the list might be corrupted. To fix take a reference on inode before adding it to invalidate_list of parent. Change-Id: I911b0e0b2060f7f41ded0b05db11af6f9b7c09c5 Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Updates: bz#1671556
* performance/write-behind: handle call-stub leaksRaghavendra Gowdappa2019-02-221-0/+8
| | | | | | | Change-Id: I7be9a5f48dcad1b136c479c58b1dca1e0488166d Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com> Fixes: bz#1671556 (cherry picked from commit 6175cb10cd5f59f3c7ae4100bc78f359b68ca3e9)
* md-cache: Adapt integer data types to avoid integer overflowDavid Spisla2019-02-201-3/+3
| | | | | | | | | | | | | | The "struct iatt" in iatt.h is using int64_t types for storing the atime, mtime and ctime. Therefore the struct 'struct md_cache' in md-cache.c should also use this types to avoid an integer overflow. This can happen e.g. if someone uses a very high default-retention-period in the WORM-Xlator. Change-Id: I605268d300ab622b9c8ab30e459dc00d9340aad1 fixes: bz#1678726 Signed-off-by: David Spisla <david.spisla@iternity.com> (cherry picked from commit 15423e14f16dd1a15ee5e5cbbdbdd370e57ed59f)
* cluster/thin-arbiter: Consider thin-arbiter before marking new entry changelogAshish Pandey2019-02-185-25/+103
| | | | | | | | | | | | | | | | | | | If a fop to create an entry fails on one of the data brick, we mark the pending changelog on the entry on brick for which it was successful. This is done as part of post op phase to make sure that entry gets healed even if it gets renamed to some other path where its parent was not marked as bad. As it happens as part of post op, we should consider thin-arbiter to check if the brick, which was successful, is the good brick or not. This will avoide split brain and other issues. >Change-Id: I12686675be98f02f70a5186b3ed748c541514d53 >Signed-off-by: Ashish Pandey <aspandey@redhat.com> Change-Id: I12686675be98f02f70a5186b3ed748c541514d53 updates: bz#1672314 Signed-off-by: Ashish Pandey <aspandey@redhat.com>
* Bump up timeout for tests on AWSNigel Babu2019-02-073-3/+4
| | | | | | Fixes: bz#1673268 Change-Id: I2b9be45f199f6436b858536c6f49be85902217f0 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* libglusterfs/common-utils.c: Fix buffer size for checksum computationVarsha Rao2019-02-042-2/+37
| | | | | | | | | | | | | | Problem: When quorum count option is updated, the change is not reflected in the nfs-server.vol file. This is because in get_checksum_for_file(), when the last part of the file read has size less than buffer size, the read buffer stores old data value along with correct data value. Solution: Pass the bytes read instead of fixed buffer size, for calculating checksum. Change-Id: I4b641607c8a262961b3f3da0028a54e08c3f8589 fixes: bz#1672248 Signed-off-by: Varsha Rao <varao@redhat.com>
* features/shard: Ref shard inode while adding to fsync listKrutika Dhananjay2019-02-042-8/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: Lot of the earlier changes in the management of shards in lru, fsync lists assumed that if a given shard exists in fsync list, it must be part of lru list as well. This was found to be not true. Consider this - a file is FALLOCATE'd to a size which would make the number of participant shards to be greater than the lru list size. In this case, some of the resolved shards that are to participate in this fop will be evicted from lru list to give way to the rest of the shards. And once FALLOCATE completes, these shards are added to fsync list but without a ref. After the fop completes, these shard inodes are unref'd and destroyed while their inode ctxs are still part of fsync list. Now when an FSYNC is called on the base file and the fsync-list traversed, the client crashes due to illegal memory access. FIX: Hold a ref on the shard inode when adding to fsync list as well. And unref under following conditions: 1. when the shard is evicted from lru list 2. when the base file is fsync'd 3. when the shards are deleted. Change-Id: Iab460667d091b8388322f59b6cb27ce69299b1b2 fixes: bz#1669382 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 72922c1fd69191b220f79905a23395c3a87f86ce)
* readdir-ahead: do not zero-out iatt in fop cbkRavishankar N2019-02-042-20/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | ...when ctime is zero. ia_type and ia_gfid always need to be non-zero for things to work correctly. Problem: Commit c9bde3021202f1d5c5a2d19ac05a510fc1f788ac zeroed out the iatt buffer in the cbks of modification fops before unwinding if the ctime in the buffer was zero. This was causing the fops to fail: noticeable when AFR's 'consistent-metadata' option was enabled. (AFR zeros out the ctime when the option is set. See commit 4c4624c9bad2edf27128cb122c64f15d7d63bbc8). Fixes: -Do not zero out the ia_type and ia_gfid of the iatt buff under any circumstance. -Also, fixed _rda_inode_ctx_update_iatts() to always update these values from the incoming buf when ctime is zero. Otherwise we end up with zero ia_type and ia_gfid the first time the function is called *and* the incoming buf has ctime set to zero. fixes: bz#1665145 Reported-By:Michael Hanselmann <public@hansmi.ch> Change-Id: Ib72228892d42c3513c19fc6dfb543f2aa3489eca Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 09db11b0c020bc79d493c6d7e7ea4f3beb000c68)
* cluster/dht: Delete invalid linkto files in rmdirN Balachandran2019-02-042-1/+67
| | | | | | | | | | | | rm -rf <dir> fails on dirs which contain linkto files that point to themselves because dht incorrectly thought that they were cached files after looking them up. The fix now treats them as invalid linkto files and deletes them. Change-Id: I376c72a5309714ee339c74485e02cfb4e29be643 fixes: bz#1671611 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* socket: fix issue when socket write return with EAGAINZhang Huan2019-02-041-0/+2
| | | | | | | | | | | | | | In the case socket write return with EAGAIN, the remaining vector count is return all way back to event handler, making followup pollin event to skip handling and dispatch loop complains about failure. Even thought temporary write failure is not an error. [2018-12-29 07:31:41.772310] E [MSGID: 101191] [event-epoll.c:674:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler Change-Id: Idf03d120b5f7619eda19720a583cbcc3e7da2504 updates: bz#1651246 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com> (cherry picked from commit 0301a66bda44582e3a48519f2a5d365b0c38090d)
* socket: don't pass return value from protocol handler to event handlerZhang Huan2019-02-041-2/+2
| | | | | | | | | | | | Event handler handles socket level error only, while protocol handler handles in protocol level error. If protocol handler decides to disconnect on error in any case, it should call disconnect instead of return an error back to event handler. Change-Id: I9375be98cc52cb969085333f3c7229a91207d1bd updates: bz#1651246 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com> (cherry picked from commit cd5714554627fe90ee2c77685cb410a8fb25eceb)
* core: move "dict is NULL" logs to DEBUG log levelMilind Changire2019-02-041-2/+2
| | | | | | | | | Too many logs get printed if dict_ref() and dict_unref() are passed NULL pointer. fixes: bz#1671217 Change-Id: I18afd849d64318f68baa7b549ee310dac0e1e786 Signed-off-by: Milind Changire <mchangir@redhat.com>
* api: bad GFAPI_4.1.6 blockKaleb S. KEITHLEY2019-01-291-2/+3
| | | | | | | | missing global: line, tabs not spaces Change-Id: Icdbc23b4e4cd608da1d764e81757201c4b1269a6 fixes: bz#1670307 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
* doc: Release notes for 5.3v5.3ShyamsundarR2019-01-161-0/+30
| | | | | | Fixes: bz#1659085 Change-Id: I5195e4eca6518e3122ea188e2f4891f5e68ca01a Signed-off-by: ShyamsundarR <srangana@redhat.com>
* fuse: add --lru-limit optionAmar Tumballi2019-01-1611-87/+396
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode LRU mechanism is moot in fuse xlator (ie. there is no limit for the LRU list), as fuse inodes are referenced from kernel context, and thus they can only be dropped on request of the kernel. This might results in a high number of passive inodes which are useless for the glusterfs client, causing a significant memory overhead. This change tries to remedy this by extending the LRU semantics and allowing to set a finite limit on the fuse inode LRU. A brief history of problem: When gluster's inode table was designed, fuse didn't have any 'invalidate' method, which means, userspace application could never ask kernel to send a 'forget()' fop, instead had to wait for kernel to send it based on kernel's parameters. Inode table remembers the number of times kernel has cached the inode based on the 'nlookup' parameter. And 'nlookup' field is not used by no other entry points (like server-protocol, gfapi etc). Hence the inode_table of fuse module always has to have lru-limit as '0', which means no limit. GlusterFS always had to keep all inodes in memory as kernel would have had a reference to it. Again, the reason for this is, kernel's glusterfs inode reference was pointer of 'inode_t' structure in glusterfs. As it is a pointer, we could never free it (to prevent segfault, or memory corruption). Solution: In the inode table, handle the prune case of inodes with 'nlookup' differently, and call a 'invalidator' method, which in this case is fuse_invalidate(), and it sends the request to kernel for getting the forget request. When the kernel sends the forget, it means, it has dropped all the reference to the inode, and it will send the forget with the 'nlookup' parameter too. We just need to make sure to reduce the 'nlookup' value we have when we get forget. That automatically cause the relevant prune to happen. Credits: Csaba Henk, Xavier Hernandez, Raghavendra Gowdappa, Nithya B fixes: bz#1623107 Change-Id: Ifee0737b23b12b1426c224ec5b8f591f487d83a2 Signed-off-by: Amar Tumballi <amarts@redhat.com>
* features/shard: Fix launch of multiple synctasks for background deletionKrutika Dhananjay2019-01-152-71/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM: When multiple sharded files are deleted in quick succession, multiple issues were observed: 1. misleading logs corresponding to a sharded file where while one log message said the shards corresponding to the file were deleted successfully, this was followed by multiple logs suggesting the very same operation failed. This was because of multiple synctasks attempting to clean up shards of the same file and only one of them succeeding (the one that gets ENTRYLK successfully), and the rest of them logging failure. 2. multiple synctasks to do background deletion would be launched, one for each deleted file but all of them could readdir entries from .remove_me at the same time could potentially contend for ENTRYLK on .shard for each of the entry names. This is undesirable and wasteful. FIX: Background deletion will now follow a state machine. In the event that there are multiple attempts to launch synctask for background deletion, one for each file deleted, only the first task is launched. And if while this task is doing the cleanup, more attempts are made to delete other files, the state of the synctask is adjusted so that it restarts the crawl even after reaching end-of-directory to pick up any files it may have missed in the previous iteration. This patch also fixes uninitialized lk-owner during syncop_entrylk() which was leading to multiple background deletion synctasks entering the critical section at the same time and leading to illegal memory access of base inode in the second syntcask after it was destroyed post shard deletion by the first synctask. Change-Id: Ib33773d27fb4be463c7a8a5a6a4b63689705324e updates: bz#1665803 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit c0c2022e7d7097e96270a74f37813eda0c4e6339)
* features/shard: Assign fop id during background deletion to prevent ↵Krutika Dhananjay2019-01-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | excessive logging ... of the kind "[2018-12-26 05:22:44.195019] E [MSGID: 133010] [shard.c:2253:shard_common_lookup_shards_cbk] 0-volume1-shard: Lookup on shard 785 failed. Base file gfid = cd938e64-bf06-476f-a5d4-d580a0d37416 [No such file or directory]" shard_common_lookup_shards_cbk() has a specific check to ignore ENOENT error without logging them during specific fops. But because background deletion is done in a new frame (with local->fop being GF_FOP_NULL), the ENOENT check is skipped and the absence of shards gets logged everytime. To fix this, local->fop is initialized to GF_FOP_UNLINK during background deletion. Change-Id: I0ca8d3b3bfbcd354b4a555eee520eb0479bcda35 updates: bz#1665803 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit aa28fe32364e39981981d18c784e7f396d56153f)
* leases: Reset lease_ctx->timer post deletionSoumya Koduri2019-01-091-0/+1
| | | | | | | | | | To avoid use_after_free, reset lease_ctx->timer back to NULL after the structure has been freed. Change-Id: Icd213ec809b8af934afdb519c335a4680a1d6cdc updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit a9b0003c717087ff168bc143c70559162e53e0d5)
* core: Fixed typos in nl-cache and logging-guidelines.mdN Balachandran2019-01-092-3/+3
| | | | | | | | | Replaced "recieve" with "receive". Change-Id: I58a3d3d4a0093df4743de9fae4d8ff152d4b216c fixes: bz#1662200 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit a11c5c66321dd8411373a68cc163c981c7d083df)
* gfapi: Access fs->oldvolfile under mutex lockSoumya Koduri2019-01-091-0/+6
| | | | | | | | | | | | | | | | In some cases (for eg., when there are multiple RPC_CLNT_CONNECT notifications), multiple threads may fetch volfile and try to update it in 'fs' object simultaneously. Hence protect those variables' access under fs->mutex lock. This is backport of below two mainline patches - - https://review.gluster.org/#/c/glusterfs/+/21882/ - https://review.gluster.org/#/c/glusterfs/+/21927/ Change-Id: Idaee9548560db32d83f4c04ebb1f375fee7864a9 fixes: bz#1663131 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit 8fe3c6107a2b431d7cc0b8cfaeeb7941cf9590f9)
* io-cache: xdata needs to be passed for readv operationsSoumya Koduri2018-12-302-2/+16
| | | | | | | | | | | | | io-cache xlator has been skipping xdata references when the date needs to be read into page cache. This patch fixes the same. Note: similar changes may be needed for other fops as well which are handled by io-cache. Change-Id: I28d73d4ba471d13eb55d0fd0b5197d222df77a2a updates: bz#1651323 Signed-off-by: Soumya Koduri <skoduri@redhat.com> (cherry picked from commit b3d88a0904131f6851f4185e43f815ecc3353ab5)
* geo-rep: Fix syncing of files with non-ascii filenamesKotresh HR2018-12-265-69/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Creation of files/directories with non-ascii names fails to sync to the slave. It crashes with below traceback on slave. ... File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 709, in entry_ops [ESTALE, EINVAL, EBUSY]) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 546, in errno_wrap return call(*arg) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 83, in lsetxattr cls.raise_oserr() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/libcxattr.py", line 38, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 12] Cannot allocate memory Cause: The length calculation arguments passed to blob creation was done before encoding. Hence was failing in gfid-access layer. Fix: It appears that the calculating lenght properly fixes this issue. But it will cause issues in other places in 'python2' and not in 'python3'. So encoding and decoding each required string to make geo-rep compatible with both 'python2' and 'python3' is a nightmare and is not fool proof. Hence kept 'python2' code as is with out encode/decode and applied encode/decode only to 'python3' Added non-ascii filename tests to regression Backport of: > Patch: https://review.gluster.org/21668 > BUG: 1650893 > Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1648642 Change-Id: I35cfaf848e07b1a0b5cb93c01b98b472f08271a6 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/dht: sync brick root perms on add brickN Balachandran2018-12-262-22/+14
| | | | | | | | | | | | | | | | | If a single brick is added to the volume and the newly added brick is the first to respond to a dht_revalidate call, its stbuf will not be merged into local->stbuf as the brick does not yet have a layout. The is_permission_different check therefore fails to detect that an attr heal is required as it only considers the stbuf values from existing bricks. To fix this, merge all stbuf values into local->stbuf and use local->prebuf to store the correct directory attributes. Change-Id: Ic9e8b04a1ab9ed1248b6b056e3450bbafe32e1bc fixes: bz#1660736 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* performance/rda: Fixed dict_t memory leakN Balachandran2018-12-261-8/+0
| | | | | | | | | | | Removed all references to dict_t xdata_from_req which is allocated but not used anywhere. It is also not cleaned up and hence causes a memory leak. fixes: bz#1659676 Change-Id: I2edb857696191e872ad12a12efc36999626bacc7 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* shard: prevent segfault in shard_unlink_block_inode()Sunny Kumar2018-12-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | gluster-blockd sometimes segfaults with the following backtrace: Core was generated by `/usr/sbin/gluster-blockd --glfs-lru-count 5 --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007fbb9cd639b9 in shard_unlink_block_inode (local=local@entry=0x7fbb80000a78, shard_block_num=<optimized out>) at shard.c:2929 2929 base_ictx->fsync_count--; (gdb) bt #0 0x00007fbb9cd639b9 in shard_unlink_block_inode (local=local@entry=0x7fbb80000a78, shard_block_num=<optimized out>) at shard.c:2929 #1 0x00007fbb9cd64311 in shard_unlink_shards_do_cbk (frame=frame@entry=0x7fbb9010a768, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, preparent=preparent@entry=0x7fbb7470dcf8, postparent=postparent@entry=0x7fbb7470dd90, xdata=xdata@entry=0x0) at shard.c:2987 A fix for this has already been provided through a Converity report. Backport of: > Change-Id: Ic5d302a5e32d375acf8adc412763ab94e6dabc3d > Signed-off-by: Sunny Kumar <sunkumar@redhat.com> > (cherry picked from commit 145e180517054626d07892219fdee689b703c218) Change-Id: I699a039e9c5115eb3376190dd8014427d12a293b Updates: bz#1659563 Signed-off-by: Niels de Vos <ndevos@redhat.com>