summaryrefslogtreecommitdiffstats
path: root/xlators
Commit message (Collapse)AuthorAgeFilesLines
* performance/md-cache: Do not skip caching of null character xattr valuesAnoop C S2019-08-282-20/+23
| | | | | | | | | | | | | | | | | | | | | | Null character string is a valid xattr value in file system. But for those xattrs processed by md-cache, it does not update its entries if value is null('\0'). This results in ENODATA when those xattrs are queried afterwards via getxattr() causing failures in basic operations like create, copy etc in a specially configured Samba setup for Mac OS clients. On the other side snapview-server is internally setting empty string("") as value for xattrs received as part of listxattr() and are not intended to be cached. Therefore we try to maintain that behaviour using an additional dictionary key to prevent updation of entries in getxattr() and fgetxattr() callbacks in md-cache. Credits: Poornima G <pgurusid@redhat.com> Change-Id: I7859cbad0a06ca6d788420c2a495e658699c6ff7 Fixes: bz#1743782 Signed-off-by: Anoop C S <anoopcs@redhat.com> (cherry picked from commit b4b683736367d93daad08a5ee6ca95778c07c5a4)
* afr: restore timestamp of parent dir during entry-healRavishankar N2019-08-261-0/+2
| | | | | | | Fixes: bz#1741044 Change-Id: I29e338bac62104233a6f80212df8d0fb016affda Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 8e9c53ebf16705b9a1db2fc486dc24a5cb244ddd)
* protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brickRaghavendra G2019-08-091-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | Two reasons: * ping responses from glusterd may not be relevant for Halo replication. Instead, it might be interested in only knowing whether the brick itself is responsive. * When a brick is killed, propagating GF_EVENT_CHILD_PING of ping response from glusterd results in GF_EVENT_DISCONNECT spuriously propagated to parent xlators. These DISCONNECT events are from the connections client establishes with glusterd as part of its reconnect logic. Without GF_EVENT_CHILD_PING, the last event propagated to parent xlators would be the first DISCONNECT event from brick and hence subsequent DISCONNECTS to glusterd are not propagated as protocol/client prevents same event being propagated to parent xlators consecutively. propagating GF_EVENT_CHILD_PING for ping responses from glusterd would change the last_sent_event to GF_EVENT_CHILD_PING and hence protocol/client cannot prevent subsequent DISCONNECT events Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Fixes: bz#1739335 Change-Id: I50276680c52f05ca9e12149a3094923622d6eaef (cherry picked from commit 5d66eafec581fb3209af74595784be8854ca40a4)
* features/utime: always update ctime at setattrKinglong Mee2019-08-072-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | For the nfs EXCLUSIVE mode create may sets a later time to mtime (at verifier), it should not set to ctime for storage.ctime does not allowed set ctime to a earlier time. /* Earlier, mdata was updated only if the existing time is less * than the time to be updated. This would fail the scenarios * where mtime can be set to any time using the syscall. Hence * just updating without comparison. But the ctime is not * allowed to changed to older date. */ According to kernel's setattr, always set ctime at setattr, and doesnot set ctime from mtime at storage.ctime. Backport of: > Patch: https://review.gluster.org/23154 > Change-Id: I5cfde6cb7f8939da9617506e3dc80bd840e0d749 > BUG: 1737288 > Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Change-Id: I5cfde6cb7f8939da9617506e3dc80bd840e0d749 fixes: bz#1737746 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* posix/ctime: Fix race during lookup ctime xattr healKotresh HR2019-08-071-18/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Ctime heals the ctime xattr ("trusted.glusterfs.mdata") in lookup if it's not present. In a multi client scenario, there is a race which results in updating the ctime xattr to older value. e.g. Let c1 and c2 be two clients and file1 be the file which doesn't have the ctime xattr. Let the ctime of file1 be t1. (from backend, ctime heals time attributes from backend when not present). Now following operations are done on mount c1 -> ls -l /mnt/file1 | c2 -> ls -l /mnt/file1;echo "append" >> /mnt/file1; The race is that the both c1 and c2 didn't fetch the ctime xattr in lookup, so both of them tries to heal ctime to time 't1'. If c2 wins the race and appends the file before c1 heals it, it sets the time to 't1' and updates it to 't2' (because of append). Now c1 proceeds to heal and sets it to 't1' which is incorrect. Solution: Compare the times during heal and only update the larger time. This is the general approach used in ctime feature but got missed with healing legacy files. Backport of: > Patch: https://review.gluster.org/23131 > BUG: 1734299 > Change-Id: I930bda192c64c3d49d0aed431ce23d3bc57e51b7 > Signed-off-by: Kotresh HR <khiremat@redhat.com> fixes: bz#1737745 Change-Id: I930bda192c64c3d49d0aed431ce23d3bc57e51b7 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* features/utime: Fix mem_put crashPranith Kumar K2019-08-061-1/+3
| | | | | | | | | | | | | | | | | | | | Problem: When frame->local is not null FRAME_DESTROY calls mem_put on it. Since the stub is already destroyed in call_resume(), it leads to crash Fix: Set frame->local to NULL before calling call_resume() Backport of: > Patch: https://review.gluster.org/23091 > BUG: 1593542 > Change-Id: I0f8adf406f4cefdb89d7624ba7a9d9c2eedfb1de > Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> fixes: bz#1733885 Change-Id: I0f8adf406f4cefdb89d7624ba7a9d9c2eedfb1de Signed-off-by: Kotresh HR <khiremat@redhat.com>
* ctime: Set mdata xattr on legacy filesKotresh HR2019-08-066-56/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: The files which were created before ctime enabled would not have "trusted.glusterfs.mdata"(stores time attributes) xattr. Upon fops which modifies either ctime or mtime, the xattr gets created with latest ctime, mtime and atime, which is incorrect. It should update only the corresponding time attribute and rest from backend Solution: Creating xattr with values from brick is not possible as each brick of replica set would have different times. So create the xattr upon successful lookup if the xattr is not created Note To Reviewers: The time attributes used to set xattr is got from successful lookup. Instead of sending the whole iatt over the wire via setxattr, a structure called mdata_iatt is sent. The mdata_iatt contains only time attributes. Backport of: > Patch: https://review.gluster.org/22936 > Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 > BUG: 1593542 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: I5e535631ddef04195361ae0364336410a2895dd4 updates: bz#1733885 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* features/snapview-server: obtain the list of snapshots inside the lockRaghavendra Bhat2019-08-011-1/+1
| | | | | | | | | The current list of snapshots from priv->dirents is obtained outside the lock. Change-Id: I8876ec0a38308da5db058397382fbc82cc7ac177 Fixes: bz#1731509 (cherry picked from commit 8e795617fd6f5193d0d52a336059ce1a28108c0e)
* cluster/dht: Fix directory perms during selfhealN Balachandran2019-07-291-3/+5
| | | | | | | | | | | | | | Fixed a bug in the revalidate code path that wiped out directory permissions if no mds subvol was found. Backport of: > Change-Id: I8b4239ffee7001493c59d4032a2d3062586ea115 > fixes: bz#1716830 > Signed-off-by: N Balachandran <nbalacha@redhat.com> Change-Id: I8b4239ffee7001493c59d4032a2d3062586ea115 fixes: bz#1716848 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* glusterd: do not mark skip_locking as true for geo-rep operationsSanju Rakonde2019-07-241-2/+7
| | | | | | | | | | | | | | | | | | | | | | We need to send the commit req to peers in case of geo-rep operations even though it is a no volname operation. In commit phase peers try to set the txn_opinfo which will fail because it is a no volname operation where we don't require a commit phase. We mark skip_locking as true for no volname operations, but we have to give an exception to geo-rep operations, so that they can set txn_opinfo in commit phase. Please refer to detailed RCA at the bug: 1730545 fixes: bz#1730545 Cherrypicked from https://review.gluster.org/#/c/glusterfs/+/23034/ > Change-Id: I9f2478b12a281f6e052035c0563c40543493a3fc > Signed-off-by: Sanju Rakonde <srakonde@redhat.com> Change-Id: I9f2478b12a281f6e052035c0563c40543493a3fc Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
* features/snapview-server: use the same volfile server for gfapi optionsRaghavendra Bhat2019-07-162-4/+42
| | | | | | | | | | | | snapview server xlator makes use of "localhost" as the volfile server while initing the new glfs instance to talk to a snapshot. While localhost is fine, better use the same volfile server that was used to start the snapshot daemon containing the snapview-server xlator. Change-Id: I4485d39b0e3d066f481adc6958ace53ea33237f7 fixes: bz#1727984 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> (cherry picked from commit 36f6e6df0ff15d0464b869803710adca2b65e8ba)
* glusterd: add GF_TRANSPORT_BOTH_TCP_RDMA in glusterd_get_gfproxy_client_volfileAtin Mukherjee2019-07-161-0/+1
| | | | | | | | | | | | | ... with out which volume creation fails with "volume create: <xyz>: failed: Failed to create volume files" >Fixes: bz#1716812 >Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb >Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Fixes: bz#1721105 Change-Id: I2f4c2c6d5290f066b54e1c1db19e25db9937bedb Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* mgmt/glusterd: Fix a memory leak when peer detach failsVijay Bellur2019-07-161-0/+13
| | | | | | | | | | Dictionary object is not being unref'd when an error happens in __glusterd_handle_cli_deprobe(). This patch addresses that problem. Change-Id: I11e1f92d06dc9edd1260845256f435ea31ef1a87 fixes: bz#1683815 Signed-off-by: Vijay Bellur <vbellur@redhat.com> (cherry picked from commit 16b4936696c8b602243513fbde0b20a1e8417432)
* glusterd: conditionally clear txn_opinfo in stage opAtin Mukherjee2019-07-091-2/+10
| | | | | | | | | | | | | | ...otherwise this leads to a crash when volume status is run on a heterogeneous mode. > Fixes: bz#1723658 > Change-Id: I0d39f412b2e5e9d3ef0a3462b90b38bb5364b09d > Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit a72452fcf90679b28baec12d2769cbaa982bb4e4) Fixes: bz#1728126 Change-Id: I0d39f412b2e5e9d3ef0a3462b90b38bb5364b09d Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* upcall: Avoid sending notifications for invalid inodesSoumya Koduri2019-07-051-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | For nameless LOOKUPs, server creates a new inode which shall remain invalid until the fop is successfully processed post which it is linked to the inode table. But incase if there is an already linked inode for that entry, it discards that newly created inode which results in upcall notification. This may result in client being bombarded with unnecessary upcalls affecting performance if the data set is huge. This issue can be avoided by looking up and storing the upcall context in the original linked inode (if exists), thus saving up on those extra callbacks. This is backport of below mainline fix - > fixes: bz#1718338 > patch url: https://review.gluster.org/#/c/glusterfs/+/22840/ > Signed-off-by: Soumya Koduri <skoduri@redhat.com> Change-Id: I044a1737819bb40d1a049d2f53c0566e746d2a17 fixes: bz#1720633 Signed-off-by: Soumya Koduri <skoduri@redhat.com>
* [RFC] change get_real_filename implementation to use ENOATTR instead of ENOENTMichael Adam2019-07-042-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_real_filename is implemented as a virtual extended attribute to help Samba implement the case-insensitive but case preserving SMB protocol more efficiently. It is implemented as a getxattr call on the parent directory with the virtual key of "get_real_filename:<entryname>" by looking for a spelling with different case for the provided file/dir name (<entryname>) and returning this correct spelling as a result if the entry is found. Originally (05aaec645a6262d431486eb5ac7cd702646cfcfb), the implementation used the ENOENT errno to return the authoritative answer that <entryname> does not exist in any case folding. Now this implementation is actually a violation or misuse of the defined API for the getxattr call which returns ENOENT for the case that the dir that the call is made against does not exist and ENOATTR (or the synonym ENODATA) for the case that the xattr does not exist. This was not a problem until the gluster fuse-bridge was changed to do map ENOENT to ESTALE in 59629f1da9dca670d5dcc6425f7f89b3e96b46bf, after which we the getxattr call for get_real_filename returned an ESTALE instead of ENOENT breaking the expectation in Samba. It is an independent problem that ESTALE should not leak out to user space but is intended to trigger retries between fuse and gluster. But nevertheless, the semantics seem to be incorrect here and should be changed. This patch changes the implementation of the get_real_filename virtual xattr to correctly return ENOATTR instead of ENOENT if the file/directory being looked up is not found. The Samba glusterfs_fuse vfs module which takes advantage of the get_real_filename over a fuse mount will receive a corresponding change to map ENOATTR to ENOENT. Without this change, it will still work correctly, but the performance optimization for nonexisting files is lost. On the other hand side, this change removes the distinction between the old not-implemented case and the implemented case. So Samba changed to treat ENOATTR like ENOENT will not work correctly any more against old servers that don't implement get_real_filename. I.e. existing files will be reported as non-existing Backport of: > Change-Id: I971b427ab8410636d5d201157d9af70e0d075b67 > fixes: bz#1722977 > Signed-off-by: Michael Adam <obnox@samba.org> Change-Id: I971b427ab8410636d5d201157d9af70e0d075b67 fixes: bz#1723659 Signed-off-by: Michael Adam <obnox@samba.org> (cherry picked from commit dc1b87fcfef08c9497b0c02b2410c9d18bbc2dba)
* features/shard: Fix block-count accounting upon truncate to lower sizeKrutika Dhananjay2019-07-033-13/+58
| | | | | | | | | | | | | | | | | | | | | | | Backport of: > BUG: bz#1705884 > Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> The way delta_blocks is computed in shard is incorrect, when a file is truncated to a lower size. The accounting only considers change in size of the last of the truncated shards. FIX: Get the block-count of each shard just before an unlink at posix in xdata. Their summation plus the change in size of last shard (from an actual truncate) is used to compute delta_blocks which is used in the xattrop for size update. Change-Id: I9128a192e9bf8c3c3a959e96b7400879d03d7c53 fixes: bz#1716871 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit 400b66d568ad18fefcb59949d1f8368d487b9a80)
* features/shard: Fix integer overflow in block count accountingKrutika Dhananjay2019-07-031-1/+1
| | | | | | | | | | | | | | Backport of: > BUG: bz#1705884 > Change-Id: I2c1ddab17457f45e27428575ad16fa678fd6c0eb > Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> ... by holding delta_blocks in 64-bit int as opposed to 32-bit int. Change-Id: I2c1ddab17457f45e27428575ad16fa678fd6c0eb updates: bz#1716871 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> (cherry picked from commit e18e98659dd2b41eb59cf593fd625f1821a20abf)
* posix : add posix_set_ctime() in posix_ftruncate()Jiffin Tony Thottan2019-07-031-0/+2
| | | | | | | | | | | >Backport of https://review.gluster.org/#/c/glusterfs/+/22948/ >Change-Id: I0cb5320fea71306e0283509ae47024f23874b53b >fixes: bz#1723761 Change-Id: I0cb5320fea71306e0283509ae47024f23874b53b fixes: bz#1724558 Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com> (cherry picked from commit 7d8be567f2f904fc74d0990ebce2e8afbedab918)
* cluster/dht: Fixed a memleak in dht_rename_cbkN Balachandran2019-07-031-11/+33
| | | | | | | | | | | | | Fixed a memleak in dht_rename_cbk when creating a linkto file. >Change-Id: I705adef3cb79e33806520fc2b15558e90e2c211c >fixes: bz#1722698 Change-Id: I705adef3cb79e33806520fc2b15558e90e2c211c fixes: bz#1726294 Signed-off-by: N Balachandran <nbalacha@redhat.com> (cherry picked from commit 532b0fc8b1ace9ad48fdaf643e0b1a34020b6cd8)
* core: avoid dynamic TLS allocation when possibleXavi Hernandez2019-07-033-54/+6
| | | | | | | | | | | | | | | | | | | | | | | | Some interdependencies between logging and memory management functions make it impossible to use the logging framework before initializing memory subsystem because they both depend on Thread Local Storage allocated through pthread_key_create() during initialization. This causes a crash when we try to log something very early in the initialization phase. To prevent this, several dynamically allocated TLS structures have been replaced by static TLS reserved at compile time using '__thread' keyword. This also reduces the number of error sources, making initialization simpler. Backport of: > BUG: 1193929 > Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Updates: bz#1724210 Change-Id: I8ea2e072411e30790d50084b6b7e909c7bb01d50 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* posix/ctime: Fix ctime upgrade issueKotresh HR2019-07-021-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: On a EC volume, during upgrade from the older version where ctime feature is not enabled(or not present) to the newer version where the ctime feature is available (enabled default), the self heal hangs and doesn't complete. Cause: The ctime feature has both client side code (utime) and server side code (posix). The feature is driven from client. Only if the client side sets the time in the frame, should the server side sets the time attributes in xattr. But posix setattr/fseattr was not doing that. When one of the server nodes is updated, since ctime is enabled by default, it starts setting xattr on setattr/fseattr on the updated node/brick. On a EC volume the first two updated nodes(bricks) are not a problem because there are 4 other bricks with consistent data. However once the third brick is updated, the new attribute(mdata xattr) will cause an inconsistency on metadata on 3 bricks, which prevents the file to be repaired. Fix: Don't create mdata xattr with utimes/utimensat system call. Only update if already present. Backport of: > Patch: https://review.gluster.org/22858 > Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c > BUG: 1720201 > Signed-off-by: Kotresh HR <khiremat@redhat.com> Change-Id: Ieacedecb8a738bb437283ef3e0f042fd49dc4c8c fixes: bz#1722805 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* uss: Fix tar issue with ctime and uss enabledKotresh HR2019-07-021-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: If ctime and uss enabled, tar still complains with 'file changed as we read it' Cause: To clear nfs cache (gluster-nfs), the ctime was incremented in snap-view client on stat cbk. Fix: The ctime should not be incremented manually. Since gluster-nfs is planning to be deprecated, this code is being removed to fix the issue. Backport of: > Patch: https://review.gluster.org/2861/ > Change-Id: Iae7f100c20fce880a50b008ba716077350281404 > BUG: 1720290 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 01846e7b9b0b92e4a10f5cbcc921bad90441660d) Change-Id: Iae7f100c20fce880a50b008ba716077350281404 fixes: bz#1721783 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/ec: honor contention notifications for partially acquired locksXavi Hernandez2019-06-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | EC was ignoring lock contention notifications received while a lock was being acquired. When a lock is partially acquired (some bricks have granted the lock but some others not yet) we can receive notifications from acquired bricks, which should be honored, since we may not receive more notifications after that. Since EC was ignoring them, once the lock was acquired, it was not released until the eager-lock timeout, causing unnecessary delays on other clients. This fix takes into consideration the notifications received before having completed the full lock acquisition. After that, the lock will be releaed as soon as possible. Backport of: > BUG: bz#1708156 > Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Fixes: bz#1714172 Change-Id: I2a306dbdb29fb557dcab7788a258bd75d826cc12 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* cluster/ec: Reopen shouldn't happen with O_TRUNCPranith Kumar K2019-05-151-1/+1
| | | | | | | | | | | | | Problem: Doing re-open with O_TRUNC will truncate the fragment even when it is not needed needing extra heals Fix: At the time of re-open don't use O_TRUNC. fixes bz#1709660 Change-Id: Idc6408968efaad897b95a5a52481c66e843d3fb8 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* afr: thin-arbiter lock release fixesRavishankar N2019-05-153-47/+93
| | | | | | | | | | | | | | | | | | | | | - pass fop state instead of afr local to afr_ta_dom_lock_check_and_release() - avoid afr_lock_release_synctask() being called simultaneosuly from notify code path and transaction (post-op) code path due to races. - Check if the post-op on TA is valid based on event_gen checks. - Invalidate in-memory information when we get TA child down. Note: Thi patch addresses some pending review comments of commit 053b1309dc8fbc05fcde5223e734da9f694cf5cc (https://review.gluster.org/#/c/glusterfs/+/20095/) fixes: bz#1709130 Change-Id: I2ccd7e1b53362f9f3fed8680aecb23b5011eb18c Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 9ab2747da78061882f6734df4b265bce11adaef1)
* cluster/afr : TA: Return actual error code in case of failureAshish Pandey2019-05-131-6/+6
| | | | | | | | | | In afr_ta_post_op_do, we were sending EIO for every failure. However, the original error code should be sent. Change-Id: I9fdc15dac00d758baf8e6f14db244f526481a63a updates: bz#1709143 Signed-off-by: Ashish Pandey <aspandey@redhat.com> (cherry picked from commit 63159cdb5374f458d7d2bffec24d4720ffc96d6c)
* cluster/dht: Refactor dht lookup functionsN Balachandran2019-05-091-74/+30
| | | | | | | | | | Part 2: Modify dht_revalidate_cbk to call dht_selfheal_directory instead of separate calls to heal attrs and xattrs. Change-Id: Id41ac6c4220c2c35484812bbfc6157fc3c86b142 fixes: bz#1707393 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: refactor dht lookup functionsN Balachandran2019-05-082-124/+119
| | | | | | | | | | Part 1: refactor the dht_lookup_dir_cbk and dht_selfheal_directory functions. Added a simple dht selfheal directory test Change-Id: I1410c26359e3c14b396adbe751937a52bd2fcff9 updates: bz#1707393 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* glusterd: define dumpops in the xlator_api of glusterdSanju Rakonde2019-05-081-0/+1
| | | | | | | | | | | | | | | Problem: statedump is not capturing information related to glusterd Solution: statdump is not capturing glusterd info because trav->dumpops is null in gf_proc_dump_single_xlator_info () where trav is glusterd xlator object. trav->dumpops is null because we missed to define dumpops in xlator_api of glusterd. defining dumpops in xlator_api of glusterd fixes the issue. fixes: bz#1703759 Change-Id: If85429ecb1ef580aced8d5b88d09fc15258bfc4c Signed-off-by: Sanju Rakonde <srakonde@redhat.com> (cherry picked from commit 5d866c13efdcdeddf184f012aa88a652e90ff22e)
* ctime: Fix log repeated logging during openKotresh HR2019-05-081-10/+5
| | | | | | | | | | | | | | | | | | | The log "posix set mdata failed, No ctime" logged repeatedly after the fix [1]. Those could be internal fops. This patch fixes the same. [1] https://review.gluster.org/22540 Backport of: > Patch: https://review.gluster.org/#/c/glusterfs/+/22591/ > BUG:1701457 > Change-Id: I42799a90b976982cedb0ca11fa224d555eb05650 > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit 2d39572821306496c96797f4d122f8200aae4585) fixes: bz#1702734 Change-Id: I42799a90b976982cedb0ca11fa224d555eb05650 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* cluster/ec: fix fd reopenPranith Kumar K2019-05-0814-274/+328
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently EC tries to reopen fd's that have been opened while a brick was down. This is done as part of regular write operations, just after having acquired the locks, and it's sent as a sub-fop of the main write fop. There were two problems: 1. The reopen was attempted on all UP bricks, even if a previous lock didn't succeed. This is incorrect because most probably the open will fail. 2. If reopen is sent and fails, the error is propagated to the main operation, causing it to fail when it shouldn't. To fix this, we only attempt reopens on bricks where the current fop owns a lock, and we prevent any error to be propagated to the main fop. To implement this behaviour an argument used to indicate the minimum number of required answers has overloaded to also include some flags. To make the change consistent, it has been necessary to rename the argument, which means that a lot of files have been changed. However there are no functional changes. This change has also uncovered a problem in discard code, which didn't correctely process requests of small sizes because no real discard fop was being processed, only a write of 0's on some region. In this case some fields of the fop remained uninitialized or with incorrect values. To fix this, a new function has been created to simulate success on a fop and it's used in the discard case. Thanks to Pranith for providing a test script that has also detected an issue in this patch. This patch includes a small modification of this script to force data to be written into bricks before stopping them. Backport of: > Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec > BUG: bz#1699866 > Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Change-Id: If272343873369186c2fb8f43c1d9c52c3ea304ec Fixes: bz#1699917 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* glusterd: fix loading ctime in client graph logicAtin Mukherjee2019-04-171-3/+9
| | | | | | | | | Commit efbf8ab wasn't handling all the scenarios of toggling ctime option correctly and more over a ! had completely tossed up the logic. Fixes: bz#1698471 Change-Id: If12e2f69045e59878992ee2cd0518cc0eabcce0d Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
* geo-rep: IPv6 supportAravinda VK2019-04-171-2/+28
| | | | | | | | | | | | | | | | `address_family=inet6` needs to be added while mounting master and slave volumes in gverify script. New option introduced to gluster cli(`--inet6`) which will be used internally by geo-rep while calling `gluster volume info --remote-host=<ipv6>`. Backport of https://review.gluster.org/22363 Fixes: bz#1695436 Change-Id: I1e0d42cae07158df043e64a2f991882d8c897837 Signed-off-by: Aravinda VK <avishwan@redhat.com> (cherry picked from commit 240e1d6821fbb779c3dd73f6f0225d755a5b7cc6)
* cluster/afr: Remove local from owners_list on failure of lock-acquisitionPranith Kumar K2019-04-164-18/+14
| | | | | | | | | | | | | When eager-lock lock acquisition fails because of say network failures, the local is not being removed from owners_list, this leads to accumulation of waiting frames and the application will hang because the waiting frames are under the assumption that another transaction is in the process of acquiring lock because owner-list is not empty. Handled this case as well in this patch. Added asserts to make it easier to find these problems in future. fixes bz#1699731 Change-Id: I3101393265e9827755725b1f2d94a93d8709e923 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* core: Log level changes do not effect on running client processMohit Agrawal2019-04-162-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | Problem: commit c34e4161f3cb6539ec83a9020f3d27eb4759a975 set log-level per xlator during reconfigure only for a brick process not for the client process. Solution: 1) Change per xlator log-level only if brick_mux is enabled.To make sure about brick multiplex introudce a flag brick_mux at ctx->cmd_args. Note: There are two other changes done with this patch 1) Ignore client-log-level option to attach a brick with already running brick if brick_mux is enabled 2) Add a log to print pid of the running process to make easier debugging > Change-Id: I39e85de778e150d0685cd9a79425ce8b4783f9c9 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > Fixes: bz#1696046 > (Cherry picked from commit 798aadbe51a9a02dd98a0f861cc239ecf7c8ed57) > (Reviewed on upstream link https://review.gluster.org/#/c/glusterfs/+/22495/) Change-Id: If91682830f894ab8f6857f19dcb1797fc15ca64c Fixes: bz#1699715 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* posix/ctime: Fix stat(time attributes) inconsistency during readdirpKotresh HR2019-04-162-26/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Creation of tar file on gluster volume throws warning 'file changed as we read it' Cause: During readdirp, for few of the files whose inode is not present, time attributes were served from backend. This caused the ctime of few files to be different between before readdir and after readdir by tar. Solution: If ctime feature is enabled and inode is not present, don't serve the time attributes from backend file, serve it from xattr. Backport of: > Patch: https://review.gluster.org/22540 > BUG: 1698078 > Change-Id: I427ef865f97399475faf5aa6ca495f7e317603ae > Signed-off-by: Kotresh HR <khiremat@redhat.com> (cherry picked from commit c56f102da21c5b69e656a055aaf736281596284d) fixes: bz#1699703 Change-Id: I427ef865f97399475faf5aa6ca495f7e317603ae Signed-off-by: Kotresh HR <khiremat@redhat.com>
* ec: fix truncate lock to cover the write in tuncate cleanKinglong Mee2019-04-161-2/+6
| | | | | | | | | | | ec_truncate_clean does writing under the lock granted for truncate, but the lock is calculated by ec_adjust_offset_up, so that, the write in ec_truncate_clean is out of lock. Updates: bz#1699499 Change-Id: Idbe1fd48d26afe49c36b77db9f12e0907f5a4134 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> (cherry picked from commit 0e1223491e964096384edfae5032ed0d50d028ad)
* core: Brick is not able to detach successfully in brick_mux environmentMohit Agrawal2019-04-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In brick_mux environment, while volumes are stopped in a loop bricks are not detached successfully. Brick's are not detached because xprtrefcnt has not become 0 for detached brick. At the time of initiating brick detach process server_notify saves xprtrefcnt on detach brick and once counter has become 0 then server_rpc_notify spawn a server_graph_janitor_threads for cleanup brick resources.xprtrefcnt has not become 0 because socket framework is not working due to assigning 0 as a fd for socket. In commit dc25d2c1eeace91669052e3cecc083896e7329b2 there was a change in changelog fini to close htime_fd if htime_fd is not negative, by default htime_fd is 0 so it close 0 also. Solution: Initialize htime_fd to -1 after just allocate changelog_priv by GF_CALLOC > Fixes: bz#1699025 > Change-Id: I5f7ca62a0eb1c0510c3e9b880d6ab8af8d736a25 > Signed-off-by: Mohit Agrawal <moagrawal@redhat.com> > (cherry picked from commit b777d83001d8006420b6c7d2d88fe68950aa7e00) Change-Id: I7a2b6fc2d36405d51990376333e093661be48475 Fixes: bz#1699714 Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
* build: glusterfs build is failing on RHEL-6Mohit Agrawal2019-04-161-1/+1
| | | | | | | | | | | | | | | | Problem: glusterfs build is throwing error undefined reference to `dlclose' on RHEL 6 Solution: Add LIB_DL link in Makefile.am to resolve the same > Fixes: bz#1696512 > Change-Id: I58019ca9e29d569d8e6df282b8ab178ad540843b > Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> > (cherry picked from commit 064aad721c249d63fb89686b32e5d15de50e2f8c) Change-Id: I4f68553b501c283e2066ddc64e204db40552ee73 Fixes: bz#1699713 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* cluster/afr: Thin-arbiter SHD fixeskarthik-us2019-04-162-13/+13
| | | | | | | | | This patch address post-merge review comments for commit 5784a00f997212d34bd52b2303e20c097240d91c Change-Id: I7ed954664a2ae8e1091d23ee3ceb9c66e83bfeac fixes: bz#1699319 Signed-off-by: karthik-us <ksubrahm@redhat.com>
* protocol/client: Do not fallback to anon-fd if fd is not openPranith Kumar K2019-04-161-1/+7
| | | | | | | | | | | | | | | | | | | If an open comes on a file when a brick is down and after the brick comes up, a fop comes on the fd, client xlator would still wind the fop on anon-fd leading to wrong behavior of the fops in some cases. Example: If lk fop is issued on the fd just after the brick is up in the scenario above, lk fop will be sent on anon-fd instead of failing it on that client xlator. This lock will never be freed upon close of the fd as flush on anon-fd is invalid and is not wound below server xlator. As a fix, failing the fop unless the fd has FALLBACK_TO_ANON_FD flag. Change-Id: I77692d056660b2858e323bdabdfe0a381807cccc fixes bz#1699198 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> (cherry picked from commit 92ae26ae8039847e38c738ef98835a14be9d4296)
* afr: thin-arbiter read txn fixesRavishankar N2019-04-163-22/+37
| | | | | | | | | | | | | | - Fixes afr_ta_read_txn() to handle inode refresh failures. code-path. - Fixes a double free issue of dict. Note: This patch address post-merge review comments for commit 69532c141be160b3fea03c1579ae4ac13018dcdf fixes: bz#1693992 Change-Id: Id5299b45b68569d47df6b73755918237a1592cb4 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 500bd0014128e6727e83b6cb77e8ac94304b8f4a)
* cluster/ec: Don't enqueue an entry if it is already healingAshish Pandey2019-04-165-30/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: 1 - heal-wait-qlength is by default 128. If shd is disabled and we need to heal files, client side heal is needed. If we access these files that will trigger the heal. However, it has been observed that a file will be enqueued multiple times in the heal wait queue, which in turn causes queue to be filled and prevent other files to be enqueued. 2 - While a file is going through healing and a write fop from mount comes on that file, it sends write on all the bricks including healing one. At the end it updates version and size on all the bricks. However, it does not unset dirty flag on all the bricks, even if this write fop was successful on all the bricks. After healing completion this dirty flag remain set and never gets cleaned up if SHD is disabled. Solution: 1 - If an entry is already in queue or going through heal process, don't enqueue next client side request to heal the same file. 2 - Unset dirty on all the bricks at the end if fop has succeeded on all the bricks even if some of the bricks are going through heal. Change-Id: Ia61ffe230c6502ce6cb934425d55e2f40dd1a727 updates: bz#1693223 Signed-off-by: Ashish Pandey <aspandey@redhat.com> (cherry picked from commit 313dcefe7a62bd16cd794040df068f9bec9c6927)
* glusterd: load ctime in the client graph only if it's not turned offAtin Mukherjee2019-04-161-1/+2
| | | | | | | | | | | | Considering ctime is a client side feature, we can't blindly load ctime xlator into the client graph if it's explicitly turned off, that'd result into backward compatibility issue where an old client can't mount a volume configured on a server which is having ctime feature. Fixes: bz#1698471 Change-Id: I6ae7b96d056073aa6746de9a449cf319786d45cc Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit efbf8abcc3bc729a90d4a7b57dc515f1df8a5863)
* glusterd: fix txn-id mem leakAtin Mukherjee2019-04-162-6/+36
| | | | | | | | | | | | | | | | This commit ensures the following: 1. Don't send commit op request to the remote nodes when gluster v status all is executed as for the status all transaction the local commit gets the name of the volumes and remote commit ops are technically a no-op. So no need for additional rpc requests. 2. In op state machine flow, if the transaction is in staged state and op_info.skip_locking is true, then no need to set the txn id in the priv->glusterd_txn_opinfo dictionary which never gets freed. Fixes: bz#1694610 Change-Id: Ib6a9300ea29633f501abac2ba53fb72ff648c822 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> (cherry picked from commit 34e010d64905b7387de57840d3fb16a326853c9b)
* afr: add client-pid to all gf_event() callsRavishankar N2019-04-167-15/+38
| | | | | | | | | | client-pid for glustershd is GF_CLIENT_PID_SELF_HEALD client-pid for glfsheal is GF_CLIENT_PID_GLFS_HEALD updates: bz#1693155 Change-Id: Ib3a863af160ff48c822a5e6b0c27c575c9887470 Signed-off-by: Ravishankar N <ravishankar@redhat.com> (cherry picked from commit 8016d51a3bbd410b0b927ed66be50a09574b7982)
* cluster/ec: Fix handling of heal info cases without locksAshish Pandey2019-04-091-25/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we use heal info command, it takes lot of time as in some cases it takes lock on entries to find out if the entry actually needs heal or not. There are some cases where we can avoid these locks and can conclude if the entry needs heal or not. 1 - We do a lookup (without lock) on an entry, which we found in .glusterfs/indices/xattrop, and find that lock count is zero. Now if the file contains dirty bit set on all or any brick, we can say that this entry needs heal. 2 - If the lock count is one and dirty is greater than 1, then it also means that some fop had left the dirty bit set which made the dirty count of current fop (which has taken lock) more than one. At this point also we can definitely say that this entry needs heal. This patch is modifying code to take into consideration above two points. It is also changing code to not to call ec_heal_inspect if ec_heal_do was called from client side heal. Client side heal triggeres heal only when it is sure that it requires heal. [We have changed the code to not to call heal for lookup] updates bz#1697764 Change-Id: I7f09f0ecd12f65a353297aefd57026fd2bebdf9c Signed-off-by: Ashish Pandey <aspandey@redhat.com> (cherry picked from commit da47caf2405c08c9abafc4a55525a8b2c2dd5bb8)
* client-rpc: Fix the payload being sent on the wirePoornima G2019-03-296-244/+308
| | | | | | | | | | | | | | | | | | | The fops allocate 3 kind of payload(buffer) in the client xlator: - fop payload, this is the buffer allocated by the write and put fop - rsphdr paylod, this is the buffer required by the reply cbk of some fops like lookup, readdir. - rsp_paylod, this is the buffer required by the reply cbk of fops like readv etc. Currently, in the lookup and readdir fop the rsphdr is sent as payload, hence the allocated rsphdr buffer is also sent on the wire, increasing the bandwidth consumption on the wire. With this patch, the issue is fixed. Fixes: bz#1692101 Change-Id: Ie8158921f4db319e60ad5f52d851fa5c9d4a269b Signed-off-by: Poornima G <pgurusid@redhat.com>
* server.c: fix Coverity CID 1399758Yaniv Kaul2019-03-211-1/+2
| | | | | | | | | | | | | | | 1399758 Dereference before null check It was introduced @ commit 67f48bfcc16a38052e6c9ae7c25e69b03b8ae008 updates: bz#1691187 > updates: bz#789278 > Signed-off-by: Yaniv Kaul <ykaul@redhat.com> > Change-Id: I1424b008b240691fe2a8924e31c708d0fb4f362d > (cherry picked from commit 8aff9cc5c6277ef7dacfb89f1392b7c2eda9b825) Change-Id: Ie2160fb9ae9cdeacf845e849da7f6001b3b6b10b