summaryrefslogtreecommitdiffstats
path: root/xlators/storage/posix/src/posix-inode-fd-ops.c
Commit message (Collapse)AuthorAgeFilesLines
* md-cache: Do not invalidate cache post set/remove xattrPoornima G2018-07-111-42/+43
| | | | | | | | | | | | | | | Since setxattr and removexattr fops cbk do not carry poststat, the stat cache was being invalidated in setxatr/remoxattr cbk. Hence the further lookup wouldn't be served from cache. To prevent this invalidation, md-cache is modified to get the poststat in set/removexattr_cbk in dict. Co-authored with Xavi Hernandez. Change-Id: I6b946be2d20b807e2578825743c25ba5927a60b4 fixes: bz#1586018 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com> Signed-off-by: Poornima G <pgurusid@redhat.com>
* posix: Do not log ENXIO errors for seek fopXavi Hernandez2018-07-101-2/+3
| | | | | | | | | | | | | When lseek is used with SEEK_DATA and SEEK_HOLE, it's expected that the last operation fails with ENXIO when offset is beyond the end of file. In this case it doesn't make sense to report this as an error log message. This patch reports ENXIO failure messages for seek fops in debug level instead of error level. Change-Id: I62a4f61f99b0e4d7ea6a2cdcd40afe15072794ac fixes: bz#1598926 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
* storage/posix: Add warning logs on failureN Balachandran2018-07-021-2/+12
| | | | | | | | | | | posix_readdirp_fill will fail to update the iatt information if posix_handle_path fails. There is currently no log message to indicate this making debugging difficult. Change-Id: I6bce360ea7d1696501637433f80e02794fe1368f updates: bz#1564071 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* ctime: Fix self heal of symlink in EC volumeKotresh HR2018-06-201-2/+0
| | | | | | | | | | | | | | | | | | | | Since IEEE Std 1003.1-2001 does not require any association of file times with symbolic links, there is no requirement that file times be updated by readlink() states [1]. stat on symlink file was generating a readlink fop on one of the subvolumes of ec set which in turn updates atime on that subvolume. This causes mdata xattr to be different across ec set and hence self heal fails. So based on [1], atime is no longer updated by readlink fop. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/readlink.html fixes: bz#1592509 Change-Id: I08bd3ca3bdb222bd18160b1aa58fc2f7630c8083 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* storage/posix: Handle ENOSPC correctly in zero_fillPranith Kumar K2018-06-141-1/+22
| | | | | | Change-Id: Icc521d86cc510f88b67d334b346095713899087a fixes: bz#1590710 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* dht: Delete MDS internal xattr from dict in dht_getxattr_cbkMohit Agrawal2018-06-031-31/+0
| | | | | | | | | | | | | | Problem: At the time of fetching xattr to heal xattr by afr it is not able to fetch xattr because posix_getxattr has a check to ignore if xattr name is MDS Solution: To ignore same xattr update a check in dht_getxattr_cbk instead of having a check in posix_getxattr BUG: 1584098 Change-Id: I86cd2b2ee08488cb6c12f407694219d57c5361dc fixes: bz#1584098 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* storage/posix: use proper FOP for unwinding readdir(p)Raghavendra Bhat2018-05-241-3/+8
| | | | | | | | | As of now, even for readdirp, posix is unwinding with readdir signature. Change-Id: I6440c8a253c5d78bbcc97043e4e6e208e3d47cd1 fixes: bz#1581345 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
* posix: use the ctime framework to handle setattr ctime payloadCsaba Henk2018-05-181-5/+16
| | | | | | | | | | Work on #208 having been been merged, we have obtained means to associate arbitrary ctimes with files, so we can handle setattr ctime payload with proper semantics. Updates: #435 Change-Id: I7302a3ee2574ca9bba605c7a8586c16c452f82c1 Signed-off-by: Csaba Henk <csaba@redhat.com>
* posix/ctime: posix hook to set ctime xattr in relevant fopsKotresh HR2018-05-061-3/+52
| | | | | | | | | | | This patch uses the ctime posix APIs to set consistent time across replica on disk. It also stores the time attributes in the inode context. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: I1a8d74d1e251f1d6d142f066fc99258025c0bcdd Signed-off-by: Kotresh HR <khiremat@redhat.com>
* posix/ctime: posix hooks to get consistent time xattrKotresh HR2018-05-061-30/+35
| | | | | | | | | | | | This patch uses the ctime posix APIs to get consistent time across replica. The time attributes are got from from inode context or from on disk if not found and merged with iatt to be returned. Credits: Rafi KC <rkavunga@redhat.com> Updates: #208 Change-Id: Id737038ce52468f1f5ebc8a42cbf9c6ffbd63850 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* fuse: add support for kernel writeback cacheCsaba Henk2018-05-041-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added kernel-writeback-cache command line and xlator option for requesting utilisation of the writeback cache of the kernel in FUSE_INIT (see [1]). - Added attr-times-granularity command line and xlator option via which granularity of the {a,m,c}time in stat (attr) data that we support can be indicated to kernel. This is a means to avoid divergence of the attr times between kernel and userspace that could occur with writeback-cache, while still maintaining maximum time precision the FUSE server is capable of (see [2]). - Handling FATTR_CTIME flag in FUSE_SETATTR that indicates presence of ctime in setattr payload. Currently we cannot associate arbitrary ctimes to files on backend, so we just touch them to update their ctimes to current time. Having ctimes in setattr payload is also a side effect of writeback cache (see [3] and [4]). [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d99ff8, "fuse: Turn writeback cache on" [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e27c9d3, "fuse: fuse: add time_gran to INIT_OUT" [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e18bda, "fuse: add .write_inode" [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab9e13f, "fuse: allow ctime flushing to userspace" Updates: #435 Change-Id: Id174c8e0c815c4456c35f8c53e41a6a507d91855 Signed-off-by: Csaba Henk <csaba@redhat.com>
* posix: Avoid changelog retries for geo-repMohit Agrawal2018-05-031-0/+33
| | | | | | | | | | | | | | | Problem: georep is slowdown to migrate directory from master volume to slave volume due to lot of changelog retries Solution: Update the condition in posix_getxattr to ignore MDS_INTERNAL_XATTR as it(posix) ignored other internal xattrs BUG: 1571069 Change-Id: I4d91ec73e5b1ca1cb3ecf0825ab9f49e261da70e fixes: bz#1571069 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
* Revert "storage/posix: add pgfid in readdirp if needed"Nigel Babu2018-04-181-38/+8
| | | | | | | | This reverts commit d206fab73f6815c927a84171ee9361c9b31557b1. Change-Id: I5b43fdcf916bc844437c9d60f6957bc40936e3c2 Updates: bz#1560319 Signed-off-by: Nigel Babu <nigelb@redhat.com>
* posix: reserve option behavior is not correct while using fallocateMohit Agrawal2018-04-111-0/+9
| | | | | | | | | | | | | | | | | Problem: storage.reserve option is not working correctly while disk space is allocate throguh fallocate Solution: In posix_disk_space_check_thread_proc after every 5 sec interval it calls posix_disk_space_check to monitor disk space and set the flag in posix priv.In 5 sec timestamp user can create big file with fallocate that can reach posix reserve limit and no error is shown on terminal even limit has reached. To resolve the same call posix_disk_space for every fallocate fop instead to call by a thread after 5 second BUG: 1560411 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Change-Id: I39ba9390e2e6d084eedbf3bcf45cd6d708591577
* storage/posix: add pgfid in readdirp if neededKinglong Mee2018-04-101-8/+38
| | | | | | Change-Id: I6745428fd9d4e402bf2cad52cee8ab46b7fd822f fixes: bz#1560319 Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
* posix: check file state before continuing with fopsSusant Palai2018-04-101-14/+252
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In context of Cloudsync: In scenarios where a data modification fop e.g. a write landed in POSIX thinking that the file is local, while the file was actually remote, can be dangerous. Ofcourse we don’t want to take inodelk for every read/write operation to check the archival status or coordinate with an upload or a download of a file. To avoid inodelk, we will check the status of the file in POSIX it self, before we resume the fop. This helps us avoiding any races mentioned above. Now e.g. if a write reached POSIX for a file which was actually remote, it can check the status of the file and will get to know that the file is remote. It can error out with this status “remote” and cloudsync xlator will retry the same operation, once it finished downloading the file. This patch includes the setxattr changes to do the post processing of upload i.e. truncate and setting the remote xattr "trusted.glusterfs.cs.remote" to indicate the file is REMOTE Each file will have no xattr if the file is LOCAL, one remote xattr if the file is REMOTE and a combination of REMOTE and DOWNLOADING xattr if the file is getting downloaded. There is healing logic of these xattrs to recover from crash inconsitencies. Fixes: #387 Change-Id: Ie93c2d41aa8d6a798a39bdbef9d1669f057e5fdb Signed-off-by: Susant Palai <spalai@redhat.com>
* storage/posix: Add active-fd-count option in glusterPranith Kumar K2018-03-211-0/+12
| | | | | | | | | | | | | | | | | | | | Problem: when dd happens on sharded replicate volume all the writes on shards happen through anon-fd. When the writes don't come quick enough, old anon-fd closes and new fd gets created to serve the new writes. open-fd-count is decremented only after the fd is closed as part of fd_destroy(). So even when one fd is on the way to be closed a new fd will be created and during this short period it appears as though there are multiple fds opened on the file. AFR thinks another application opened the same file and switches off eager-lock leading to extra latency. Fix: Have a different option called active-fd whose life cycle starts at fd_bind() and ends just before fd_destroy() BUG: 1557932 Change-Id: I2e221f6030feeedf29fbb3bd6554673b8a5b9c94 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
* glusterfsd: Memleak in glusterfsd process while brick mux is onMohit Agrawal2018-02-271-3/+12
| | | | | | | | | | | | | | | | | | Problem: At the time of stopping the volume while brick multiplex is enabled memory is not cleanup from all server side xlators. Solution: To cleanup memory for all server side xlators call fini in glusterfs_handle_terminate after send GF_EVENT_CLEANUP notification to top xlator. BUG: 1544090 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com> Note: Run all test-cases in separate build (https://review.gluster.org/19574) with same patch after enable brick mux forcefully, all test cases are passed. Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc
* posix/afr: handle backward compatibility for rchecksum fopRavishankar N2018-02-191-3/+23
| | | | | | | | | Added a volume option 'fips-mode-rchecksum' tied to op version 4. If not set, rchecksum fop will use MD5 instead of SHA256. updates: #230 Change-Id: Id8ea1303777e6450852c0bc25503cda341a6aec2 Signed-off-by: Ravishankar N <ravishankar@redhat.com>
* cluster/dht: avoid overwriting client writes during migrationSusant Palai2018-02-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | For more details on this issue see https://github.com/gluster/glusterfs/issues/308 Solution: This is a restrictive solution where a file will not be migrated if a client writes to it during the migration. This does not check if the writes from the rebalance and the client actually do overlap. If dht_writev_cbk finds that the file is being migrated (PHASE1) it will set an xattr on the destination file indicating the file was updated by a non-rebalance client. Rebalance checks if any other client has written to the dst file and aborts the file migration if it finds the xattr. updates gluster/glusterfs#308 Change-Id: I73aec28bc9dbb8da57c7425ec88c6b6af0fbc9dd Signed-off-by: Susant Palai <spalai@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Signed-off-by: N Balachandran <nbalacha@redhat.com>
* storage/posix: Set f_bfree to 0 if brick fullN Balachandran2018-01-151-1/+12
| | | | | | | | | Return 0 free blocks if the brick is full or has less than the reserved limit. Change-Id: I2c5feda0303d0f4abe5af22fac903011792b2dc8 BUG: 1533736 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* cluster/dht: Add migration checks to dht_(f)xattropN Balachandran2017-12-261-0/+1
| | | | | | | | | | | | The dht_(f)xattrop implementation did not implement migration phase1/phase2 checks which could cause issues with rebalance on sharded volumes. This does not solve the issue where fops may reach the target out of order. Change-Id: I2416fc35115e60659e35b4b717fd51f20746586c BUG: 1471031 Signed-off-by: N Balachandran <nbalacha@redhat.com>
* rchecksum/fips: Replace MD5 usage to enable fips supportKotresh HR2017-12-211-2/+1
| | | | | | | | | rchecksum uses MD5 which is not fips compliant. Hence using sha256 for the same. Updates: #230 Change-Id: I7fad016fcc2a9900395d0da919cf5ba996ec5278 Signed-off-by: Kotresh HR <khiremat@redhat.com>
* posix: Reorganize posix xlator to prepare for reuse with rioShyamsundarR2017-12-021-0/+4975
1. Split out entry and inode/fd based FOPs into separate files from posix.c 2. Split out common routines (init, fini, reconf, and such) into its own file, from posix.c 3. Retain just the method assignments in posix.c (such that posix2 for RIO can assign its own methods in the future for entry operations and such) 4. Based on the split in (1) and (2) split out posix-handle.h into 2 files, such that macros that are needed for inode ops are in one and rest are in the other If the split is done as above, posix2 can compile with its own entry ops, and hence not compile, the entry ops as split in (1) above. The split described in (4) can again help posix2 to define its own macros to make entry and inode handles, thus not impact existing POSIX xlator code. Noted problems - There are path references in certain cases where quota is used (in the xattr FOPs), and thus will fail on reuse in posix2, this needs to be handled when we get there. - posix_init does set root GFID on the brick root, and this is incorrect for posix2, again will need handling later when posix2 evolves based on this code (other init checks seem fine on current inspection) Merge of experimental branch patches with the following gerrit change-IDs > Change-Id: I965ce6dffe70a62c697f790f3438559520e0af20 > Change-Id: I089a4d9cf470c2f9c121611e8ef18dea92b2be70 > Change-Id: I2cec103f6ba8f3084443f3066bcc70b2f5ecb49a Fixes gluster/glusterfs#327 Change-Id: I0ccfa78559a7c5a68f5e861e144cf856f5c9e19c Signed-off-by: ShyamsundarR <srangana@redhat.com>