summaryrefslogtreecommitdiffstats
path: root/libglusterfs/src/glusterfs.h
Commit message (Collapse)AuthorAgeFilesLines
* glusterfsd: send PARENT_UP on brick attachAtin Mukherjee2017-05-141-1/+3
| | | | | | | | | | | | | | | | With brick multiplexing being enabled, if a brick is instance attached to a process then a PARENT_UP event is needed so that it reaches right till posix layer and then from posix CHILD_UP event is sent back to all the children. Change-Id: Ic341086adb3bbbde0342af518e1b273dd2f669b9 BUG: 1447389 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: https://review.gluster.org/17225 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
* event/epoll: Add back socket for polling of events immediately afterRaghavendra G2017-05-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | reading the entire rpc message from the wire Currently socket is added back for future events after higher layers (rpc, xlators etc) have processed the message. If message processing involves signficant delay (as in writev replies processed by Erasure Coding), performance takes hit. Hence this patch modifies transport/socket to add back the socket for polling of events immediately after reading the entire rpc message, but before notification to higher layers. credits: Thanks to "Kotresh Hiremath Ravishankar" <khiremat@redhat.com> for assitance in fixing a regression in bitrot caused by this patch. Change-Id: I04b6b9d0b51a1cfb86ecac3c3d87a5f388cf5800 BUG: 1448364 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: https://review.gluster.org/15036 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* core: make the per glusterfs_ctx_t timer-wheel refcountedNiels de Vos2017-05-011-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | xlators can use a 'global' timer-wheel for scheduling events. This timer-wheel is managed per glusterfs_ctx_t, but does not need to be allocated for every graph. When an xlator wants to use the timer-wheel, it will be instanciated on demand, and provided to xlators that request it later on. By adding a reference counter to the glusterfs_ctx_t for the timer-wheel, the threads and structures can be cleaned up when the last xlator does not have a need for it anymore. In general, the xlators request the timer-wheel in init(), and they should return it in fini(). Because the timer-wheel is managed per glusterfs_ctx_t, the functions can be added to ctx.c and do not need to live in their very minimal tw.[ch] files. Change-Id: I19d225b39aaa272d9005ba7adc3104c3764f1572 BUG: 1442788 Reported-by: Poornima G <pgurusid@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/17068 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Amar Tumballi <amarts@redhat.com> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* feature/dht: Directory synchronizationKotresh HR2017-04-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Design doc: https://review.gluster.org/16876 Directory creation is now synchronized with blocking inodelk of the parent on the hashed subvolume followed by the entrylk on the hashed subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup selfheal mkdir. To maintain internal consistency of directories across all subvols of dht, we need locks. Specifically we are interested in: 1. Consistency of layout of a directory. Only one writer should modify the layout at a time. A writer (layout setting during directory heal as part of lookup) shouldn't modify the layout while there are readers (all other fops like create, mkdir etc., which consume layout) and readers shouldn't read the layout while a writer is in progress. Readers can read the layout simultaneously. Writer takes a WRITE inodelk on the directory (whose layout is being modified) across ALL subvols. Reader takes a READ inodelk on the directory (whose layout is being read) on ANY subvol. 2. Consistency of directory namespace across subvols. The path and associated gfid should be same on all subvols. A gfid should not be associated with more than one path on any subvol. All fops that can change directory names (mkdir, rmdir, renamedir, directory creation phase in lookup-heal) takes an entrylk on hashed subvol of the directory. NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a directory, the transaction itself is a consumer of layout on parent directory. So, the transaction is a reader of parent layout and does an inodelk on parent directory just like any other layout reader. So a mkdir (dir/subdir) would: > Acquire a READ inodelk on "dir" on any subvol. > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir". > creates directory on hashed subvol and possibly on non-hashed subvols. > UNLOCK (entrylk) > UNLOCK (inodelk) NOTE2: mkdir fop while setting the layout of the directory being created is considered as a reader, but NOT a writer. The reason is for a fop which can consume the layout of a directory to come either of the following conditions has to be true: > mkdir syscall from application has to complete. In this case no need of synchronization. > A lookup issued on the directory racing with mkdir has to complete. Since layout setting by a lookup is considered as a writer, only one of either mkdir or lookup will set the layout. Code re-organization: All the lock related routines are moved to "dht-lock.c" file. New wrapper function is introduced to take blocking inodelk followed by entrylk 'dht_protect_namespace' Updates #191 Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31 Signed-off-by: Kotresh HR <khiremat@redhat.com> BUG: 1443373 Reviewed-on: https://review.gluster.org/15472 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org>
* xlator: do not call dlclose() when debuggingNiels de Vos2017-04-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Valgrind can not show the symbols if a .so after calling dlclose(). The unhelpful ??? in the output gets resolved properly with this change: ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324 ==25170== at 0x4C29975: calloc (vg_replace_malloc.c:711) ==25170== by 0x52C7C0B: __gf_calloc (mem-pool.c:117) ==25170== by 0x12B0638A: ??? ==25170== by 0x528FCE6: __xlator_init (xlator.c:472) ==25170== by 0x528FE16: xlator_init (xlator.c:498) ==25170== by 0x52DA8D6: glusterfs_graph_init (graph.c:321) ==25170== by 0x52DB587: glusterfs_graph_activate (graph.c:695) ==25170== by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79) ==25170== by 0x5043B9E: glfs_volumes_init (glfs.c:281) ==25170== by 0x5044FEC: glfs_init_common (glfs.c:986) ==25170== by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031) By not calling dlclose(), the dynamically loaded .so is still available upon program exit, and Valgrind is able to resolve the symbols. This will add an additional leak, so dlclose() is called for normal builds, but skipped when configuring with "./configure --enable-valgrind" or passing the "run-with-valgrind" xlator option. URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731 BUG: 1425623 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16809 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Samikshan Bairagya <samikshan@gmail.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* core: run many bricks within one glusterfsd processJeff Darcy2017-01-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple brick translator stacks running in a single brick server process. This reduces our per-brick memory usage by approximately 3x, and our appetite for TCP ports even more. It also creates potential to avoid process/thread thrashing, and to improve QoS by scheduling more carefully across the bricks, but realizing that potential will require further work. Multiplexing is controlled by the "cluster.brick-multiplex" global option. By default it's off, and bricks are started in separate processes as before. If multiplexing is enabled, then *compatible* bricks (mostly those with the same transport options) will be started in the same process. Change-Id: I45059454e51d6f4cbb29a4953359c09a408695cb BUG: 1385758 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/14763 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tier : Tier as a servicehari gowtham2017-01-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tierd is implemented by separating from rebalance process. The commands affected: 1) Attach tier will trigger this process instead of old one 2) tier start and tier start force will also trigger this process. 3) volume status [tier] will show tier daemon as a process instead of task and normal tier status and tier detach status works. 4) tier stop implemented. 5) detach tier implemented separately along with new detach tier status 6) volume tier volname status will work using the changes. 7) volume set works This patch has separated the tier translator from the legacy DHT rebalance code. It now sends the RPCs from the CLI to glusterd separate to the DHT rebalance code. The daemon is now a service, similar to the snapshot daemon, and can be viewed using the volume status command. The code for the validation and commit phase are the same as the earlier tier validation code in DHT rebalance. The “brickop” phase has been changed so that the status command can use this framework. The service management framework is now used. DHT rebalance does not use this framework. This service framework takes care of : *) spawning the daemon, killing it and other such processes. *) volume set options , which are written on the volfile. *) restart and reconfigure functions. Restart is to restart the daemon at two points 1)after gluster goes down and comes up. 2) to stop detach tier. *) reconfigure is used to make immediate volfile changes. By doing this, we don’t restart the daemon. it has the code to rewrite the volfile for topological changes too (which comes into place during add and remove brick). With this patch the log, pid, and volfile are separated and put into respective directories. Change-Id: I3681d0d66894714b55aa02ca2a30ac000362a399 BUG: 1313838 Signed-off-by: hari gowtham <hgowtham@redhat.com> Reviewed-on: http://review.gluster.org/13365 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: hari gowtham <hari.gowtham005@gmail.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* ec: Invalidations in disperse volume should not update the statPoornima G2017-01-051-4/+0
| | | | | | | | | | | | | | | | | | | | | | Issue: In disperse volume, the file is present across bricks, hence the stat from one brick doesn't carry the valid size of the file. Therefore the upcall from one brick updating the md-cache results in wrong size being updated. Fix: If the notification is cache invalidation then, indicate md-cache that the attributes is invalid. BUG: 1410375 Change-Id: Id89d2283478e70b62b435a8891fffc86d2be8cb2 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16329 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* md-cache, afr: Reduce the window of stale readPoornima G2016-10-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Consider a replica setup, where one mount writes data to a file and the other mount reads the file. In afr, read operations are not transaction based, a brick(read subvolume) is chosen as a part of lookup or other operations, read is always wound only to the read subvolume, even if there was write from a different client that failed on this brick. This stale read continues until there is a lookup or any write operation from the mount point. Currently, this is not a major issue, as a lookup is issued before every read and it will switch the read subvolume to a correct one. But with the plan of increasing md-cache timeout to 600s, the stale read problem will be more pronounced, i.e. stale read can continue for 600s(or more if cascaded with readdirp), as there will be no lookups. Solution: Afr doesn't have any built-in solution for stale read(without affecting the performance). The solution that came up, was to use upcall. When a file on any brick is marked bad for the first time, upcall sends a notification to all the clients that had recently accessed the file. The solution has 2 parts: - Identifying when a file is marked bad, on any of the bricks, for the first time - Client side actions on recieving the notifications Identifying when a file is marked bad on any of the bricks for the first time: ----------------------------------------------------------------------------- The idea is to track xattrop in upcall. xattrop currently comes with 2 afr xattrs - afr dirty bit and afr pending xattrs. Dirty xattr is set to 1 before every write, and is unset if write succeeds. In certain scenarios, dirty xattr can be 0 and still the file could be bad copy. Hence do not track dirty xattr. Pending xattr is set on the good copy, indicating the other bricks that have bad copy. It is still not as simple as, notifying when any of the pending xattrs change. It could lead to flood of notifcations, in case the other brick is completely down or consistantly failing. Hence it is important to notify only once, the first time a good copy is marked bad. Client side actions on recieving pending xattr change, notification: -------------------------------------------------------------------- md-cache will invalidate the cache of that file, so that further lookup is passed down to afr and hence update the read subvolume. Invalidating only in md-cache is not enough, consider the folling oder of opertaions: - pending xattr invalidation - invalidate md-cache - readdirp on the bad read subvolume - fill md-cache - lookup (served from md-cache) - read - wound to the old read subvol. Hence, along with invalidating md-cache, it is very important to reset the read subvolume for that file, in afr. Design Credit: Anuradha Talur, Ravishankar N 1. xattrop doesn't carry info saying post op/pre op. 2. Pre xattrop will have 0 value for all pending xattrs, the cbk of pre xattrop carries the on-disk xattr value. Non zero indicated healing is required. 3. Post xattrop will have non zero value for any of the pending xattrs, if the fop failed on any of the bricks. Change-Id: I469cbc111714c433984fe1c922be2ef113c25804 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15398 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* glusterfsd/main: fix OOM adjustment for older kernelsOleksandr Natalenko2016-10-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Milind Changire reported that GlusterFS fails to build on RHEL5 because linux/oom.h is unavailable. Milind's initial patch disables OOM adjustment completely for those environments that do not have this header. However, I'd take another approach that: 1) checks for linux/oom.h in compile-time and defines necessary constants if the header is not present; 2) checks for available OOM API in /proc in run-time and uses it accordingly. This allows OOM to be adjusted properly on RHEL5 (the kernel is pretty new to present /proc API for that) as well as RHEL6 (the kernel has many thing backported including new /proc API). Change-Id: I1bc610586872d208430575c149a7d0c54bd82370 BUG: 1379769 Signed-off-by: Oleksandr Natalenko <onatalen@redhat.com> Reviewed-on: http://review.gluster.org/15587 Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht, md-cache, upcall: Add invalidation of IATT when the layout changesPoornima G2016-08-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: dht_layout is built as a part of lookup only. The layout can be modified by rebalance process. Since every IO fop is preceded by a lookup, there are very less issues of stale layout. But with enhancements of aggressive caching of stats in md-cache, the lookup will reduce and expose the stale layout issue often. Solution: Since stale layout is already an issue on dht, there is already a plan to fix this at the dht layer, but this fix is not currently planned for any release. Until this fix comes out, we can have a workaround where, the upcall will send a notification to md-cache when a layout xattr is changed. As a part of layout change notification the existing cache is invalidated and the next lookup will fetch the latest layout. This is not a foolproof solution as the window between the layout change and the next lookup(after invalidation of stat), where there will be stale layout. But until the final fix comes in, this reduces the stale layout window. Change-Id: Iacf871a38b35880c1fc0bc68fe7ce291265e71d4 BUG: 1369638 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15300 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* eventsapi: Fix disable-events issueAravinda VK2016-08-241-4/+0
| | | | | | | | | | | | | | | | | | | | | | | Events related sources are not loaded in libglusterfs when configure is run with --disable-events option. Due to this every call of gf_event should be guarded with USE_EVENTS macro. To prevent this, USE_EVENTS macro was included in events.c itself(Patch #15054) Instead of disabling building entire directory "events", selectively disabled the code. So that constants and empty function gf_event is exposed. Code will not fail even if gf_event is called when events is disabled. BUG: 1368042 Change-Id: Ia6abfe9c1e46a7640c4d8ff5ccf0e9c30c87f928 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/15198 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* eventsapi: Gluster Eventing Feature implementationAravinda VK2016-07-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Depends on http://review.gluster.org/14627] Design is available in `glusterfs-specs`, A change from the design is support of webhook instead of Websockets as discussed in the design http://review.gluster.org/13115 Since Websocket support depends on REST APIs, I will add Websocket support once REST APIs patch gets merged Usage: Run following command to start/stop Eventsapi server in all Peers, which will collect the notifications from any Gluster daemon and emits to configured client. gluster-eventsapi start|stop|restart|reload Status of running services can be checked using, gluster-eventsapi status Events listener is a HTTP(S) server which listens to events emited by the Gluster. Create a HTTP Server to listen on POST and register that URL using, gluster-eventsapi webhook-add <URL> [--bearer-token <TOKEN>] For example, if HTTP Server running in `http://192.168.122.188:9000` then add that URL using, gluster-eventsapi webhook-add http://192.168.122.188:9000 If it expects a Token then specify it using `--bearer-token` or `-t` We can also test Webhook if all peer nodes can send message or not using, gluster-eventsapi webhook-test <URL> [--bearer-token <TOKEN>] Configurations can be viewed/updated using, gluster-eventsapi config-get [--name] gluster-eventsapi config-set <NAME> <VALUE> gluster-eventsapi config-reset <NAME|all> If any one peer node was down during config-set/reset or webhook modifications, Run sync command from good node when a peer node comes back. Automatic update is not yet implemented. gluster-eventsapi sync Basic Events Client(HTTP Server) is included with the code, Start running the client with required port and start listening to the events. /usr/share/glusterfs/scripts/eventsdash.py --port 8080 Default port is 9000, if no port is specified, once it started running then configure gluster-eventsapi to send events to that client. Eventsapi Client can be outside of the Cluster, it can be run event on Windows. But only requirement is the client URL should be accessible by all peer nodes.(Or ngrok(https://ngrok.com) like tools can be used) Events implemented with this patch, - Volume Create - Volume Start - Volume Stop - Volume Delete - Peer Attach - Peer Detach It is easy to add/support more events, since it touches Gluster cmd code and to avoid merge conflicts I will add support for more events once this patch merges. BUG: 1334044 Change-Id: I316827ac9dd1443454df7deffe4f54835f7f6a08 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/14248 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* build: fix noisy compilation warning with current build option ↵Zhou Zhengping2016-07-121-1/+1
| | | | | | | | | | | | | | | Wstrict-prototypes Change-Id: I50904033aa2beb880dee828849f470ac31048a79 BUG: 1354221 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: http://review.gluster.org/14884 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* afr, index: Clean up stale directory and file indices in granular entry shKrutika Dhananjay2016-07-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Specifically when a directory tree is removed (rm -rf) while a brick is down, both the directory index and the name indices of the files and subdirs under it will remain. Self-heal will need to pick up these and remove them. Towards this, afr sh will now also crawl indices/entry-changes and call an rmdir on the dir if the directory index is stale. On the brick side, rmdir fop has been implemented for index xl, which would delete the directory index and its contents if present in a synctask. Change-Id: I8b527331c2547e6c141db6c57c14055ad1198a7e BUG: 1331323 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14832 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* io-stats: Fix io-stat dump to dump at all levelsShyam2016-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous commit to fix the bug, where io-stat-dump was overwriting the dump file when the client and a brick was on the same host, failed to consider the existing behaviour where io-stats can help generate closely correlated set of stats across clients and bricks, by triggering the dump using the same command. This was introduced in commit: 0facb11220aea20a6573b656785922219c9650cf Further, by limiting the first io-stat to unwind the dump request, there is no way to trigger other io-stat xlators in the stack to dump their stat information. This bug hence is being fixed by this commit keeping the following in mind, - We need to trigger io-stat-dump for all instances in the graph when this attr is set - We need to write the output to different files, so that they do not overwrite each others data - We need to prevent this xattr from being set on the path that is used to trigger the io-stat-dump information Change-Id: I31ec380f0d85e10313a9d7b977da0e1ec74638a6 BUG: 1322825 Signed-off-by: Shyam <srangana@redhat.com> Reviewed-on: http://review.gluster.org/14552 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* posix/lock: implement meta-lock/unlock functionalitySusant Palai2016-06-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | problem : The lock state needs to be protected when rebalance is reading the lock state on the source. Otherwise there will be locks left unmigrated. Hence, to synchronize incoming lock requests with lock-migration, meta lock is needed. Any new lock request will be queued if there is an active meta-lock and with sucessful lock migration, will be unwound with EREMOTE, so that dht module can wind the request to the correct destination. On a successful lock migration, "pl_inode->migrated" flag is enabled. Hence, any further request would be unwound with EREMOTE and will be redirected to new destination. More details can be found here: https://github.com/gluster/glusterfs-specs/blob/master/accepted/Lock-Migration.md design discussion: https://www.gluster.org/pipermail/gluster-devel/2016-January/048088.html Change-Id: Ief033d5652b5ca4ba6f499110a521cae283d6aba BUG: 1331720 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14251 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* libglusterfs (timer): race conditions, illegal mem access, mem leakKaleb S KEITHLEY2016-06-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | While investigating gfapi memory consumption with valgrind, valgrind reported several memory access issues. Also see the timer 'registry' being recreated (shortly) after being freed during teardown due to the way it's currently written. Passing ctx as data to gf_timer_proc() is prone to memory access issues if ctx is freed before gf_timer_proc() terminates. (And in fact this does happen, at least in valgrind.) gf_timer_proc() doesn't need ctx for anything, it only needs ctx->timer, so just pass that. Nothing ever calls gf_timer_registry_init(). Nothing outside of timer.c that is. Making it and gf_timer_proc() static. Change-Id: Ia28454dda0cf0de2fec94d76441d98c3927a906a BUG: 1333925 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14247 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* glusterfsd/main: Add ability to set oom_score_adjOleksandr Natalenko2016-06-011-0/+1
| | | | | | | | | | | | | | Give the administrator a possibility to set oom_score_adj for glusterfs process. Applies to Linux only. Change-Id: Iff13c2f4cb28457871c6ebeff6130bce4a8bf543 BUG: 1336818 Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-on: http://review.gluster.org/14399 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* features/shard: Get hard-link-count in {unlink,rename}_cbk before deleting ↵Krutika Dhananjay2016-05-201-0/+1
| | | | | | | | | | | | | shards Change-Id: I0606b74f11f5412c4d9af44a6505635ed9022c15 BUG: 1335858 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/14334 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* core: Honour mandatory lock flags during lock migrationAnoop C S2016-05-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | lk_flags from posix_lock_t structure is the primary key used to differentiate locks as either advisory and mandatory type. During lock migration this field is not read in getactivelk() call path. So in order to copy the exact lock state from source to destination it is necessary to include lk_flags within lock_migration_info_t structure to maintain accurate state. This change also includes minor modifications to setactivelk() call to consider lk_flags during lock migration. Change-Id: I20a7b6b6a0f3bdac5734cce8a2cd2349eceff195 BUG: 1332501 Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/14189 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Poornima G <pgurusid@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* readdir-ahead: Prefetch xattrs needed by md-cachePrashanth Pai2016-05-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Negative cache feature implementation in md-cache requires xattrs returned by posix to be intercepted for every call that can possibly return xattrs. This includes readdirp(). This is crucial to treat missing keys in cache as a case of negative entry (returns ENODATA) md-cache puts names of xattrs that it wants to cache in xdata and passes it down to posix which returns the specified xattrs in the callback. This is done in lookup() and readdirp(). Hence, a xattr that is cached can be invalidated during readdirp_cbk too. This is based on the assumption that readdirp() will always return all xattrs that md-cache is interested in. However, this is not the case when readdirp() call is served from readdir-ahead's cache. readdir-ahead xlator will pre-fetch dentries during opendir_cbk and readdirp. These internal readdirp() calls made by readdir-ahead xlator does not set xdata in it's requests. Hence, no xattrs are fetched and stored in it's internal cache. This causes metadata loss in gluster-swift. md-cache returns ENODATA during getxattr() call even though the xattr for that object exists on the brick. On receiving ENODATA, gluster-swift will create new metadata and do setxattr(). This results in loss of information stored in existing xattr. Fix: During opendir, md-cache will communicate to readdir-ahead asking it to store the names of xattrs it's interested in so that readdir-ahead can fetch those in all subsequent internal readdirp() calls issued by it. This stored names of xattrs is invalidated/updated on the next real readdirp() call issued by application. This readdirp() call will have xdata set correctly by md-cache xlator. BUG: 1333023 Change-Id: I32d46f93a99d4ec34c741f3c52b0646d141614f9 Reviewed-on: http://review.gluster.org/14214 Tested-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* features/locks: Implement mandatory locksAnoop C S2016-05-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initial change to fix/enable the mandatory locking support in GlusterFS as per the following design: https://review.gluster.org/#/c/12014/ Accordingly 'locks.mandatory-locking' option is available as part of this change which will accept one among the following values: * off * file * forced * optimal See design doc for more details Change-Id: I14c489b3f8af5ebcbfa155a03f0c175e9558ac46 BUG: 762184 Signed-off-by: Anoop C S <anoopcs@redhat.com> Reviewed-on: http://review.gluster.org/9768 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* core: add getactivelk () fopSusant Palai2016-05-011-0/+9
| | | | | | | | | | | Change-Id: Ifd0ff278dcf43da064021f5c25e5dcd34347fcde BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/13970 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* afr/index: changes for granular entry self-healRavishankar N2016-04-301-0/+2
| | | | | | | | | | | | | | | | Implements new indices type ENTRY_CHANGES where other xlators can add/delete names. Change-Id: I01c5568997085e11d22ba36a4376c70b78fb3827 BUG: 1269461 Signed-off-by: Ravishankar N <ravishankar@redhat.com> Reviewed-on: http://review.gluster.org/12482 Tested-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/afr: Entry self-heal performance enhancementsKrutika Dhananjay2016-04-291-0/+3
| | | | | | | | | | | Change-Id: I52da41dff5619492b656c2217f4716a6cdadebe0 BUG: 1269461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/12442 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* rpc: split FOPs enum from glusterfs.hNiels de Vos2016-04-281-186/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Moving the enumeration of FOPs and some of the other parts that are defining the network protocol to the rpc/xdr/ section. These structures need some care when modifications are made, moving them out of the common glusterfs.h header helps with that. The protocol definition structures are generated in a new glusterfs-fops header. This file is present in rpc/xdr/src/ and libglusterfs/src/, it is a little ugly, but prevents the need to update all Makefile.am files with the additional -I option for finding the new header file. The generation of the .c and .h files from the .x descriptions needed small modifications to accommodate these changes. The build/xdrgen script was improved slightly for this. The .c and .h files are incorrectly in the $(top_srcdir), instead of $(top_builddir). This is an existing issue, and bug 1330604 has been filed to get that addressed. Change-Id: I98fc8cf7e4b631082c7b203b5a0a77111bec1fb9 BUG: 1328502 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14032 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* Swap order of GF_EVENT_SOME_CHILD_DOWN enumRavishankar N2016-04-271-1/+1
| | | | | | | | | | | | | | | | GF_EVENT_SOME_CHILD_DOWN value seems to be mismatching between master and 3.7. Fix the master since 3.7 is a release branch and GF_EVENT_SOME_CHILD_DOWN was added newly and hence should be in the end in the enum list. Signed-off-by: Ravishankar N <ravishankar@redhat.com> Change-Id: I1f758550d6300f6750d1574302096d8e7f493549 BUG: 1330974 Reviewed-on: http://review.gluster.org/14092 Tested-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* performance/decompounder: Introducing decompounder xlatorAnuradha Talur2016-04-251-1/+3
| | | | | | | | | | | | | | This xlator decompounds the compound fops received, and executes them serially. Change-Id: Ieddcec3c2983dd9ca7919ba9d7ecaa5192a5f489 BUG: 1303829 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/13577 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* cluster/distribute: detect stale layouts in entry fopsRaghavendra G2016-04-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dht_mkdir () { first-hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; inodelk (SETLKW, parent, "LAYOUT_HEAL_DOMAIN", "can be any subvol, but we choose first-hashed-subvol randomly"); { begin: hashed-subvol = hashed-subvol for "bname" in in-memory layout of "parent"; hash-range = extract hashe-range from layout of "parent"; ret = mkdir (parent/bname, hashed-subvol, hash-range); if (ret == "hash-value doesn't fall into layout stored on the brick (this error is returned by posix-mkdir)") { refresh_parent_layout (); goto begin; } } inodelk (UNLCK, parent, "LAYOUT_HEAL_DOMAIN", "first-hashed-subvol"); proceed with other parts of dht_mkdir; } posix_mkdir (parent/bname, client-hash-range) { disk-hash-range = getxattr (parent, "dht-layout-key"); if (disk-hash-range != client-hash-range) { fail-with-error ("hash-value doesn't fall into layout stored on the brick"); return 0; } continue-with-posix-mkdir; } Similar changes need to be done for dentry operations like create, symlink, link, unlink, rmdir, rename. These will be addressed in subsequent patches. This patch addresses only mkdir codepath. This change breaks stripe tests, as on some striped subvols dht layout xattrs are not set for some reason. This results in failure of mkdir. Since striped volumes are always created with dht, some tests associated with stripe also fail. So, I am making following tests changes (since stripe is out of maintainance): * modify ./tests/basic/rpc-coverage.t to not to use striped volumes * mark all (2) tests in tests/bugs/stripe/ as bad tests Change-Id: Idd1ae879f24a48303dc743c1bb4d91f89a629e25 BUG: 1323040 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/13885 Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com>
* features/marker: Fix dict_get errors when key is NULLKotresh HR2016-04-211-0/+1
| | | | | | | | | | | | | Change-Id: I25e497459441334c13af77b3fec83c42a7a92ac4 BUG: 1319581 Signed-off-by: Kotresh HR <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/13793 Smoke: Gluster Build System <jenkins@build.gluster.com> Tested-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> Reviewed-by: Vijaikumar Mallikarjuna <vmallika@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* core: add lease fopPoornima G2016-04-211-0/+26
| | | | | | | | | | | | | Change-Id: Ia27d66b1061b0377857827515590eb89b18515c9 BUG: 1319992 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/11596 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Raghavendra Talur <rtalur@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* quota: setting 'read-only' option in xdata to instruct DHT to not healSakshi Bansal2016-04-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | When quota is enabled the quota enforcer tries to get the size of the source directory by sending nameless lookup to quotad. But if the rename is successful even on one subvol or the source layout has anomalies then this nameless lookup in quotad tries to heal the directory which requires a lock on as many subvols as it can. But src is already locked as part of rename. For rename to proceed in brick it needs to complete a cluster-wide lookup. But cluster-wide lookup in quotad is blocked on locks held by rename, hence a deadlock. To avoid this quota sends an option in xdata which instructs DHT not to heal. Change-Id: I792f9322331def0b1f4e16e88deef55d0c9f17f0 BUG: 1252244 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/13988 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* glusterd / afr : Enable auto heal when replica count increasesAnuradha Talur2016-03-211-0/+1
| | | | | | | | | | | | | | | | | | | | | In replicate volumes, when a brick is added to a replicate group, heal to the new brick should be triggered. Also, the new brick should not be considered as source for healing till it is up to date. Previously, extended attributes had to be set manually on the bricks for this to happen. This patch is part 1 patch to automate this process. Change-Id: I29958448618372bfde23bf1dac5dd23dba1ad98f BUG: 1276203 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12451 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com>
* fuse: Add a new mount option capabilityPoornima G2016-03-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Originally all security.* xattrs were forbidden if selinux is disabled, which was causing Samba's acl_xattr module to not work, as it would store the NTACL in security.NTACL. To fix this http://review.gluster.org/#/c/12826/ was sent, which forbid only security.selinux. This opened up a getxattr call on security.capability before every write fop and others. Capabilities can be used without selinux, hence if selinux is disabled, security.capability cannot be forbidden. Hence adding a new mount option called capability. Only when "--capability" or "--selinux" mount option is used, security.capability is sent to the brick, else it is forbidden. Change-Id: I77f60e0fb541deaa416159e45c78dd2ae653105e BUG: 1309462 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/13540 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* core: add seek() FOPNiels de Vos2016-01-311-0/+6
| | | | | | | | | | | | | | Minimal infrastructure changes for the seek() FOP. This will provide SEEK_HOLE and SEEK_DATA functionalities. BUG: 1220173 Change-Id: I4b74fce8b0bad2f45291fd2c2b9e243c4f4a1aa9 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11480 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* tier:delete the linkfile if data file creation failsMohammed Rafi KC2015-12-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | If we are creating data file in a hot subvolume then we will create a linkfile in cold subvolume. Linkfile creation happens first. If linkfile creation was successful and data file creation failed, then linkfile in cold subvolume will become stale. This patch will delete the linkfile as well, if data file creation fails. Also this code duplicates dht_create to make tier_create Change-Id: I377a90dad47f288e9576c7323b23cf694a91a7a3 BUG: 1290677 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12948 Reviewed-by: N Balachandran <nbalacha@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* tier:unlink during migrationMohammed Rafi KC2015-12-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | files deleted during promotion were not deleting as the files are moving from hashed to non-hashed. On deleting a file that is undergoing promotion, the unlink call is not sent to the dst file as the hashed subvol == cached subvol. This causes the file to reappear once the migration is complete. This patch also fixes a problem with stale linkfile deleting. Change-Id: I4b02a498218c9d8eeaa4556fa4219e91e7fa71e5 BUG: 1282390 Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com> Reviewed-on: http://review.gluster.org/12829 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* glusterd: cli command implementation for bitrot scrub statusGaurav Kumar Garg2015-11-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CLI command for bitrot scrub status will be : gluster volume bitrot <volname> scrub status Above command will show the statistics of bitrot scrubber. Upon execution of this command it will show some common scrubber tunable value of volume <VOLNAME> followed by statistics of scrubber statistics of individual nodes. sample ouput for single node: Volume name : <VOLNAME> State of scrub: Active Scrub frequency: biweekly Bitrot error log location: /var/log/glusterfs/bitd.log Scrubber error log location: /var/log/glusterfs/scrub.log ========================================================= Node name: Number of Scrubbed files: Number of Unsigned files: Last completed scrub time: Duration of last scrub: Error count: ========================================================= This is just infrastructure. list of bad file, last scrub time, error count value will be taken care by http://review.gluster.org/#/c/12503/ and http://review.gluster.org/#/c/12654/ patches. Change-Id: I3ed3c7057c9d0c894233f4079a7f185d90c202d1 BUG: 1207627 Signed-off-by: Gaurav Kumar Garg <ggarg@redhat.com> Reviewed-on: http://review.gluster.org/10231 Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd/afr : Readdirp performance improvementAnuradha Talur2015-11-181-0/+2
| | | | | | | | | | | | | | | Add xlator options to index xlator with xattrs that it needs to keep track of. Change-Id: If818673be5e626f77e65cc3a340f8cdd624179c2 BUG: 1250803 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/12467 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* afr: replica pair going offline does not require CHILD_MODIFIED eventSakshi Bansal2015-11-161-0/+1
| | | | | | | | | | | | | | | | | As a part of CHILD_MODIFIED event DHT forgets the current layout and performs fresh lookup. However this is not required when a replica pair goes offline as the xattrs can be read from other replica pairs. Hence setting different event to handle replica pair going down. Change-Id: I5ede2a6398e63f34f89f9d3c9bc30598974402e3 BUG: 1281230 Signed-off-by: Sakshi Bansal <sabansal@redhat.com> Reviewed-on: http://review.gluster.org/12573 Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Susant Palai <spalai@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* marker: do remove xattr only for last linkvmallika2015-11-091-3/+3
| | | | | | | | | | | | | | | | | | With unlink, rename, rmdir, contribution xattrs are removed. If the file is a last link then remove_xattr will fail with ENOENT. So it better to perform remove_xattr only if there are more links to the file Change-Id: Ifc1e7fda4d310fd87f6f28a635c9ea78b8f3929d BUG: 1257694 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12033 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* quota: add version to quota xattrsvmallika2015-11-021-3/+4
| | | | | | | | | | | | | | | | | | | | | When a quota is disable and the clean-up process terminated without completely cleaning-up the quota xattrs. Now when quota is enabled again, this can mess-up the accounting A version number is suffixed for all quota xattrs and this version number is specific to marker xaltor, i.e when quota xattrs are requested by quotad/client marker will remove the version suffix in the key before sending the response Change-Id: I1ca2c11460645edba0f6b68db70d476d8d26e1eb BUG: 1272411 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/12386 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Manikandan Selvaganesh <mselvaga@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* debug/io-stats: Add FOP sampling featureRichard Wareing2015-11-011-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: - Using sampling feature you can record details about every Nth FOP. The fields in each sample are: FOP type, hostname, uid, gid, FOP priority, port and time taken (latency) to fufill the request. - Implemented using a ring buffer which is not (m/c) allocated in the IO path, this should make the sampling process pretty cheap. - DNS resolution done @ dump time not @ sample time for performance w/ cache - Metrics can be used for both diagnostics, traffic/IO profiling as well as P95/P99 calculations - To control this feature there are two new volume options: diagnostics.fop-sample-interval - The sampling interval, e.g. 1 means sample every FOP, 100 means sample every 100th FOP diagnostics.fop-sample-buf-size - The size (in bytes) of the ring buffer used to store the samples. In the even more samples are collected in the stats dump interval than can be held in this buffer, the oldest samples shall be discarded. Samples are stored in the log directory under /var/log/glusterfs/samples. - Uses DNS cache written by sshreyas@fb.com (Thank-you!), the DNS cache TTL is controlled by the diagnostics.stats-dnscache-ttl-sec option and defaults to 24hrs. Test Plan: - Valgrind'd to ensure it's leak free - Run prove test(s) - Shadow testing on 100+ brick cluster Change-Id: I9ee14c2fa18486b7efb38e59f70687249d3f96d8 BUG: 1271310 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/12210 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* tier/ctr: CTR DB named lookup heal of cold tier during attach tierJoseph Fernandes2015-10-101-0/+2
| | | | | | | | | | | | | | | | | | | | | Heal hardlink in the db for already existing data in the cold tier during attach tier. i.e during fix layout do lookup to files in the cold tier. CTR xlator on the brick/server side does db update/insert of the hardlink on a namelookup. Currently the namedlookup is done synchronous to the fixlayout that is triggered by attach tier. This is not performant, adding more time to fixlayout. The performant approach is record the hardlinks on a compressed datastore and then do the namelookup asynchronously later, giving the ctr db eventual consistency Change-Id: I4ffc337fffe7d447804786851a9183a51b5044a9 BUG: 1252586 Signed-off-by: Joseph Fernandes <josferna@redhat.com> Reviewed-on: http://review.gluster.org/11828 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Dan Lambright <dlambrig@redhat.com> Tested-by: Dan Lambright <dlambrig@redhat.com>
* posix: xattrop 'GF_XATTROP_ADD_ARRAY_WITH_DEFAULT' implementationvmallika2015-09-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implementation of xattrop type: GF_XATTROP_ADD_ARRAY_WITH_DEFAULT GF_XATTROP_ADD_ARRAY64_WITH_DEFAULT These operations are similar to 'GF_XATTROP_ADD_ARRAY', except that it adds a default value if xattr is missing or its value is zero on disk. One use-case of this operation is in inode-quota. When a new directory is created, its default dir_count should be set to 1. So when a xattrop performed setting inode-xattrs, it should account initial dir_count 1 if the xattrs are not present Here is the usage of this operation value required in xdata for each key struct array { int32_t newvalue_1; int32_t newvalue_2; ... int32_t newvalue_n; int32_t default_1; int32_t default_2; ... int32_t default_n; }; or struct array { int32_t value_1; int32_t value_2; ... int32_t value_n; } data[2]; fill data[0] with new value to add fill data[1] with default value xattrop GF_XATTROP_ADD_ARRAY_WITH_DEFAULT for i from 1 to n { if (xattr (dest_i) is zero or not set in the disk) dest_i = newvalue_i + default_i else dest_i = dest_i + newvalue_i } value in xdata after xattrop is successful struct array { int32_t dest_1; int32_t dest_2; ... int32_t dest_n; }; Change-Id: Ic6a08473e99fd98299a839d4d8416081a7534efd BUG: 1243946 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11702 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* posix: xattrop 'GF_XATTROP_GET_AND_SET' implementationvmallika2015-08-271-1/+2
| | | | | | | | | | | | | | | | | | GF_XATTROP_GET_AND_SET stores the existing xattr value in xdata and sets the new value xattrop was reusing input xattr dict to set the results instead of creating new dict. This can be problem for server side xlators as the inout dict will have the value changed. Change-Id: I43369082e1d0090d211381181e9f3b9075b8e771 BUG: 1251454 Signed-off-by: vmallika <vmallika@redhat.com> Reviewed-on: http://review.gluster.org/11995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* fuse: add "resolve-gids" mount option to overcome 32-groups limitNiels de Vos2015-08-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a --resolve-gids commandline option to the glusterfs binary. This option gets set when executing "mount -t glusterfs -o resolve-gids ...". This option is most useful in combination with the "acl" mount option. POSIX ACL permission checking is done on the FUSE-client side to improve performance (in addition to the checking on the bricks). The fuse-bridge reads /proc/$PID/status by default, and this file contains maximum 32 groups. Any local (client-side) permission checking that requires more than the first 32 groups will fail. By enabling the "resolve-gids" option, the fuse-bridge will call getgrouplist() to retrieve all the groups from the user accessing the mountpoint. This is comparable to how "nfs.server-aux-gids" works. Note that when a user belongs to more than ~93 groups, the volume option server.manage-gids needs to be enabled too. Without this option, the RPC-layer will need to reduce the number of groups to make them fit in the RPC-header. Change-Id: I7ede90d0e41bcf55755cced5747fa0fb1699edb2 BUG: 1246275 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/11732 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* features/bit-rot-stub: deny access to bad objectsRaghavendra Bhat2015-06-271-0/+3
| | | | | | | | | | | | | | | | | * Access to bad objects (especially operations such as open, readv, writev) should be denied to prevent applications from getting wrong data. * Do not allow anyone apart from scrubber to set bad object xattr. * Do not allow bad object xattr to be removed. Change-Id: Ia9185a067233a9f26e3d41d41d11d9a4eb0da827 BUG: 1210689 Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-on: http://review.gluster.org/11126 Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* storage/posix: Introduce flag instructing posix to perform prestat, writev ↵Krutika Dhananjay2015-06-261-0/+1
| | | | | | | | | | | | and poststat atomically Change-Id: I9b52ddaed4e306e9a49f39c86450c94bea843a7b BUG: 1233617 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/11345 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>