summaryrefslogtreecommitdiffstats
path: root/xlators/performance
Commit message (Collapse)AuthorAgeFilesLines
* glusterd, readdir-ahead: Fix backward incompatibilityPoornima G2017-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of https://review.gluster.org/#/c/16657/ Issue: Any opion is spcified in two places: In the options[] of xlator itself and glusterd-volume-set.c. The default value of this option can be specified in both the places. If its specified only in xlator then the volfile generated will not have the option and default value, it will be assigned during graph initialization. With patch [1] the option rda-request-size was changed from INT to SIZET type, and default was changed from 131072 to 128KB, but was specified only in the readdir-ahead.c. Thus with this patch alone the volfile entry for readdir-ahead looks like: volume patchy-readdir-ahead type performance/readdir-ahead subvolumes patchy-read-ahead end-volume With patch [2], the default of option rda-request-size was specified in glusterd-volume-set.c as well(as it was necessary fr parallel readdir). With this patch the readdir entry in the volfile will look like: volume patchy-readdir-ahead type performance/readdir-ahead option rda-cache-limit 10MB option rda-request-size 128KB option parallel-readdir off subvolumes patchy-read-ahead end-volume Now consider the server has both these patches and client doesn't. Server will generate a volfile with entry: The old clients which thought the option rda-request-size is of type INT will now recieve the value 128KB which it willn't understand, and hence fail the mount. The issue is seen only with the combination of [1] and [2]. Solution: Instead of specifying 128KB as default in glusterd we specify 131072 so that the old clients will interpret as INT and new ones as 128KB Credits: Raghavendra G > Reviewed-on: https://review.gluster.org/16657 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Change-Id: I0c269a5890957fd8a38e9a05bdec088645a6688a BUG: 1423412 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16658 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* readdir-ahead: Remove unnecessary loggingPoornima G2017-02-171-6/+0
| | | | | | | | | | | | | | | | | | | | dict_get_int can return < 0 when key is not found is a valid case. Hence no need to log. > Reviewed-on: https://review.gluster.org/16654 > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Change-Id: If0795b0f178adbb94b10efc563506993f7411962 BUG: 1423429 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16659 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* performance/decompounder: Have terminal value in options[]Pranith Kumar K2017-02-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Absence of terminal values is leading to buffer-over-flow errors in address sanitizer. >BUG: 1422152 >Change-Id: I769c0e4b5bbb3ef2849b8d1097b9def522ae08d9 >Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> >Reviewed-on: https://review.gluster.org/16615 >Smoke: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Reviewed-by: Jeff Darcy <jdarcy@redhat.com> >(cherry picked from commit fca31ae5faa15a02e64143b9efb7e19c3b907c2d) Change-Id: Ifec6f5b96cfe4c26a039f88649284bcdad5321f6 BUG: 1423070 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: https://review.gluster.org/16652 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* md-cache: initialize mdc_counter.lockNiels de Vos2017-02-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | add missed LOCK_INIT to fix INCREMENT_ATOMIC on conf->mdc_counter.lock when pthread_spin_* using Cherry picked from commit 22f02d8f1dcdf176744ab1536cb23a5fcd291243: > Change-Id: I680bd6f41e3b8a1852ed969bf6794cbf4c1ccdd4 > BUG: 1417913 > Signed-off-by: Vitaly Lipatov <lav@etersoft.ru> > Reviewed-on: https://review.gluster.org/16515 > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Tested-by: Niels de Vos <ndevos@redhat.com> > Smoke: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > Reviewed-by: Poornima G <pgurusid@redhat.com> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I680bd6f41e3b8a1852ed969bf6794cbf4c1ccdd4 BUG: 1417915 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: https://review.gluster.org/16640 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* performance/write-behind: access stub only if available duringRaghavendra G2017-02-021-12/+12
| | | | | | | | | | | | | | | | | | | | | | | statedump >Change-Id: Ia5dd718458a5e32138012f81f014d13fc6b28be2 >BUG: 1415115 >Signed-off-by: Raghavendra G <rgowdapp@redhat.com> >Reviewed-on: https://review.gluster.org/16440 >Reviewed-by: N Balachandran <nbalacha@redhat.com> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >Smoke: Gluster Build System <jenkins@build.gluster.org> Change-Id: Ia5dd718458a5e32138012f81f014d13fc6b28be2 BUG: 1418623 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> (cherry picked from commit 85d7f1d1ee24ac400d4aa223478727643532693a) Reviewed-on: https://review.gluster.org/16519 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* libglusterfs+transport+io-threads: fix 256KB stack abuseJeff Darcy2017-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Some functions were allocating 64K booleans, which are (crazily) mapped to 4-byte ints, for a total of 256KB per call. Changed to use bitfields instead, so usage is now only 8KB per call. This was the impediment to changing the io-threads stack size, so that has been adjusted too. Backport of: > Change-Id: I8781c4f2c8f2b830f4535e366995fac8dd0a8653 > BUG: 1418095 > Reviewed-on: https://review.gluster.org/15745 Change-Id: Ia5dada61703e6bea95f2511da71feb573fc9a429 BUG: 1418536 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: https://review.gluster.org/16511 Reviewed-by: N Balachandran <nbalacha@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Cache security.ima xattrsPoornima G2017-01-301-2/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/16296 From kernel version 3.X or greater, creating of a file results in removexattr call on security.ima xattr. But this xattr is not set on the file unless IMA feature is active. With this patch, removxattr call returns ENODATA if it is not found in the cache. > Change-Id: I8136096598a983aebc09901945eba1db1b2f93c9 > Signed-off-by: Poornima G <pgurusid@redhat.com> > Reviewed-on: http://review.gluster.org/16296 > Smoke: Gluster Build System <jenkins@build.gluster.org> > NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> > CentOS-regression: Gluster Build System <jenkins@build.gluster.org> > Reviewed-by: Raghavendra G <rgowdapp@redhat.com> > (cherry picked from commit ac629e574935a8aed6526936bc83b1c6d295ae67) Change-Id: I27abc23024c8fcf07389608df61ef6e64736d414 BUG: 1415918 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: https://review.gluster.org/16460 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* Readdir-ahead : Honour readdir-optimise option of dhtPoornima G2017-01-273-1/+20
| | | | | | | | | | | | | | | | | | | | | | | >Change-Id: I9c5e65b32e316e6a2fc7e1f5c79fce79386b78e2 >BUG: 1401812 >Signed-off-by: Poornima G <pgurusid@redhat.com> >Reviewed-on: https://review.gluster.org/16071 >Smoke: Gluster Build System <jenkins@build.gluster.org> >CentOS-regression: Gluster Build System <jenkins@build.gluster.org> >NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> >Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I9c5e65b32e316e6a2fc7e1f5c79fce79386b78e2 BUG: 1417027 Signed-off-by: Poornima G <pgurusid@redhat.com> (cherry picked from commit 7c6538f6c8f9a015663b4fc57c640a7c451c87f7) Reviewed-on: https://review.gluster.org/16461 Tested-by: Raghavendra G <rgowdapp@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
* readdir-ahead : Perform STACK_UNWIND outside of mutex locksPoornima G2017-01-091-48/+67
| | | | | | | | | | | | | | | | | | Currently STACK_UNWIND is performnd within ctx->lock. If readdir-ahead is loaded as a child of dht, then there can be scenarios where the function calling STACK_UNWIND becomes re-entrant. Its a good practice to not call STACK_WIND/UNWIND within local mutex's Change-Id: If4e869849d99ce233014a8aad7c4d5eef8dc2e98 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16068 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Cache updated as a part of invalidate should not update timePoornima G2017-01-081-14/+25
| | | | | | | | | | | | | | | | | Currently when a invalidate happens we update the cache along with the cache time. The problem with this is, upcall doesn't update the last access time of a client when an invalidation is sent, thus resulting in a timewindow where the md-cache has cached, but the upcall is unaware and hence upcall will not further invalidate the cache(unless a fop is sent from the same client, and upcall updates its database to reflect the same) Change-Id: Ibceb8d2fc360582752846bbf7fd59697d5424754 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16295 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: Add debug messagesRaghavendra G2017-01-081-17/+168
| | | | | | | | | | Change-Id: I2ea1350fcbe4b6c06dcb8093b28316f734cd3b48 BUG: 1379655 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/16285 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* ec: Invalidations in disperse volume should not update the statPoornima G2017-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | Issue: In disperse volume, the file is present across bricks, hence the stat from one brick doesn't carry the valid size of the file. Therefore the upcall from one brick updating the md-cache results in wrong size being updated. Fix: If the notification is cache invalidation then, indicate md-cache that the attributes is invalid. BUG: 1410375 Change-Id: Id89d2283478e70b62b435a8891fffc86d2be8cb2 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16329 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
* readdir-ahead: Enhance EOD detection logicPoornima G2017-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Issue: Currently end of directory is identified on obtaining a readdirp_cbk with op_ret = 0 (i.e. 0 entries fetched in readdirp). Thus an extra readdirp is required for every directory just to identify EOD. Consider a case of listing large number of small directories. The readdirp fops required are doubled in that case. Solution: On reaching the EOD, posix sets the op_errno to ENOENT, hence along with looking for 'op_ret == 0' we also look for 'operrno == ENOENT' ehile checking for EOD condition Change-Id: I7a5b52e7b98f5dc236c387635fcc651dac0171b3 BUG: 1401812 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/16042 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/readdir-ahead: limit cache sizeRaghavendra G2016-12-222-35/+85
| | | | | | | | | | | | | | | | This patch introduces a new option called "rda-cache-limit", which is the maximum value the entire readdir-ahead cache can grow into. Since, readdir-ahead holds a reference to inode through dentries, this patch also accounts memory stored by various xlators in inode contexts. Change-Id: I84cc0ca812f35e0f9041f8cc71effae53a9e7f99 BUG: 1356960 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/16137 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* performance/write-behind: Add more fops for checking dependency withRaghavendra G2016-12-121-0/+317
| | | | | | | | | | | | | | | | | | cached writes Fops like readdirp, link, fallocate, discard, zerofill return iatt of files in their responses. This iatt can be cached by md-cache. Hence it is important that write-behind maintains relative ordering of these fops with cached writes. Failure to do so, can result in md-cache storing stale iatts and returning the same to applications. Change-Id: Icfe12ad807e42fe9e52a9f63e47ce63f511c6946 BUG: 1390050 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/15757 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* dht/md-cache: Filter invalidate if the file is made a linkto filePoornima G2016-12-021-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | Upcall as a part of setattr, sends an invalidation and the invalidation carries the resulting stat value. When a file is converted to linkto files, even then an invalidation is set and as a result the mountpoint shows the sticky bit in the stat of the file. eg: ---------T. 945 root root 0 Nov 8 10:14 hardlink.999 Fix: When dht recieves a notification of sticky bit change, it updates the flag, to indicate md-cache to send the subsequent lookup. Change-Id: Ic2fd7a5b196db0754f9b97072e644e6bf69da606 BUG: 1392713 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15789 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Susant Palai <spalai@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* io-cache: Fix a read hangPoornima G2016-11-231-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: ===== In certain cases, there was no unwind of read from read-ahead xlator, thus resulting in hang. RCA: ==== In certain cases, ioc_readv() issues STACK_WIND_TAIL() instead of STACK_WIND(). One such case is when inode_ctx for that file is not present (can happen if readdirp was called, and populates md-cache and serves all the lookups from cache). Consider the following graph: ... io-cache (parent) | readdir-ahead | read-ahead ... Below is the code snippet of ioc_readv calling STACK_WIND_TAIL: ioc_readv() { ... if (!inode_ctx) STACK_WIND_TAIL (frame, FIRST_CHILD (frame->this), FIRST_CHILD (frame->this)->fops->readv, fd, size, offset, flags, xdata); /* Ideally, this stack_wind should wind to readdir-ahead:readv() but it winds to read-ahead:readv(). See below for explaination. */ ... } STACK_WIND_TAIL (frame, obj, fn, ...) { frame->this = obj; /* for the above mentioned graph, frame->this will be readdir-ahead * frame->this = FIRST_CHILD (frame->this) i.e. readdir-ahead, which * is as expected */ ... THIS = obj; /* THIS will be read-ahead instead of readdir-ahead!, as obj expands * to "FIRST_CHILD (frame->this)" and frame->this was pointing * to readdir-ahead in the previous statement. */ ... fn (frame, obj, params); /* fn will call read-ahead:readv() instead of readdir-ahead:readv()! * as fn expands to "FIRST_CHILD (frame->this)->fops->readv" and * frame->this was pointing ro readdir-ahead in the first statement */ ... } Thus, the readdir-ahead's readv() implementation will be skipped, and ra_readv() will be called with frame->this = "readdir-ahead" and this = "read-ahead". This can lead to corruption / hang / other problems. But in this perticular case, when 'frame->this' and 'this' passed to ra_readv() doesn't match, it causes ra_readv() to call ra_readv() again!. Thus the logic of read-ahead readv() falls apart and leads to hang. Solution: ========= Ideally, STACK_WIND_TAIL() should be modified as: STACK_WIND_TAIL (frame, obj, fn, ...) { next_xl = obj /* resolve obj as the variables passed in obj macro can be overwritten in the further instrucions */ next_xl_fn = fn /* resolve fn and store in a tmp variable, before modifying any variables */ frame->this = next_xl; ... THIS = next_xl; ... next_xl_fn (frame, next_xl, params); ... } But for this solution, knowing the type of variable 'next_xl_fn' is a challenge and is not easy. Hence just modifying all the existing callers to pass "FIRST_CHILD (this)" as obj, instead of "FIRST_CHILD (frame->this)". Change-Id: I179ffe3d1f154bc5a1935fd2ee44e912eb0fbb61 BUG: 1388292 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15901 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* performance/io-threads: Exit threads in fini() as wellPranith Kumar K2016-11-212-11/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: io-threads starts the thread in 'init()' but doesn't clean them up on 'fini()'. It relies on PARENT_DOWN to exit threads but there can be cases where event before PARENT_UP the graph init code can think of issuing fini(). This code path is hit when glfs_init() is called on a volume that is in 'stopped' state. It leads to a crash in ganesha process, because the io-thread tries to access freed memory. Fix: Ideal fix would be to wait for all fops in io-thread list to be completed on PARENT_DOWN, and have fini() do cleanup of threads. Because there is no proper documentation about how PARENT_DOWN/fini are supposed to be used, we are getting different kinds of sequences in different higher level protocols. So for now cleaning up in both PARENT_DOWN and fini(). Fuse doesn't call fini() gfapi is not calling PARENT_DOWN in some cases, so for now I don't see another way out. BUG: 1396793 Change-Id: I9c9154e7d57198dbaff0f30d3ffc25f6d8088aec Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15888 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* afr,dht,ec: Replace GF_EVENT_CHILD_MODIFIED with event SOME_DESCENDENT_DOWN/UPPoornima G2016-11-211-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently these are few events related to child_up/down: GF_EVENT_CHILD_UP : Issued when any of the protocol client connects. GF_EVENT_CHILD_MODIFIED : Issued by afr/dht/ec GF_EVENT_CHILD_DOWN : Issued when any of the protocol client disconnects. These events get modified at the dht/afr/ec layers. Here is a brief on the same. DHT: - All the subvolumes reported once, and atleast one child came up, then GF_EVENT_CHILD_UP is issued - connect GF_EVENT_CHILD_UP is issued - disconnect GF_EVENT_CHILD_MODIFIED is issued - All the subvolumes disconnected, GF_EVENT_CHILD_DOWN is issued AFR: - First subvolume came up, then GF_EVENT_CHILD_UP is issued - Subsequent subvolumes coming up, results in GF_EVENT_CHILD_MODIFIED - Any of the subvolumes go down, then GF_EVENT_SOME_CHILD_DOWN is issued - Last up subvolume goes down, then GF_EVENT_CHILD_DOWN is issued Until the patch [1] introduced GF_EVENT_SOME_CHILD_UP, GF_EVENT_CHILD_MODIFIED was issued by afr/dht when any of the subvolumes go up or down. Now with md-cache changes, there is a necessity to differentiate between child up and down. Hence, introducing GF_EVENT_SOME_DESCENDENT_DOWN/UP and getting rid of GF_EVENT_CHILD_MODIFIED. [1] http://review.gluster.org/12573 Change-Id: I704140b6598f7ec705493251d2dbc4191c965a58 BUG: 1396038 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15764 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: N Balachandran <nbalacha@redhat.com> Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
* performance/open-behind: Avoid deadlock in statedumpPranith Kumar K2016-11-091-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: open-behind is taking fd->lock then inode->lock where as statedump is taking inode->lock then fd->lock, so it is leading to deadlock In open-behind, following code exists: void ob_fd_free (ob_fd_t *ob_fd) { loc_wipe (&ob_fd->loc); <<--- this takes (inode->lock) ....... } int ob_wake_cbk (call_frame_t *frame, void *cookie, xlator_t *this, int op_ret, int op_errno, fd_t *fd_ret, dict_t *xdata) { ....... LOCK (&fd->lock); <<---- fd->lock { ....... __fd_ctx_del (fd, this, NULL); ob_fd_free (ob_fd); <<<--------------- } UNLOCK (&fd->lock); ....... } ================================================================= In statedump this code exists: inode_dump (inode_t *inode, char *prefix) { ....... ret = TRY_LOCK(&inode->lock); <<---- inode->lock ....... fd_ctx_dump (fd, prefix); <<<----- ....... } fd_ctx_dump (fd_t *fd, char *prefix) { ....... LOCK (&fd->lock); <<<------------------ this takes fd-lock { ....... } Fix: Make sure open-behind doesn't call ob_fd_free() inside fd->lock BUG: 1393259 Change-Id: I4abdcfc5216270fa1e2b43f7b73445f49e6d6e6e Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15808 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: fix flush stuck by former failed writesRyan Ding2016-11-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the issue is happened in this case: assume a file is opened with fd1 and fd2. 1. some WRITE opto fd1 got error, they were add back to 'todo' queue because of those error. 2. fd2 closed, a FLUSH op is send to write-behind. 3. FLUSH can not be unwind because it's not a legal waiter for those failed write(as func __wb_request_waiting_on() say). and those failed WRITE also can not be ended if fd1 is not closed. fd2 stuck in close syscall. to resolve this issue, we can change the way we determine 2 requests is 'conflict': flush/fsync is not conflict with those write that is not belonged to them. so __wb_pick_winds() can wind the FLUSH op. below is some information when the stuck issue happen: glusterdump logs: [xlator.performance.write-behind.wb_inode] path=/ltp-F9eG0ZSOME/rw-buffered-16436 inode=0x7fdbe8039b9c window_conf=1048576 window_current=249856 transit-size=0 dontsync=0 [.WRITE] request-ptr=0x7fdbe8020200 refcount=1 wound=no generation-number=4 req->op_ret=-1 req->op_errno=116 sync-attempts=3 sync-in-progress=no size=131072 offset=1220608 lied=-1 append=0 fulfilled=0 go=0 [.WRITE] request-ptr=0x7fdbe8068c30 refcount=1 wound=no generation-number=5 req->op_ret=-1 req->op_errno=116 sync-attempts=2 sync-in-progress=no size=118784 offset=1351680 lied=-1 append=0 fulfilled=0 go=0 [.FLUSH] request-ptr=0x7fdbe8021cd0 refcount=1 wound=no generation-number=6 req->op_ret=0 req->op_errno=0 sync-attempts=0 gdb detail about above 3 requests: (gdb) print *((wb_request_t *)0x7fdbe8021cd0) $2 = {all = {next = 0x7fdbe803a608, prev = 0x7fdbe8068c30}, todo = {next = 0x7fdbe803a618, prev = 0x7fdbe8068c40}, lie = {next = 0x7fdbe8021cf0, prev = 0x7fdbe8021cf0}, winds = {next = 0x7fdbe8021d00, prev = 0x7fdbe8021d00}, unwinds = {next = 0x7fdbe8021d10, prev = 0x7fdbe8021d10}, wip = { next = 0x7fdbe8021d20, prev = 0x7fdbe8021d20}, stub = 0x7fdbe80224dc, write_size = 0, orig_size = 0, total_size = 0, op_ret = 0, op_errno = 0, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_FLUSH, lk_owner = {len = 8, data = "W\322T\f\271\367y$", '\000' <repeats 1015 times>}, iobref = 0x0, gen = 6, fd = 0x7fdbe800f0dc, wind_count = 0, ordering = {size = 0, off = 0, append = 0, tempted = 0, lied = 0, fulfilled = 0, go = 0}} (gdb) print *((wb_request_t *)0x7fdbe8020200) $3 = {all = {next = 0x7fdbe8068c30, prev = 0x7fdbe803a608}, todo = {next = 0x7fdbe8068c40, prev = 0x7fdbe803a618}, lie = {next = 0x7fdbe8068c50, prev = 0x7fdbe803a628}, winds = {next = 0x7fdbe8020230, prev = 0x7fdbe8020230}, unwinds = {next = 0x7fdbe8020240, prev = 0x7fdbe8020240}, wip = { next = 0x7fdbe8020250, prev = 0x7fdbe8020250}, stub = 0x7fdbe8062c3c, write_size = 131072, orig_size = 4096, total_size = 0, op_ret = -1, op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, iobref = 0x7fdbe80311a0, gen = 4, fd = 0x7fdbe805c89c, wind_count = 3, ordering = {size = 131072, off = 1220608, append = 0, tempted = -1, lied = -1, fulfilled = 0, go = 0}} (gdb) print *((wb_request_t *)0x7fdbe8068c30) $4 = {all = {next = 0x7fdbe8021cd0, prev = 0x7fdbe8020200}, todo = {next = 0x7fdbe8021ce0, prev = 0x7fdbe8020210}, lie = {next = 0x7fdbe803a628, prev = 0x7fdbe8020220}, winds = {next = 0x7fdbe8068c60, prev = 0x7fdbe8068c60}, unwinds = {next = 0x7fdbe8068c70, prev = 0x7fdbe8068c70}, wip = { next = 0x7fdbe8068c80, prev = 0x7fdbe8068c80}, stub = 0x7fdbe806746c, write_size = 118784, orig_size = 4096, total_size = 0, op_ret = -1, op_errno = 116, refcount = 1, wb_inode = 0x7fdbe803a5f0, fop = GF_FOP_WRITE, lk_owner = {len = 8, data = '\000' <repeats 1023 times>}, iobref = 0x7fdbe8052b10, gen = 5, fd = 0x7fdbe805c89c, wind_count = 2, ordering = {size = 118784, off = 1351680, append = 0, tempted = -1, lied = -1, fulfilled = 0, go = 0}} you can see they are all on 'todo' queue, and FLUSH op fd is not the same WRITE op fd. Change-Id: Id687f9cd3b9f281e1a97c83f1ce981ede272b8ab BUG: 1372211 Signed-off-by: Ryan Ding <ryan.ding@open-fs.com> Reviewed-on: http://review.gluster.org/15380 Tested-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* md-cache: Invalidate cache entry for open() with O_TRUNCSoumya Koduri2016-10-261-0/+48
| | | | | | | | | | | | | | | | | When a file is opened with O_TRUNC flag set, its size gets set to '0'. This case needs to be handled in md-cache to avoid sending incorrect cached stat. Change-Id: I95d1f8a6634734898883ede010c3e7b0b7eb97d9 BUG: 1382266 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/15618 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: jiffin tony Thottan <jthottan@redhat.com> Tested-by: jiffin tony Thottan <jthottan@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* performance/io-threads: Exit all threads on PARENT_DOWNPranith Kumar K2016-10-232-17/+40
| | | | | | | | | | | | | | | | | | | | Problem: When glfs_fini() is called on a volume where client.io-threads is enabled, fini() will free up iothread xl's private structure but there would be some threads that are sleeping which would wakeup after the timedwait completes leading to accessing already free'd memory. Fix: As part of parent-down, exit all sleeping threads. BUG: 1381830 Change-Id: I0bb8d90241112c355fb22ee3fbfd7307f475b339 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/15620 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: remove the request from liability queue inRaghavendra G2016-10-161-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wb_fulfill_request Before this patch, a request is removed from liability queue only when ref count of request hits 0. Though, wb_fulfill_request does an unref, it need not be the last unref and hence the request may survive in liability queue till the last unref. Let, T1: the time at which wb_fulfill_request is invoked T2: the time at which last unref is done on request Let's consider a case of T2 > T1. In the time window between T1 and T2, any other request (waiter) conflicting with request in liability queue (blocker - basically a write which has been lied) is blocked from winding. If T2 happens to be when wb_do_unwinds is invoked, no further processing of request list happens and "waiter" would get blocked forever. An example imaginary sequence of events is given below: 1. A write request w1 is picked up for unwinding in __wb_pick_unwinds (but unwind is not done _yet_ and hence reference remains). However, w1 is moved to liability queue. Let's call this invocation of wb_process_queue by wb_writev as PQ1. 2. A flush (f1) request hits write behind. Since the liability queue of inode is not empty, f1 is not picked for unwinding. Let's call the invocation of wb_process_queue by wb_flush as PQ2. 3. PQ2 continues and picks w1 for fulfilling and invokes wb_fulfill. As part of successful wb_fulfill_cbk, wb_fulfill_request (w1) is invoked. But, w1 is not freed (and hence not removed from liability queue) as w1 is not unwound _yet_ and a ref remains (PQ1 has not invoked wb_do_unwinds _yet_). 4. wb_fulfill_cbk (triggered by PQ2) invokes a wb_process_queue (let's say PQ3). f1 is not resumed in PQ3 as w1 is still in liability queue. At this time, PQ2 and PQ3 are complete. 5. PQ1 continues, unwinds w1 and does last unref on w1 and w1 is freed (and removed from liability queue). Since PQ1 didn't invoke wb_fulfill on any other write requests, there won't be any future codepaths that would invoke wb_process_queue and f1 is stuck forever. With this fix, w1 is removed from liability queue in step 3 above and PQ3 resumes f1 in step 4 (as there are no requests conflicting with f1 in liability queue during execution of PQ3). Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1379655 Change-Id: Idacda1fcd520ac27f30224f8dfe8360dba6ac6cb Reviewed-on: http://review.gluster.org/15579 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* performance/open-behind: Pass O_DIRECT flags for anon fd reads when requiredKrutika Dhananjay2016-09-222-39/+28
| | | | | | | | | | | | | | | | Writes are already passing the correct flags at the time of open(). Also, make io-cache honor direct-io for anon-fds with O_DIRECT flag during reads. Change-Id: I215cb09ef1b607b9f95cabf0ef3065c00edd9e78 BUG: 1377556 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/15537 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* build: out-of-tree builds generates files in the wrong directoryKaleb S KEITHLEY2016-09-1811-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And minor cleanup of a few of the Makefile.am files while we're at it. Rewrite the make rules to do what xdrgen does. Now we can get rid of xdrgen. Note 1. netbsd6's sed doesn't do -i. Why are we still running smoke tests on netbsd6 and not netbsd7? We barely support netbsd7 as it is. Note 2. Why is/was libgfxdr.so (.../rpc/xdr/src/...) linked with libglusterfs? A cut-and-paste mistake? It has no references to symbols in libglusterfs. Note3. "/#ifndef\|#define\|#endif/" (note the '\'s) is a _basic_ regex that matches the same lines as the _extended_ regex "/#(ifndef|define|endif)/". To match the extended regex sed needs to be run with -r on Linux; with -E on *BSD. However NetBSD's and FreeBSD's sed helpfully also provide -r for compatibility. Using a basic regex avoids having to use a kludge in order to run sed with the correct option on OS X. Note 4. Not copying the bit of xdrgen that inserts copyright/license boilerplate. AFAIK it's silly to pretend that machine generated files like these can be copyrighted or need license boilerplate. The XDR source files have their own copyright and license; and their copyrights are bound to be more up to date than old boilerplate inserted by a script. From what I've seen of other Open Source projects -- e.g. gcc and its C parser files generated by yacc and lex -- IIRC they don't bother to add copyright/license boilerplate to their generated files. It appears that it's a long-standing feature of make (SysV, BSD, gnu) for out-of-tree builds to helpfully pretend that the source files it can find in the VPATH "exist" as if they are in the $cwd. rpcgen doesn't work well in this situation and generates files with "bad" #include directives. E.g. if you `rpcgen ../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.x`, you get an #include directive in the generated .c file like this: ... #include "../../../../$srcdir/rpc/xdr/src/glusterfs3-xdr.h" ... which (obviously) results in compile errors on out-of-tree build because the (generated) header file doesn't exist at that location. Compared to `rpcgen ./glusterfs3-xdr.x` where you get: ... #include "glusterfs3-xdr.h" ... Which is what we need. We have to resort to some Stupid Make Tricks like the addition of various .PHONY targets to work around the VPATH "help". Warning: When doing an in-tree build, -I$(top_builddir)/rpc/xdr/... looks exactly like -I$(top_srcdir)/rpc/xdr/... Don't be fooled though. And don't delete the -I$(top_builddir)/rpc/xdr/... bits Change-Id: Iba6ab96b2d0a17c5a7e9f92233993b318858b62e BUG: 1330604 Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/14085 Tested-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* xlators/md-cache: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-09-131-3/+1
| | | | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a "pragma leak" where the generated rpc/xdr headers have a pair of pragmas that disable these warnings. With the warnings disabled, many unused variables have crept into the code base. And 14085 won't pass its own smoke test until all these warnings are fixed. BUG: 1369124 Change-Id: I5904956b2033993abee0a29ff615e058a52c9ac0 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15481 Reviewed-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* xlators/decompounder: fix unused variable warnings/errorsKaleb S. KEITHLEY2016-09-131-2/+0
| | | | | | | | | | | | | | | | | | http://review.gluster.org/14085 fixes a "pragma leak" where the generated rpc/xdr headers have a pair of pragmas that disable these warnings. With the warnings disabled, many unused variables have crept into the code base. And 14085 won't pass its own smoke test until all these warnings are fixed. BUG: 1369124 Change-Id: Ia4d11871838b749d15dd1942d0a93594e90124e0 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/15480 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
* compound fops: Some fixes to compound fops frameworkAnuradha Talur2016-08-302-90/+90
| | | | | | | | | | | Change-Id: I808fd5f9f002a35bff94d310c5d61a781e49570b BUG: 1360169 Signed-off-by: Anuradha Talur <atalur@redhat.com> Reviewed-on: http://review.gluster.org/15010 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Process all the cache invalidation flagsPoornima G2016-08-301-5/+38
| | | | | | | | | | | | | | | | Currently, md-cache only processes IATT_UPDATE_FLAGS, UP_XATTR and UP_XATTR_RM. We also need to process UP_RENAME_FLAGS, UP_FORGET, UP_PARENT_DENTRY_FLAGS and UP_NLINK_FLAGS. Otherwise the files unlinked or renamed will not be reflected on other mounts. Change-Id: Icb8b03da51482c3fc2e2a7292d16d56e11a341d9 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15324 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* dht, md-cache, upcall: Add invalidation of IATT when the layout changesPoornima G2016-08-301-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: dht_layout is built as a part of lookup only. The layout can be modified by rebalance process. Since every IO fop is preceded by a lookup, there are very less issues of stale layout. But with enhancements of aggressive caching of stats in md-cache, the lookup will reduce and expose the stale layout issue often. Solution: Since stale layout is already an issue on dht, there is already a plan to fix this at the dht layer, but this fix is not currently planned for any release. Until this fix comes out, we can have a workaround where, the upcall will send a notification to md-cache when a layout xattr is changed. As a part of layout change notification the existing cache is invalidated and the next lookup will fetch the latest layout. This is not a foolproof solution as the window between the layout change and the next lookup(after invalidation of stat), where there will be stale layout. But until the final fix comes in, this reduces the stale layout window. Change-Id: Iacf871a38b35880c1fc0bc68fe7ce291265e71d4 BUG: 1369638 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15300 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Register the list of xattrs with cache-invalidationPoornima G2016-08-303-6/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: md-cache caches a specified list of xattrs, and when cache invalidation is enabled, it makes sense to recieve invalidation only when those xattrs are modified by other clients. But the current implementation of upcall is that, it will send invalidation when any of the on-disk xattrs is modified. Solution: md-cache sends a list of xattrs that it is interested in, to upcall by issuing an ipc(). The challenge here is to make sure everytime a brick goes offline and comes back up, the ipc() needs to be issued to the bricks. Hence ipc() is sent from md-cache every time there is a CHILD_UP/CHILD_MODIFIED event. TODO: There will be patches following, in cluster xlators, to implement ipc fop. Change-Id: I6efcf3df474f5ce6eabd3d6694c00c7bd89bc25d BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15002 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Fix wrong cache time update for xattrsPoornima G2016-08-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | In md-cache, the cache has two times: 1. Time when the stat was last fetched for that inode 2. Time when the xattrs were last fetched for that inode. This time should not be updated when only one of the xattrs is updated. If, its updated when one of the cached xattr is changed, it can so happen that the other xattrs have past their cache timeout, but are still served from cache. Solution: Do not update the xattr cache time, when one of the xattrs being cached is changed. With this, we may end up in cache timeout though it was updated recently, but it is not a harm. The other way is to have timeout for every xattr that is being cached. Its more complicated, and may be not worth it, as we have lot of lookup fops, that are overloaded to get all the xattrs. Change-Id: Id77e547f403fc792348f1ea56b468b9260a5a34f BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15045 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Do not use features.cache-invalidation for both md-cache and upcallPoornima G2016-08-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the volume set option features.cache-invalidation enables upcall feature on server side and md-cache cache-invalidation on client side. There are multiple problems that can arise from this: 1. The scenario when user wants to, enable upcall for nfs-ganesha setup, but do not want to enable md-cache cache-invalidation, as the nfs-clients have already cached the metadata and upcall is used to to invalidate the nfs-client cache. In this case, users should have a way of disabling md-cache invalidation without disabling upcall. 2. Upcall requires a op-version of GD_OP_VERSION_3_7_0, where as md-cache invalidation requires an op version of GD_OP_VERSION_3_9_0. Consider a setup where the servers are in op-version GD_OP_VERSION_3_7_0, and th clients are in op-version GD_OP_VERSION_3_9_0. if there is one single volume set option, user can enable this feature in this setup. But it can lead to stale xattr cache as the xattr invalidation was introduced in upcall only in release 3.8. Hence, we should not be able to enable md-cache invalidation, if all the servers and clients are not on opversion >= GD_OP_VERSION_3_9_0. To solve the above mentioned issues, we have seperate volume options for enabling md-cache invalidation and upcall. But this can lead to issues when user enable md-cache invalidation and forgets to enable upcall. Probably in the next release, these can be enables by default. Change-Id: Ie70eff97fe12fcb623eec8f4f5861ac065bf483e BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15314 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: soumya k <skoduri@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Add cache hit and miss countersPoornima G2016-08-271-7/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | These counters can be accessed either by .meta interface or statedump. From meta: cat on the private file in md-cache directory. Eg: cat /mnt/glusterfs/0/.meta/graphs/active/patchy-md-cache/private [performance/md-cache.patchy-md-cache] stat_hit_count = 2 stat_miss_count = 8 xattr_hit_count = 4 xattr_miss_count = 3 nameless_lookup_count = 1 negative_lookup_count = 0 stat_invalidations_recieved = 1 xattr_invalidations_recieved = 1 Change-Id: Ib62a8822f263a9f75858b15832d0119fbe382629 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15185 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache: Add logging to increase debuggabilityPoornima G2016-08-152-9/+51
| | | | | | | | | | | | Change-Id: I147d16ec3c20d3372892fdd5f62010e52f82f8bd BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15069 Smoke: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* md-cache/upcall: In case of mode bit change invalidate xattrPoornima G2016-08-041-1/+4
| | | | | | | | | | | | | | | | | | | When the mode bits are changed, the ACL entries also do get affected. Currently in upcall, setattr invalidates only the stat info. With this patch, if mode bits are changed, the upcall will invalidate all the xattrs. Change-Id: Iccda2e1a7440ee845aa5442bf51970f74d9b0862 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/15043 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* io-threads: distribute work fairly among clientsJeff Darcy2016-07-283-13/+107
| | | | | | | | | | | | | This is the full "queue of queues" approach where each client gets its own queue (per priority) and we round-robin among them. Change-Id: I73955d1b9bb93f2ff781b48dfe2509009c519ec6 BUG: 1360402 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/14904 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org>
* io-threads: remove least-rate-limit option and codeJeff Darcy2016-07-282-96/+2
| | | | | | | | | | | | | | This will be unnecessary, and mostly in the way, as real fairness guarantees are implemented. Change-Id: Ic61ec1c9e9add58385f1a4eafcfe2cc554ceefc8 BUG: 1360402 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/14989 Smoke: Gluster Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: Gluster Build System <jenkins@build.gluster.org>
* md-cache: fix indention to silence CoverityNiels de Vos2016-07-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity complains about the incorrect indention: *** CID 1357867: Control flow issues (NESTING_INDENT_MISMATCH) ... 2566 if (conf->mdc_invalidation) 2567 ret = mdc_invalidate (this, data); >>> CID 1357867: Control flow issues (NESTING_INDENT_MISMATCH) >>> This 'if' statement is indented to column 25, as if it were nested within the preceding parent statement, but it is not. 2568 if (default_notify (this, event, data) != 0) 2569 ret = -1; 2570 break; ... Even when md-cache does not have cache-invalidation on, we need to pass the upcall to the next xlator. Change-Id: I6d2a174eb54e3df270920aae9673b5010c235f25 CID: 1357867 BUG: 1211863 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/14971 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> Reviewed-by: Poornima G <pgurusid@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* md-cache: Add cache invalidation support to invalidate the meta data cachePoornima G2016-07-202-23/+245
| | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: md-cache currently updates its stat in cbks of selected fops. The default cache time is 1 second, if this is increasd to reap the benefits of caching, we may end up with stale cache for long time, as there is no logic yet to notify md-cache of backend changes by another client. Solution: Use the existing upcall mechanism to invalidate the cache. For this feature to work, "features.cache-invalidation" volume option should be enabled. This patch as is doesn't improve any performance, the benifit of the patch is that it provides coherency for stat cache, hence the cache timeout can be quite longer which in turn can improve the performance. Change-Id: I2dbb0afa7b5e4a5a248f910188e0918e02f18692 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/12951 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* core: add a basis function to reduce verbose codeZhou Zhengping2016-07-181-16/+0
| | | | | | | | | | | Change-Id: Icebe1b865edb317685e93f3ef11d98fd9b2c2e9a BUG: 1357226 Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-on: http://review.gluster.org/14936 Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
* io-threads.c: add serveral fops support in io-threadsZhou Zhengping2016-07-171-8/+45
| | | | | | | | | | | | | | | | | | 1.add lease, seek, getactivelk, setactivelk support in io-threads 2.add several missing function declations in defaults.h, the declation of function default_getactivelk_failure_cbk needed by io-threads.c Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Change-Id: Ie7a56569b4c29cdc6984d31738b484d2d6d83557 BUG: 1357210 Reviewed-on: http://review.gluster.org/14935 Tested-by: Zhou Zhengping <johnzzpcrystal@gmail.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Niels de Vos <ndevos@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* "md-cache: Enable caching of stat fetched from readdirpPoornima G2016-07-111-4/+1
| | | | | | | | | | | | | | | | | | | Patch http://review.gluster.org/11894 removed readdirp fop for md-cache, but there is no mention of exact xlator which was failing because of this. As mentioned by Rafi(author of patch 11894) tiering and svc doesn't really need this as the inode_ctx is populated in readdirp_cbk. Hence reverting this commit. This reverts commit c8c9308134ae4ce24c630a1b0ccfcf4e8f9b0fe7. Change-Id: Ib8d00b3f129596f3a54984f839199175f5c9b55b BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/14879 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* md-cache: Cache gluster-samba metadataPoornima G2016-07-051-0/+30
| | | | | | | | | | | | Change-Id: I0a95f4897440c5bf6f54612d9c232e015c8bf983 BUG: 1211863 Signed-off-by: Poornima G <pgurusid@redhat.com> Reviewed-on: http://review.gluster.org/14824 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Prashanth Pai <ppai@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
* performance/decompounder: Add graph for decompounder xlatorAshish Pandey2016-06-141-0/+6
| | | | | | | | | | | | | | | | | | | | | This xlator will fall below protocol/server. This is mandatory xlator without any options. Observed that the callback for decompounder translator was not added which was causing volume start to fail. Added cbks for decompounder. Change-Id: I3e16a566376338d9c6d36d6fbc7bf295fda9f3a6 BUG: 1335019 Signed-off-by: Ashish Pandey <aspandey@redhat.com> Reviewed-on: http://review.gluster.org/13968 Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Ravishankar N <ravishankar@redhat.com> Reviewed-by: Anuradha Talur <atalur@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* __inode_ctx_put: fix mem leak on failurePrasanna Kumar Kalever2016-06-011-1/+6
| | | | | | | | | | | | | | | | | | | up on failure case __inode_ctx_put need to free the allocated memory Indirect leak of 104 byte(s) in 1 object(s) allocated from: #0 0x496669 in __interceptor_calloc (/usr/local/sbin/glusterfsd+0x496669) #1 0x7f8a288522f9 in __gf_calloc libglusterfs/src/mem-pool.c:117 #2 0x7f8a17235962 in __posix_acl_ctx_get xlators/system/posix-acl/src/posix-acl.c:308 Change-Id: I0ce6da3967c55931a70f77d8551ccf52e4cdfda3 BUG: 1338733 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Reviewed-on: http://review.gluster.org/14505 Tested-by: Prasanna Kumar Kalever <pkalever@redhat.com> Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
* readdir-ahead: Prefetch xattrs needed by md-cachePrashanth Pai2016-05-103-7/+204
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: Negative cache feature implementation in md-cache requires xattrs returned by posix to be intercepted for every call that can possibly return xattrs. This includes readdirp(). This is crucial to treat missing keys in cache as a case of negative entry (returns ENODATA) md-cache puts names of xattrs that it wants to cache in xdata and passes it down to posix which returns the specified xattrs in the callback. This is done in lookup() and readdirp(). Hence, a xattr that is cached can be invalidated during readdirp_cbk too. This is based on the assumption that readdirp() will always return all xattrs that md-cache is interested in. However, this is not the case when readdirp() call is served from readdir-ahead's cache. readdir-ahead xlator will pre-fetch dentries during opendir_cbk and readdirp. These internal readdirp() calls made by readdir-ahead xlator does not set xdata in it's requests. Hence, no xattrs are fetched and stored in it's internal cache. This causes metadata loss in gluster-swift. md-cache returns ENODATA during getxattr() call even though the xattr for that object exists on the brick. On receiving ENODATA, gluster-swift will create new metadata and do setxattr(). This results in loss of information stored in existing xattr. Fix: During opendir, md-cache will communicate to readdir-ahead asking it to store the names of xattrs it's interested in so that readdir-ahead can fetch those in all subsequent internal readdirp() calls issued by it. This stored names of xattrs is invalidated/updated on the next real readdirp() call issued by application. This readdirp() call will have xdata set correctly by md-cache xlator. BUG: 1333023 Change-Id: I32d46f93a99d4ec34c741f3c52b0646d141614f9 Reviewed-on: http://review.gluster.org/14214 Tested-by: Prashanth Pai <ppai@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Raghavendra G <rgowdapp@redhat.com>
* performance/write-behind: guaranteed retry after a short writeRaghavendra G2016-05-041-17/+70
| | | | | | | | | | | | | | | | | | * Don't mark the request with a fake EIO after a short write. * retry the remaining buffer at least once before unwinding reply to application. This way we capture correct error from backend (ENOSPC, EDQUOT etc). Thanks to "Vijaikumar Mallikarjuna"<vmallika@redhat.com> for the test script. Change-Id: I73a18b39b661a7424db1a7855a980469a51da8f9 BUG: 1292020 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/13438 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
* io-threads: add setactivelk () fopSusant Palai2016-05-011-0/+1
| | | | | | | | | | | Change-Id: I74225a39348f6bb2fbdd1513676a70019227e45f BUG: 1326085 Signed-off-by: Susant Palai <spalai@redhat.com> Reviewed-on: http://review.gluster.org/14012 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com>