summaryrefslogtreecommitdiffstats
path: root/tools/glusterfind/src/changelogdata.py
Commit message (Collapse)AuthorAgeFilesLines
* glusterfind: Speed up gfid lookup 100x by using an SQL indexNiklas Hambüchen2017-12-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #1529883. This fixes some bits of `glusterfind`'s horrible performance, making it 100x faster. Until now, glusterfind was, for each line in each CHANGELOG.* file, linearly reading the entire contents of the sqlite database in 4096-bytes-sized pread64() syscalls when executing the SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ? query through the code path: get_changes() parse_changelog_to_db() when_data_meta() gfidpath_exists() _exists() In a quick benchmark on my laptop, doing one such `SELECT` query took ~75ms on a 10MB-sized sqlite DB, while doing the same query with an index took < 1ms. Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e BUG: 1529883 Signed-off-by: Niklas Hambüchen <mail@nh2.me>
* tools/glusterfind: Fix encoding to encode only space,newline and percent charsAravinda VK2017-07-211-39/+13
| | | | | | | | | | | | | | | | | libgfchangelog was encoding path using spec rfc3986, but encoding only required for SPACE, NEWLINE and PERCENT chars since the NEWLINE char is used as record separator and SPACE as field separator in the parsed changelogs output. Changed the encoding function to encode only SPACE, NEWLINE and PERCENT chars BUG: 1451724 Change-Id: Ic1dea824d23493dedcf3db45f353f90572f4e046 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: https://review.gluster.org/17788 CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Milind Changire <mchangir@redhat.com>
* tools/glusterfind: unquote DELETE path before further processingMilind Changire2017-06-301-2/+3
| | | | | | | | | | | | | | | | | | | Problem: DELETE path is quoted before it reaches glusterfind. This wasn't handled in the glusterfind code leading to double quoting of path separator '%2F' to '%252F' i.e. the '%' character in '%2F' itself was quoted to '%25' Solution: unquote the the deleted path before further processing Change-Id: I2dfbbd7792dc0f9da5c8e02093b0f1c031ff344a BUG: 1465024 Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: https://review.gluster.org/17629 Smoke: Gluster Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.org> Reviewed-by: Aravinda VK <avishwan@redhat.com>
* tools/glusterfind: Handling Unicode file namesAravinda VK2016-03-301-39/+26
| | | | | | | | | | | | | | | | | | | | Unicode filenames handled cleanly with this patch. Changelog files and output files are opened with utf-8 encoding using codecs.open. urllib.quote_plus and unquote_plus will not handle Unicode so, encode Unicode to 8-bit string version before calling unquote. urllib.quote_plus requires 8-bit string itself so do not decode to Unicode if we need to use quote_plus(when --no-encode=false). Decode to unicode in --no-encode is set. BUG: 1319717 Change-Id: If5561c749ab5529445650d322c831eb4da22b65a Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/13798 Smoke: Gluster Build System <jenkins@build.gluster.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Reviewed-by: Kotresh HR <khiremat@redhat.com>
* tools/glusterfind: New option --no-encodeAravinda VK2016-02-261-27/+67
| | | | | | | | | | | | | | | | | | | | | | | | | New option added to skip encoding path in output file. Also handled Unicode strings. File paths can have newline characters, to differentiate between each path patch is encoded according to RFC3986(https://www.ietf.org/rfc/rfc3986.txt). Due to this consumer applications have to decode the path before consuming it. With this option Paths are not encoded, can be directly consumed by applications. Unicode encoding is handled automatically BUG: 1310080 Change-Id: I83d59831997dbd1264b48e9b1aa732c7dfc700b5 Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/13477 Smoke: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Milind Changire <mchangir@redhat.com> NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Reviewed-by: Kotresh HR <khiremat@redhat.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Venky Shankar <vshankar@redhat.com>
* tools/glusterfind: Prepend prefix in case of deleteSaravanakumar Arumugam2015-08-261-2/+6
| | | | | | | | | | | | | | | In case of delete operation alone, output prefix adding was not handled earlier. Output prefix is added now. Change-Id: Ia91444dddbff501b26a864f6185ca4c0aaf4c802 BUG: 1244144 Signed-off-by: Saravanakumar Arumugam <sarumuga@redhat.com> Reviewed-on: http://review.gluster.org/11712 Tested-by: NetBSD Build System <jenkins@build.gluster.org> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com>
* tools/glusterfind: RENAME and MODIFY issuesAravinda VK2015-07-051-11/+28
| | | | | | | | | | | | | | | | | | | If Modification happens before RENAME, GFID to Path Conversion converts it into New Path. Delete Modify Entry and insert again So that MODIFY <NEW NAME> comes after RENAME. Default value of pgfids and basenames changed to "" instead of NULL Also fixed RENAME issue of displaying "RENAME <NEW NAME> <NEW NAME>". Also fixed RENAME followed by missing MODIFY Change-Id: I8202f6e6ec33f7bd921e71da38677f2ee2dab87a BUG: 1236270 Signed-off-by: Kotresh HR <khiremat@redhat.com> Signed-off-by: Milind Changire <mchangir@redhat.com> Reviewed-on: http://review.gluster.org/11443 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Aravinda VK <avishwan@redhat.com> Tested-by: NetBSD Build System <jenkins@build.gluster.org>
* tools/glusterfind: GFID to Path conversion using ChangelogAravinda VK2015-05-081-0/+412
Records fop information collected from Changelogs in sqlite database. This is only working database, not required after processing. After post processing, output file is generated by reading these database files. This is applicable only in incremental run, When a changelog is parsed, all the details are saved in Db. GFID to Path is converted to those files for which information is available in Changelogs. For all the failed cases, it tries to convert to Path using Pgfid, if not found GFID to Path is done using find. BUG: 1201284 Change-Id: I53f168860dae15a0149004835e67f97aebd822be Signed-off-by: Aravinda VK <avishwan@redhat.com> Reviewed-on: http://review.gluster.org/10463 Reviewed-by: Kotresh HR <khiremat@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>