Bug 1535852

Summary: glusterfind is extremely slow if there are lots of changes
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Milind Changire <mchangir>
Component: glusterfindAssignee: Milind Changire <mchangir>
Status: CLOSED ERRATA QA Contact: Vinayak Papnoi <vpapnoi>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, avishwan, bugs, khiremat, nh2-redhatbugzilla, rhs-bugs, sheggodu, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: RHGS 3.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.12.2-5 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1529883 Environment:
Last Closed: 2018-09-04 06:40:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1529883    
Bug Blocks: 1503138    

Description Milind Changire 2018-01-18 07:41:19 UTC
+++ This bug was initially created as a clone of Bug #1529883 +++

Description of problem:

I noticed that my glusterfind on 3.12.3 ran for 100s of hours straight without terminating.

A quick strace showed that there were tons of pread64() syscalls in between each open() of a CHANGELOG.* file.

Looking in /proc/$(pidof glusterfind)/fd, I found that the file it's pread64()ing from is the `tmp_output_1` sqlite file. It was clearly reading the entire database in via those syscalls for each *line* of each CHANGELOG.* file.

To make it very clear, it was doing:

for each CHANGELOG file:
  for each line in that file:
     read in the entire SQL database contents (9 MB in my case)

Looking into the code, it beacame clear that there's a simple check implemented in glusterfind whether some line of a CHANGELOG.* file is already in the DB. That is done by checking whether some `gfid` is already in the `gfid` column.

Unfortunately that column didn't have an SQL index defined, thus resulting in a full scan over the database for each check if the line already exists.

If you use sqlite you must really make sure to use indexes, because otherwise any O(1) or O(log n) operation turns into a O(n) operation, thus giving glusterfind O(n²) complexity.

I will submit a patch.

It makes glusterfind 150x faster for me.

--- Additional comment from Worker Ant on 2017-12-31 02:08:54 IST ---

REVIEW: https://review.gluster.org/19114 (glusterfind: Speed up gfid lookup 100x by using an SQL index) posted (#1) for review on master by Niklas Hambüchen

--- Additional comment from Worker Ant on 2017-12-31 15:03:38 IST ---

COMMIT: https://review.gluster.org/19114 committed in master by  with a commit message- glusterfind: Speed up gfid lookup 100x by using an SQL index

Fixes #1529883.

This fixes some bits of `glusterfind`'s horrible performance,
making it 100x faster.

Until now, glusterfind was, for each line in each CHANGELOG.* file,
linearly reading the entire contents of the sqlite database in
4096-bytes-sized pread64() syscalls when executing the

  SELECT COUNT(1) FROM %s WHERE 1=1 AND gfid = ?

query through the code path:

  get_changes()
    parse_changelog_to_db()
      when_data_meta()
        gfidpath_exists()
          _exists()

In a quick benchmark on my laptop, doing one such `SELECT` query
took ~75ms on a 10MB-sized sqlite DB, while doing the same query
with an index took < 1ms.

Change-Id: I8e7fe60f1f45a06c102f56b54d2ead9e0377794e
BUG: 1529883
Signed-off-by: Niklas Hambüchen <mail>

Comment 1 Sunil Kumar Acharya 2018-01-18 08:05:13 UTC
Upstream Pach: https://review.gluster.org/19114

Comment 6 Vinayak Papnoi 2018-05-02 06:32:03 UTC
Tested the glusterfind run on 3.4.0 vs 3.3.1 for the same volume topology and same I/O operation (in this case linux kernel untar, approximately 1.9GB post untar).

Following are my observations:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Build : glusterfs-3.8.4-54.6.el7rhgs.x86_64

[root@dhcp42-39 ~]# gluster v info
 
Volume Name: alpha
Type: Distributed-Replicate
Volume ID: cfea6bcc-13ec-4305-b580-6f8e4a391cf8
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.42.39:/bricks/brick0/alpha-b1
Brick2: 10.70.42.22:/bricks/brick0/alpha-b2
Brick3: 10.70.42.21:/bricks/brick0/alpha-b3
Brick4: 10.70.42.14:/bricks/brick0/alpha-b4
Brick5: 10.70.42.15:/bricks/brick0/alpha-b5
Brick6: 10.70.42.16:/bricks/brick0/alpha-b6
Options Reconfigured:
changelog.capture-del-path: on
changelog.changelog: on
storage.build-pgfid: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: enable
cluster.brick-multiplex: enable
cluster.max-bricks-per-process: 20
nfs-ganesha: enable

[root@dhcp42-39 scripts]# time glusterfind pre session1 alpha /tmp/outfilenew.txt --regenerate-outfile
Generated output file /tmp/outfilenew.txt

real	7m14.345s
user	4m54.421s
sys	2m2.892s

-------------------------------


Build : glusterfs-3.12.2-8.el7rhgs.x86_64

[root@dhcp43-18 ~]# gluster v info
 
Volume Name: alpha
Type: Distributed-Replicate
Volume ID: 925b3b99-4742-4ecd-b462-acb6aec1d1a9
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: 10.70.43.18:/bricks/brick0/alpha-b1
Brick2: 10.70.43.9:/bricks/brick0/alpha-b2
Brick3: 10.70.43.2:/bricks/brick0/alpha-b3
Brick4: 10.70.42.240:/bricks/brick0/alpha-b4
Brick5: 10.70.43.4:/bricks/brick0/alpha-b5
Brick6: 10.70.42.243:/bricks/brick0/alpha-b6
Options Reconfigured:
changelog.capture-del-path: on
changelog.changelog: on
storage.build-pgfid: on
nfs.disable: on
snap-activate-on-create: disable
auto-delete: disable
cluster.brick-multiplex: enable
cluster.enable-shared-storage: disable

[root@dhcp43-18 ~]# time glusterfind pre session1 alpha /tmp/outfilenew.txt
Generated output file /tmp/outfilenew.txt

real    0m7.588s
user    0m1.751s
sys     0m0.998s

-------------------------------

As seen above, the 3.4.0 build was able to run glusterfind about 60 times faster than 3.3.1.

Hence, moving bug to VERIFIED.

Comment 8 errata-xmlrpc 2018-09-04 06:40:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607