Bug 1225501
Summary: | query performance does not scale | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Daniel Mach <dmach> | ||||||
Component: | libdnf | Assignee: | rpm-software-management | ||||||
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | rawhide | CC: | dmach, jmracek, jzeleny, lkocman, mluscon, mvanross, packaging-team-maint, pbrobinson, rpm-software-management, vmukhame | ||||||
Target Milestone: | --- | Keywords: | Performance, Reopened, Tracking, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libdnf-0.14 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-05-29 14:20:35 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1080837, 1156501 | ||||||||
Attachments: |
|
We would appreciate more data from profiler, please, to get it fixed. *** Bug 1272109 has been marked as a duplicate of this bug. *** This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. btw, I have to note that it's impossible to get proper performance with hawkey/libdnf/dnf for repoclosure, because they don't expose libsolv objects. If you need speed, use libsolv directly. Line # Hits Time Per Hit % Time Line Contents ============================================================== 14 @profile 15 def main(): 16 1 3801 3801.0 0.0 d = dnf.Base() 17 1 14 14.0 0.0 d.conf.cachedir = "./dnf-cache" 18 1 789 789.0 0.0 repo = dnf.repo.Repo("repo-0", d.conf) 19 1 83 83.0 0.0 repo.baseurl = "http://dl.fedoraproject.org/pub/fedora/linux/releases/22/Server/x86_64/os/" 20 1 12 12.0 0.0 d.repos.add(repo) 21 22 1 6644 6644.0 0.0 d.fill_sack(load_system_repo=False, load_available_repos=True) 23 24 25 1 25 25.0 0.0 print("DICT CACHE") 26 27 1 5 5.0 0.0 t10 = datetime.now() 28 29 1 899 899.0 0.0 RELDEP_RE = re.compile("^(?P<name>.*)( (?P<flag>[<>=]+) (?P<version>.*))?$") 30 31 1 1 1.0 0.0 pkgs_by_dep = {} # provides_name -> [pkgs] 32 1 1 1.0 0.0 pkgs_by_file = {} # /file/path -> [pkgs] 33 34 2482 7217 2.9 0.0 for pkg in d.sack.query(): 35 54090 79387 1.5 0.4 for prov in pkg.provides: 36 51609 82624 1.6 0.4 match = RELDEP_RE.match(str(prov)) 37 51609 60466 1.2 0.3 name = match.groupdict()["name"] 38 51609 70579 1.4 0.3 pkgs_by_dep.setdefault(name, set()).add(pkg) 39 172182 230086 1.3 1.0 for prov in pkg.files: 40 169701 341161 2.0 1.5 pkgs_by_file.setdefault(str(prov), set()).add(pkg) 41 42 20158 22652 1.1 0.1 for key in pkgs_by_dep: 43 20157 1316334 65.3 6.0 pkgs_by_dep[key] = d.sack.query().filter(pkg=pkgs_by_dep[key]).apply() 44 45 46 11 18 1.6 0.0 for i in range(ITERATIONS): 47 24820 129733 5.2 0.6 for pkg in d.sack.query(): 48 226700 404662 1.8 1.8 for req in pkg.requires: 49 201890 446666 2.2 2.0 match = RELDEP_RE.match(str(req)) 50 201890 280349 1.4 1.3 name = match.groupdict()["name"] 51 201890 218405 1.1 1.0 if name.startswith("/"): 52 4490 7620 1.7 0.0 pkgs_by_file.get(name, []) 53 else: 54 197400 260652 1.3 1.2 q = pkgs_by_dep.get(name, None) 55 197400 327988 1.7 1.5 if q: 56 130580 3404829 26.1 15.4 q.filter(provides=req) 57 else: 58 66820 58576 0.9 0.3 [] 59 60 1 7 7.0 0.0 t11 = datetime.now() 61 1 2 2.0 0.0 delta = t11 - t10 62 1 38 38.0 0.0 print("total: %ss" % delta.total_seconds()) 63 64 1 20 20.0 0.0 print() 65 1 9 9.0 0.0 print("-----") 66 1 7 7.0 0.0 print() 67 68 1 8 8.0 0.0 print("QUERIES") 69 70 1 4 4.0 0.0 t20 = datetime.now() 71 72 11 18 1.6 0.0 for i in range(ITERATIONS): 73 24820 79194 3.2 0.4 for pkg in d.sack.query(): 74 226700 386157 1.7 1.7 for req in pkg.requires: 75 201890 13841045 68.6 62.7 list(d.sack.query().filter(provides=req)) 76 77 1 7 7.0 0.0 t21 = datetime.now() 78 1 2 2.0 0.0 delta = t21 - t20 79 1 48 48.0 0.0 print("total: %ss" % delta.total_seconds()) This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle. Changing version to '26'. Created attachment 1307485 [details]
new reproducer working with dnf 2.x and f26
cache performance has degraded significantly (regression in libdnf/hawkey? unicode literals?)
but the overall query performance stays where it was
Query performance was fixed in upstream, to be released as part of libdnf-0.14 |
Created attachment 1030579 [details] reproducer hawkey (or libsolv?) performs sequence scan for every single query argument. This makes queries slower than on yum, that probably benefits from using database (sqlite3) backend with indexed data. Results from my test where I cached data in memory and narrowed down package sets for individual queries vs queries without caching: 1 iteration: dict cache: 2.5s <-- cache building overhead queries: 1.8s 5 iterations: dict cache: 3.9s queries: 9.5s 10 iterations: dict cache: 5.4s queries: 18.2s 20 iterations: dict cache: 9.0s queries: 36.2s 100 iterations: dict cache: 35.3s queries: 191.3s