Bug 1099917

Summary: Optimize loading libosinfo
Product: [Community] Virtualization Tools Reporter: Cole Robinson <crobinso>
Component: libosinfoAssignee: Matthias Clasen <mclasen>
Status: CLOSED DEFERRED QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: berrange, cfergeau, fidencio, gscrivan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 18:35:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Cole Robinson 2014-05-21 13:50:24 UTC
# time python -c 'from gi.repository import Libosinfo; l = Libosinfo.Loader(); l.process_default_path()'

real	0m0.510s
user	0m0.466s
sys	0m0.038s

With warm cache. Stemmed from a discussion here:

http://www.redhat.com/archives/virt-tools-list/2014-May/msg00048.html

And Dan's comment over at bug 500320#c14 :

(In reply to Daniel Berrange from comment #14)
> Can you file a bug against libosinfo to optimize this. I'd like to think we
> can also reduce that time penalty, even if that means we have to cache the
> data in a more efficient format than XML, and only reload XML files when
> they change.

Comment 1 Zeeshan Ali 2014-05-21 16:19:57 UTC
Thanks for filing this. I had been thinking about this every now and then but never came up with any concrete ideas. One idea I had was:

1. Also allow data to be provided in JSON format.
2. Whenever libosinfo parses data from .xml, it write it into a json file, in ~/cache/libosinfo (or some other location) with the name SHA256_OF_XML_FILE_PATH.json.
3. Before loading a .xml file, check if a corresponding JSON file exists and is not newer than .xml file. If so, load from JSON file intead. Otherwise, do the same as #1.

What do you guys think?

Comment 2 Daniel Berrangé 2014-05-21 16:38:33 UTC
It depends whether parsing JSON is actually faster than parsing XML or not :-)

Actually, what would be better is to actually profile libosinfo to see exactly where the slowness is. Perhap it isn't even the XML parsing that's the problem !

Comment 3 Zeeshan Ali 2014-05-21 22:52:11 UTC
(In reply to Daniel Berrange from comment #2)
> It depends whether parsing JSON is actually faster than parsing XML or not
> :-)

Surely we need to test and measure but based on my experience with both, i'm betting parsing json is a lot faster than parsing of XML. I might be wrong about the difference being significant enough though.

> Actually, what would be better is to actually profile libosinfo to see
> exactly where the slowness is. Perhap it isn't even the XML parsing that's
> the problem !

Yeah, even though it seems unlikely the culprit is something else, we really should start with that.

Comment 4 Giuseppe Scrivano 2014-07-31 07:53:24 UTC
upstream master now should be around 30% faster than 0.2.10.

The functions osinfo_loader_process_file_reg_usb and osinfo_loader_process_file_reg_pci take a lot of time and it seems that the reason is in the cost of creating gobjects: g_object_new is quite expensive and called many times.

Comment 6 Cole Robinson 2018-09-04 18:35:53 UTC
Things can always get faster but I don't think keeping this bug open is going to motivate any more change, so closing it