Bug 1804308

Summary: [RFE] As a user, I can parse repodata without having it on disk
Product: Red Hat Enterprise Linux 8 Reporter: Tanya Tereshchenko <ttereshc>
Component: createrepo_cAssignee: amatej
Status: CLOSED ERRATA QA Contact: Jan Blazek <jblazek>
Severity: unspecified Docs Contact: Mariya Pershina <mpershin>
Priority: medium    
Version: 8.3CC: amatej, dalley, mblaha, mpershin, pkratoch
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: createrepo_c-0.15.10-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 03:09:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tanya Tereshchenko 2020-02-18 16:27:31 UTC
Background
==========

To provide a migration path for Satellite/Pulp users, Pulp needs to migrate content from old version to the newer one.
To achieve that an xml snippet of each repodata for each package needs to be parsed and converted into createrepo_c object.

Createrepo_c offers parsers for primary/other/filelists which require a path to a file. Current options for Satellite/Pulp are:

 - [in memory] write our own xml parser to parse each snippet and create a createrepo_c object from that data 
 - [write to a file ] for each package wrap each of xml snippets to make them a valid xml and write them to 3 files, and then use createrepo_c parser   
        cr.xml_parse_primary(primary_xml_path, pkgcb=pkgcb, do_files=False)
        cr.xml_parse_filelists(filelists_xml_path, newpkgcb=newpkgcb)
        cr.xml_parse_other(other_xml_path, newpkgcb=newpkgcb)
 - [write to a file] optimized version of previous one - process packages in batches and not one by one - wrap and write xml snippets of multiple packages into 3 files and then use createrepo_c parser as in the previous option,

NOTE: When I say createrepo_c, I mean createrepo_c Python bindings.


Request
=======

Provide a way to get a createrepo_c Package object(s) from xml snippets of repodata without needing to write files to disk.

Timeline
========

This problem (in Satellite/Pulp) needs to be solved this way or the other in the upstream by the end of March (changes in the upstream for createrepo_c will work for us). 

Please, let us know soon if the functionality is getting added to the createrepo_c or Pulp needs to solve it in some way themselves.

Comment 1 amatej 2020-02-19 09:39:40 UTC
We have discussed this with dmach a bit, but can you further outline what the migration process will look like? Such as what happens to the package object after its created from the xml snippet, how it will be stored or even later used after the migration is finished?

Comment 2 Tanya Tereshchenko 2020-02-19 16:04:14 UTC
It is saved into db through the object very similar to createrepo_c Package object https://github.com/pulp/pulp_rpm/blob/08dfe9e2f24e1ab960971e6892ffd2975719ae71/pulp_rpm/app/models/package.py#L19. That's it.

Comment 4 amatej 2020-02-24 13:12:49 UTC
I have created a PR: https://github.com/rpm-software-management/createrepo_c/pull/210
It also contains some tests.

Can you check whether the new functions work for you? Basically they should take a snippet string of repodata xml instead of a path to a file. By snippet here I mean that it contains just <package> elements, not the root element such as <filelists>.

Comment 5 Daniel Alley 2020-03-11 17:11:14 UTC
@amatej, sorry for the late feedback, it has been a very busy time for everyone and we haven't had time to experiment with the code much.

It appears that the PR has already been merged and released with 15.8, but there are some issues.

https://github.com/rpm-software-management/createrepo_c/pull/210#issuecomment-597751016
https://github.com/rpm-software-management/createrepo_c/pull/212

Comment 6 Daniel Alley 2020-03-11 17:40:54 UTC
Disregard the first link, I eventually realized that I had made a mistake myself. The second issue is really just a nice-to-have.

I'll continue testing and see if I find any (actual) issues.

Comment 7 amatej 2020-03-16 06:52:23 UTC
Ok great.

Also I see your second issue (PR) is already merged.

Comment 18 errata-xmlrpc 2020-11-04 03:09:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (createrepo_c bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4700