1622791 – [RFE] rewrite beaker-repo-update to not use yum/urlgrabber, improve it

Bug 1622791 - [RFE] rewrite beaker-repo-update to not use yum/urlgrabber, improve it

Summary: [RFE] rewrite beaker-repo-update to not use yum/urlgrabber, improve it

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Beaker
Classification:	Retired
Component:	general
Sub Component:
Version:	25
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	27.0
Assignee:	Renan Rodrigo Barbosa
QA Contact:	tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-28 03:46 UTC by Dan Callaghan
Modified:	2019-11-27 15:32 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-11-27 15:32:56 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Beaker Project Gerrit	6678	0	'None'	'NEW'	'Change repo-update yum-based commands to dnf'	2019-11-25 03:22:00 UTC

Description Dan Callaghan 2018-08-28 03:46:22 UTC

Currently beaker-repo-update is implemented as a kind of copy-paste inspired by the original /usr/bin/reposync from Yum, with a bunch of extra behaviour. But yum is EOL, and also the code has a lot of problematic behaviour (see bug 1619969).

We should rewrite to use either:

* librepo + libdnf/libsolv, assuming its API gives us what we need, and assuming by the time we tackle this RFE we are no longer targetting RHEL6; or

* requests + our own minimal repodata parser using lxml.

Existing behaviour that must be preserved:

* only try to sync OS majors that have existing trees in Beaker

* grab repodata from under the given URL and fetch the *latest* version of each package for each OS major

* write the package to disk in /var/www/beaker/harness

* if any package changed, regenerate repodata using createrepo_c

* if there is no upstream harness repo for a given OS major, print a non-scary warning (for eaxmple Atomic and RHVH4 etc)

Some sorely needed improvements that the current version can't do:

* re-use the package on disk only if it matches the expected checksum

* verify checksum (against the repodata checksum) after downloading a package

* ignore OS majors that are entirely absent but *don't* ignore other errors like write failures or checksum failures -- this should be an immediate hard error with a good message, so that it's not lost amongst the spew as subsequent repos get downloaded

* write out packages with the proper atomic rename dance, so that an interrupted download does not leave an incomplete package

Comment 1 Dan Callaghan 2018-08-28 03:49:28 UTC

Another improvement to the list:

* don't use any cached repodata, or only use cached repodata after first checking repomd.xml and ensuring that the repodata we are reusing matches the checksum of the current repodata.

In bug 1619969 I noticed that in certain error conditions, Yum falls back to using a local cache of the repodata left behind in /var/tmp/yum-* but that is never what we want.

Comment 2 Dan Callaghan 2018-08-28 03:50:28 UTC

I've implemented some of the above improvements within the existing Yum-based command for bug 1619969, but it would still be nicer if we can clean it up and simplify it by avoiding the Yum APIs which are very difficult to use correctly.

Comment 3 Dan Callaghan 2018-08-28 07:16:17 UTC

There is another issue I noticed while working on this. I don't think we have ever hit it so it is purely theoretical, but it would be good to fix it up in this new implementation.

If you interrupt beaker-repo-update while it's running createrepo, or if the createrepo fails for some reason, and then you re-run beaker-repo-update to do it again -- it won't actually do it. That's because, as an optimsation, it avoids running createrepo if it hasn't downloaded any new packages in a repo.

I am not sure what the best way to keep that optimisation while avoiding this problem if createrepo fails. But some ideas come to mind like, checking modtime on the repodata directory, or using a .dirty marker file whenever a new package is downloaded and then only removing the marker file after createrepo has been successfully executed.

Note You need to log in before you can comment on or make changes to this bug.