Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1374511

Summary:	[RFE] Efficient log downloading
Product:	[Retired] Beaker	Reporter:	Alois Mahdal <amahdal>
Component:	general	Assignee:	beaker-dev-list
Status:	CLOSED WONTFIX	QA Contact:	tools-bugs <tools-bugs>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	develop	CC:	cbouchar, mjia
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-11-19 21:11:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Alois Mahdal 2016-09-08 21:35:34 UTC

Description of problem
======================

Currently, only way to download full set of job logs is:

    bkr job-log J:1234 | xargs wget

which is extremely inefficient; jobs can easily generate thousands of small logs, resulting in tens of minutes to download what's basically few tens of megs.  Especially if the server is half the globe away.


Proposal
========

Add possibility to download whole tree of logs as a single file or several files (eg. one per job).


Use case
========

Well, anybody who wants to use `bkr job-log` would probably benefit from this, but here's is my own motivation:

I'm developing a tool that downloads the job logs as a tree to enable further inspection using tools like grep etc.  I'm using `wget -x` to download all that job-log shows and then "fish out" the job folfer from there:

    https://gitlab.cee.redhat.com/amahdal/rqkit/blob/last/src/shellfu/get_tj.sh#L49

The whole process could be simpler and faster if I could just fetch a tarball and unpack it.

Comment 1 Dan Callaghan 2016-09-12 05:52:45 UTC

Unfortunately this is quite tricky to implement because Beaker would have to generate the tarball *but* it itself does not have direct access to the logs either -- they are stored on the archive server's filesystem and served up by it.

It just runs a plain Apache, no Beaker code.

I wonder if there is an Apache module which can dynamically produce tarballs of a directory?

Comment 2 Dan Callaghan 2016-09-12 05:53:47 UTC

Is there something else we could to do make the logs files able to be fetched more efficiently? Like enable HTTP/2.0, or even just enable HTTP/1.1 pipelining or so?

Or maybe anonymous read-only rsync?

Comment 3 Alois Mahdal 2018-06-15 21:38:45 UTC

Is there any possibility this could get implemented in near future?

I'm already using my tool in my own workflow but the poor efficiency makes it unreliable.  I tried it today on a larger job set: 27 jobs from Errata regression testing; and it's on job 25/27 after ~6 hours.  It's around 300 MiB--but in 7500 files.

(I don't know the answer to the above questions though.)

Comment 4 Roman Joost 2018-06-17 22:57:25 UTC

Dear Alois,

in order to implement it we would need to find a way to address your problem. That's why Dan threw around some ideas, but as outlined, the logs are served by plain "old" Apache. Nothing else.

Is your problem with efficiency of the current way, that the size of some logs are too big or the download process being sequential taking too long?

Comment 5 Alois Mahdal 2018-06-18 10:18:42 UTC

Thanks, Roman.

I'm pretty sure the problem lies in the inefficient downloading.  Currently all that bkr-client will do for me is list URLs to all files and I have to download each URL separately.

As point in case, my last download was set of 27 jobs, containing 350M data in 8262 files.  It took over 6.5 hours (17:08 - 23:45).   (The command is `wget -x -nv -i uris`, where `uris` is list given by bkr job-logs.)