Bug 1374511

Summary: [RFE] Efficient log downloading
Product: [Retired] Beaker Reporter: Alois Mahdal <amahdal>
Component: generalAssignee: beaker-dev-list
Status: CLOSED WONTFIX QA Contact: tools-bugs <tools-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: developCC: cbouchar, mjia
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-19 21:11:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Alois Mahdal 2016-09-08 21:35:34 UTC
Description of problem
======================

Currently, only way to download full set of job logs is:

    bkr job-log J:1234 | xargs wget

which is extremely inefficient; jobs can easily generate thousands of small logs, resulting in tens of minutes to download what's basically few tens of megs.  Especially if the server is half the globe away.


Proposal
========

Add possibility to download whole tree of logs as a single file or several files (eg. one per job).


Use case
========

Well, anybody who wants to use `bkr job-log` would probably benefit from this, but here's is my own motivation:

I'm developing a tool that downloads the job logs as a tree to enable further inspection using tools like grep etc.  I'm using `wget -x` to download all that job-log shows and then "fish out" the job folfer from there:

    https://gitlab.cee.redhat.com/amahdal/rqkit/blob/last/src/shellfu/get_tj.sh#L49

The whole process could be simpler and faster if I could just fetch a tarball and unpack it.

Comment 1 Dan Callaghan 2016-09-12 05:52:45 UTC
Unfortunately this is quite tricky to implement because Beaker would have to generate the tarball *but* it itself does not have direct access to the logs either -- they are stored on the archive server's filesystem and served up by it.

It just runs a plain Apache, no Beaker code.

I wonder if there is an Apache module which can dynamically produce tarballs of a directory?

Comment 2 Dan Callaghan 2016-09-12 05:53:47 UTC
Is there something else we could to do make the logs files able to be fetched more efficiently? Like enable HTTP/2.0, or even just enable HTTP/1.1 pipelining or so?

Or maybe anonymous read-only rsync?

Comment 3 Alois Mahdal 2018-06-15 21:38:45 UTC
Is there any possibility this could get implemented in near future?

I'm already using my tool in my own workflow but the poor efficiency makes it unreliable.  I tried it today on a larger job set: 27 jobs from Errata regression testing; and it's on job 25/27 after ~6 hours.  It's around 300 MiB--but in 7500 files.

(I don't know the answer to the above questions though.)

Comment 4 Roman Joost 2018-06-17 22:57:25 UTC
Dear Alois,

in order to implement it we would need to find a way to address your problem. That's why Dan threw around some ideas, but as outlined, the logs are served by plain "old" Apache. Nothing else.

Is your problem with efficiency of the current way, that the size of some logs are too big or the download process being sequential taking too long?

Comment 5 Alois Mahdal 2018-06-18 10:18:42 UTC
Thanks, Roman.

I'm pretty sure the problem lies in the inefficient downloading.  Currently all that bkr-client will do for me is list URLs to all files and I have to download each URL separately.

As point in case, my last download was set of 27 jobs, containing 350M data in 8262 files.  It took over 6.5 hours (17:08 - 23:45).   (The command is `wget -x -nv -i uris`, where `uris` is list given by bkr job-logs.)