Bug 1374511 - [RFE] Efficient log downloading
Summary: [RFE] Efficient log downloading
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Beaker
Classification: Retired
Component: general
Version: develop
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified vote
Target Milestone: ---
Assignee: beaker-dev-list
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-08 21:35 UTC by Alois Mahdal
Modified: 2020-11-19 21:13 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-19 21:11:17 UTC


Attachments (Terms of Use)

Description Alois Mahdal 2016-09-08 21:35:34 UTC
Description of problem
======================

Currently, only way to download full set of job logs is:

    bkr job-log J:1234 | xargs wget

which is extremely inefficient; jobs can easily generate thousands of small logs, resulting in tens of minutes to download what's basically few tens of megs.  Especially if the server is half the globe away.


Proposal
========

Add possibility to download whole tree of logs as a single file or several files (eg. one per job).


Use case
========

Well, anybody who wants to use `bkr job-log` would probably benefit from this, but here's is my own motivation:

I'm developing a tool that downloads the job logs as a tree to enable further inspection using tools like grep etc.  I'm using `wget -x` to download all that job-log shows and then "fish out" the job folfer from there:

    https://gitlab.cee.redhat.com/amahdal/rqkit/blob/last/src/shellfu/get_tj.sh#L49

The whole process could be simpler and faster if I could just fetch a tarball and unpack it.

Comment 1 Dan Callaghan 2016-09-12 05:52:45 UTC
Unfortunately this is quite tricky to implement because Beaker would have to generate the tarball *but* it itself does not have direct access to the logs either -- they are stored on the archive server's filesystem and served up by it.

It just runs a plain Apache, no Beaker code.

I wonder if there is an Apache module which can dynamically produce tarballs of a directory?

Comment 2 Dan Callaghan 2016-09-12 05:53:47 UTC
Is there something else we could to do make the logs files able to be fetched more efficiently? Like enable HTTP/2.0, or even just enable HTTP/1.1 pipelining or so?

Or maybe anonymous read-only rsync?

Comment 3 Alois Mahdal 2018-06-15 21:38:45 UTC
Is there any possibility this could get implemented in near future?

I'm already using my tool in my own workflow but the poor efficiency makes it unreliable.  I tried it today on a larger job set: 27 jobs from Errata regression testing; and it's on job 25/27 after ~6 hours.  It's around 300 MiB--but in 7500 files.

(I don't know the answer to the above questions though.)

Comment 4 Roman Joost 2018-06-17 22:57:25 UTC
Dear Alois,

in order to implement it we would need to find a way to address your problem. That's why Dan threw around some ideas, but as outlined, the logs are served by plain "old" Apache. Nothing else.

Is your problem with efficiency of the current way, that the size of some logs are too big or the download process being sequential taking too long?

Comment 5 Alois Mahdal 2018-06-18 10:18:42 UTC
Thanks, Roman.

I'm pretty sure the problem lies in the inefficient downloading.  Currently all that bkr-client will do for me is list URLs to all files and I have to download each URL separately.

As point in case, my last download was set of 27 jobs, containing 350M data in 8262 files.  It took over 6.5 hours (17:08 - 23:45).   (The command is `wget -x -nv -i uris`, where `uris` is list given by bkr job-logs.)


Note You need to log in before you can comment on or make changes to this bug.