| Summary: | [RFE] Efficient log downloading | ||
|---|---|---|---|
| Product: | [Retired] Beaker | Reporter: | Alois Mahdal <amahdal> |
| Component: | general | Assignee: | beaker-dev-list |
| Status: | CLOSED WONTFIX | QA Contact: | tools-bugs <tools-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | develop | CC: | cbouchar, mjia |
| Target Milestone: | --- | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Enhancement | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-19 21:11:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Unfortunately this is quite tricky to implement because Beaker would have to generate the tarball *but* it itself does not have direct access to the logs either -- they are stored on the archive server's filesystem and served up by it. It just runs a plain Apache, no Beaker code. I wonder if there is an Apache module which can dynamically produce tarballs of a directory? Is there something else we could to do make the logs files able to be fetched more efficiently? Like enable HTTP/2.0, or even just enable HTTP/1.1 pipelining or so? Or maybe anonymous read-only rsync? Is there any possibility this could get implemented in near future? I'm already using my tool in my own workflow but the poor efficiency makes it unreliable. I tried it today on a larger job set: 27 jobs from Errata regression testing; and it's on job 25/27 after ~6 hours. It's around 300 MiB--but in 7500 files. (I don't know the answer to the above questions though.) Dear Alois, in order to implement it we would need to find a way to address your problem. That's why Dan threw around some ideas, but as outlined, the logs are served by plain "old" Apache. Nothing else. Is your problem with efficiency of the current way, that the size of some logs are too big or the download process being sequential taking too long? Thanks, Roman. I'm pretty sure the problem lies in the inefficient downloading. Currently all that bkr-client will do for me is list URLs to all files and I have to download each URL separately. As point in case, my last download was set of 27 jobs, containing 350M data in 8262 files. It took over 6.5 hours (17:08 - 23:45). (The command is `wget -x -nv -i uris`, where `uris` is list given by bkr job-logs.) |
Description of problem ====================== Currently, only way to download full set of job logs is: bkr job-log J:1234 | xargs wget which is extremely inefficient; jobs can easily generate thousands of small logs, resulting in tens of minutes to download what's basically few tens of megs. Especially if the server is half the globe away. Proposal ======== Add possibility to download whole tree of logs as a single file or several files (eg. one per job). Use case ======== Well, anybody who wants to use `bkr job-log` would probably benefit from this, but here's is my own motivation: I'm developing a tool that downloads the job logs as a tree to enable further inspection using tools like grep etc. I'm using `wget -x` to download all that job-log shows and then "fish out" the job folfer from there: https://gitlab.cee.redhat.com/amahdal/rqkit/blob/last/src/shellfu/get_tj.sh#L49 The whole process could be simpler and faster if I could just fetch a tarball and unpack it.