Bug 962901 - console.log is incomplete/damaged after it is moved to archive
Summary: console.log is incomplete/damaged after it is moved to archive
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Retired
Component: lab controller
Version: 0.12
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 0.13
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard: LogStorage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-14 18:11 UTC by Jan Stancek
Modified: 2018-02-06 00:41 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-25 06:27:34 UTC
Embargoed:


Attachments (Terms of Use)

Description Jan Stancek 2013-05-14 18:11:06 UTC
Description of problem:
console.log is incomplete/damaged after it is archived and contains a lot of null byte characters (0x00). The size of log appears to match the file in conserver, but majority of output is missing.

The file stored in conserver (/var/consoles/) is OK. Output in my terminal, while attached to conserver while recipe is running is OK. Problem happens after recipe completes and console.log is moved to beaker-archive.

Bill mentioned, following commit could be related:
http://git.beaker-project.org/cgit/beaker/commit/?id=913b112b79eb9013b8efad34e141b5e3e073dee1

Version-Release number of selected component (if applicable):
0.12

How reproducible:
sporadically

Steps to Reproduce:
1. unknown, happens randomly

Actual results:
console.log is incomplete

Expected results:
archived console.log matches console log in conserver

Additional info:

Comment 2 Dan Callaghan 2013-05-14 22:57:34 UTC
(In reply to comment #1)

The version of console.log in Beaker has 378090 NUL bytes, and then:

[-- Console down -- Tue May 14 11:48:55 2013] 
[-- Console up -- Tue May 14 11:48:55 2013] 

followed by some normal-looking output.

The version from conserver has exactly 378090 bytes of normal-looking output, and then the same down/up tags (when the box apparently rebooted):

[-- Console down -- Tue May 14 11:48:55 2013]
[-- Console up -- Tue May 14 11:48:55 2013]

followed by the same normal-looking output.

Comment 4 Nick Coghlan 2013-05-15 03:38:16 UTC
This turned out to be due to a configuration issue with the affected Beaker instance rather than a problem in Beaker itself.

Comment 5 Dan Callaghan 2013-05-15 03:49:19 UTC
On second thought, this problem pre-dates the iptables configuration issue by about four hours, and it doesn't match the failure mode observed (connections severed without TCP RST) so I think this bears more investigation.

Comment 7 Dan Callaghan 2013-05-15 23:49:23 UTC
It seems that beaker-transfer is transferring the logs for the same recipe multiple times. (Thank you Jan for suggesting this possibility!) The recipe finished at 2013-05-14 11:48:16 -04:00, and then:

2013-05-14 11:48:37,906 bkr.server.xmlrpccontroller DEBUG Time: 0:00:00.146802 recipes.change_files (876178, 'http://beaker-archive.app.eng.
[...]
2013-05-14 11:48:52,374 bkr.server.xmlrpccontroller DEBUG Time: 0:00:00.104982 recipes.change_files (876178, 'http://beaker-archive.app.eng.
[...]
2013-05-14 11:49:13,352 bkr.server.xmlrpccontroller DEBUG Time: 0:00:00.102350 recipes.change_files (876178, 'http://beaker-archive.app.eng.

The first time everything is fine, and the logs are deleted from the LC's filesystem. But the watchdog might not have noticed that the recipe is finished yet, so it would still be writing to the console log. It will seek to what it thinks is the end of the file and append more chunks to it, causing the start to be filled with NUL bytes. Then the logs are transferred a second time, overwriting the good console log.

The reason why the logs would be transferred twice is because of a change in 0.12 (so this bug is a regression in 0.12): now the log storage code registers a file with the scheduler whenever it writes it and the file didn't exist. So beaker-transfer removes the local console.log, then beaker-watchdog (which hasn't noticed the recipe finished) writes it again and registers the file again because it is newly created. Now the scheduler thinks that the recipe has un-transferred files on the lab controller so it becomes eligible to be transferred by beaker-transfer again.

Some ideas for fixing this:

* The scheduler should refuse to register new files against a recipe which is finished (see also bug 962253).

* We could revert to the pre-0.12 behaviour for LC log storage, so that it registers the log file with the scheduler only when byte 0 is written. The reason for changing this in 0.12 was to simplify the API for alternative harnesses: if writing byte 0 has special behaviour then the alternative harnesses need to be aware they must always upload files starting from byte 0. However this is not too serious a limitation, it would just need to be documented properly.

* We could delay transferring logs for a little while after the recipe finishes, to give beaker-watchdog a chance to notice and stop copying the console log. They delay would need to be at least 60 seconds, which is the polling interval of beaker-watchdog.

* It would be nice if beaker-watchdog would stop watching the console immediately the recipe finishes, but I'm not sure of any practical way to do that.

Comment 8 Nick Coghlan 2013-05-16 00:25:00 UTC
I'm inclined to go with the model where we simply reject the upload for a finished task, since we have to block those anyway for bug 962253.

As far as I can see, that's not a regression relative to 0.12, as in that case the late update would have happened after the transfer to the archive server, and thus the file with the NULs would stay on the LC instead of being transferred.

Comment 10 Dan Callaghan 2013-05-21 02:13:31 UTC
(In reply to Nick Coghlan from comment #8)
> I'm inclined to go with the model where we simply reject the upload for a
> finished task, since we have to block those anyway for bug 962253.

I think we can leave that for bug 962253.

To fix this, I would like to do two things: add a delay for beaker-transfer, so that it waits a minute or two after the recipe finishes before it transfers the logs; and Jan's patch (or a version of it), so that WatchFile only creates the console log once.

Comment 16 Amit Saha 2013-06-25 06:27:34 UTC
Beaker 0.13.1 has been released.


Note You need to log in before you can comment on or make changes to this bug.