Hide Forgot
Description of problem: Bad Data in Beaker test results. Not sure where the responsibility is here: Its my understanding that Conserver provides the data and the Lab Controller copies the data. Seems the console logs were not cleared prior to my test run. The console logs in my test run contained data from a previous test run. A PANIC was reported due to stale data in console logs. This issue was seen in two different jobs: https://beaker.engineering.redhat.com/jobs/67673 intel-urbanna-01.lab.bos.redhat.com and https://beaker.engineering.redhat.com/jobs/67675 dell-pet410-01.lab.bos.redhat.com Looking at https://beaker.engineering.redhat.com/jobs/67673: Looking at the test results there is a report of a PANIC on a "oops". Digging into the console logs for the test one sees the following: logger: /usr/bin/rhts-test-runner.sh rhts-test-checkin 127.0.0.1:7093 intel-urbanna-01.lab.eng.bos.redhat.com 67367 /kernel/kdump/crash-sysrq-c 5400 1496111 03/28/11 19:53:50 JobID:67367 Test:/kernel/kdump/crash-sysrq-c Response:1 logger: /usr/bin/rhts-test-runner.sh rhts-test-update 127.0.0.1:7093 1496111 start rh-tests-kernel-kdump-crash-sysrq-c 03/28/11 19:53:50 testID:1496111 start: /mnt/tests/kernel/kdump/crash-sysrq-c /mnt/tests/kernel/kdump/crash-sysrq-c SysRq : Trigger a crash BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81319816>] sysrq_handle_crash+0x16/0x20 PGD 3548ab067 PUD 35df66067 PMD 0 Oops: 0002 [#1] SMP *This test does not issue a crash-sysrq-c. *Seems the logs were not cleared prior to the testing ========================================================= Investigating further... These system are on different serial consoles and are using standard serial port connection not a management interface (ie. iDRAC): [pbunyan]$ console -i dell-pet410-01.lab.bos.redhat.comdell-pet410-01.lab.bos.redhat.com:conserver-01.eng.bos.redhat.com,4391,40826:!:serial-l2e13.mgmt.lab.eng.bos.redhat.com,7030,telnet,27::up:rw:/var/consoles/pub/dell-pet410-01.lab.bos.redhat.com,log,act,brk,300,26:1:noautoup::ixon,ixoff,autoreinit,login::0:\n [pbunyan]$ console -i intel-urbanna-01.lab.bos.redhat.com intel-urbanna-01.lab.bos.redhat.com:conserver-01.eng.bos.redhat.com,4281,40817:!:serial-l1a.mgmt.lab.eng.bos.redhat.com,7025,telnet,15::up:rw:/var/consoles/pub/intel-urbanna-01.lab.bos.redhat.com,log,act,brk,300,14:1:noautoup::ixon,ixoff,autoreinit,login::0:\n [pbunyan]$ Connecting to netdump-01.eng.bos.redhat.com later in the day and viewing the current console data for these systems. I see the time stamps in the data as current. I spoke with BillP and he investigated the issue. He saw nothing definitive. BillP confirmed there was an issue but things seemed to be working now. Best, -pbunyan How reproducible: I cannot reproduce, though I have two instances of the issue. Actual results: Bad Data in Beaker test results Expected results: Data from current Job only in the logs. Additional info: I have opened an the following RT ticket: https://engineering.redhat.com/rt3/Ticket/Display.html?id=106184
Looks like the console was "suspended" (or buffered) for a while and you got error from the previous incarnation: https://beaker.engineering.redhat.com/recipes/138193#task1496111 I have no idea how this happened... Bill, does LC do any buffering, or is this more likely a conserver issue? Shall LC try to connect and flush any captured console output before starting a new job?
When the scheduler schedules a machine to run it makes a call to the labcontroller cobbler instance to clear the console logs *if* they exist. It looks like the lab controller didn't see any logs at the moment we called the command to clear. The lab controller watchdog process could clear the log when it starts monitoring a log but then it would cause problems if the watchdog process was restarted while a recipe was running, you would lose all existing data from before the restart. But maybe that is better?
> How reproducible: > I cannot reproduce, though I have two instances of the issue. I am afraid this may be more common but would go unnoticed unless there was a panic stuck in the queue as there were in your case. I would look for the issue in console.log of few jobs which had started at about the same time.
This issue is resolved. Existing issue is being tracked in a new BZ Bug 837300 - [Beaker] Console Logs are not being cleared properly between automated jobs https://bugzilla.redhat.com/show_bug.cgi?id=837300