Bug 1062867 - virt/install fails to submit logs and proceed with next task
Summary: virt/install fails to submit logs and proceed with next task
Keywords:
Status: CLOSED DUPLICATE of bug 1065257
Alias: None
Product: Beaker
Classification: Retired
Component: beah
Version: 0.15
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: Dan Callaghan
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-08 10:04 UTC by Jan Stancek
Modified: 2018-02-06 00:41 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-17 03:02:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1063090 0 unspecified CLOSED [RFE] Add "beah_rpm" ks_meta variable to force particular beah version 2021-02-22 00:41:40 UTC

Internal Links: 1063090

Description Jan Stancek 2014-02-08 10:04:56 UTC
Description of problem:
On some RHEL7 system I see that /virt/install installs all guests, but it fails to proceed with next task. When I log into host I can see in guest console logs, that both guests installed fine.

This is what I see on host console logs:
======================================================================
2014-02-08 04:11:27,880 rhts_task checkin_finish: INFO resetting nohup 
02/08/14 04:11:27  testID:19052514 finish: 
2014-02-08 04:11:27,894 rhts_task task_exited: INFO task_exited([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. 
]) 
2014-02-08 04:11:27,894 rhts_task on_exit: INFO quitting... 
2014-02-08 04:11:27,895 rhts_task task_ended: INFO task_ended([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. 
]) 
2014-02-08 04:11:28,918 beah processExited: INFO TaskStdoutProtocol:processExited([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. 
]) 
2014-02-08 04:11:28,918 beah processEnded: INFO TaskStdoutProtocol:processEnded([Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessDone'>: A process has ended without apparent errors: process finished with exit code 0. 
]) 
2014-02-08 04:11:28,938 beah task_finished: INFO Task 6062cbda-0a54-4dbd-aa34-922ed5fa7a17 has finished. 
2014-02-08 04:11:28,939 backend async_proc: INFO Task 19052514 done. Submitting logs... 
[-- MARK -- Sat Feb  8 09:15:00 2014] 
[-- MARK -- Sat Feb  8 09:20:00 2014] 
[-- MARK -- Sat Feb  8 09:25:01 2014] 
[-- MARK -- Sat Feb  8 09:30:00 2014] 
[-- MARK -- Sat Feb  8 09:35:00 2014] 
[-- MARK -- Sat Feb  8 09:40:00 2014] 
[-- MARK -- Sat Feb  8 09:45:00 2014] 
[-- MARK -- Sat Feb  8 09:50:00 2014] 
======================================================================

Looking at processes, harness doesn't have any child processes:

 3110 ?        Ss     0:00 /usr/sbin/crond -n
 3111 ?        Ss     0:00 /usr/bin/python /usr/bin/beah-srv
 3112 ?        Ss     0:00 /usr/bin/python /usr/bin/beah-beaker-backend
 3113 ?        Ss     0:00 /usr/bin/python /usr/bin/beah-fwd-backend
 3379 ?        Ssl    0:00 /usr/sbin/libvirtd
 3405 ?        Ssl    0:00 /usr/sbin/automount --pid-file /run/autofs.pid
 3583 ?        Sl     0:00 Xvfb :1 -screen 0 1600x1200x24 -fbdir /tmp
 3709 ?        Ss     0:00 /usr/local/bin/logguestconsoles --config /usr/local/etc/logguestconsoles.conf
 3865 ?        Ss     0:00 /usr/lib/systemd/systemd-machined
25077 ?        Ss     0:00 /usr/sbin/anacron -s


If I start guests manually with "virsh start", then sometimes they hit same issue. Console log reports "Submitting logs", but no logs get submitted to beaker and tasks eventually hit external watchdog.

Version-Release number of selected component (if applicable):
0.15.3

How reproducible:
high

Steps to Reproduce:
1. Install host using RHEL-7.0-20140206.0
2. Install 2 guests using RHEL-7.0-20140206.0

Actual results:
/virt/install fails to submit logs and proceed with next task

Expected results:
All logs get submitted and harness will proceed with next task

Additional info:

Comment 3 Jan Stancek 2014-02-08 17:15:12 UTC
(In reply to Jan Stancek from comment #0)
> If I start guests manually with "virsh start", then sometimes they hit same
> issue. Console log reports "Submitting logs", but no logs get submitted to
> beaker and tasks eventually hit external watchdog.

If I log to guest via ssh when this happens, all harness processes are running (with no child processes). If I run "systemctl restart beah-beaker-backend" logs get submitted to beaker immediately and guest continues with next task.

Comment 5 Amit Saha 2014-02-09 12:09:54 UTC
A wild guess is the option: net.ipv6.conf.all.forwarding being turned off on the host.

Comment 6 Nick Coghlan 2014-02-10 00:57:00 UTC
In addition to attempting to fix this directly, we will also add the ability to opt in to using older versions of the harness (see #1063090)

Comment 7 Dan Callaghan 2014-02-12 02:02:25 UTC
(In reply to Jan Stancek from comment #4)
> It looks like guests are not getting any response from LC. For example, set
> filter to tcp.port==8000 and look at frame 130254. Both guests sent SYN to
> LC, but there appears to be no response.

I think this is actually from the logguestconsoles service created by /distribution/virt/install. I don't think it's related.

From what I can see, when the task ends beah just sits there waiting for... nothing. There are no open connections to the LC waiting for anything.

Comment 8 Dan Callaghan 2014-02-12 02:17:33 UTC
(In reply to Dan Callaghan from comment #7)

Scratch that, the problem is definitely that IPv6 connectivity goes bad on the host. I can see beah talking to the LC over IPv6 at the start of the recipe, but once /distribution/virt/install runs IPv6 packets suddenly get dropped on the floor.

It looks like the default IPv6 routes are missing from the routing table.

Restarting beah-beaker-backend works, because it notices that IPv6 connections are timing out so it falls back to IPv4.

Restarting the network service also works because it fixes up the IPv6 routing table to have default routes again.

Comment 9 Dan Callaghan 2014-02-12 06:54:51 UTC
Okay, I don't think it's just the routing entries. I noticed that "service network restart" fixes IPv6 connectivity to the LC, and if I keep packets flowing to the LC (using ping6 left running for example) it will keep working for several hours. But if I leave it alone for about 60 seconds or more, IPv6 packets to the LC suddenly start dropping on the floor again.

So I think there must be something going wrong with neighbour discovery/autoconfiguration, but I don't fully understand how that stuff works so I can't figure out what's going wrong.

We only seem to hit this problem with /distribution/virt/install, and one of the things it does is disable NetworkManager and enable the network initscript instead. So the problem might be due to some difference between those two.

Comment 14 Nick Coghlan 2014-02-14 04:43:09 UTC
Dropping this from Beaker's targets, as it appears to be failing due to a genuine issue in the kernel.

We'll mark it as CLOSED/DUPLICATE once there's an appropriate BZ to reference.

Comment 16 Nick Coghlan 2014-02-17 03:02:26 UTC

*** This bug has been marked as a duplicate of bug 1065257 ***


Note You need to log in before you can comment on or make changes to this bug.