Bug 1387798

Summary: virt-install attempt fails to reach anaconda after system has been running for a while
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED DEFERRED QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 24CC: agedosier, awilliam, berrange, clalancette, crobinso, itamar, laine, libvirt-maint, rjones, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-03 19:38:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
virt-install command and output (with -d)
none
journal output after starting the virt-install command none

Description Adam Williamson 2016-10-21 23:11:47 UTC
This is a rather strange bug that's also a rather big problem for our openQA deployment.

For our openQA tests we have some base hard disk images that are produced with virt-install. They're supposed to be re-generated every two weeks to prevent them being too old. However, when the script that produces them tries to rebuild them, it usually fails. If I reboot the openQA server box and run the script right away, though, it works.

I've finally got a bit more time to look at this today. The box has been up for a while and the script is failing, so I tried simply running a virt-install interactively...and it seems like what goes wrong is it never even manages to reach anaconda. It dies in dracut, failing to find /dev/root.

I'll attach the virt-manager debug output, and the associated system logs.

The weird thing is that this doesn't seem to be happening on the virtually-identical openQA *staging* server - it only seems to happen on the *production* server. But I can't figure out what the difference could possibly be. Both boxes are running fairly up to date Fedora 24, with libvirt 1.3.3.2-1.fc24 and virt-install-1.4.0-3.fc24 .

Comment 1 Adam Williamson 2016-10-21 23:14:48 UTC
Created attachment 1212979 [details]
virt-install command and output (with -d)

Comment 2 Adam Williamson 2016-10-21 23:21:03 UTC
Created attachment 1212983 [details]
journal output after starting the virt-install command

Comment 3 Adam Williamson 2016-10-21 23:28:59 UTC
hum, so I think it may be a network issue: I think the VM doesn't have network access, so it can't go out and download the installer from the repo, so it doesn't get there (we're doing a direct kernel boot with an `inst.repo` parameter to tell it where to get the installer from, here).

if I run `ipaddr` from the dracut prompt in the VM, I see:

dracut:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:34:43:9a brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe34:439a/64 scope link 
       valid_lft forever preferred_lft forever

and if I try running dhclient, I get:

dracut:/# dhclient ens2
dhcp: PREINIT ens2 up
dhcp: FAIL
dracut:/# 

so that seems to be the problem, but I don't see anything immediately obvious wrong with the libvirt networking config. `virsh net-list` shows:

[root@openqa01 libvirt][PROD]# virsh net-list
 Name                 State      Autostart     Persistent
----------------------------------------------------------
 default              active     yes           yes

[root@openqa01 libvirt][PROD]#

Comment 4 Adam Williamson 2016-10-21 23:47:45 UTC
Indeed, adding `--network user` to the virt-install command seems to make it fly. But I don't know why it's failing to work with the 'default' libvirt network.

Comment 5 Adam Williamson 2016-10-22 06:50:07 UTC
Note to self: rwmjones gave me https://bugzilla.redhat.com/show_bug.cgi?id=1271183#c14 as a reference for debugging this further, whenever I can get to it.

Comment 6 Cole Robinson 2017-05-03 19:08:46 UTC
Adam are you still seeing this? Is it f24 specific?

Comment 7 Adam Williamson 2017-05-03 19:37:49 UTC
I implemented the workaround I described in #c3 on the openQA boxes, so I can't tell if this is still a problem. I've got quite a lot of other stuff on my plate right now so I'm not sure I want to switch them back and wait for this to happen again...

Comment 8 Cole Robinson 2017-05-03 19:38:42 UTC
Okay let's close this then, please reopen if you ever take a stab at reproducing