Bug 2060045

Summary: Issues with networking in the latest Rawhide Vagrant (libvirt) boxes
Product: [Fedora] Fedora Reporter: Frantisek Sumsal <fsumsal>
Component: cloud-initAssignee: Dusty Mabe <dustymabe>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 36CC: adimania, admiller, apevec, awilliam, bruno, davdunc, dustymabe, eterrell, fzatlouk, gholms, kevin, lars, lrintel, rominf, shardy, s, vanmeeuwen+fedora, vpavlin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedFreezeException
Fixed In Version: cloud-init-22.1-3.fc36 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-14 21:00:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1953784    

Description Frantisek Sumsal 2022-03-02 15:16:50 UTC
Description of problem:
In our systemd CI we use the Rawhide libvirt-based Vagrant boxes[0] to test systemd/SELinux interoperability. Recently, I noticed that the boxes fail to boot completely without any apparent reason. After digging a bit deeper, it looks like NetworkManager fails to bring up network even though calling `dhclient` manually on the affected machine correctly fetches the necessary addresses.

I can reproduce this reliably on CentOS Stream 8 with Vagrant 2.2.19[1] and vagrant-libvirt 0.7.0.

[0] https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/
[1] https://releases.hashicorp.com/vagrant/2.2.19/

Version-Release number of selected component (if applicable):
# rpm -q libvirt vagrant centos-stream-release
libvirt-8.0.0-2.module_el8.6.0+1087+b42c8331.x86_64
vagrant-2.2.19-1.x86_64
centos-stream-release-8.6-1.el8.noarch

Steps to Reproduce:
# cat >Vagrantfile <<EOF
Vagrant.configure("2") do |config|
  config.vm.define :rawhide_selinux
  config.vm.box = "fedora-rawhide-cloud"
  config.vm.box_url = "https://dl.fedoraproject.org/pub/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Vagrant-Rawhide-20220302.n.0.x86_64.vagrant-libvirt.box"

  config.vm.provider :libvirt do |libvirt|
    libvirt.driver = "kvm"
    libvirt.memory = "4096"
    libvirt.cpus = "4"
    libvirt.random :model => 'random'
  end
end
EOF
# vagrant up

Actual results:
Vagrant hangs and eventually times out when trying to obtain an IP address:
```
==> rawhide_selinux:  -- RNG device model:  random
==> rawhide_selinux: Creating shared folders metadata...
==> rawhide_selinux: Starting domain.
==> rawhide_selinux: Waiting for domain to get an IP address...
```

Closer inspection:
```
(host) # virsh console vagrant-cache-8WamD_rawhide_selinux
## Login root/vagrant
[root@fedora ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:45:73:24 brd ff:ff:ff:ff:ff:ff
    altname enp0s6
    altname ens6
# journalctl -b -u NetworkManager --no-pager -o short-monotonic
[    4.941743] fedora systemd[1]: Starting NetworkManager.service - Network Manager...
[    5.009948] fedora NetworkManager[665]: <info>  [1646233546.7626] NetworkManager (version 1.36.0-1.fc37) is starting... (for the first time)
[    5.010729] fedora NetworkManager[665]: <info>  [1646233546.7643] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 00-server.conf)
[    5.049839] fedora NetworkManager[665]: <info>  [1646233546.8040] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager"
[    5.055860] fedora systemd[1]: Started NetworkManager.service - Network Manager.
[    5.058622] fedora NetworkManager[665]: <info>  [1646233546.8129] manager[0x559385e78000]: monitoring kernel firmware directory '/lib/firmware'.
[    5.149227] fedora NetworkManager[665]: <info>  [1646233546.9034] hostname: hostname: using hostnamed
[    5.149660] fedora NetworkManager[665]: <info>  [1646233546.9039] dns-mgr[0x559385e57250]: init: dns=systemd-resolved rc-manager=unmanaged (auto), plugin=systemd-resolved
[    5.151631] fedora NetworkManager[665]: <info>  [1646233546.9059] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
[    5.151734] fedora NetworkManager[665]: <info>  [1646233546.9060] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
[    5.151820] fedora NetworkManager[665]: <info>  [1646233546.9060] manager: Networking is enabled by state file
[    5.152166] fedora NetworkManager[665]: <info>  [1646233546.9064] settings: Loaded settings plugin: keyfile (internal)
[    5.153431] fedora NetworkManager[665]: <info>  [1646233546.9077] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.36.0-1.fc37/libnm-settings-plugin-ifcfg-rh.so")
[    5.154574] fedora NetworkManager[665]: <info>  [1646233546.9089] dhcp-init: Using DHCP client 'internal'
[    5.154725] fedora NetworkManager[665]: <info>  [1646233546.9089] device (lo): carrier: link connected
[    5.154921] fedora NetworkManager[665]: <info>  [1646233546.9092] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
[    5.156020] fedora NetworkManager[665]: <info>  [1646233546.9103] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
[    5.156281] fedora NetworkManager[665]: <info>  [1646233546.9106] device (eth0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
[    5.157716] fedora NetworkManager[665]: <info>  [1646233546.9120] device (eth0): carrier: link connected
[    5.159530] fedora NetworkManager[665]: <info>  [1646233546.9138] device (eth0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
[    5.160154] fedora NetworkManager[665]: <info>  [1646233546.9144] manager: startup complete
# nmcli g
STATE         CONNECTIVITY  WIFI-HW  WIFI     WWAN-HW  WWAN    
disconnected  none          enabled  enabled  enabled  enabled
# nmcli d
DEVICE  TYPE      STATE         CONNECTION 
eth0    ethernet  disconnected  --         
lo      loopback  unmanaged     --     
```

(restarting NM doesn't help)

Trying to get the IP via `dhclient`:
```
# setenforce 0
# dhclient -v
Internet Systems Consortium DHCP Client 4.4.2-P1
Copyright 2004-2021 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
Listening on LPF/eth0/52:54:00:45:73:24
Sending on   LPF/eth0/52:54:00:45:73:24
Sending on   Socket/fallback
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x55bc061c)
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 (xid=0x55bc061c)
DHCPOFFER of 192.168.121.249 from 192.168.121.1
DHCPREQUEST for 192.168.121.249 on eth0 to 255.255.255.255 port 67 (xid=0x55bc061c)
DHCPACK of 192.168.121.249 from 192.168.121.1 (xid=0x55bc061c)
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
grep: /etc/sysconfig/network-scripts/ifcfg-*: No such file or directory
bound to 192.168.121.249 -- renewal in 1746 seconds.
[root@fedora ~]# nmcli g
STATE                   CONNECTIVITY  WIFI-HW  WIFI     WWAN-HW  WWAN    
connected (local only)  limited       enabled  enabled  enabled  enabled 
[root@fedora ~]# nmcli d
DEVICE  TYPE      STATE                   CONNECTION 
eth0    ethernet  connected (externally)  eth0       
lo      loopback  unmanaged               --      
```

Expected results:
The machine should boot up with a working network. This used to work flawlessly until a couple of days ago (the last successful run was from Feb 24).

Comment 1 Dusty Mabe 2022-03-10 15:05:50 UTC
This is because the Network-config-server package is installed which lays down a configuration file that tells it not to activate any devices unless there is a specific configuration for them.

Since the vagrant box doesn't run cloud-init (which I assume will create a NM config file for a connection) there will be no connections by default.

Comment 2 Dusty Mabe 2022-03-10 15:12:25 UTC
potential fix: https://src.fedoraproject.org/rpms/cloud-init/pull-request/27

Comment 3 Fedora Blocker Bugs Application 2022-03-10 21:28:55 UTC
Proposed as a Freeze Exception for 36-beta by Fedora user dustymabe using the blocker tracking app because:

 Would be nice to have vagrant boxes working so that people can play with the 36 beta and find bugs before final.

Comment 4 Fedora Update System 2022-03-10 21:31:07 UTC
FEDORA-2022-3039cd1634 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-3039cd1634

Comment 5 FrantiĊĦek Zatloukal 2022-03-11 21:03:05 UTC
*** Bug 2062211 has been marked as a duplicate of this bug. ***

Comment 6 Dusty Mabe 2022-03-11 21:10:02 UTC
Note you can try out the Rawhide vagrant box from today's run: https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20220311.n.0/compose/Cloud/x86_64/images/

Comment 7 Adam Williamson 2022-03-12 00:09:31 UTC
+3 in https://pagure.io/fedora-qa/blocker-review/issue/655 , marking accepted.

Comment 8 Fedora Update System 2022-03-14 21:00:49 UTC
FEDORA-2022-3039cd1634 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.