Hide Forgot
Description of problem: When performing the following test[1] with the [2] environment, I entered the required command to list all failed services and although there should be none, I got the following service listed: [root@testcloud ~]# systemctl --all --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● network.service loaded failed failed LSB: Bring up/down networking LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 1 loaded units listed. To show all installed unit files use 'systemctl list-unit-files'. Version-Release number of selected component (if applicable): initscripts-9.79-3.fc28.x86_64 How reproducible: Always Steps to Reproduce: 1. Start the testcloud using the image. testcloud instance create test -u location 2. Ssh into the instance 3. Run systemctl --all --failed Actual results: UNIT LOAD ACTIVE SUB DESCRIPTION ● network.service loaded failed failed LSB: Bring up/down networking Expected results: No services should be listed with "--failed" option. Additional info: [1] https://fedoraproject.org/wiki/QA:Testcase_base_services_start?rd=QA:Testcase_Services_start [2] https://kojipkgs.fedoraproject.org/compose/branched/Fedora-28-20180314.n.2/compose/Cloud/x86_64/images/Fedora-Cloud-Base-28-20180314.n.2.x86_64.qcow2
Created attachment 1409863 [details] The output of journalctl -aeb
Proposed as a Blocker for 28-beta by Fedora user lruzicka using the blocker tracking app because: All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present.
Complement: Networking works in the above mentioned virtual machine.
Proposed as a Blocker for 28-final by Fedora user lruzicka using the blocker tracking app because: All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present.
Mar 19 12:54:32 testcloud network[461]: Bringing up interface enp0s3: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device enp0s3 does not seem to be present, delaying initialization. This is the issue. The question is why is this happening, do you have such network device on that machine? Also, this looks like a bug: Mar 19 12:54:32 testcloud network[461]: Determining IP information for eth0.../usr/sbin/dhclient-script: line 220: run-parts: command not found
I have this device on my host computer (laptop). I tested the setting in a virtual machine.
I have tried the image and it is messed up. It contains a ifcfg-files that should not be there. I am not sure what is the correct component here, so putting to distribution.
I kinda suspect this is all tied up with this bit of the kickstart: https://pagure.io/fedora-kickstarts/blob/master/f/fedora-cloud-base.ks#_173-185 basically, in Cloud images, we try to disable all 'predictable interface naming' so that the network interface will be called ifcfg-eth0, and write a config file for it. One obvious way in which this appears to be broken at present is: 80-net-setup-link.rules is in /usr/lib/udev/rules.d/ , not /etc/udev/rules.d/ . That seems to have been the case in F27 too, though: https://koji.fedoraproject.org/koji/rpminfo?rpmID=11453132 so not sure if that entirely explains things. Still, would be worth checking if the interface shows up as 'eth0' in an F27 image, and if fixing the above bit of the kickstart helps...
Note, the kickstart stuff isn't new. It's been there since 2013/2014.
Discussed at 2018-03-19 Fedora 28 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-03-19/f28-blocker-review.2018-03-19-16.02.html . We decided to delay decision on blocker status as we'd like more information on what's going on here, particularly if the service also shows up as failed in our actual supported cloud environments (EC2 and Openstack, I believe).
The same happens on OpenStack. Removing /etc/sysconfig/network-scripts/ifcfg-enp0s3 (the inteface is named "eth0") fixes the service failure.
Ah, so disabling persistent naming works, the problem (one of the problems?) is we're getting this config file for the 'persistent' name included in the image too. I note this, from the oz log of an F28 cloud image compose - https://koji.fedoraproject.org/koji/taskinfo?taskID=25814011 : 12:24:02,761 DEBUG anaconda:anaconda: localization: setting locale to: en_US.UTF-8 12:24:02,774 DEBUG anaconda:anaconda: network: devices found ['enp0s3'] 12:24:02,774 DEBUG anaconda:ifcfg: content of files (network initialization): 12:24:02,775 DEBUG anaconda:ifcfg: /etc/sysconfig/network-scripts/ifcfg-enp0s3: 12:24:02,775 DEBUG anaconda:ifcfg: # Generated by dracut initrd 12:24:02,775 DEBUG anaconda:ifcfg: NAME="enp0s3" 12:24:02,776 DEBUG anaconda:ifcfg: DEVICE="enp0s3" 12:24:02,776 DEBUG anaconda:ifcfg: ONBOOT=yes 12:24:02,776 DEBUG anaconda:ifcfg: NETBOOT=yes 12:24:02,776 DEBUG anaconda:ifcfg: UUID="2c699325-5697-4ed3-83ad-f218b7900b70" 12:24:02,776 DEBUG anaconda:ifcfg: IPV6INIT=yes 12:24:02,776 DEBUG anaconda:ifcfg: BOOTPROTO=dhcp 12:24:02,777 DEBUG anaconda:ifcfg: TYPE=Ethernet 12:24:02,785 DEBUG anaconda:ifcfg: all settings: [{'802-3-ethernet': {'auto-negotiate': False, 'mac-address-blacklist': [], 's390-options': {}}, 'connection': {'id': 'enp0s3', 'uuid': '2c699325-5697-4ed3-83ad-f218b7900b70', 'interface-name': 'enp0s3', 'type': '802-3-ethernet', 'permissions': [], 'timestamp': 1521462234}, 'ipv6': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'addr-gen-mode': 0, 'address-data': [], 'route-data': []}, 'ipv4': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'address-data': [], 'route-data': []}, 'proxy': {}}] 12:24:02,785 DEBUG anaconda:anaconda: network: ensure single initramfs connections 12:24:02,837 DEBUG anaconda:anaconda: network: apply kickstart 12:24:02,919 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3 12:24:02,920 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3 12:24:02,920 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3 12:24:02,921 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3 12:24:02,921 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3 12:24:02,922 DEBUG anaconda:anaconda: network: pre kickstart - updating settings of device enp0s3 12:24:02,982 DEBUG NetworkManager:<debug> [1521462242.9826] create NMAuditManager singleton (0x7fd84c00f560) 12:24:02,983 DEBUG NetworkManager:<debug> [1521462242.9839] ifcfg-rh: write: connection enp0s3 (2c699325-5697-4ed3-83ad-f218b7900b70) was modified by persisting it to "/etc/sysconfig/network-scripts/ifcfg-enp0s3" 12:24:02,984 DEBUG NetworkManager:<debug> [1521462242.9843] ++ connection 'update-settings' (0x561ec71a1d40/NMSimpleConnection/"802-3-ethernet" < 0x7fd8480085f0/NMIfcfgConnection/"802-3-ethernet"): 12:24:02,984 DEBUG NetworkManager:<debug> [1521462242.9843] ++ connection [ 0x561ec7206d30 < 0x561ec717f240 ] 12:24:02,984 DEBUG NetworkManager:<debug> [1521462242.9843] ++ connection.autoconnect = FALSE 12:24:02,985 INFO NetworkManager:<info> [1521462242.9851] settings-connection[0x7fd8480085f0,2c699325-5697-4ed3-83ad-f218b7900b70]: write: successfully updated (ifcfg-rh: update /etc/sysconfig/network-scripts/ifcfg-enp0s3), connection was modified in the process looks like anaconda writes the file during install. Ah! I've got it. Check out this, further down the kickstart file: https://pagure.io/fedora-kickstarts/blob/master/f/fedora-cloud-base.ks#_248-249 # For trac ticket https://fedorahosted.org/cloud/ticket/128 rm -f /etc/sysconfig/network-scripts/ifcfg-ens3 and indeed, if you check an f27 cloud image compose, like https://koji.fedoraproject.org/koji/taskinfo?taskID=25806260 , its oz.log shows ifcfg-ens3 being the file created: 07:16:49,702 DEBUG anaconda:ifcfg: content of files (network initialization): 07:16:49,702 DEBUG anaconda:ifcfg: /etc/sysconfig/network-scripts/ifcfg-ens3: 07:16:49,702 DEBUG anaconda:ifcfg: # Generated by dracut initrd 07:16:49,703 DEBUG anaconda:ifcfg: NAME="ens3" 07:16:49,703 DEBUG anaconda:ifcfg: DEVICE="ens3" 07:16:49,703 DEBUG anaconda:ifcfg: ONBOOT=yes 07:16:49,703 DEBUG anaconda:ifcfg: NETBOOT=yes 07:16:49,703 DEBUG anaconda:ifcfg: UUID="dbbd7e53-2f41-471a-92dc-a4f6bb027ec8" 07:16:49,703 DEBUG anaconda:ifcfg: IPV6INIT=yes 07:16:49,704 DEBUG anaconda:ifcfg: BOOTPROTO=dhcp 07:16:49,704 DEBUG anaconda:ifcfg: TYPE=Ethernet 07:16:49,711 DEBUG anaconda:ifcfg: all settings: [{'802-3-ethernet': {'auto-negotiate': False, 'mac-address-blacklist': [], 's390-options': {}}, 'connection': {'id': 'ens3', 'uuid': 'dbbd7e53-2f41-471a-92dc-a4f6bb027ec8', 'interface-name': 'ens3', 'type': '802-3-ethernet', 'permissions': [], 'timestamp': 1521443804}, 'ipv6': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'addr-gen-mode': 0, 'address-data': [], 'route-data': []}, 'ipv4': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'address-data': [], 'route-data': []}, 'proxy': {}}] 07:16:49,711 DEBUG anaconda:anaconda: network: ensure single initramfs connections 07:16:49,756 DEBUG anaconda:anaconda: network: apply kickstart 07:16:49,843 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3 07:16:49,844 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3 07:16:49,845 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3 07:16:49,846 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3 07:16:49,847 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3 07:16:49,848 DEBUG anaconda:anaconda: network: pre kickstart - updating settings of device ens3 and indeed https://fedorahosted.org/cloud/ticket/128 talks about precisely this same issue. So what happened here, basically, is we've known about this and worked around it in the kickstart for two years, but it showed up again because the 'persistent' device name changed from ens3 to enp0s3 . I think this is something systemd 238 caused, ISTR reading a bug about it somewhere. Really, cloud image composes should pass 'biosdevname=0 net.ifnames=0' to anaconda so the interface will be eth0 so far as *anaconda* is concerned too. That would obviate the need for workarounds in the kickstart.
The inadvertent interface name change was fixed in https://github.com/systemd/systemd/commit/8eebb6a9e5. Actually it's something I'd like a FE exception for independently of this bug. I'll fire off a build of systemd with the patch. Longer term, biosdevname=0 net.ifnames=0 would be also a nice thing to do.
Zbigniew: is there a bug specifically for the inadvertent name change issue? If so we can propose it as an FE, if not, we need to file one. I poked a bit into seeing if we can clean this up by making it possible to run the initial image creation with `net.ifnames=0`; after drilling down a ways, I'm at https://github.com/clalancette/oz/pull/254 . If that gets applied, we'd then need to tweak Koji in some way to produce an appropriate TDL for Fedora cloud image builds.
Update, I filed https://pagure.io/releng/issue/7400 with some general thoughts / questions / suggestions on how we deal with network interface naming for disk image builds.
Kevin's PR to adjust how fedora-cloud-base.ks does this has been merged to f28 branch of fedora-kickstarts, so setting this to ON_QA. Note, it should work with both the 'correct' and 'incorrect' systemd, I think, as it just does `rm -f /etc/sysconfig/network-scripts/ifcfg-en*`. So I think we're good there.
(In reply to Lukáš Nykrýn from comment #5) > > Also, this looks like a bug: > Mar 19 12:54:32 testcloud network[461]: Determining IP information for > eth0.../usr/sbin/dhclient-script: line 220: run-parts: command not found reported and should be fixed in rawhide https://bugzilla.redhat.com/show_bug.cgi?id=1558612
systemd-238-5.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-037a145d4d
Discussed during the 2018-03-26 blocker review meeting: [1] The decision to classify this bug as an AcceptedBlocker was made as it violates the following blocker criteria: "All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present", for the Cloud base image (which is release-blocking) [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2018-03-26/f28-blocker-review.2018-03-26-16.01.txt
This is fixed (both in fedora-kickstarts and systemd) now. We have a releng ticket to track long-term improvements to how this is done: https://pagure.io/releng/issue/7400