1558027 – The network.service failed LSB in Fedora Cloud.

Bug 1558027 - The network.service failed LSB in Fedora Cloud.

Summary: The network.service failed LSB in Fedora Cloud.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	distribution
Sub Component:
Version:	28
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Václav Pavlín
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	AcceptedBlocker
Depends On:
Blocks:	F28FinalBlocker
TreeView+	depends on / blocked

Reported:	2018-03-19 13:12 UTC by Lukas Ruzicka
Modified:	2018-04-02 16:43 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-04-02 16:43:26 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
The output of journalctl -aeb (76.89 KB, text/plain) 2018-03-19 13:13 UTC, Lukas Ruzicka	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1559629	0	unspecified	CLOSED	persistent interface names are wrong for some interfaces	2021-02-22 00:41:40 UTC

Internal Links: 1559629

Description Lukas Ruzicka 2018-03-19 13:12:50 UTC

Description of problem:

When performing the following test[1] with the [2] environment, I entered the required command to list all failed services and although there should be none,
I got the following service listed:

[root@testcloud ~]# systemctl --all --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● network.service loaded failed failed LSB: Bring up/down networking

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.

1 loaded units listed.
To show all installed unit files use 'systemctl list-unit-files'.

Version-Release number of selected component (if applicable):

initscripts-9.79-3.fc28.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Start the testcloud using the image.
testcloud instance create test -u location
2. Ssh into the instance
3. Run systemctl --all --failed

Actual results:

UNIT LOAD ACTIVE SUB DESCRIPTION
● network.service loaded failed failed LSB: Bring up/down networking

Expected results:

No services should be listed with "--failed" option.

Additional info:

[1] https://fedoraproject.org/wiki/QA:Testcase_base_services_start?rd=QA:Testcase_Services_start
[2] https://kojipkgs.fedoraproject.org/compose/branched/Fedora-28-20180314.n.2/compose/Cloud/x86_64/images/Fedora-Cloud-Base-28-20180314.n.2.x86_64.qcow2

Comment 1 Lukas Ruzicka 2018-03-19 13:13:31 UTC

Created attachment 1409863 [details]
The output of journalctl -aeb

Comment 2 Fedora Blocker Bugs Application 2018-03-19 13:15:52 UTC

Proposed as a Blocker for 28-beta by Fedora user lruzicka using the blocker tracking app because:

 All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present.

Comment 3 Lukas Ruzicka 2018-03-19 13:17:11 UTC

Complement:

Networking works in the above mentioned virtual machine.

Comment 4 Fedora Blocker Bugs Application 2018-03-19 13:34:41 UTC

Proposed as a Blocker for 28-final by Fedora user lruzicka using the blocker tracking app because:

 All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present.

Comment 5 Lukáš Nykrýn 2018-03-19 14:37:11 UTC

Mar 19 12:54:32 testcloud network[461]: Bringing up interface enp0s3:  ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Device enp0s3 does not seem to be present, delaying initialization.

This is the issue. The question is why is this happening, do you have such network device on that machine?


Also, this looks like a bug:
Mar 19 12:54:32 testcloud network[461]: Determining IP information for eth0.../usr/sbin/dhclient-script: line 220: run-parts: command not found

Comment 6 Lukas Ruzicka 2018-03-19 18:18:45 UTC

I have this device on my host computer (laptop). I tested the setting in a virtual machine.

Comment 7 Lukáš Nykrýn 2018-03-19 18:43:51 UTC

I have tried the image and it is messed up. It contains a ifcfg-files that should not be there. I am not sure what is the correct component here, so putting to distribution.

Comment 8 Adam Williamson 2018-03-19 21:35:48 UTC

I kinda suspect this is all tied up with this bit of the kickstart:

https://pagure.io/fedora-kickstarts/blob/master/f/fedora-cloud-base.ks#_173-185

basically, in Cloud images, we try to disable all 'predictable interface naming' so that the network interface will be called ifcfg-eth0, and write a config file for it.

One obvious way in which this appears to be broken at present is: 80-net-setup-link.rules is in /usr/lib/udev/rules.d/ , not /etc/udev/rules.d/ . That seems to have been the case in F27 too, though:

https://koji.fedoraproject.org/koji/rpminfo?rpmID=11453132

so not sure if that entirely explains things. Still, would be worth checking if the interface shows up as 'eth0' in an F27 image, and if fixing the above bit of the kickstart helps...

Comment 9 Adam Williamson 2018-03-19 21:36:22 UTC

Note, the kickstart stuff isn't new. It's been there since 2013/2014.

Comment 10 Adam Williamson 2018-03-19 21:37:59 UTC

Discussed at 2018-03-19 Fedora 28 blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-03-19/f28-blocker-review.2018-03-19-16.02.html . We decided to delay decision on blocker status as we'd like more information on what's going on here, particularly if the service also shows up as failed in our actual supported cloud environments (EC2 and Openstack, I believe).

Comment 11 Patrick Uiterwijk 2018-03-19 23:04:43 UTC

The same happens on OpenStack.
Removing /etc/sysconfig/network-scripts/ifcfg-enp0s3 (the inteface is named "eth0") fixes the service failure.

Comment 12 Adam Williamson 2018-03-19 23:14:16 UTC

Ah, so disabling persistent naming works, the problem (one of the problems?) is we're getting this config file for the 'persistent' name included in the image too.

I note this, from the oz log of an F28 cloud image compose - https://koji.fedoraproject.org/koji/taskinfo?taskID=25814011 :

12:24:02,761 DEBUG anaconda:anaconda: localization: setting locale to: en_US.UTF-8
12:24:02,774 DEBUG anaconda:anaconda: network: devices found ['enp0s3']
12:24:02,774 DEBUG anaconda:ifcfg: content of files (network initialization):
12:24:02,775 DEBUG anaconda:ifcfg: /etc/sysconfig/network-scripts/ifcfg-enp0s3:
12:24:02,775 DEBUG anaconda:ifcfg:   # Generated by dracut initrd
12:24:02,775 DEBUG anaconda:ifcfg:   NAME="enp0s3"
12:24:02,776 DEBUG anaconda:ifcfg:   DEVICE="enp0s3"
12:24:02,776 DEBUG anaconda:ifcfg:   ONBOOT=yes
12:24:02,776 DEBUG anaconda:ifcfg:   NETBOOT=yes
12:24:02,776 DEBUG anaconda:ifcfg:   UUID="2c699325-5697-4ed3-83ad-f218b7900b70"
12:24:02,776 DEBUG anaconda:ifcfg:   IPV6INIT=yes
12:24:02,776 DEBUG anaconda:ifcfg:   BOOTPROTO=dhcp
12:24:02,777 DEBUG anaconda:ifcfg:   TYPE=Ethernet
12:24:02,785 DEBUG anaconda:ifcfg: all settings: [{'802-3-ethernet': {'auto-negotiate': False, 'mac-address-blacklist': [], 's390-options': {}}, 'connection': {'id': 'enp0s3', 'uuid': '2c699325-5697-4ed3-83ad-f218b7900b70', 'interface-name': 'enp0s3', 'type': '802-3-ethernet', 'permissions': [], 'timestamp': 1521462234}, 'ipv6': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'addr-gen-mode': 0, 'address-data': [], 'route-data': []}, 'ipv4': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'address-data': [], 'route-data': []}, 'proxy': {}}]
12:24:02,785 DEBUG anaconda:anaconda: network: ensure single initramfs connections
12:24:02,837 DEBUG anaconda:anaconda: network: apply kickstart
12:24:02,919 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3
12:24:02,920 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3
12:24:02,920 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3
12:24:02,921 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3
12:24:02,921 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-enp0s3
12:24:02,922 DEBUG anaconda:anaconda: network: pre kickstart - updating settings of device enp0s3
12:24:02,982 DEBUG NetworkManager:<debug> [1521462242.9826] create NMAuditManager singleton (0x7fd84c00f560)
12:24:02,983 DEBUG NetworkManager:<debug> [1521462242.9839] ifcfg-rh: write: connection enp0s3 (2c699325-5697-4ed3-83ad-f218b7900b70) was modified by persisting it to "/etc/sysconfig/network-scripts/ifcfg-enp0s3" 
12:24:02,984 DEBUG NetworkManager:<debug> [1521462242.9843] ++ connection 'update-settings' (0x561ec71a1d40/NMSimpleConnection/"802-3-ethernet" < 0x7fd8480085f0/NMIfcfgConnection/"802-3-ethernet"):
12:24:02,984 DEBUG NetworkManager:<debug> [1521462242.9843] ++ connection                [ 0x561ec7206d30 < 0x561ec717f240 ]
12:24:02,984 DEBUG NetworkManager:<debug> [1521462242.9843] ++ connection.autoconnect    = FALSE
12:24:02,985 INFO NetworkManager:<info>  [1521462242.9851] settings-connection[0x7fd8480085f0,2c699325-5697-4ed3-83ad-f218b7900b70]: write: successfully updated (ifcfg-rh: update /etc/sysconfig/network-scripts/ifcfg-enp0s3), connection was modified in the process

looks like anaconda writes the file during install.

Ah! I've got it. Check out this, further down the kickstart file:

https://pagure.io/fedora-kickstarts/blob/master/f/fedora-cloud-base.ks#_248-249

# For trac ticket https://fedorahosted.org/cloud/ticket/128
rm -f /etc/sysconfig/network-scripts/ifcfg-ens3

and indeed, if you check an f27 cloud image compose, like https://koji.fedoraproject.org/koji/taskinfo?taskID=25806260 , its oz.log shows ifcfg-ens3 being the file created:

07:16:49,702 DEBUG anaconda:ifcfg: content of files (network initialization):
07:16:49,702 DEBUG anaconda:ifcfg: /etc/sysconfig/network-scripts/ifcfg-ens3:
07:16:49,702 DEBUG anaconda:ifcfg:   # Generated by dracut initrd
07:16:49,703 DEBUG anaconda:ifcfg:   NAME="ens3"
07:16:49,703 DEBUG anaconda:ifcfg:   DEVICE="ens3"
07:16:49,703 DEBUG anaconda:ifcfg:   ONBOOT=yes
07:16:49,703 DEBUG anaconda:ifcfg:   NETBOOT=yes
07:16:49,703 DEBUG anaconda:ifcfg:   UUID="dbbd7e53-2f41-471a-92dc-a4f6bb027ec8"
07:16:49,703 DEBUG anaconda:ifcfg:   IPV6INIT=yes
07:16:49,704 DEBUG anaconda:ifcfg:   BOOTPROTO=dhcp
07:16:49,704 DEBUG anaconda:ifcfg:   TYPE=Ethernet
07:16:49,711 DEBUG anaconda:ifcfg: all settings: [{'802-3-ethernet': {'auto-negotiate': False, 'mac-address-blacklist': [], 's390-options': {}}, 'connection': {'id': 'ens3', 'uuid': 'dbbd7e53-2f41-471a-92dc-a4f6bb027ec8', 'interface-name': 'ens3', 'type': '802-3-ethernet', 'permissions': [], 'timestamp': 1521443804}, 'ipv6': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'addr-gen-mode': 0, 'address-data': [], 'route-data': []}, 'ipv4': {'method': 'auto', 'dns': [], 'dns-search': [], 'addresses': [], 'routes': [], 'address-data': [], 'route-data': []}, 'proxy': {}}]
07:16:49,711 DEBUG anaconda:anaconda: network: ensure single initramfs connections
07:16:49,756 DEBUG anaconda:anaconda: network: apply kickstart
07:16:49,843 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3
07:16:49,844 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3
07:16:49,845 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3
07:16:49,846 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3
07:16:49,847 DEBUG anaconda:ifcfg: IfcfFile.read /etc/sysconfig/network-scripts/ifcfg-ens3
07:16:49,848 DEBUG anaconda:anaconda: network: pre kickstart - updating settings of device ens3

and indeed https://fedorahosted.org/cloud/ticket/128 talks about precisely this same issue. So what happened here, basically, is we've known about this and worked around it in the kickstart for two years, but it showed up again because the 'persistent' device name changed from ens3 to enp0s3 . I think this is something systemd 238 caused, ISTR reading a bug about it somewhere.

Really, cloud image composes should pass 'biosdevname=0 net.ifnames=0' to anaconda so the interface will be eth0 so far as *anaconda* is concerned too. That would obviate the need for workarounds in the kickstart.

Comment 13 Zbigniew Jędrzejewski-Szmek 2018-03-20 07:57:09 UTC

The inadvertent interface name change was fixed in https://github.com/systemd/systemd/commit/8eebb6a9e5. Actually it's something I'd like a FE exception for independently of this bug. I'll fire off a build of systemd with the patch.

Longer term, biosdevname=0 net.ifnames=0 would be also a nice thing to do.

Comment 14 Adam Williamson 2018-03-20 17:42:17 UTC

Zbigniew: is there a bug specifically for the inadvertent name change issue? If so we can propose it as an FE, if not, we need to file one.

I poked a bit into seeing if we can clean this up by making it possible to run the initial image creation with `net.ifnames=0`; after drilling down a ways, I'm at https://github.com/clalancette/oz/pull/254 . If that gets applied, we'd then need to tweak Koji in some way to produce an appropriate TDL for Fedora cloud image builds.

Comment 15 Adam Williamson 2018-03-20 18:42:26 UTC

Update, I filed https://pagure.io/releng/issue/7400 with some general thoughts / questions / suggestions on how we deal with network interface naming for disk image builds.

Comment 16 Adam Williamson 2018-03-20 18:43:50 UTC

Kevin's PR to adjust how fedora-cloud-base.ks does this has been merged to f28 branch of fedora-kickstarts, so setting this to ON_QA. Note, it should work with both the 'correct' and 'incorrect' systemd, I think, as it just does `rm -f /etc/sysconfig/network-scripts/ifcfg-en*`. So I think we're good there.

Comment 17 Pavel Zhukov 2018-03-21 16:22:37 UTC

(In reply to Lukáš Nykrýn from comment #5)

> 
> Also, this looks like a bug:
> Mar 19 12:54:32 testcloud network[461]: Determining IP information for
> eth0.../usr/sbin/dhclient-script: line 220: run-parts: command not found

reported and should be fixed in rawhide https://bugzilla.redhat.com/show_bug.cgi?id=1558612

Comment 18 Fedora Update System 2018-03-26 17:13:14 UTC

systemd-238-5.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-037a145d4d

Comment 19 Geoffrey Marr 2018-03-26 19:02:56 UTC

Discussed during the 2018-03-26 blocker review meeting: [1]

The decision to classify this bug as an AcceptedBlocker was made as it violates the following blocker criteria:

"All system services present after installation with one of the release-blocking package sets must start properly, unless they require hardware which is not present", for the Cloud base image (which is release-blocking)

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2018-03-26/f28-blocker-review.2018-03-26-16.01.txt

Comment 20 Adam Williamson 2018-04-02 16:43:26 UTC

This is fixed (both in fedora-kickstarts and systemd) now. We have a releng ticket to track long-term improvements to how this is done:

https://pagure.io/releng/issue/7400

Note You need to log in before you can comment on or make changes to this bug.