Bug 1287726 - NIC cannot start after reboot. Manually typing systemctl start network after login can initialize network
NIC cannot start after reboot. Manually typing systemctl start network after ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: initscripts (Show other bugs)
7.2
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: initscripts Maintenance Team
Leos Pol
:
Depends On:
Blocks: 1289485 1313485 1377133
  Show dependency treegraph
 
Reported: 2015-12-02 09:31 EST by edw.ekei
Modified: 2016-11-25 08:08 EST (History)
9 users (show)

See Also:
Fixed In Version: initscripts-9.49.31-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1377133 (view as bug list)
Environment:
Last Closed: 2016-11-04 02:43:04 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
VM's dmesg after applying a change to /etc/init.d/functions, as requested (1007.44 KB, text/plain)
2015-12-09 07:58 EST, edw.ekei
no flags Details
network-functions (14.94 KB, text/plain)
2015-12-09 10:43 EST, Lukáš Nykrýn
no flags Details
dmesg after copying patched /etc/sysconfig/network-scripts/network-functions (1008.06 KB, text/plain)
2015-12-10 04:59 EST, edw.ekei
no flags Details

  None (edit)
Description edw.ekei 2015-12-02 09:31:03 EST
Actually, component should be "network" but I couldn't find it and chose "system-config-network".

Description of problem:

I have a recently updated Redhat 7.2 VM on MS Hyper-V 2012 R2, Generation 2 VM (using EUFI). The system worked fine since version 7. After the latest yum upgrade, the NIC does not initialize after reboot. After I login and type "sudo systemctl start network" the network service and the NIC are initialized as normal!

The error I get is:
Bringing up interface eth0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device has different MAC address than expected

Which is wrong, the MAC address is correct! I am quite sure about it, I have triple checked the HWADDR in ifcfg-eth0 and it is correct, same as the MAC address presented by the hypervisor to the virtual NIC. I also get the same results using "ip link" and "ls -la /sys/class/net/", in order to check that the NIC name, eth0, corresponds with the MAC address.
Actually, I suppose that if settings or NIC-MAC were wrong, "sudo systemctl start network" wouldn't work either.

I also tried to remove the HWADDR field in ifcfg-eth0 and add DEVICE. Same effect, the difference is that after reboot the error is:
/etc/sysconfig/network-scripts/ifup-eth[817]: Device does not seem to be present, delaying initialization.
Although the device eth0 can be seen using "ifconfig -a" or ip link, and will start after typing "sudo systemctl start network"

Also, I think there is no udev rule for NICs any longer in RHEL 7. In RHEL 6 I used to delete or change /etc/udev/rules.d/70-persistent-net.rules file in such cases. I tried to manually make a rules file, no effect.

Looking journalctl, it looks like the network.service and BIND try to start before any NIC is detected on the RHEL 7.2 VM. If I make a comparison to a VM still running RHEL 7.1 (same settings, Hyper-V 2012 R2 Gen 2, UEFI), NICs are detected first, then the services are initiated (which should be the proper order).


It seems to be similar to this bug from Fedora 20:
https://bugzilla.redhat.com/show_bug.cgi?id=1013247


Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.2 (Maipo)
kernel 3.10.0-327.el7


Steps to Reproduce:
1. Reboot VM -> network.service fails
2. Login, manually typing "sudo systemctl start network" -> Works (same settings, nothing changed)


Additional info:
I have removed NetworkManager from the beginning of the setup of the VM, it's a minimal installation just for DNS server, nothing irrelevant left. I'm using the ifcfg configuration file to configure the nic.
That's since I prepared the VM, it's not a change (NetworkManager removal) that I did recently.

If I boot using older kernel 3.10.0-229.20.1.el7.x86_64, the NIC goes up and running but the BIND service still binds/listens to 127.0.0.1 only and need to restart it manually after login in order to listen to all interfaces.
Comment 1 Harald Hoyer 2015-12-08 04:49:30 EST
reassigning to initscripts, which owns ifup-eth
Comment 2 Lukáš Nykrýn 2015-12-08 05:00:36 EST
Quick fix here is to delete the HWADDR and add DEVTIMEOUT=5 to your ifcfg file.
It is a known issue of hyper-v that the device will appear after the network script is run during the boot.
Can you please try that?

"Device has different MAC address than expected" seems to be misleading here, I will look at it and try to fix it.
Comment 3 edw.ekei 2015-12-09 04:40:45 EST
I have commented out HWADDR and added DEVTIMEOUT=5. Still, network interfaces are not initialized during boot. I have tried increasing DEVTIMEOUT, up to 100.

This is the error after rebooting (no HWADDR):

sudo journalctl -xe -u network

network[453]: Bringing up interface eth0: ERROR: [/etc/sysconfig/network-scripts/ifup-eth] Device  does not seem to be present, delaying initialization.
network[453]: [FAILED]

network[453]: Bringing up interface eth1: ERROR: [/etc/sysconfig/network-scripts/ifup-eth] Device  does not seem to be present, delaying initialization.
network[453]: [FAILED]


This is the first NIC configuration; the second is exactly the same, using a public IP. I have tried combinations of DEVICE, NAME, UUID, HWADDR. Most of these combinations initialize interfaces after logging in, none during boot.

/etc/sysconfig/network-scripts/ifcfg-eth0
TYPE=Ethernet
BOOTPROTO=none
IPADDR=172.16.1.57
PREFIX=24
#HWADDR=00:15:5d:31:c0:04
ONBOOT=yes
IPV6INIT=no

DEVTIMEOUT=100

DEVICE=eth0
#NAME=eth0
#UUID=84eb62ed-1727-4c2e-bc66-be65a1a56a71


Since you mention about Hyper-V issue, I'd like to comment that in RHEL 7.1 there was no such problem; using an older kernel after updating to RHEL 7.2 produced mixed results as I have mentioned above.

Also in case it helps: The VM is a generation 2, UEFI on Hyper-V 2012 R2. In RHEL 7-7.1 the GRUB loader was counting seconds extremely fast. It is a known bug and MS suggests to set the GRUB delay in thousands (10000 seconds, instead of 10) in order to be able to select a kernel or other boot option. Now in RHEL 7.2 this has been corrected, GRUB counts seconds properly. Is there any case that the two are connected? Could the correction to GRUB timing produced delayed NIC initialization?
Comment 4 edw.ekei 2015-12-09 04:48:23 EST
(In reply to Lukáš Nykrýn from comment #2)
> Quick fix here is to delete the HWADDR and add DEVTIMEOUT=5 to your ifcfg
> file.
> It is a known issue of hyper-v that the device will appear after the network
> script is run during the boot.
> Can you please try that?
> 
> "Device has different MAC address than expected" seems to be misleading
> here, I will look at it and try to fix it.

I have commented out HWADDR and added DEVTIMEOUT=5. Still, network interfaces are not initialized during boot. I have tried increasing DEVTIMEOUT, up to 100.

Sorry for not pressing "Reply", I not very keen on the use of fora. You can see my hole message as a new comment, thank you.
Comment 5 Lukáš Nykrýn 2015-12-09 06:47:44 EST
Can you add 

exec 30>/dev/kmsg
BASH_XTRACEFD=30
set -x

to the end of /etc/init.d/functions, reboot the machine and send me output of dmesg?
Comment 6 edw.ekei 2015-12-09 07:58 EST
Created attachment 1103875 [details]
VM's dmesg after applying a change to /etc/init.d/functions, as requested
Comment 7 edw.ekei 2015-12-09 07:59:14 EST
(In reply to Lukáš Nykrýn from comment #5)
> Can you add 
> 
> exec 30>/dev/kmsg
> BASH_XTRACEFD=30
> set -x
> 
> to the end of /etc/init.d/functions, reboot the machine and send me output
> of dmesg?

Ok, I have uploaded the dmesg as a text file attachment.
Comment 8 edw.ekei 2015-12-09 08:01:09 EST
> Ok, I have uploaded the dmesg as a text file attachment.

I have removed the public IPs, wherever you see PublicIP_replaced_x.x.x in the file is a public IP or gateway etc setting.
Comment 9 Lukáš Nykrýn 2015-12-09 08:51:31 EST
So the fix should be easy, we just need to backport 
https://git.fedorahosted.org/cgit/initscripts.git/commit/?id=1f230a3d2e2733e30577c91645005801ab2c0f40
to rhel.
Comment 10 edw.ekei 2015-12-09 09:57:44 EST
(In reply to Lukáš Nykrýn from comment #9)
> So the fix should be easy, we just need to backport 
> https://git.fedorahosted.org/cgit/initscripts.git/commit/
> ?id=1f230a3d2e2733e30577c91645005801ab2c0f40
> to rhel.

Ok, should I try to copy the /etc/sysconfig/network-scripts/network-functions file from initscripts-1f230a3d2e2733e30577c91645005801ab2c0f40.zip to my server to test it?
Comment 11 Lukáš Nykrýn 2015-12-09 10:43 EST
Created attachment 1103964 [details]
network-functions

I have attached the patched /etc/sysconfig/network-scripts/network-functions, so you can try to replace just that one file.
Comment 12 Lukáš Nykrýn 2015-12-09 15:42:43 EST
And also please keep the DEVTIMEOUT in your ifcfg file.
Comment 13 edw.ekei 2015-12-10 03:27:14 EST
I replaced the /etc/sysconfig/network-scripts/network-functions file, DEVTIMEOUT=15 (tried other values, less that 15, NICs don't start, more that 15, no difference than 15). Odd behaviour:
After login, first execution of 'systemctl status network' command shows only loopback. Some seconds later, re-issuing the command, I can see only eth0, about a minute later I can see eth1 also! Services cannot initialize properly because of this delay. Is it relevant to DEVTIMEOUT, that NICs don't start concurrently?

I'll try to update Hyper-V host also. Unfortunately it'll need reboot and since it hosts other VMs also it's not easy to proceed.
Comment 14 Lukáš Nykrýn 2015-12-10 03:31:11 EST
lets do one more round of 

exec 30>/dev/kmsg
BASH_XTRACEFD=30
set -x

Can you send me again the output of dmesg after you do all of those things?
Comment 15 edw.ekei 2015-12-10 04:59 EST
Created attachment 1104289 [details]
dmesg after copying patched /etc/sysconfig/network-scripts/network-functions
Comment 16 edw.ekei 2015-12-10 05:25:10 EST
I uploaded the dmesg you asked for. It's from a test server.
I decided to try it on my production server too; The NICs initialize. sshd, which binds to eth0 only, starts properly. named which should bind to both eth0 and eth1, binds only to localhost. I can send you dmesg from the production server also if you like.
Thank you.
Comment 17 Lukáš Nykrýn 2015-12-14 08:31:51 EST
Hm from that log, it looks like it should work. 

[   22.912864] + ip link set dev eth0 up

Even it is weird that it take 13 seconds for the device to appear. 

Maybe can you try to increase udev.children-max= 
http://www.freedesktop.org/software/systemd/man/systemd-udevd.service.html
Comment 18 Lukáš Nykrýn 2015-12-14 08:43:02 EST
Or maybe can you try one unsupported solution? Since 7.2 there should be systemd-networkd in optional repository. Could you try it instead of network-scripts?

Basic configuration is really easy, you just need something like:

/etc/systemd/network/80-dhcp.network:

[Match]
Name=eth*

[Network]
DHCP=yes
Comment 19 edw.ekei 2015-12-30 08:00:09 EST
I sorted out some things and resolved the problem using a workabout.

First of all, in my less-than-minimal setup I have dhcp-common removed for static-IP-only VMs. As a dependency, dracut-network is also removed. I re-installed them, dracut -f, and the network interfaces could start without DEVTIMEOUT=15 (needed before).

After this change, my test machine's sshd and named could also start properly. On the contrary, my production machine's sshd and named did not bind properly (only to localhost) although both NICs where up and running. I'm afraid due to other alterations.
So as a workabout to my production machine, I made this file+directory:

For named:
/lib/systemd/system/named.service.d/Just_A_name.conf

For sshd:
/lib/systemd/system/sshd.service.d/Just_A_name.conf

Same content:
[Unit]
After=network-online.target
Requires=network-online.target

And problem solved, sshd and named start after network. It's a very peculiar scenario (UEFI, dracut-network removed), you can close the bug if you like.

Thank you.
Comment 23 Robert Scheck 2016-08-31 12:46:53 EDT
We experienced the same issue today, found bug #1180837 (which is actually
exactly the same, just for a Fedora) and the patch mentioned in comment #9
solves the issue for us perfectly.

I now cross-filed case #01694963 on the Red Hat customer portal to speed up 
things, as we need the patch ASAP in conjunction with all RHEL 7.x VMs under 
Microsoft Hyper-V (when not running more fancy things like a NetworkManager
on a server *sigh*).
Comment 24 Lukáš Nykrýn 2016-09-01 04:09:53 EDT
I am not sure what else I can tell you to that. This is scheduled for 7.3 now.
https://git.fedorahosted.org/cgit/initscripts.git/commit/?h=rhel7-branch&id=da83c4e174991b2dedf6ce7f8c490f2d1fbc1d57

If you need this to be fixed in z-stream you need to ask through customer portal.
Comment 25 Leos Pol 2016-09-26 06:15:48 EDT
Verified by bz1339648.
Comment 27 errata-xmlrpc 2016-11-04 02:43:04 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2456.html

Note You need to log in before you can comment on or make changes to this bug.