Bug 1391944

Summary: Networking goes down for VMs after upgrade to 7.3
Product: Red Hat Enterprise Linux 7 Reporter: Rashid Khan <rkhan>
Component: systemdAssignee: Lukáš Nykrýn <lnykryn>
Status: CLOSED ERRATA QA Contact: Branislav Blaškovič <bblaskov>
Severity: urgent Docs Contact: Mirek Jahoda <mjahoda>
Priority: urgent    
Version: 7.3CC: aconole, akpn, atragler, bblaskov, brubisch, cww, dcbw, ddumas, dmoessne, egolov, fsumsal, gkeegan, hartsjc, hhoyer, jen, jlyle, jmaxwell, jsitnick, klaas, ldu, leiwang, liko, lnykryn, luvilla, lwang, mjenner, mleitner, mlinden, mruzicka, msekleta, mtenheuv, myamazak, myllynen, ovasik, pabeni, pdwyer, ptalbert, rkhan, sauchter, snagar, sreber, sukulkar, systemd-maint-list, tbowling, thaller, tramer, tsorense, vanhoof, yacao
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1392506 (view as bug list) Environment:
Last Closed: 2017-08-01 09:12:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1230210    
Bug Blocks: 1298243, 1392506, 1400961    
Attachments:
Description Flags
pre upgrade script with fixed lower case issue
none
reboot failed none

Description Rashid Khan 2016-11-04 12:52:54 UTC
Description of problem:
After upgrading to 7.3 networking goes down
https://bugzilla.redhat.com/show_bug.cgi?id=1230210


Version-Release number of selected component (if applicable):
7.3 GA

How reproducible:
100%

Steps to Reproduce:
1. Have long interface names inside the VM
2. Upgrade to 7.3 
3. Network down


Additional info:
In the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1230210

Comment 6 Terry Bowling 2016-11-04 14:18:53 UTC
Additional reference links from support cases on the root cause of firmware device naming:

https://communities.vmware.com/thread/478439?start=0&tstart=0
https://github.com/systemd/systemd/commit/6c1e69f9456d022f14dd00737126cfa4d9cca10c

Comment 8 Lukáš Nykrýn 2016-11-04 14:32:06 UTC
What if we suggest our customer running following script before the update.
It will go through the ifcfg files and for the broken ones, it will add the HWADDR line.
That will keep the same name even after update, because we have a udev rule, that rename the network card to DEVICE name for the device with specified HWADDR. This naming has a priority before the systemd one.

#!/bin/bash
. /etc/init.d/functions
. /etc/sysconfig/network-scripts/network-functions

pushd /etc/sysconfig/network-scripts > /dev/null

interfaces=$(ls ifcfg-* | \
        LC_ALL=C sed -e "$__sed_discard_ignored_files" \
               -e '/\(ifcfg-lo$\|:\|ifcfg-.*-range\)/d' \
               -e '{ s/^ifcfg-//g;s/[0-9]/ &/}' | \
        LC_ALL=C sort -k 1,1 -k 2n | \
        LC_ALL=C sed 's/ //')
        
    for i in $interfaces; do
        unset DEVICE HWADDR
        eval $(LANG=C grep -F "DEVICE=" ifcfg-$i)
        eval $(LANG=C grep -F "HWADDR=" ifcfg-$i)
        
        [ -n "$HWADDR" -o -z "$DEVICE" ] && continue
        [[ "$DEVICE" != eno* ]] && continue
        [ "${DEVICE#eno}" -lt "16383" ] && continue
        echo "HWADDR=$(get_hwaddr $DEVICE)"
    done

popd > /dev/null

Comment 10 Lukáš Nykrýn 2016-11-04 14:34:13 UTC
Damn, there is a missing redirection

#!/bin/bash
. /etc/init.d/functions
. /etc/sysconfig/network-scripts/network-functions

pushd /etc/sysconfig/network-scripts > /dev/null

interfaces=$(ls ifcfg-* | \
        LC_ALL=C sed -e "$__sed_discard_ignored_files" \
               -e '/\(ifcfg-lo$\|:\|ifcfg-.*-range\)/d' \
               -e '{ s/^ifcfg-//g;s/[0-9]/ &/}' | \
        LC_ALL=C sort -k 1,1 -k 2n | \
        LC_ALL=C sed 's/ //')
        
    for i in $interfaces; do
        unset DEVICE HWADDR
        eval $(LANG=C grep -F "DEVICE=" ifcfg-$i)
        eval $(LANG=C grep -F "HWADDR=" ifcfg-$i)
        
        [ -n "$HWADDR" -o -z "$DEVICE" ] && continue
        [[ "$DEVICE" != eno* ]] && continue
        [ "${DEVICE#eno}" -lt "16383" ] && continue
        echo "HWADDR=$(get_hwaddr $DEVICE)" >> ifcfg-$i
    done

popd > /dev/null

Comment 11 Thomas Haller 2016-11-04 14:56:53 UTC
the script from comment 10 is nice, I think the approach is the best in general.

But with almost the same effort, the script could write a udev-rule instead.

Advantage:

  - it works for users who don't use initscripts (or don't happen to have an
    ifcfg-file with DEVICE set), but still somehow depend on the old name.

  - the update process would not mangle the users' ifcfg-files, instead it just 
    creates one udev rule. The user can easily find out what happens, simply by 
    the presence of the particular udev rule. You can also easier get rid of the 
    upgrade-hack by deleting the file.


btw. pabeni pointed out, that 
        [[ "$DEVICE" != eno* ]] && continue
        [ "${DEVICE#eno}" -lt "16383" ] && continue
doesn't handle trailing "p<something>"

Comment 13 Lukáš Nykrýn 2016-11-04 15:10:51 UTC
(In reply to Thomas Haller from comment #11)
> the script from comment 10 is nice, I think the approach is the best in
> general.
> 
> But with almost the same effort, the script could write a udev-rule instead.

We could re-use some code from rhel6 udev for that. That was a way how did persistent device names in the past.

> btw. pabeni pointed out, that 
>         [[ "$DEVICE" != eno* ]] && continue
>         [ "${DEVICE#eno}" -lt "16383" ] && continue
> doesn't handle trailing "p<something>"

This was meant as a proof-of-concept, it definitely needs some polishing.

Comment 19 Lukáš Nykrýn 2016-11-07 09:14:17 UTC
So we put together with this script, if you run it before update it will write udev rules for the problematic devices and keep their names after the update.
https://paste.fedoraproject.org/474943/78509867/

Although I really don't like that idea, we can put this into a trigger in systemd package so this script will be run for everyone, when they update from old version.

Comment 27 Lukáš Nykrýn 2016-11-07 12:19:16 UTC
Created attachment 1217980 [details]
pre upgrade script with fixed lower case issue

Comment 42 ldu 2016-11-09 09:12:43 UTC
Created attachment 1218856 [details]
reboot failed

Comment 60 Terry Bowling 2016-11-11 14:41:22 UTC
Public Blog post explaining this issue, how we got here and resolution.

https://www.redhat.com/en/about/blog/red-hat-enterprise-linux-73-achieving-persistent-and-consistent-network-interface-naming-vmware-environments

Red Hat Knowledge Solution Article explaining the issue, root cause, resolution and workarounds

https://access.redhat.com/solutions/2592561

Comment 66 errata-xmlrpc 2017-08-01 09:12:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2297