Bug 597206 - NetworkManager doesn't write correct config file for network interface during installation
NetworkManager doesn't write correct config file for network interface during...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: NetworkManager (Show other bugs)
6.0
s390x Linux
high Severity high
: rc
: ---
Assigned To: Dan Williams
desktop-bugs@redhat.com
:
Depends On: 595382 595388 597205 597625
Blocks: 582286
  Show dependency treegraph
 
Reported: 2010-05-28 07:33 EDT by Jan Stodola
Modified: 2010-09-08 04:31 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-06-16 16:19:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
good config file (306 bytes, text/plain)
2010-05-28 07:36 EDT, Jan Stodola
no flags Details
bad config file (341 bytes, text/plain)
2010-05-28 07:38 EDT, Jan Stodola
no flags Details
/var/log/messages (21.64 KB, text/plain)
2010-05-31 05:18 EDT, Jan Stodola
no flags Details
DEVPATH=/sys/bus/ccw/devices/0.0.0900 SUBSYSTEM=ccw bash -x /lib/udev/ccw_init (1.93 KB, text/plain)
2010-05-31 05:24 EDT, Jan Stodola
no flags Details
ifcfg-eth0, iSCSI disk (340 bytes, text/plain)
2010-06-01 06:03 EDT, Jan Stodola
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 64481 None None None Never

  None (edit)
Description Jan Stodola 2010-05-28 07:33:52 EDT
Description of problem:
When running graphical installation on s390x and click on "Configure Network" button, NetworkManager writes new configuration file for network interface (ifcfg-eth0). Using this config file, system doesn't start network interface after reboot:

...
ip6tables: Applying firewall rules: Ý  OK  ¨   
iptables: Applying firewall rules: Ý  OK  ¨   
Bringing up loopback interface:  Ý  OK  ¨   
Bringing up interface eth0:  Device eth0 does not seem to be present, delaying i
nitialization.  
ÝFAILED¨   
Starting auditd: Ý  OK  ¨   
Starting system logger: Ý  OK  ¨   
...

When user doesn't click on the "Configure Network" button during the installation, network interface brings up without errors. I will attach both good and bad ifcfg-eth0 to compare.


Version-Release number of selected component (if applicable):
RHEL6.0-20100527.2
anaconda-13.21.48-1.el6
NetworkManager-0.8.1-0.3.el6

How reproducible:
always

Steps to Reproduce:
1. start graphical installation on s390x
2. go through the installation and click on "Configure Network" button
3. do not make any changes and click on "Close" button
4. finish the installation and reboot
  
Actual results:
network interface doesn't bring up during boot

Expected results:
network interface is up and running, machine is accessible via ssh
Comment 1 Jan Stodola 2010-05-28 07:36:45 EDT
Created attachment 417579 [details]
good config file

From installation, where use didn't click on the "Configure Network" button, eth0 is running after reboot.
Comment 2 Jan Stodola 2010-05-28 07:38:02 EDT
Created attachment 417580 [details]
bad config file

From installation, where user clicked on the "Configure Network" button, eth0 doesn't bring up after reboot.
Comment 4 Steffen Maier 2010-05-29 08:15:39 EDT
This seems to consist of multiple issues.

1) Corrupting the OPTIONS line from OPTIONS='layer2=0 portno=0' into OPTIONS="layer2" is anaconda bug 597205. A patch has been posted https://www.redhat.com/archives/anaconda-devel-list/2010-May/msg00717.html but is not yet in anaconda.git.

2) Adding the line NM_CONTROLLED="yes". I guess this also something anaconda writes out. Radek recently changed ifcfg file handling in anaconda, so putting him on needinfo to find out if this is the way it has to be.
Maybe this option makes NM own the device but this fails because s390 does not use HWADDR and NM does not yet (see also bug 591533) have means to identify s390 network devices by SUBCHANNELS.

These are the only two "non-whitespace" issues I could find between the two ifcfg files. I don't understand yet why exactly the device does not come up on boot. We need more debug info to figure out the exact cause.

Jan, was NetworkManager installed?
Was NM activated (chkconfig)?
Was the "network" service (of initscripts) activated (chkconfig)?
What's the content of /etc/zipl.conf?
Can you login as root on the console after boot and show the output of the following commands?: /sbin/ip a, lsqeth, cat /proc/cio_ignore, lscss -t 1732, lsmod | grep qeth.
Can you attach /var/log/messages or whatever syslog file contains NM messages?
What's the output of?:
DEVPATH=/sys/bus/ccw/devices/0.0.0900 SUBSYSTEM=ccw bash -x /lib/udev/ccw_init
Comment 5 Radek Vykydal 2010-05-29 11:43:48 EDT
(In reply to comment #4)
> This seems to consist of multiple issues.
> 
> 1) Corrupting the OPTIONS line from OPTIONS='layer2=0 portno=0' into
> OPTIONS="layer2" is anaconda bug 597205. A patch has been posted
> https://www.redhat.com/archives/anaconda-devel-list/2010-May/msg00717.html but
> is not yet in anaconda.git.
> 
> 2) Adding the line NM_CONTROLLED="yes". I guess this also something anaconda
> writes out. Radek recently changed ifcfg file handling in anaconda, so putting
> him on needinfo to find out if this is the way it has to be.
> Maybe this option makes NM own the device but this fails because s390 does not
> use HWADDR and NM does not yet (see also bug 591533) have means to identify
> s390 network devices by SUBCHANNELS.
> 
> These are the only two "non-whitespace" issues I could find between the two
> ifcfg files. I don't understand yet why exactly the device does not come up on
> boot. We need more debug info to figure out the exact cause.
> 

NM_CONTROLLED="yes" is equivalent to missing NM_CONTROLLED, or at least it used to be (I'll better check with NM people), so only the OPTIONS difference seem to remain. Here: https://bugzilla.redhat.com/show_bug.cgi?id=597205#c4 is updates image fixing OPTIONS parameter. Jan, can you try if it fixes your case?

And yes, NetworkManager logs from installed systems would be very valuable to see.
Comment 6 Radek Vykydal 2010-05-29 11:46:19 EDT
(In reply to comment #5)

> And yes, NetworkManager logs from installed systems would be very valuable to
> see.    

... it should be in /var/log/messages.
Comment 7 Steffen Maier 2010-05-29 16:49:13 EDT
I'm not sure, since I didn't click "Configure Network", but maybe you see the same thing as I just did in bug 597625.
Do you have eth1 instead of eth0 after reboot after installation?
Comment 8 Jan Stodola 2010-05-31 05:17:21 EDT
NM is not installed at all

# chkconfig --list network  
network             0:off   1:off   2:on    3:on    4:on    5:on    6:off

sh-4.1# cat /mnt/sysimage/etc/zipl.conf
[defaultboot]
default=linux
target=/boot/
[linux]
        image=/boot/vmlinuz-2.6.32-30.el6.s390x
        ramdisk=/boot/initramfs-2.6.32-30.el6.s390x.img
        parameters="root=/dev/mapper/vg_rtt6-lv_root rd_DASD=0.0.3026,use_diag=0,readonly=0,erplog=0,failfast=0 rd_DASD=0.0.3226,use_diag=0,readonly=0,erplog=0,failfast=0 rd_DASD=0.0.3326,use_diag=0,readonly=0,erplog=0,failfast=0 rd_LVM_LV=vg_rtt6/lv_root rd_LVM_LV=vg_rtt6/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us cio_ignore=all,!0.0.0009 crashkernel=auto"

/sbin/ip a  
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00  
    inet 127.0.0.1/8 scope host lo  
    inet6 ::1/128 scope host   
       valid_lft forever preferred_lft forever  

lsqeth doesn't print anything

cat /proc/cio_ignore
0.0.0000-0.0.0008  
0.0.000a-0.0.08ff  
0.0.0903-0.0.3025  
0.0.3027-0.0.3225  
0.0.3227-0.0.3325  
0.0.3327-0.0.ffff  
0.1.0000-0.1.ffff  
0.2.0000-0.2.ffff  
0.3.0000-0.3.ffff  

lscss -t 1732  
Device   Subchan.  DevType CU Type Use  PIM PAM POM  CHPIDs  
----------------------------------------------------------------------
0.0.0900 0.0.0007  1732/01 1731/01      80  80  ff   34000000 00000000
0.0.0901 0.0.0008  1732/01 1731/01      80  80  ff   34000000 00000000
0.0.0902 0.0.0009  1732/01 1731/01      80  80  ff   34000000 00000000

lsmod | grep qeth  
qeth                  112033  0   
qdio                   61977  1 qeth
ccwgroup                9936  1 qeth
Comment 9 Jan Stodola 2010-05-31 05:18:06 EDT
Created attachment 418222 [details]
/var/log/messages
Comment 10 Jan Stodola 2010-05-31 05:24:44 EDT
Created attachment 418224 [details]
DEVPATH=/sys/bus/ccw/devices/0.0.0900 SUBSYSTEM=ccw bash -x /lib/udev/ccw_init
Comment 11 Jan Stodola 2010-05-31 05:28:04 EDT
(In reply to comment #5)
> Here: https://bugzilla.redhat.com/show_bug.cgi?id=597205#c4 is
> updates image fixing OPTIONS parameter. Jan, can you try if it fixes your case?

See bug 597205 comment 5.
Comment 12 Radek Vykydal 2010-05-31 08:03:13 EDT
(In reply to comment #7)
> I'm not sure, since I didn't click "Configure Network", but maybe you see the
> same thing as I just did in bug 597625.
> Do you have eth1 instead of eth0 after reboot after installation?    

Here is updates.img with OPTIONS= patch updated with fix for the bug 597625:
http://rvykydal.fedorapeople.org/updates.597205.597625.img
Comment 13 Steffen Maier 2010-05-31 09:02:13 EDT
All ifcfg files including the original one written by writeEnabledNetInfo of
loader are missing SUBCHANNELS and NETTYPE among other (s390 specific) things.
This explains why the network does not come up on boot.
/lib/udev/ccw_init does not find any information to prepare s390 network device
by grouping and setting online in order to allocate network device structures.

Jan, is BOOTPROTO=dhcp intentional?
Does the installation work, if you don't run nm-c-e?

I think it won't work either, since this most probably depends on fixes for bug
595388 and bug 595382 which are closely related and concern loader's
readNetInfo and writeEnabledNetInfo and therefore the initial ifcfg file
anaconda starts with. That's where SUBCHANNELS and NETTYPE currently get lost.
If those are fixed, this bug here might turn out as not a bug or duplicate (hopefully).
Comment 14 Jan Stodola 2010-05-31 10:10:08 EDT
BOOTPROTO=dhcp is not intentional.

With updates img from comment 12 and not running nm-c-e, ifcfg-eth0 contains:

DEVICE=eth0
HWADDR=00:11:25:BE:2C:57
ONBOOT=yes
BOOTPROTO=dhcp
OPTIONS="layer2=1"
TYPE=Ethernet
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03

(and eth0 doesn't bring up during boot)
Comment 15 Jan Stodola 2010-06-01 06:03:44 EDT
Created attachment 418571 [details]
ifcfg-eth0, iSCSI disk

When running installation and adding an iSCSI disk, ifcfg-eth0 is also incorrect even if I don't run nm-c-e. Difference between the iSCSI ifcfg-eth0 and ifcfg-eth0 from comment 2 is:
NM_CONTROLLED="no"
Comment 16 Radek Vykydal 2010-06-01 06:37:01 EDT
(In reply to comment #15)
> Created an attachment (id=418571) [details]
> ifcfg-eth0, iSCSI disk
> 
> When running installation and adding an iSCSI disk, ifcfg-eth0 is also
> incorrect even if I don't run nm-c-e. Difference between the iSCSI ifcfg-eth0
> and ifcfg-eth0 from comment 2 is:
> NM_CONTROLLED="no"    

Writing-out NM_CONTROLLED="no" is intended for installations having root on iSCSI disk. Explanation:

<hansg> The purpose here is to make sure that NM does not touch the interface  while booting the system
<hansg> (so after install)
<hansg> What happens is NM starts, brings down the interface, then tries to bring it up again, needs some code which is not yet paged in from disk (or some external util) to bring up the nic again, goes to disk to get code, disk is iscsi disk, so this needs network access, deadlock

and also (see 3rd paragraph below):

commit 91e3b49205eca4af3ac80312beebdc78d3e70116
Author: Hans de Goede <hdegoede@redhat.com>
Date:   Thu Jul 9 15:47:12 2009 +0200

    Write out NM_CONTROLLED=no for NICs used for FCoE

    Write out NM_CONTROLLED=no for NICs used for FCoE, note that unlike with
    iSCSI we do not blindly write out NM_CONTROLLED=no for all NICs, but just
    for the NIC which is used for FCoE.

    The iSCSI behaviour is undesirable, but the whole writing of
NM_CONTROLLED=no
    for iSCSI will go away as soon as NetworkManager is fixed to not down
    devices when it takes over control, which should be fixed soon.
Comment 17 Dan Williams 2010-06-08 01:59:45 EDT
Note that I did do a bunch of work in August 2009 (for F-12 no less!!!) to ensure that NM did *NOT* step on existing ethernet connections that were active when NM was started, including iSCSI.  This requires two things:

1) a valid ifcfg file describing the iSCSI connection in the usual place, with BOOTPROTO=ibft and an HWADDR line for that interface (so that we can match it with the 'iscsiadm' output and get the correct IP configuration for the device)

2) if the method is DHCP, that a valid dhclient leasefile exist in the normal dhclient leasefile location

These two things, which are reasonable to expect from a correctly configured network setup at boot, should be handled by the initrd and copied to the correct place at switchroot time.  If these two conditions are met, it is a bug if NM does not simply take that connection over without disrupting the interface.  This should work for any static or DHCP IPv4 ethernet connection.  IPv6 is not yet supported for this functionality.

So it should be simple to test this functionality out even without iSCSI. From a normal machine with an already-configured ethernet connection do:

service NetworkManager stop
ifconfig eth0  (should still have an IP address)
service NetworkManager start
(NM should not take the device down at all and should just take the existing connection over)


commit 38f732a721955ffc66bb5c679c8302c499d20a62
Merge: 0993ea3 1d5a68d
Author: Dan Williams <dcbw@redhat.com>
Date:   Mon Aug 10 15:52:28 2009 -0500

    Merge branch 'iscsi'


So given that, what's the actual bug here?  Is it even an NM bug at this point?
Comment 18 Steffen Maier 2010-06-08 06:57:20 EDT
Due to bug 595388, bug 595382, bug 597205, and bug 597625 it is hardly possible to tell if there is really something wrong with nm-c-e and if so what would be wrong. I'd like to suggest to retest with a compose containing at least fixes for all those four bugs, which are a prereq for install on s390 anyway.
Comment 19 Radek Vykydal 2010-06-15 10:54:50 EDT
(In reply to comment #18)
> Due to bug 595388, bug 595382, bug 597205, and bug 597625 it is hardly possible
> to tell if there is really something wrong with nm-c-e and if so what would be
> wrong. I'd like to suggest to retest with a compose containing at least fixes
> for all those four bugs, which are a prereq for install on s390 anyway.    

Yes, I agree, all the four anaconda bugs are in modified now and should go to anaconda-13.21.51-1. I'd retest with fixed compose, and based on results I'd reconsider the NM_CONTROLLED=no anaconda setting. It really doesn't look like NM bug now. Dan, thanks for explanation.
Comment 21 IBM Bug Proxy 2010-06-23 11:31:57 EDT
------- Comment From mgrf@de.ibm.com 2010-06-23 11:28 EDT-------
(In reply to comment #11)
> ......
> Yes, I agree, all the four anaconda bugs are in modified now and should go to
> anaconda-13.21.51-1. I'd retest with fixed compose, and based on results I'd
> reconsider the NM_CONTROLLED=no anaconda setting. It really doesn't look
> like NM bug now. Dan, thanks for explanation.

Hello Red Hat
any news on testing the fixes with anaconda-13.21.51-1?
Comment 22 Jan Stodola 2010-06-24 04:37:48 EDT
Hello, I tried with anaconda-13.21.52-1, but I hit one more issue when booting installed system:

Bug 607481  - get_config_by_subchannel should handle SUBCHANNELS enclosed in quotes

This is the only issue I found when testing, ifcfg-eth0 itself looks sane after clicking on "Configure Network" button during the installation.
Comment 23 IBM Bug Proxy 2010-09-08 04:31:04 EDT
------- Comment From venatesa@in.ibm.com 2010-09-08 04:21 EDT-------
The bug has not been seen in RHEL6.0 snap13, so closing the bug

Note You need to log in before you can comment on or make changes to this bug.