Bug 597206
| Summary: | NetworkManager doesn't write correct config file for network interface during installation | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jan Stodola <jstodola> | ||||||||||||
| Component: | NetworkManager | Assignee: | Dan Williams <dcbw> | ||||||||||||
| Status: | CLOSED NOTABUG | QA Contact: | desktop-bugs <desktop-bugs> | ||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||
| Priority: | high | ||||||||||||||
| Version: | 6.0 | CC: | bugproxy, caillon, maier, mgrf, rvykydal, rwilliam | ||||||||||||
| Target Milestone: | rc | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | s390x | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2010-06-16 20:19:48 UTC | Type: | --- | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | 595382, 595388, 597205, 597625 | ||||||||||||||
| Bug Blocks: | 582286 | ||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Jan Stodola
2010-05-28 11:33:52 UTC
Created attachment 417579 [details]
good config file
From installation, where use didn't click on the "Configure Network" button, eth0 is running after reboot.
Created attachment 417580 [details]
bad config file
From installation, where user clicked on the "Configure Network" button, eth0 doesn't bring up after reboot.
This seems to consist of multiple issues. 1) Corrupting the OPTIONS line from OPTIONS='layer2=0 portno=0' into OPTIONS="layer2" is anaconda bug 597205. A patch has been posted https://www.redhat.com/archives/anaconda-devel-list/2010-May/msg00717.html but is not yet in anaconda.git. 2) Adding the line NM_CONTROLLED="yes". I guess this also something anaconda writes out. Radek recently changed ifcfg file handling in anaconda, so putting him on needinfo to find out if this is the way it has to be. Maybe this option makes NM own the device but this fails because s390 does not use HWADDR and NM does not yet (see also bug 591533) have means to identify s390 network devices by SUBCHANNELS. These are the only two "non-whitespace" issues I could find between the two ifcfg files. I don't understand yet why exactly the device does not come up on boot. We need more debug info to figure out the exact cause. Jan, was NetworkManager installed? Was NM activated (chkconfig)? Was the "network" service (of initscripts) activated (chkconfig)? What's the content of /etc/zipl.conf? Can you login as root on the console after boot and show the output of the following commands?: /sbin/ip a, lsqeth, cat /proc/cio_ignore, lscss -t 1732, lsmod | grep qeth. Can you attach /var/log/messages or whatever syslog file contains NM messages? What's the output of?: DEVPATH=/sys/bus/ccw/devices/0.0.0900 SUBSYSTEM=ccw bash -x /lib/udev/ccw_init (In reply to comment #4) > This seems to consist of multiple issues. > > 1) Corrupting the OPTIONS line from OPTIONS='layer2=0 portno=0' into > OPTIONS="layer2" is anaconda bug 597205. A patch has been posted > https://www.redhat.com/archives/anaconda-devel-list/2010-May/msg00717.html but > is not yet in anaconda.git. > > 2) Adding the line NM_CONTROLLED="yes". I guess this also something anaconda > writes out. Radek recently changed ifcfg file handling in anaconda, so putting > him on needinfo to find out if this is the way it has to be. > Maybe this option makes NM own the device but this fails because s390 does not > use HWADDR and NM does not yet (see also bug 591533) have means to identify > s390 network devices by SUBCHANNELS. > > These are the only two "non-whitespace" issues I could find between the two > ifcfg files. I don't understand yet why exactly the device does not come up on > boot. We need more debug info to figure out the exact cause. > NM_CONTROLLED="yes" is equivalent to missing NM_CONTROLLED, or at least it used to be (I'll better check with NM people), so only the OPTIONS difference seem to remain. Here: https://bugzilla.redhat.com/show_bug.cgi?id=597205#c4 is updates image fixing OPTIONS parameter. Jan, can you try if it fixes your case? And yes, NetworkManager logs from installed systems would be very valuable to see. (In reply to comment #5) > And yes, NetworkManager logs from installed systems would be very valuable to > see. ... it should be in /var/log/messages. I'm not sure, since I didn't click "Configure Network", but maybe you see the same thing as I just did in bug 597625. Do you have eth1 instead of eth0 after reboot after installation? NM is not installed at all
# chkconfig --list network
network 0:off 1:off 2:on 3:on 4:on 5:on 6:off
sh-4.1# cat /mnt/sysimage/etc/zipl.conf
[defaultboot]
default=linux
target=/boot/
[linux]
image=/boot/vmlinuz-2.6.32-30.el6.s390x
ramdisk=/boot/initramfs-2.6.32-30.el6.s390x.img
parameters="root=/dev/mapper/vg_rtt6-lv_root rd_DASD=0.0.3026,use_diag=0,readonly=0,erplog=0,failfast=0 rd_DASD=0.0.3226,use_diag=0,readonly=0,erplog=0,failfast=0 rd_DASD=0.0.3326,use_diag=0,readonly=0,erplog=0,failfast=0 rd_LVM_LV=vg_rtt6/lv_root rd_LVM_LV=vg_rtt6/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us cio_ignore=all,!0.0.0009 crashkernel=auto"
/sbin/ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
lsqeth doesn't print anything
cat /proc/cio_ignore
0.0.0000-0.0.0008
0.0.000a-0.0.08ff
0.0.0903-0.0.3025
0.0.3027-0.0.3225
0.0.3227-0.0.3325
0.0.3327-0.0.ffff
0.1.0000-0.1.ffff
0.2.0000-0.2.ffff
0.3.0000-0.3.ffff
lscss -t 1732
Device Subchan. DevType CU Type Use PIM PAM POM CHPIDs
----------------------------------------------------------------------
0.0.0900 0.0.0007 1732/01 1731/01 80 80 ff 34000000 00000000
0.0.0901 0.0.0008 1732/01 1731/01 80 80 ff 34000000 00000000
0.0.0902 0.0.0009 1732/01 1731/01 80 80 ff 34000000 00000000
lsmod | grep qeth
qeth 112033 0
qdio 61977 1 qeth
ccwgroup 9936 1 qeth
Created attachment 418222 [details]
/var/log/messages
Created attachment 418224 [details]
DEVPATH=/sys/bus/ccw/devices/0.0.0900 SUBSYSTEM=ccw bash -x /lib/udev/ccw_init
(In reply to comment #5) > Here: https://bugzilla.redhat.com/show_bug.cgi?id=597205#c4 is > updates image fixing OPTIONS parameter. Jan, can you try if it fixes your case? See bug 597205 comment 5. (In reply to comment #7) > I'm not sure, since I didn't click "Configure Network", but maybe you see the > same thing as I just did in bug 597625. > Do you have eth1 instead of eth0 after reboot after installation? Here is updates.img with OPTIONS= patch updated with fix for the bug 597625: http://rvykydal.fedorapeople.org/updates.597205.597625.img All ifcfg files including the original one written by writeEnabledNetInfo of loader are missing SUBCHANNELS and NETTYPE among other (s390 specific) things. This explains why the network does not come up on boot. /lib/udev/ccw_init does not find any information to prepare s390 network device by grouping and setting online in order to allocate network device structures. Jan, is BOOTPROTO=dhcp intentional? Does the installation work, if you don't run nm-c-e? I think it won't work either, since this most probably depends on fixes for bug 595388 and bug 595382 which are closely related and concern loader's readNetInfo and writeEnabledNetInfo and therefore the initial ifcfg file anaconda starts with. That's where SUBCHANNELS and NETTYPE currently get lost. If those are fixed, this bug here might turn out as not a bug or duplicate (hopefully). BOOTPROTO=dhcp is not intentional. With updates img from comment 12 and not running nm-c-e, ifcfg-eth0 contains: DEVICE=eth0 HWADDR=00:11:25:BE:2C:57 ONBOOT=yes BOOTPROTO=dhcp OPTIONS="layer2=1" TYPE=Ethernet DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=yes IPV6INIT=no NAME="System eth0" UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03 (and eth0 doesn't bring up during boot) Created attachment 418571 [details] ifcfg-eth0, iSCSI disk When running installation and adding an iSCSI disk, ifcfg-eth0 is also incorrect even if I don't run nm-c-e. Difference between the iSCSI ifcfg-eth0 and ifcfg-eth0 from comment 2 is: NM_CONTROLLED="no" (In reply to comment #15) > Created an attachment (id=418571) [details] > ifcfg-eth0, iSCSI disk > > When running installation and adding an iSCSI disk, ifcfg-eth0 is also > incorrect even if I don't run nm-c-e. Difference between the iSCSI ifcfg-eth0 > and ifcfg-eth0 from comment 2 is: > NM_CONTROLLED="no" Writing-out NM_CONTROLLED="no" is intended for installations having root on iSCSI disk. Explanation: <hansg> The purpose here is to make sure that NM does not touch the interface while booting the system <hansg> (so after install) <hansg> What happens is NM starts, brings down the interface, then tries to bring it up again, needs some code which is not yet paged in from disk (or some external util) to bring up the nic again, goes to disk to get code, disk is iscsi disk, so this needs network access, deadlock and also (see 3rd paragraph below): commit 91e3b49205eca4af3ac80312beebdc78d3e70116 Author: Hans de Goede <hdegoede> Date: Thu Jul 9 15:47:12 2009 +0200 Write out NM_CONTROLLED=no for NICs used for FCoE Write out NM_CONTROLLED=no for NICs used for FCoE, note that unlike with iSCSI we do not blindly write out NM_CONTROLLED=no for all NICs, but just for the NIC which is used for FCoE. The iSCSI behaviour is undesirable, but the whole writing of NM_CONTROLLED=no for iSCSI will go away as soon as NetworkManager is fixed to not down devices when it takes over control, which should be fixed soon. Note that I did do a bunch of work in August 2009 (for F-12 no less!!!) to ensure that NM did *NOT* step on existing ethernet connections that were active when NM was started, including iSCSI. This requires two things:
1) a valid ifcfg file describing the iSCSI connection in the usual place, with BOOTPROTO=ibft and an HWADDR line for that interface (so that we can match it with the 'iscsiadm' output and get the correct IP configuration for the device)
2) if the method is DHCP, that a valid dhclient leasefile exist in the normal dhclient leasefile location
These two things, which are reasonable to expect from a correctly configured network setup at boot, should be handled by the initrd and copied to the correct place at switchroot time. If these two conditions are met, it is a bug if NM does not simply take that connection over without disrupting the interface. This should work for any static or DHCP IPv4 ethernet connection. IPv6 is not yet supported for this functionality.
So it should be simple to test this functionality out even without iSCSI. From a normal machine with an already-configured ethernet connection do:
service NetworkManager stop
ifconfig eth0 (should still have an IP address)
service NetworkManager start
(NM should not take the device down at all and should just take the existing connection over)
commit 38f732a721955ffc66bb5c679c8302c499d20a62
Merge: 0993ea3 1d5a68d
Author: Dan Williams <dcbw>
Date: Mon Aug 10 15:52:28 2009 -0500
Merge branch 'iscsi'
So given that, what's the actual bug here? Is it even an NM bug at this point?
Due to bug 595388, bug 595382, bug 597205, and bug 597625 it is hardly possible to tell if there is really something wrong with nm-c-e and if so what would be wrong. I'd like to suggest to retest with a compose containing at least fixes for all those four bugs, which are a prereq for install on s390 anyway. (In reply to comment #18) > Due to bug 595388, bug 595382, bug 597205, and bug 597625 it is hardly possible > to tell if there is really something wrong with nm-c-e and if so what would be > wrong. I'd like to suggest to retest with a compose containing at least fixes > for all those four bugs, which are a prereq for install on s390 anyway. Yes, I agree, all the four anaconda bugs are in modified now and should go to anaconda-13.21.51-1. I'd retest with fixed compose, and based on results I'd reconsider the NM_CONTROLLED=no anaconda setting. It really doesn't look like NM bug now. Dan, thanks for explanation. ------- Comment From mgrf.com 2010-06-23 11:28 EDT------- (In reply to comment #11) > ...... > Yes, I agree, all the four anaconda bugs are in modified now and should go to > anaconda-13.21.51-1. I'd retest with fixed compose, and based on results I'd > reconsider the NM_CONTROLLED=no anaconda setting. It really doesn't look > like NM bug now. Dan, thanks for explanation. Hello Red Hat any news on testing the fixes with anaconda-13.21.51-1? Hello, I tried with anaconda-13.21.52-1, but I hit one more issue when booting installed system: Bug 607481 - get_config_by_subchannel should handle SUBCHANNELS enclosed in quotes This is the only issue I found when testing, ifcfg-eth0 itself looks sane after clicking on "Configure Network" button during the installation. ------- Comment From venatesa.com 2010-09-08 04:21 EDT------- The bug has not been seen in RHEL6.0 snap13, so closing the bug |