Bug 808751

Summary: WLAN does not work if ethernet connection cannot be established at boot time
Product: Red Hat Enterprise Linux 6 Reporter: o.h.weiergraeber
Component: NetworkManagerAssignee: Dan Williams <dcbw>
Status: CLOSED WORKSFORME QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: danw, jklimes, rkhan, tpelka
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-19 17:29:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Requested info after booting without ethernet connection
none
Requested info after activation of wlan connection
none
syslog (from bootup to establishment of wlan connection) none

Description o.h.weiergraeber 2012-03-31 14:20:52 UTC
Description of problem:
My two EL6 laptops are routinely connected to my router via ethernet with fixed IP addresses and connect to the network automatically at boot time. Every now and then, one may want to use them wireless so I also configured WLAN access, using DHCP for this purpose (different range of IP addresses on the same subnet). When switching from ethernet to WLAN connection, one out of three scenarios can occur:

1. The machine is booted with ethernet connection, then I disconnect the cable and activate WLAN via NetworkManager > works fine

2. The machine is booted with ethernet disabled (i.e., "connect automatically" is unchecked), and I activate WLAN via NetworkManager > works fine

However,

3. The machine is booted without the ethernet cable (without disabling the connection in advance), and I activate WLAN via NetworkManager > does NOT work!
In this case NetworkManager reports that the connection has been established, and the connection information it outputs is perfectly correct (including DNS server, netmask, gateway etc.). No errors or warnings. But it is impossible to open any web page - firefox is stuck at the "Looking up..." stage. Indeed, I am unable to ping the router's internal IP address, which explains the failure in name resolution since the router acts as DNS proxy. External machines respond to ping, so the connection is, in principle, functional.

So it seems that the failure to establish the ethernet connection at boot time interferes with wireless networking by preventing access to local IP addresses!!!
Since scenario #3 above is the usual one when switching connections, this behavior is very annoying.


Version-Release number of selected component (if applicable):
NetworkManager-0.8.1-15.el6.x86_64


How reproducible:
Always


Steps to Reproduce:
1. Configure ethernet and wlan connections asd described above
2. Boot without ethernet cable and activate wlan connection via NetworkManager
3. Try pinging local ethernet addresses, check name resolution


Actual results:
Ping to local ethernet addresses (including the router) fails, but remote addresses are reachable.
As a result name resolution fails if the router is DNS proxy.


Expected results:
Local ethernet addresses are reachable when wireless mode is enabled.


Additional info:

Comment 2 Dan Winship 2012-04-02 14:33:19 UTC
Sounds like the default route still points to the non-functional ethernet connection even after getting a wifi connection.

As a workaround, try "ifdown eth0". (Either before or after bringing up the wifi connection. Should work either way.)

Comment 3 o.h.weiergraeber 2012-04-04 09:57:29 UTC
Unfortunately, "ifdown eth0" does not change the situation.

As to the default routes: these are formally identical for both types of connection (and correspond to my router's local IP address 192.168.2.1).
The DHCP server used for the wifi connection provides IP addresses in a range not overlapping with my fixed ethernet addresses. I really fail to see any problems here.

A long time ago, I encountered a similar phenomenon on an IRIX machine (remote addresses were reachable but local ones were not). That turned out to be due to a wrong default netmask! In my case, again, both eth0 and wlan0 should be using the same netmask (255.255.255.0), and according to the "connection information" panel of NetworkManager it is setup in the correct way.
But still something seems to be messed up internally... :-(

Comment 4 Jirka Klimes 2012-04-04 11:06:33 UTC
To debug further we need some data. Please provide the following info for case 3)

$ ip a
$ route -n
$ cat /etc/resolv.conf
- /var/log/messages (best as an attachment)
- show output of ping tries (both the local and external)

Comment 5 o.h.weiergraeber 2012-04-04 13:30:36 UTC
Created attachment 575125 [details]
Requested info after booting without ethernet connection

Comment 6 o.h.weiergraeber 2012-04-04 13:31:36 UTC
Created attachment 575126 [details]
Requested info after activation of wlan connection

Comment 7 o.h.weiergraeber 2012-04-04 13:33:21 UTC
Created attachment 575127 [details]
syslog (from bootup to establishment of wlan connection)

Comment 8 o.h.weiergraeber 2012-04-04 13:41:46 UTC
Ok, please find the requested information above. The name of my (private, non-official) domain has been changed to "mydomain" in those files.
I noted that the DHCP server implemented in my router appears to send his name (Speedport_W_504V_Typ_A) as "domain" information, which thus appears in the resolv.conf file generated by NetworkManager. To rule out this generates a problem, I have defined a wlan connection manually (not included in the attachments); this results in a clean resolv.conf, but has no effect on the problem reported in this thread.

Hope this helps to clarify things.

Comment 9 Jirka Klimes 2012-04-04 16:11:37 UTC
The problem is evident
You have two routes (from comment #6)
192.168.2.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.2.0     0.0.0.0         255.255.255.0   U     2      0        0 wlan0

and the eth0 one has lower metric, so the packets are directed to it.

You said that eth0 is not activated (and there's no NM activation eth0 in messages).
However, comment #5 reveals that eth0 has an IP and have the interfering route
installed (even if the interface is DOWN):
192.168.2.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0

So, the IP and route has to be set up somehow.
Do you have 'network' service enabled, did you do ifup eth0 in a script or something?
chkconfig --list network

What's in /etc/sysconfig/network-scripts/ifcfg-eth0

As a workaround you can use different subnets, so that there is no clash between
wired and wireless.

Comment 10 o.h.weiergraeber 2012-04-04 16:47:49 UTC
Thanks a lot for your analysis.

Hmm, I left all services at their default settings - strange that this has lead to such a problem...

# chkconfig --list network
network        	0:off	1:off	2:on	3:on	4:on	5:on	6:off

So yes, the network service is running, and it must have been running all the time because it was on by default.
However it was never evident to me what this coexistence of "network" and "NetworkManager" is good for...

What I did during the initial installation was define my ethernet connection with static IP, and activate "connect automatically" because otherwise configuration of RHN during firstboot would not be possible.
Then, as an ordinary user, I defined a WLAN connection which can be activated on demand. This uses DHCP, but that does not make a difference since a manual configuration leads to the same problem.

cat /etc/sysconfig/network-scripts/ifcfg-eth0 gives me

DEVICE="eth0"
NM_CONTROLLED="yes"
ONBOOT=yes
TYPE=Ethernet
BOOTPROTO=none
IPADDR=192.168.2.12
PREFIX=24
GATEWAY=192.168.2.1
DNS1=192.168.2.1
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"
UUID=5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03
HWADDR=00:1F:E2:1C:60:B1
DOMAIN=mydomain.de

so eth0 is setup to be controlled by NetworkManager.

As indicated in my initial post, just disconnecting the ethernet cable *after bootup* and then activating wlan works without any problem.

So the core of the problem is that a missing ethernet connection at boot time is treated differently from a disconnect in the running system, right?

Proof:
In the presence of ethernet and absence of wlan I have

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.50.0    0.0.0.0         255.255.255.0   U     0      0        0 vmnet8
192.168.2.0     0.0.0.0         255.255.255.0   U     1      0        0 eth0
192.168.185.0   0.0.0.0         255.255.255.0   U     0      0        0 vmnet1
0.0.0.0         192.168.2.1     0.0.0.0         UG    0      0        0 eth0

After just disconnecting the cable this changes to

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.50.0    0.0.0.0         255.255.255.0   U     0      0        0 vmnet8
192.168.185.0   0.0.0.0         255.255.255.0   U     0      0        0 vmnet1

And after activating wlan:

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.50.0    0.0.0.0         255.255.255.0   U     0      0        0 vmnet8
192.168.2.0     0.0.0.0         255.255.255.0   U     2      0        0 wlan0
192.168.185.0   0.0.0.0         255.255.255.0   U     0      0        0 vmnet1
0.0.0.0         192.168.2.1     0.0.0.0         UG    0      0        0 wlan0

Now there is no eth0 route left, and the wlan connection works fine.


Therefore, wouldn't you agree that the persistence of the eth0 route after booting without ethernet connection is a bug?
Shouldn't this situation be handled the same way as a disconnect?

Unfortunately, using different subnets is not an option since it is not allowed by the router.

Comment 11 o.h.weiergraeber 2012-04-05 19:19:58 UTC
I can confirm that after turning the "network" service off, the wireless connection works as expected. So it was indeed this service that was causing the eth0 route to persist even if the link was down during bootup.

This raises two questions:

1. Is this the definitive solution (chkconfig network off)?

2. If so, is there ANY point at all in running "network" and "NetworkManager" at the same time (which apparently is *default* in RHEL6)?

Comment 12 o.h.weiergraeber 2012-04-27 06:52:46 UTC
There has been quite some time now without (visible) activity on this subject...

@Jirka Klimes
Can you confirm that the above assumptions are correct? If so, I think at the very least a Knowledge base entry or an addendum to the manuals should be created addressing this potential pitfall.
Still, in my opinion the straightforward solution would be to make users decide explicitly whether to use "network" or "NetworkManager" to handle their connections. Very simple actually...

Comment 13 RHEL Program Management 2012-05-03 05:39:13 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 14 Dan Williams 2013-02-19 17:29:37 UTC
(In reply to comment #11)
> I can confirm that after turning the "network" service off, the wireless
> connection works as expected. So it was indeed this service that was causing
> the eth0 route to persist even if the link was down during bootup.
> 
> This raises two questions:
> 
> 1. Is this the definitive solution (chkconfig network off)?
> 
> 2. If so, is there ANY point at all in running "network" and
> "NetworkManager" at the same time (which apparently is *default* in RHEL6)?

If you're able to use NetworkManager to fully control the networking on the machine, then the network service is not very useful.

However, many installs in enterprise-type deployments either dont' use NetworkManager (and thus need the network service) or use NM for controlling only specific interfaces.  So it still defaults to on.

But in your case, disabling the network service is the correct fix.