Bug 247523

Summary: Xen kernel breaks Wake-on-LAN from shutdown state
Product: Red Hat Enterprise Linux 5 Reporter: Chris Snook <csnook>
Component: xenAssignee: Michal Novotny <minovotn>
Status: CLOSED CANTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 5.0CC: areis, berrange, clalance, jdenemar, jreznik, martin.wilck, mrezanin, riel, simon.matter, syeghiay, tao, xen-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-04-08 16:51:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 490053    
Bug Blocks: 499522, 514499    
Attachments:
Description Flags
Proposed patch
none
New version of this patch none

Description Chris Snook 2007-07-09 18:40:55 UTC
Description of problem:
The system cannot Wake-on-LAN after a shutdown (-h) from xen kernel.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-8.1.6.el5
xen-3.0.3-25.0.3.el5

How reproducible:
100%, on a Dell Precision 470n, e1000, x86_64

Steps to Reproduce:
1. enable WOL in the BIOS
2. echo 'ETHTOOL_OPTS="wol g"' >> /etc/sysconfig/network-scripts/ifcfg-eth0
3. boot into baremetal kernel
4. run 'ethtool eth0' to confirm WOL enabled for magic packets
5. shut down the system
6. wake the system using ether-wake from another system on the same subnet
7. boot into the xen kernel
8. run 'ethtool peth0' to confirm WOL enabled for magic packets
9. shut down the system
10. attempt to wake the system using ether-wake from another system on the same
subnet

Actual results:
1. Both times, the ethtool probe shows that WOL is enabled for magic packets.
2. The system correctly wakes up after shutting down from the baremetal kernel.
3. The system fails to wake up after shutting down from the xen kernel.

Expected results:
The system should wake up regardless of which kernel was last booted.

Additional info:
Switch shows the NIC is on in 10 Mbit mode, so the NIC is at least on.  If you
reboot instead of shutdown, and then soft power-cycle (from the power button)
during POST, WOL works just fine.  Under GA, this box hangs at acpi_power_off,
and from that state a soft power-cycle does *not* make WOL work.  It seems that
just the Xen acpi_power_off method (or something called around the same time) is
messing up WOL.

Also reproduced under FC6 and F7.

Comment 1 Red Hat Bugzilla 2007-07-25 00:53:58 UTC
change QA contact

Comment 2 Chris Snook 2008-11-13 00:52:36 UTC
Still broken in RHEL 5.3 beta, on the same hardware.  The acpi_power_off bug has been fixed, so it's possible the real bug is the bridging configuration, not Xen itself.

Comment 5 Rik van Riel 2009-01-12 17:39:24 UTC
By any chance, do the xen network scripts make the NIC think it has a different MAC address?  Maybe this is related to bug #235502 or bug #458806?

Comment 13 Bryan Mason 2009-03-13 00:35:42 UTC
Created attachment 335038 [details]
Proposed patch

With this patch and the patch from Bug 490053 applied to my system, I can wake the system with a magic packet after running "poweroff" from the command line.

Comment 15 Michal Novotny 2009-03-13 15:01:25 UTC
Well, I have it tested and the patch is working. The problem is that the network was not stopped which made this Wake-On-LAN (WOL) issue. Stopping the network made WOL working even after xen kernel shutdown.

Comment 16 Michal Novotny 2009-04-14 11:43:16 UTC
Created attachment 339466 [details]
New version of this patch

I have created a better version of this patch thanks to these information to check runlevel first and stop network only when shutting down/powering off (runlevel 0 or 6). It's been tested and it cases no network disruption now, like the proposed patch (attachment #335041 [details]) did, because network is stopped only when Dom0 is shutting down. The wake-on-lan using etherwake has been tested with this patch applied and it works fine - it starts the bare-metal box.

Comment 17 Daniel Berrangé 2009-04-14 11:58:20 UTC
This is still not safe as it will break any machine using NFS root, or network block devices like NBD/GNBD/iSCSI.

Comment 18 Michal Novotny 2009-04-14 12:33:32 UTC
Well, I am not using NFS root or devices written there... Any idea what could we do about that ?

Comment 19 Michal Novotny 2009-04-22 09:06:34 UTC
I've been investigating network scripts but there should be no problem because when starting network-bridge with Xen, it returns an error when using network root. It just says that bridging on network root is not supported and returns from the script itself so that the bridge is not created and stopping it is not possible because code there is to exit the script when no bridge is found. Is this still an issue then?

Comment 23 Jiri Denemark 2009-08-06 13:00:16 UTC
As Dan said on list long time ago:
    I don't see how this is safe if you have filesystems on NFS, or iSCSI.
    Later initscripts in the shutdown sequence may still need to access the
    filesystems and this just tore network out from under them.

That is, neither xen nor its init script is allowed to stop network. Setting back to assigned.

Comment 24 Bryan Mason 2009-08-06 20:29:50 UTC
Wait a minute . . . It's been a while since I looked at this, but IIRC, the call to Vifctl.network('stop') is just supposed to break down the bridge and set everything back to the state before xend started, right?  If that's the case, then how would this affect NFS or iSCSI filesystems in use in Dom0?  Taking down the bridge shouldn't change any connectivity to external mounts, should it?

I could see a problem if there were DomUs that depended on NFS or iSCSI filesystems in Dom0, but all the DomUs should have been shut down when xendomains is stopped, which occurs before xend is stopped.

What am I missing?

Comment 25 Chris Lalancette 2009-08-07 08:13:10 UTC
(In reply to comment #24)
> Wait a minute . . . It's been a while since I looked at this, but IIRC, the
> call to Vifctl.network('stop') is just supposed to break down the bridge and
> set everything back to the state before xend started, right?  If that's the
> case, then how would this affect NFS or iSCSI filesystems in use in Dom0? 
> Taking down the bridge shouldn't change any connectivity to external mounts,
> should it?

Yes, it does.  The process of breaking down the bridge causes dom0 to lose connectivity while the bridge is being broken down.  If the commands you need to run happen to be on remote storage, then once you execute the first of the breakdown commands, you can no longer get to the rest of the commands you need to bring networking back up.

Chris Lalancette

Comment 26 Bryan Mason 2009-08-07 17:29:16 UTC
(In reply to comment #25)
> The process of breaking down the bridge causes dom0 to lose
> connectivity while the bridge is being broken down. ...

Aha.  Didn't know that.  Thanks for the explanation.

~ Bryan

Comment 27 Michal Novotny 2009-08-17 10:54:51 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > Wait a minute . . . It's been a while since I looked at this, but IIRC, the
> > call to Vifctl.network('stop') is just supposed to break down the bridge and
> > set everything back to the state before xend started, right?  If that's the
> > case, then how would this affect NFS or iSCSI filesystems in use in Dom0? 
> > Taking down the bridge shouldn't change any connectivity to external mounts,
> > should it?
> 
> Yes, it does.  The process of breaking down the bridge causes dom0 to lose
> connectivity while the bridge is being broken down.  If the commands you need
> to run happen to be on remote storage, then once you execute the first of the
> breakdown commands, you can no longer get to the rest of the commands you need
> to bring networking back up.
> 
> Chris Lalancette  

Yeah, that makes sense. The problem is how to solve this because when we don't run the script for stopping bridge, the network interface is shut down in a strange state which makes Wake-On-LAN impossible. What I am thinking about it to run the script right before the shutdown, ie. the most ideally after unmounting file systems or so... Anyway the scripts have a code not to be able to be started when using network root devices, so there should be no problem with that, shouldn't it ?

The /etc/xen/scripts/network-bridge have:
...
    if is_network_root ; then
        [ -x /usr/bin/logger ] && /usr/bin/logger "network-bridge: bridging not supported on network root; not starting"
        return
    fi
...

for op_start() operation (which can be run as /etc/xen/scripts/network-bridge start). The function of 'is_network_root' is defined as:

is_network_root () {
    local rootfs=$(awk '{ if ($1 !~ /^[ \t]*#/ && $2 == "/") { print $3; }}' /etc/mtab)
    local rootopts=$(awk '{ if ($1 !~ /^[ \t]*#/ && $2 == "/") { print $4; }}' /etc/mtab)

    [[ "$rootfs" =~ "^nfs" ]] || [[ "$rootopts" =~ "_netdev" ]] && return 0 || return 1
}

Which means that it should *not* be able to start if using network root and that way stopping the network-bridge will do nothing because no bridge can be found.

Comment 32 Martin Wilck 2009-09-25 09:43:25 UTC
... and that means it would be safe to call "network-bridge stop" at shutdown time.

Comment 48 Michal Novotny 2009-11-27 08:42:22 UTC
Well, you can also have a look at http://kbase.redhat.com/faq/docs/DOC-21309 . It's about setting network bridge manually for using SNMPd along with Xen but the reason of setting bridge manually is not important. Important is that it describes how to set up the bridge manually which is the best and supported solution.

Michal

Comment 72 Michal Novotny 2010-04-06 13:47:17 UTC
This is limitation of bridging. We can't do anything with that one in the virtualization stack.

Michal

Comment 73 Simon Matter 2010-04-07 17:48:44 UTC
My workaround is to put the script below into /sbin/halt.local
(The biggest problem is that brctl is not in /sbin but in /usr/sbin which seems like a bug to me.)

Simon

#!/bin/sh
# workaround hack for WOL issues with XEN kernels
# see also https://bugzilla.redhat.com/show_bug.cgi?id=247523
mount -o ro /usr
/etc/xen/scripts/network-bridge stop
umount /usr
modprobe -r eth0
modprobe eth0
ethtool -s eth0 wol g

Comment 76 Miroslav Rezanina 2010-04-08 09:35:01 UTC
> If we can't run Dom0 from a network file system, then
> shutting down the bridge when xend shuts down won't cause any harm.
> 

We can run Dom0 from a network filesystem (iSCSI storage tested by me).