Description of problem: The system cannot Wake-on-LAN after a shutdown (-h) from xen kernel. Version-Release number of selected component (if applicable): kernel-xen-2.6.18-8.1.6.el5 xen-3.0.3-25.0.3.el5 How reproducible: 100%, on a Dell Precision 470n, e1000, x86_64 Steps to Reproduce: 1. enable WOL in the BIOS 2. echo 'ETHTOOL_OPTS="wol g"' >> /etc/sysconfig/network-scripts/ifcfg-eth0 3. boot into baremetal kernel 4. run 'ethtool eth0' to confirm WOL enabled for magic packets 5. shut down the system 6. wake the system using ether-wake from another system on the same subnet 7. boot into the xen kernel 8. run 'ethtool peth0' to confirm WOL enabled for magic packets 9. shut down the system 10. attempt to wake the system using ether-wake from another system on the same subnet Actual results: 1. Both times, the ethtool probe shows that WOL is enabled for magic packets. 2. The system correctly wakes up after shutting down from the baremetal kernel. 3. The system fails to wake up after shutting down from the xen kernel. Expected results: The system should wake up regardless of which kernel was last booted. Additional info: Switch shows the NIC is on in 10 Mbit mode, so the NIC is at least on. If you reboot instead of shutdown, and then soft power-cycle (from the power button) during POST, WOL works just fine. Under GA, this box hangs at acpi_power_off, and from that state a soft power-cycle does *not* make WOL work. It seems that just the Xen acpi_power_off method (or something called around the same time) is messing up WOL. Also reproduced under FC6 and F7.
change QA contact
Still broken in RHEL 5.3 beta, on the same hardware. The acpi_power_off bug has been fixed, so it's possible the real bug is the bridging configuration, not Xen itself.
By any chance, do the xen network scripts make the NIC think it has a different MAC address? Maybe this is related to bug #235502 or bug #458806?
Created attachment 335038 [details] Proposed patch With this patch and the patch from Bug 490053 applied to my system, I can wake the system with a magic packet after running "poweroff" from the command line.
Well, I have it tested and the patch is working. The problem is that the network was not stopped which made this Wake-On-LAN (WOL) issue. Stopping the network made WOL working even after xen kernel shutdown.
Created attachment 339466 [details] New version of this patch I have created a better version of this patch thanks to these information to check runlevel first and stop network only when shutting down/powering off (runlevel 0 or 6). It's been tested and it cases no network disruption now, like the proposed patch (attachment #335041 [details]) did, because network is stopped only when Dom0 is shutting down. The wake-on-lan using etherwake has been tested with this patch applied and it works fine - it starts the bare-metal box.
This is still not safe as it will break any machine using NFS root, or network block devices like NBD/GNBD/iSCSI.
Well, I am not using NFS root or devices written there... Any idea what could we do about that ?
I've been investigating network scripts but there should be no problem because when starting network-bridge with Xen, it returns an error when using network root. It just says that bridging on network root is not supported and returns from the script itself so that the bridge is not created and stopping it is not possible because code there is to exit the script when no bridge is found. Is this still an issue then?
As Dan said on list long time ago: I don't see how this is safe if you have filesystems on NFS, or iSCSI. Later initscripts in the shutdown sequence may still need to access the filesystems and this just tore network out from under them. That is, neither xen nor its init script is allowed to stop network. Setting back to assigned.
Wait a minute . . . It's been a while since I looked at this, but IIRC, the call to Vifctl.network('stop') is just supposed to break down the bridge and set everything back to the state before xend started, right? If that's the case, then how would this affect NFS or iSCSI filesystems in use in Dom0? Taking down the bridge shouldn't change any connectivity to external mounts, should it? I could see a problem if there were DomUs that depended on NFS or iSCSI filesystems in Dom0, but all the DomUs should have been shut down when xendomains is stopped, which occurs before xend is stopped. What am I missing?
(In reply to comment #24) > Wait a minute . . . It's been a while since I looked at this, but IIRC, the > call to Vifctl.network('stop') is just supposed to break down the bridge and > set everything back to the state before xend started, right? If that's the > case, then how would this affect NFS or iSCSI filesystems in use in Dom0? > Taking down the bridge shouldn't change any connectivity to external mounts, > should it? Yes, it does. The process of breaking down the bridge causes dom0 to lose connectivity while the bridge is being broken down. If the commands you need to run happen to be on remote storage, then once you execute the first of the breakdown commands, you can no longer get to the rest of the commands you need to bring networking back up. Chris Lalancette
(In reply to comment #25) > The process of breaking down the bridge causes dom0 to lose > connectivity while the bridge is being broken down. ... Aha. Didn't know that. Thanks for the explanation. ~ Bryan
(In reply to comment #25) > (In reply to comment #24) > > Wait a minute . . . It's been a while since I looked at this, but IIRC, the > > call to Vifctl.network('stop') is just supposed to break down the bridge and > > set everything back to the state before xend started, right? If that's the > > case, then how would this affect NFS or iSCSI filesystems in use in Dom0? > > Taking down the bridge shouldn't change any connectivity to external mounts, > > should it? > > Yes, it does. The process of breaking down the bridge causes dom0 to lose > connectivity while the bridge is being broken down. If the commands you need > to run happen to be on remote storage, then once you execute the first of the > breakdown commands, you can no longer get to the rest of the commands you need > to bring networking back up. > > Chris Lalancette Yeah, that makes sense. The problem is how to solve this because when we don't run the script for stopping bridge, the network interface is shut down in a strange state which makes Wake-On-LAN impossible. What I am thinking about it to run the script right before the shutdown, ie. the most ideally after unmounting file systems or so... Anyway the scripts have a code not to be able to be started when using network root devices, so there should be no problem with that, shouldn't it ? The /etc/xen/scripts/network-bridge have: ... if is_network_root ; then [ -x /usr/bin/logger ] && /usr/bin/logger "network-bridge: bridging not supported on network root; not starting" return fi ... for op_start() operation (which can be run as /etc/xen/scripts/network-bridge start). The function of 'is_network_root' is defined as: is_network_root () { local rootfs=$(awk '{ if ($1 !~ /^[ \t]*#/ && $2 == "/") { print $3; }}' /etc/mtab) local rootopts=$(awk '{ if ($1 !~ /^[ \t]*#/ && $2 == "/") { print $4; }}' /etc/mtab) [[ "$rootfs" =~ "^nfs" ]] || [[ "$rootopts" =~ "_netdev" ]] && return 0 || return 1 } Which means that it should *not* be able to start if using network root and that way stopping the network-bridge will do nothing because no bridge can be found.
... and that means it would be safe to call "network-bridge stop" at shutdown time.
Well, you can also have a look at http://kbase.redhat.com/faq/docs/DOC-21309 . It's about setting network bridge manually for using SNMPd along with Xen but the reason of setting bridge manually is not important. Important is that it describes how to set up the bridge manually which is the best and supported solution. Michal
This is limitation of bridging. We can't do anything with that one in the virtualization stack. Michal
My workaround is to put the script below into /sbin/halt.local (The biggest problem is that brctl is not in /sbin but in /usr/sbin which seems like a bug to me.) Simon #!/bin/sh # workaround hack for WOL issues with XEN kernels # see also https://bugzilla.redhat.com/show_bug.cgi?id=247523 mount -o ro /usr /etc/xen/scripts/network-bridge stop umount /usr modprobe -r eth0 modprobe eth0 ethtool -s eth0 wol g
> If we can't run Dom0 from a network file system, then > shutting down the bridge when xend shuts down won't cause any harm. > We can run Dom0 from a network filesystem (iSCSI storage tested by me).