Bug 609208

Summary: Machine fails to sleep after wake-on-lan
Product: Red Hat Enterprise Linux 5 Reporter: Steve Cleveland <steve.cleveland>
Component: kernelAssignee: Lenny Szubowicz <lszubowi>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.5CC: agospoda
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-15 13:16:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of lsmod, lspci, /proc/acpi/wakeup
none
dmesg none

Description Steve Cleveland 2010-06-29 16:37:15 UTC
Created attachment 427735 [details]
Output of lsmod, lspci, /proc/acpi/wakeup

Description of problem:

Hardware: HP Compaq 8000 Elite Small Form Factor Business PC, latest BIOS revision (1.04), Intel 82567LM-3 NIC

Wake-on-LAN from S3 sleep works once from cold start, after which the computer is unable to sleep (it just wakes immediately) or shutdown (instead of powering off the machine reboots). Similar behavior can be observed when waking the machine from hibernation (S4).

This is not a problem when we load Windows 7 on this same machine, nor is it a problem with RHEL on our other hardware (Dell Optiplex 755 w/ Broadcom BCM5754). This problem does exist with other Linux distros on this hardware, we tested Fedora 13 and Ubuntu 10.04 with similar results.

Version-Release number of selected component (if applicable):

Operating system: RHEL 5.5, Linux 2.6.18-194.3.1.el5 #1 SMP Sun May 2 04:17:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:

Always

Steps to Reproduce:

1. Cold Start
2. Add IGBE to /proc/acpi/wakeup (`echo IGBE > /proc/acpi/wakeup`)
3. Put machine to sleep (`echo mem > /sys/power/state`)
4. Wake-On-LAN (successful)
5. Try to put machine to sleep
  
Actual results:

Machine tries to sleep, but immediately wakes up

Expected results:

The machine should go to sleep

Additional info:

Sleep and wakeup work fine both before and after enabling IGBE in the acpi wakeup table, the system only fails to go to sleep or shutdown after a WOL packet has been received from a sleep state.

Example:

Cold Start { Sleep
Wake by PWR Button (successful) } x2
Add IGBE to /proc/acpi/wakeup { Sleep
Wake by PWR Button (successful) } x2
Sleep
Wake-On-LAN (successful)
Sleep (fails).

Attempted remedies:
Updated the BIOS from 1.02 -> 1.04
Updated the e1000e module to the latest available RPM (kmod-e1000e-1.1.19_NAPI-1.el5.elrepo)
Built a newer version of the e1000e module from source (e1000e 1.2.8)
Changed the Interrupts in the BIOS to isolate the NIC from other devices
Removed the e1000e module before sleeping the 2nd time
Removed the tpm modules at startup
Booted single-user mode, removed nvidia/tpm/misc. modules, problem still persists.

Comment 2 Steve Cleveland 2010-07-06 15:43:59 UTC
Some other information we've gathered:

 * The same issue occurs on an HP dc7900 (same nic, same (Intel Q45) chipset).
 * The issue does not occur on HP xw4600 (Broadcom BCM5755 NIC, Intel X38 chipset)
 * The issue does not occur on a Dell Optiplex 755 (Intel 82566DM-2, Intel Q35)

We had assumed it was an issue with the e1000e driver, but now we think it may be an issue with the BIOS.

Are there any debug tools that might give us an idea of what's causing the issue?

I also see the case is listed as NEEDINFO, but I don't see any comments about what information is needed.

Comment 3 Matthew Garrett 2010-07-06 17:56:07 UTC
I'm sorry, the comment accidentally got flagged as private. Could you attach the output of dmesg after a failed attempt to suspend?

Comment 4 Steve Cleveland 2010-07-06 19:13:14 UTC
Created attachment 429863 [details]
dmesg

This is dmesg after the following actions:

Start
Sleep -> Wake (Pwr button)
Sleep -> Wake (WOL)
Sleep (fails to sleep)

Comment 5 Matthew Garrett 2010-07-06 19:39:50 UTC
Ok, so as far as the kernel is concerned the system has gone to sleep fully. Interesting. I'll see if I can duplicate this behaviour.

Comment 6 Matthew Garrett 2010-07-07 15:52:51 UTC
Hm. I can't reproduce this with a current kernel and an 82567LM-2, so either it's very specific to the 82567LM-3 or it's a platform issue. Just to check this out, is it possible for you to test Fedora 13 and modify the test sequence slightly - rather than writing to /proc/acpi/wakeup, can you write "enabled" to /sys/class/net/eth0/device/power/wakeup and then do the suspend/wol cycle, and also include the output of the ethtool eth0 command? We don't currently seem to have one of these machines internally, so I'll see if I can get my hands on one.

Comment 7 Steve Cleveland 2010-07-07 22:53:00 UTC
We just tested Fedora 13.  It appears the /sys/class/net/eth0/device/power/wakeup is enabled by default, so we didn't need to enable it.  So wake-on-lan works out of the box.  Which is cool.  But the same problem persists.

I'm not clear on how the /sys/class/.../power relates or interacts with /proc/acpi/wakeup.  The latter shows everything but the power button disabled.

Output of ethtool:

Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 2
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: on
        Supports Wake-on: pumbag
        Wake-on: g
        Current message level: 0x00000001 (1)
        Link detected: yes

Comment 8 Steve Cleveland 2011-03-21 21:31:08 UTC
For what it's worth, updating the intel e1000e driver appears to fix the problem.  I've tried two different versions of kmod-e1000e package from elrepo.org.

# rpm -q kmod-e1000e
kmod-e1000e-1.2.20_NAPI-1.el5.elrepo

# modinfo e1000e
filename:       /lib/modules/2.6.18-238.1.1.el5/weak-updates/e1000e/e1000e.ko
version:        1.2.20-NAPI

# rpm -q kmod-e1000e
kmod-e1000e-1.3.10a-1.el5.elrepo

# modinfo e1000e | head -n 3
filename:       /lib/modules/2.6.18-238.1.1.el5/weak-updates/e1000e/e1000e.ko
version:        1.3.10a-NAPI

Comment 9 Lenny Szubowicz 2013-08-15 13:16:38 UTC
Given that there is some indication in comment 8 that the problem may have been fixed by a driver update, it's relatively low severity, and the lifecycle stage that RHEL 5 is currently in, I'm closing this problem report.

                                   -Lenny.