Bug 697157 - ath9k causes lockups and prevents suspend
ath9k causes lockups and prevents suspend
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
15
i686 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Stanislaw Gruszka
Fedora Extras Quality Assurance
:
: 707276 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-16 05:53 EDT by cam
Modified: 2011-10-26 12:10 EDT (History)
12 users (show)

See Also:
Fixed In Version: kernel-2.6.40.4-5.fc15
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-09-19 03:25:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg output (123.67 KB, text/plain)
2011-04-24 02:47 EDT, cam
no flags Details
Xorg log (30.38 KB, text/x-log)
2011-04-24 02:50 EDT, cam
no flags Details
dmesg (70.78 KB, text/plain)
2011-05-18 03:26 EDT, cam
no flags Details
ar9285 patch for compat-wireless-2010-08-21 building on kernel 2.6.38.7-30.fc15.x86_64 (2.55 KB, patch)
2011-06-15 09:48 EDT, Flos Lonicerae
no flags Details | Diff
build compat-wireless-next-2011_06_15-0.el6.1.src on kernel-devel-2.6.32-71.el6.x86_64 (6.87 KB, text/plain)
2011-06-20 23:35 EDT, Flos Lonicerae
no flags Details
kernel spec with ath9k patch added (89.84 KB, text/x-rpm-spec)
2011-07-30 00:03 EDT, Flos Lonicerae
no flags Details
ath9k Stanislaw Gruszka made (3.35 KB, patch)
2011-07-30 00:04 EDT, Flos Lonicerae
no flags Details | Diff
grub.conf (686 bytes, application/octet-stream)
2011-07-30 00:11 EDT, Flos Lonicerae
no flags Details
ath9k_skip_pci_powersave.patch (4.52 KB, text/plain)
2011-07-30 09:10 EDT, Stanislaw Gruszka
no flags Details
SRPM of compat-wireless 2011-08-27 (3.86 MB, application/x-rpm)
2011-09-17 12:59 EDT, Flos Lonicerae
no flags Details
compat-wireless-2011_08_27-3.fc14 (3.86 MB, application/x-rpm)
2011-09-18 01:20 EDT, Flos Lonicerae
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Linux Kernel 37082 None None None Never
Linux Kernel 37462 None None None Never

  None (edit)
Description cam 2011-04-16 05:53:33 EDT
Description of problem:
I am seeing random lockups which I am guessing are nouveau related although I have little proof. Nothing appears in the logs.

Version-Release number of selected component (if applicable):
xorg-x11-drv-nouveau-0.0.16-24.20110324git8378443.fc15.i686

How reproducible:
With time, reasonably reproducible. I would say there's a chance of a lockup after 30 mins or so and if there was no lockup after 2-3hours of solid use I would imagine the problem was fixed.

Steps to Reproduce:
1.run HP Mini 311c-1101sa with Nvidia Ion and nouveau driver
2.use for 30mins-1 hour
3.observe lockup, mouse cursor vanishes and caps lock key is not responding
  
Actual results:
hard hang and no recovery is possible

Expected results:
no hang

Additional info:
I left the console up once (ctrl-alt-F2 and log in as root). I did tail -f /var/log/messages and caught a bit of a message which I photographed. It did not appear in the logs in the filesystem on reboot:

The message was:

uvcvideo: Failed to query (GET_DEF) UVC control 2 on unit 3: -110 (exp. 2).

I have seen similar messages when starting up and logging out and in again (associate with startup of Xorg maybe) which doesn't necessarily cause a hang.

UVC is apparently my webcam:

Apr 14 07:54:20 newt kernel: [   12.105531] uvcvideo: Found UVC 1.00 device HP Webcam-50 (090c:637b)

I'd be keen to try any additional diagnostics or report any extra info that might help diagnose the problem.
Comment 1 Matěj Cepl 2011-04-21 11:09:01 EDT
Thanks for the bug report.  We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue.

Please add drm.debug=0x04 to the kernel command line, restart computer, and attach

* your X server config file (/etc/X11/xorg.conf, if available),
* X server log file (/var/log/Xorg.*.log)
* output of the dmesg command, and
* system log (/var/log/messages)

to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.
Comment 2 cam 2011-04-24 02:45:13 EDT
There is no specific xorg.conf file;
the system /var/log/messages is empty, which I find odd. The last entries are:


Apr 14 08:20:19 newt nm-dispatcher.action: Disconnected from the system bus, exiting.
Apr 14 08:20:19 newt rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
Apr 14 08:20:19 newt kernel: [ 1579.968633] type=1305 audit(1302765619.679:28838): audit_pid=0 old=1736 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
Apr 14 08:20:19 newt auditd[1736]: The audit daemon is exiting.
Apr 14 08:20:20 newt cpuspeed: Disabling ondemand cpu frequency scaling governor
Apr 14 08:20:20 newt kernel: [ 1581.152818] type=1701 audit(1302765620.863:28839): auid=500 uid=500 gid=500 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=2895 comm="gvfs-gdu-volume" sig=6
Apr 14 08:20:21 newt kernel: Kernel logging (proc) stopped.
Apr 14 08:20:21 newt rsyslogd: [origin software="rsyslogd" swVersion="4.6.3" x-pid="1093" x-info="http://www.rsyslog.com"] exiting on signal 15.

The fact that I'm no longer getting logs I will raise as a separate BZ.
Comment 3 cam 2011-04-24 02:47:47 EDT
Created attachment 494494 [details]
dmesg output
Comment 4 cam 2011-04-24 02:50:03 EDT
Created attachment 494495 [details]
Xorg log
Comment 5 cam 2011-04-24 02:52:17 EDT
I am able to reproduce hangs on this machine fairly easily - by attempting suspend or even logging out from the desktop. Anything that ends the X11 session seems to do the trick. This may be unrelated to the random lockup.
Comment 6 Ben Skeggs 2011-04-25 08:52:57 EDT
Do you see this happen if you boot with "nomodeset"?  You'll get basic video only, however, it'll help narrow it down a bit.
Comment 7 cam 2011-04-25 09:37:00 EDT
(In reply to comment #6)
> Do you see this happen if you boot with "nomodeset"?  You'll get basic video
> only, however, it'll help narrow it down a bit.

I'll run with nomodeset from now on and see how I get on. First observation is that suspend happens when requested, but resume fails with a long beep. It didn't hang like before though. Will report back after a few hours uptime.
Comment 8 cam 2011-04-27 05:33:19 EDT
(In reply to comment #6)
> Do you see this happen if you boot with "nomodeset"?  You'll get basic video
> only, however, it'll help narrow it down a bit.

After several hours further testing, several reboots and suspend attempts, I think the problem is not changed when running with nomodeset. The appearance is very different as the screen size is misdetected, and the shell runs in fallback mode. In spite of this the basic behaviour is the same, random hangs, lockup on suspend (every time) or session end (majority of times, if not shortly after).

I am concerned about a separate bug (#699198) which means I have no logging if I don't restart it manually. I will do that from now on in the hope of exposing some relevant log content.

Maybe this would be more appropriate assigned to the kernel if nouveau is not implicated any more.
Comment 9 Ben Skeggs 2011-04-27 10:24:23 EDT
Yeah, can't be nouveau if it happens with nomodeset too.  Reassigning.
Comment 10 cam 2011-05-01 07:47:57 EDT
Any tips on how to diagnose this? I have been watching the random lockups - as opposed to the ones on logout or suspend attempts, and conclude that it seems to happen when the machine goes idle rather than when it is doing something. Could it be scheduler related, or something that happens when the machine goes idle?

I looked online for advice to debug suspend issues and a lot of the older stuff has been taken down (pm tweaks?). So I have no leads on that either.
Comment 11 cam 2011-05-03 17:53:31 EDT
Now that my logging is working again I thought I would see if there were any insights in the last line logged before the system starts:

I noticed they were all messages from systemd. Is there any way this could be causing the hangs?

grep -B 1 'kmsg started' /var/log/messages


May  1 18:26:59 newt dbus: [system] Successfully activated service 'org.freedesktop.PackageKit'
May  1 18:54:31 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  1 19:26:20 newt dbus: [system] Service 'org.freedesktop.PolicyKit1' is already active
May  1 19:57:48 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  1 20:02:27 newt dbus: [system] Service 'org.freedesktop.PolicyKit1' is already active
May  1 20:40:49 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  1 21:59:50 newt dbus: [system] Successfully activated service 'org.freedesktop.PackageKit'
May  2 00:32:13 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  2 00:38:27 newt dbus: [system] Successfully activated service 'net.reactivated.Fprint'
May  2 00:41:15 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  2 00:41:59 newt dbus: [system] Activation via systemd failed for unit 'dbus-org.bluez.service': Unit dbus-org.bluez.service failed to load: No such file or directory. See system logs and 'systemctl status' for details.
May  2 09:51:52 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  2 13:45:31 newt dbus: [system] Service 'org.freedesktop.PolicyKit1' is already active
May  2 21:44:18 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  3 00:49:53 newt dbus: [system] Successfully activated service 'org.freedesktop.PackageKit'
May  3 05:54:40 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  3 05:55:24 newt dbus: [system] Activation via systemd failed for unit 'dbus-org.bluez.service': Unit dbus-org.bluez.service failed to load: No such file or directory. See system logs and 'systemctl status' for details.
May  3 08:44:28 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  3 08:45:06 newt dbus: [system] Activation via systemd failed for unit 'dbus-org.bluez.service': Unit dbus-org.bluez.service failed to load: No such file or directory. See system logs and 'systemctl status' for details.
May  3 21:17:23 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  3 22:22:56 newt dbus: [system] Successfully activated service 'org.freedesktop.PackageKit'
May  3 22:25:25 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
--
May  3 22:26:26 newt dbus: [system] Successfully activated service 'net.reactivated.Fprint'
May  3 22:29:51 newt kernel: imklog 5.7.9, log source = /proc/kmsg started.
Comment 12 cam 2011-05-15 06:57:36 EDT
I reinstalled the system from a F15 RC3 live CD to rule out the effects of any previous installs or upgrades.

The result is that the system still hangs randomly;

Also the system will hang when suspend, resume, logout, the rf-kill switch is used. I suspect systemd as this is involved in all operations that have been affected.

At the moment this problem makes the machine unusable so I will have to install a different OS - F14 or Ubuntu. The system was perfectly usable under F14
Comment 13 Michal Schmidt 2011-05-16 07:38:03 EDT
(In reply to comment #0)
> uvcvideo: Failed to query (GET_DEF) UVC control 2 on unit 3: -110 (exp. 2).
> 
> I have seen similar messages when starting up and logging out and in again
> (associate with startup of Xorg maybe) which doesn't necessarily cause a hang.

Just to rule out this possibility, have you tried blacklisting the uvcvideo module?
Comment 14 cam 2011-05-16 07:50:19 EDT
(In reply to comment #13)
> Just to rule out this possibility, have you tried blacklisting the uvcvideo
> module?

Thank you I will try that and report back. 

There is a log message along the lines of 'address space collision: host bridge window conflicts with video ROM' that when Googled, suggests adding pci=nocrs; I tried that and the system had the same problem.

I note that the installer seems to have no trouble running for some time and reboots successfully... that doesn't use systemd, does it?

The system is currently running the latest Ubuntu, not sure if the kernel build is interesting or relevant. It seems stable but won't resume from suspend.

I have two disks and will probably set one up with F14 (fully working) and keep the other as a F15 testbed.
Comment 15 Michal Schmidt 2011-05-16 07:59:49 EDT
Please attach the complete dmesg. The one attached here is missing the beginning. Boot with "log_buf_len=1M" (or more) if necessary to prevent the loss of the early boot messages.
Comment 16 cam 2011-05-18 03:26:03 EDT
Created attachment 499533 [details]
dmesg

running 2.6.38.6-26.rc1.fc15.i686
Comment 17 cam 2011-05-18 16:04:10 EDT
I have an interesting update. I tried blacklisting uvcvideo and that wasn't helpful. But it gave me the idea to do a binary chop on the remaining modules and I found that when ath9k was disabled, the system behaved as expected (reboot and suspend / resume worked).

So I now suspect ath9k of causing the suspend / reboot problems and possibly the random lockups too. Any idea what to do apart from reassigning kernel->ath9k?

Thanks Michal for the idea.
Comment 18 cam 2011-05-21 04:07:02 EDT
Another way to hang the system is to rmmod ath9k. Again no log messages are produced. May be related to BZ706574
Comment 19 shayne 2011-06-03 06:10:55 EDT
Im getting random lockups as well.I have started a bug report which i think is a duplicate of this bug 
https://bugzilla.redhat.com/show_bug.cgi?id=708572

Im getting this in my message log as well 

May  3 05:55:24 newt dbus: [system] Activation via systemd failed for unit
'dbus-org.bluez.service': Unit dbus-org.bluez.service failed to load: No such
file or directory. See system logs and 'systemctl status' for details.
Comment 20 Michal Schmidt 2011-06-03 09:14:33 EDT
(In reply to comment #19)
> May  3 05:55:24 newt dbus: [system] Activation via systemd failed for unit
> 'dbus-org.bluez.service': Unit dbus-org.bluez.service failed to load: No such
> file or directory. See system logs and 'systemctl status' for details.

This is unlikely to be related to the lockups. It just means the bluetooth service is disabled. You can enable it with: systemctl enable bluetooth.service
Comment 21 cam 2011-06-03 17:35:58 EDT
(In reply to comment #19)
> Im getting random lockups as well.I have started a bug report which i think is
> a duplicate of this bug 
> https://bugzilla.redhat.com/show_bug.cgi?id=708572

Shayne, which wireless device are you using? I wonder if it is the same module. What hardware is the machine? If it is the same hardware as mine and using the ath9k module, you could try adding this line:

blacklist ath9k

to /etc/modprobe.d/blacklist.conf
With this I lose wireless capability but gain stability and 100% suspend/resume performance.
Comment 22 shayne 2011-06-03 20:13:30 EDT
(In reply to comment #20)
> (In reply to comment #19)
> > May  3 05:55:24 newt dbus: [system] Activation via systemd failed for unit
> > 'dbus-org.bluez.service': Unit dbus-org.bluez.service failed to load: No such
> > file or directory. See system logs and 'systemctl status' for details.
> 
> This is unlikely to be related to the lockups. It just means the bluetooth
> service is disabled. You can enable it with: systemctl enable bluetooth.service

Everytime i get a hard lockup that is the last thing in the messages to appear.Even when i change the alias to point to the right file and start the service it still locks up.

cam
i have a dlink dwl-g510 installed in the computer but im running internet through the ethernet.Ill unplug the card and see if that helps
Comment 23 cam 2011-06-04 03:46:56 EDT
(In reply to comment #22)

> i have a dlink dwl-g510 installed in the computer but im running internet
> through the ethernet.Ill unplug the card and see if that helps


Shayne can you check the times in the logs? I suspected some message in a similar way but after a while realised that the message was just one that happened from time to time, and the hang happened later with no log message. You could run the system up and have a look at the end of the log from time to time, if you see the message you mention when it isn't hung then it's probably not related.
Comment 24 Stanislaw Gruszka 2011-06-07 08:35:05 EDT
Please install kernel-debug, it should print some logs when kernel hangs. They could show only on virtual console (Alt+Ctr+F2) however. You will need take a foto, or use some other method like netconsole or kdump to capture hangs logs.
Comment 25 cam 2011-06-07 08:40:13 EDT
(In reply to comment #24)
> Please install kernel-debug, it should print some logs when kernel hangs. They
> could show only on virtual console (Alt+Ctr+F2) however. You will need take a
> foto, or use some other method like netconsole or kdump to capture hangs logs.

Thank you I will try that. 

On the ath9k-devel list there has been some relevant traffic, there is a patch and some futher debug that I need to try. I hope to find time soon to rebuild the kernel, patch and test.

https://lists.ath9k.org/pipermail/ath9k-devel/2011-June/thread.html#6250
Comment 26 cam 2011-06-11 03:32:16 EDT
Following discussions on the ath9k list, a patch has been posted to linux-wireless here:

http://marc.info/?l=linux-wireless&m=130768848626799&w=2

Thanks to Adrian Chadd this workaround removes the hang. Although it looks like the driver as is fails to get the full power saving behaviour of the chip, which is a shame.

I rebuilt the 2.6.38.7-30.fc15.i686 kernel and manually patched it with the changes above. If there is a better way to get the workaround into a Fedora system I'd be interested in suggestions (using http://people.redhat.com/sgruszka/compact_wireless.html maybe?)
Comment 27 Stanislaw Gruszka 2011-06-13 07:24:56 EDT
Patching (In reply to comment #26)
> If there is a better way to get the workaround into a Fedora
> system I'd be interested in suggestions (using
> http://people.redhat.com/sgruszka/compact_wireless.html maybe?)
You will still need to patch, but building compat-wirless takes less time than build whole kernel. I put srpm's here:
http://people.redhat.com/sgruszka/compat-wireless-src/
you can also use compat-wireless tarballs form main sites of course:
http://linuxwireless.org/en/users/Download#Directly_downloading_the_tarball
Comment 28 Stanislaw Gruszka 2011-06-14 04:54:31 EDT
*** Bug 707276 has been marked as a duplicate of this bug. ***
Comment 29 Flos Lonicerae 2011-06-15 09:45:05 EDT
Hi Stanislaw,

mine is FC15, so i cannot use your srpm or rpm directory. i downloaded the patch you told us and the compat-wireless-2010-08-21.tar.gz which is stable enough as pepole on the archlinux buzilla said.
i patched the compat-wireless source and made some small changes, and then i build the modules for my 2.6.38.7-30.fc15.x86_64 kernel.
i installed the modules and enabled my ar9285 card again, so far, it works smoothly!

thanks!

Flos
Comment 30 Flos Lonicerae 2011-06-15 09:46:44 EDT
i forget to tell you my system does not hang any longer! :D
Comment 31 Flos Lonicerae 2011-06-15 09:48:23 EDT
Created attachment 504879 [details]
ar9285 patch for compat-wireless-2010-08-21 building on kernel 2.6.38.7-30.fc15.x86_64
Comment 32 Stanislaw Gruszka 2011-06-16 10:01:40 EDT
FYI, I put compat-wireless SRPMs in compat-wireless{,-next}/SRPMS/F-{14,15}/ , compat-wireless.repo is also updated so yumdownloader --source compat-wireless{,-next} can be used.
Comment 33 Flos Lonicerae 2011-06-17 12:56:04 EDT
Hi Stanislaw,

i cannot build compat-wireless RPMs from your SRPMs both on FC15 or RHEL6... i've installed all build-requires packages, but errors occur while building. i'll post the error message tomorrow.

Flos
Comment 34 Stanislaw Gruszka 2011-06-20 03:37:30 EDT
Hi Flos, did you installed build dependencies i.e:
sudo yum-builddep compat-wireless-next-2011_06_15-0.el6.1.src.rpm

RHEL6 packages are currently broken (I should probably remove them from site). Also installing compat-wrieless from tarball is totally fine.
Comment 35 Flos Lonicerae 2011-06-20 23:34:09 EDT
Hi Stanisla,

when i run yum-builddep on FC15, i get the following messages:

[root@localhost ~]# yum-builddep compat-wireless-next-2011_06_14-0.fc15.2.src.rpm 
Loaded plugins: langpacks, refresh-packagekit
Getting requirements for compat-wireless-next-2011_06_14-0.fc15.2.src
 --> Already installed : kernel-devel-2.6.38.7-30.fc15.x86_64
Error: No Package found for kernel-devel-uname-r = 2.6.38.7-30.fc15.x86_64.debug

so, i cann't even 'rpmbuild --rebuild <your package>'.

--------------------------------------------------
when i try to compile your srpm on RHEL6, i got compile error messages. please see my attachment: build-steps.txt.

BTW, if i start my notebook for the first time after completely poweroff, although i've using my patched compat-wireless 'compat-wireless-2010-08-21', the system still hangs... if i reboot into my WinXP, then reboot to FC15, the system will not freeze. very strange! so the problem do not completely resolve.

Flos
Comment 36 Flos Lonicerae 2011-06-20 23:35:30 EDT
Created attachment 505735 [details]
build compat-wireless-next-2011_06_15-0.el6.1.src on kernel-devel-2.6.32-71.el6.x86_64
Comment 37 Stanislaw Gruszka 2011-06-21 08:05:29 EDT
Hi Flos, 

Ehh, these packages needs more work. Regarding RHEL I was building on rhel6.1 kernel, which has pci_is_pcie already included.  I need to add prober conditionals to allow build and load modules on different RHEL6 kernels. Regarding F-15 I have no idea, I will check it out locally.
Comment 38 Stanislaw Gruszka 2011-07-13 10:14:42 EDT
RHEL6 packages should be fixed now.
Comment 39 cam 2011-07-14 18:38:13 EDT
Please note, I have found a good workaround for me. I'm not sure why it works, or if it's a sign that the code needs changing in a certain way.

I added pcie_aspm=force to the kernel command line and the problem went away. This is much easier than recompiling the kernel without certain patches (which I never managed to get working, but others said they had some success). The only other fix I'm aware of was a change to the ath9k driver but that seems unfair since it is changing code that works for other people.

It would be helpful if others with this problem could try pcie_aspm=force, which works for me on my HP Mini 311c 1101SA with 2.6.38.8-35.fc15.i686 and report back.
Comment 40 Stanislaw Gruszka 2011-07-15 05:04:57 EDT
That confirm Adrian Chadd theory that problem is on machines with ASPM disabled. I have some idea how to possibly fix it, I think I will provide test patch next week.
Comment 41 Stanislaw Gruszka 2011-07-18 05:24:27 EDT
cam, could you please test this patch (without pcie_aspm=force option of course, can use compat-wirless for test):
http://marc.info/?l=linux-wireless&m=131074673413948&w=2
Comment 42 cam 2011-07-18 19:57:34 EDT
(In reply to comment #41)
> cam, could you please test this patch (without pcie_aspm=force option of
> course, can use compat-wirless for test):
> http://marc.info/?l=linux-wireless&m=131074673413948&w=2

Hello, I have rebuilt 2.6.38.8-35 with that patch, it has survived a suspend/resume without problems (and no extra command line needed). Good!
Comment 43 Flos Lonicerae 2011-07-29 23:59:56 EDT
Hi,

i've remove my Fedora15 and installed Fedora14 instead, but 2.6.35.13-92.fc14.x86_64 has the same problem. 

i've successfully rebuilt kernel-2.6.38.8-35.fc15.src.rpm with patch(http://marc.info/?l=linux-wireless&m=131074673413948&w=2) on my Fedora 14, i install the newly built kernel, reboot.

while my AR9285 Wireless Network Adapter is comminucating with my wireless router, system still hangs with NO response.

i search the reply then cam's replay reminds me if i can pass pcie_aspm=force options to kernel, i try and all problems go away!! both my newly compiled kernel-2.6.38.8-35 kernel and 2.6.35.13-92.fc14 original kernels work ok!

please see my modified kernel.spec and grub.conf files in attachments.

Thank you.

Flos
Comment 44 Flos Lonicerae 2011-07-30 00:03:14 EDT
Created attachment 515959 [details]
kernel spec with ath9k patch added
Comment 45 Flos Lonicerae 2011-07-30 00:04:28 EDT
Created attachment 515960 [details]
ath9k Stanislaw Gruszka made
Comment 46 Flos Lonicerae 2011-07-30 00:11:13 EDT
Created attachment 515961 [details]
grub.conf
Comment 47 Stanislaw Gruszka 2011-07-30 09:10:11 EDT
Created attachment 515974 [details]
ath9k_skip_pci_powersave.patch

Hi Flos, 

My previous patch has bug - incorrectly recognize if ASPM is enabled on PCIe port. This one has this bug fixed. It is 2.6.35 backport of patch I posted yesterday: http://marc.info/?l=linux-wireless&m=131194788313415&w=2

I lunched build with patch here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3240072

Please download ant test that kernel when it finish to compile. Remember to remove pcie_aspm=force.

If it does not fix the problem, provide the following info (run commands as root, otherwise output will be limited):

lspci -tv
lspci -vvvnn # kernel booted without pcie_aspm=force
lspci -vvvnn # kernel booted with pcie_aspm=force
Comment 48 cam 2011-07-31 06:01:07 EDT
(In reply to comment #47)
> Created attachment 515974 [details]
[...]
> I lunched build with patch here:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3240072

I tested that kernel (the i686) on my F15 system and it didn't work well for all sorts of reasons (mainly nouveau wasn't happy), but it didn't have wireless related issues, in particular the rf-kill switch didn't cause lockups and the suspend/resume action didn't cause a lockup. I hope that helps.
Comment 49 Flos Lonicerae 2011-07-31 08:55:42 EDT
(In reply to comment #48)
> (In reply to comment #47)
> > Created attachment 515974 [details]
> [...]
> > I lunched build with patch here:
> > http://koji.fedoraproject.org/koji/taskinfo?taskID=3240072
> 
> I tested that kernel (the i686) on my F15 system and it didn't work well for
> all sorts of reasons (mainly nouveau wasn't happy)...

Hi cam,

this is the kernel for F14,
--
"Information for task build (dist-f14, kernel-2.6.35.13-93.bz697157.fc14.src.rpm)"
--
so F15 may have various problems. let me have a try.

Flos
Comment 50 Flos Lonicerae 2011-07-31 21:19:57 EDT
Hi,

i've tried kernel-2.6.35.13-93.bz697157 on my Fedora 14, it seems ok. during the test time, i shutdown my notebook, and wait 1 hour until the notebook went cool, then rebooted, wireless could be recognized and communicated with wireless router, and got address from it. i could also browse the web pages and see the online films. i do this 'shutdown/wait/boot' test 3 times, without problems.

i reboot and reenter to the system, the wireless adapter still runs well. so i think this patch DOES solve the problem.

Flos
Comment 51 Flos Lonicerae 2011-09-17 12:56:32 EDT
Hi,

i've rebuild a compat-wireless package for Fedora 14 with the latest compat-wireless-2.6 source. i modified the spec file of Stanislaw Gruszka because i really can NOT rebuild his SRPM on my Fedora 14(it always complains kernel-devel-uname-r = xxx.i686.debug not met).

PS: this is a source RPM package, you need to rebuild and get the RPM for your kernel.

Flos
Comment 52 Flos Lonicerae 2011-09-17 12:59:37 EDT
Created attachment 523719 [details]
SRPM of compat-wireless 2011-08-27
Comment 53 Flos Lonicerae 2011-09-18 01:17:11 EDT
Comment on attachment 523719 [details]
SRPM of compat-wireless 2011-08-27

wrong date
Comment 54 Flos Lonicerae 2011-09-18 01:20:12 EDT
Created attachment 523737 [details]
compat-wireless-2011_08_27-3.fc14

rebuild
Comment 55 Stanislaw Gruszka 2011-09-19 03:19:27 EDT
Hi Flos,

You can use compat-wireless-next pre-build binary packages from my home page. Compat-wirless-next contains fix for that bug.
Comment 56 Stanislaw Gruszka 2011-09-19 03:25:46 EDT
Current F-15 kernel contains fix for that bug:

commit c82ac94469ab54ca57b05fd85ce709530d44002f
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Fri Jul 29 15:59:08 2011 +0200

    ath9k: skip ->config_pci_powersave() if PCIe port has ASPM disabled
Comment 57 Flos Lonicerae 2011-10-16 13:27:16 EDT
Hi,

i'm very sorry to tell you, this bug *STILL* affect kernel-3.1.0-rc9! i try to install Fedora 16 Beta on my notebook, when the system detect my network, it freezed! then i disabled my wireless card, it can be installed.

when i successfully login to my newly installed Fedora 16, i re-enable the wireless card, and login to GNOME, click Network Manager applet to select my SSID, after inputing password, whole system freezed! the same as the Fedora 14... :(

did anyone of QA team ever test this wireless card driver before Alpha/Beta release?

Flos
Comment 58 Flos Lonicerae 2011-10-16 13:28:43 EDT
again, the /var/log/message file did NOT show any information about what is happened when the wireless card connecting to AP.
Comment 59 cam 2011-10-16 17:59:24 EDT
I am running kernel-3.1.0-0.rc8.git0.1.fc16.i686
I still use pcie_aspm=force to boot, and without it there are hangs.
Comment 60 Flos Lonicerae 2011-10-17 00:23:27 EDT
Hi cam,

i'll try your method on my Fedora 16 when i get home tonight. btw, is there a tutortial for HOW TO show debug messages of kernel modules? or how can i debug a kernel module in this case?
Comment 61 Flos Lonicerae 2011-10-17 00:25:41 EDT
@cam

how do you install Fedora 16 on your box? can it be installed without adding the 'pcie_aspm=force' in grub?
Comment 62 Stanislaw Gruszka 2011-10-17 07:37:51 EDT
(In reply to comment #57)
> when i successfully login to my newly installed Fedora 16, i re-enable the
> wireless card, and login to GNOME, click Network Manager applet to select my
> SSID, after inputing password, whole system freezed! the same as the Fedora
> 14... :(
I think this is a different issue. Could you capture call-trace, perhaps using kdump or netconsole and open a separate bug for it? Using kernel-debug could be also a good idea to capture trace.
 
> did anyone of QA team ever test this wireless card driver before Alpha/Beta
> release?
Wireless card may work well on one laptop and completely suck on other, similarly works well with some AP and not work with other one. We are unable to test every possible combination, or even some reasonable amount. We generally relay upstream developers (usually hired by hardware vendors) do a good job, but that unfortunately not allays true :-(
Comment 63 Stanislaw Gruszka 2011-10-17 07:40:14 EDT
(In reply to comment #59)
> I am running kernel-3.1.0-0.rc8.git0.1.fc16.i686
> I still use pcie_aspm=force to boot, and without it there are hangs.

Having some kernel log would be great. Did I ask you to use krnel-debug and check if nmi watchdog works. Also if you blacklist ath9k module system does not hangs without pcie_aspm=force option?
Comment 64 Flos Lonicerae 2011-10-17 12:33:00 EDT
i want to apologize for being so rude..as i spent a whole night to install Fedora 16 on my notebook.

i add 'pcie_aspm=force' option in grub, but this time i have no luck.

@Stanislaw, i didn't know how to use kdump, but i can install kernel-debug package. could you tell me what to do next? or if you have time, you can feel free to remote login to my notebook to do any experiment on it.
Comment 65 Stanislaw Gruszka 2011-10-18 05:36:37 EDT
(In reply to comment #64)
> i want to apologize for being so rude..
No worries, we pretty much understand that you can be frustrated by fedora :-)

> @Stanislaw, i didn't know how to use kdump, 

There is graphical tool which assist to install kdump, please do as root:

yum install system-config-kdump
system-config-kdump

To test:

echo c > /proc/sysrq-trigger

This should trigger crash and kernel should dump vmcore file i.e in /var/crash

To install tools for analyse dump:

yum install crash
debuginfo-install kernel # or kernel-debug depend on which is used

To analyse memory dump:

crash /var/crash/*/vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux
crash> set scroll off
crash> dmesg 
crash> dmesg > ~/dmesg.txt

If system freeze is a effect of kernel oops, kdump should dump memory. Otherwise nmi_watchdog should trigger a crash, but first you must assure it works, if needed see:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/nmi_watchdog.txt;h=bf9f80a982829ae06062e9cc118762335936ad4a;hb=HEAD

> but i can install kernel-debug
Good. It's better to use it, even if you have worked kdump, since there probably there will be more usefull information in dmesg.

> if you have time, you can feel
> free to remote login to my notebook to do any experiment on it.
If system crash debuggig it remotly by ssh will not work.
Comment 66 Flos Lonicerae 2011-10-18 12:26:59 EDT
thanks!!! i will have a try tomorrow, wait for my result pls.
Comment 67 Stanislaw Gruszka 2011-10-19 03:47:06 EDT
Flos, please open a separate bug for this new problem, and assign it to me. Thanks.
Comment 68 Flos Lonicerae 2011-10-26 12:10:52 EDT
Hi Stanislaw,

i open a new bug here:
https://bugzilla.redhat.com/show_bug.cgi?id=749276

thanks for your help!

Flos

Note You need to log in before you can comment on or make changes to this bug.