Bug 158989 - snd-nm256 module hangs Dell Latitude CSx
Summary: snd-nm256 module hangs Dell Latitude CSx
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: John W. Linville
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 181409
TreeView+ depends on / blocked
 
Reported: 2005-05-27 14:54 UTC by Andrew Meredith
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 21:07:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed patch for Latitude CSx against RHEL4 U1 2.6.9-9.EL kernel (1.24 KB, patch)
2005-05-27 15:29 UTC, Dan Williams
no flags Details | Diff
jwltest-nm256-quirk.patch (936 bytes, patch)
2005-10-13 14:27 UTC, John W. Linville
no flags Details | Diff
jwltest-nm256-quirk.patch (3.24 KB, patch)
2005-10-14 17:39 UTC, John W. Linville
no flags Details | Diff
Oops logs from two lockups (4.67 KB, text/plain)
2005-10-18 18:09 UTC, Andrew Meredith
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 121601 0 medium CLOSED (SND NM256) 2.6.5-1.327 snd-nm256 module hangs Dell Latitidue LS (PP01S) 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Internal Links: 121601

Description Andrew Meredith 2005-05-27 14:54:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.7.8) Gecko/20050512 Red Hat/1.0.4-1.4.1 Firefox/1.0.4

Description of problem:
As soon as the snd-nm256 module is modprobed, the laptop locks up hard.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.5.EL

How reproducible:
Always

Steps to Reproduce:
1. modprobe snd-nm256
2. there is no 2)
3. see 2) ;)
  

Additional info:

Comment 1 Andrew Meredith 2005-05-27 14:57:02 UTC
See also bug number 121601


Comment 2 Dan Williams 2005-05-27 15:01:54 UTC
Andrew:

Can you also post the output of "lspci -n" ?  I'm also assuming that your lspci
output from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121601#c8 still
stands as correct, right?

Comment 3 Andrew Meredith 2005-05-27 15:09:41 UTC
Sorry, yes indeed, same box .. and here's the "lspci -n" output again for the
record.

00:00.0 Class 0600: 8086:7190 (rev 03)
00:01.0 Class 0604: 8086:7191 (rev 03)
00:03.0 Class 0607: 104c:ac1c (rev 01)
00:03.1 Class 0607: 104c:ac1c (rev 01)
00:07.0 Class 0680: 8086:7110 (rev 02)
00:07.1 Class 0101: 8086:7111 (rev 01)
00:07.2 Class 0c03: 8086:7112 (rev 01)
00:07.3 Class 0680: 8086:7113 (rev 03)
01:00.0 Class 0300: 10c8:0006
01:00.1 Class 0401: 10c8:8006
06:00.0 Class 0200: a727:0013 (rev 01)

For those searching the database, the lspci output (no -n) for the NM256 lines are:

01:00.0 VGA compatible controller:
 Neomagic Corporation NM2360 [MagicMedia 256ZX]
01:00.1 Multimedia audio controller:
 Neomagic Corporation NM2360 [MagicMedia 256ZX Audio]


Comment 4 Dan Williams 2005-05-27 15:13:34 UTC
Ok, good.  If the patch in Bug 121601 fixes the issue for patrick, it will
probably fix your issue here as well and then we can investigate getting the fix
into the RHEL4 kernel.

Comment 5 Dan Williams 2005-05-27 15:29:55 UTC
Created attachment 114916 [details]
Proposed patch for Latitude CSx against RHEL4 U1 2.6.9-9.EL kernel

Note:  This code cleans up the workaround detect section a bit too. There were
duplicate if() statements that both worked around the bug for the Latitdue LS
(0x1028 0x0080) which could be condensed.

The patch needs to be applied _after_ linux-2.6.0-compile.patch since that
patch actually adds this workaround for the Latitude LS chip.

Comment 6 Andrew Meredith 2005-05-28 23:55:11 UTC
Nice one.

Before I reach for my compiler, is it likely to work with kernel-2.6.9-6.37.EL,
(which is the latest available on RHN) until U1 is released?

Thanks


Comment 7 Dan Williams 2005-05-29 15:06:53 UTC
andrew: yes, it most likely will.

Comment 8 Dan Williams 2005-05-29 15:10:29 UTC
andrew: if you could try the patch out and see if it works, that would be great.
 A report from the other bug (bug 121601) says the patch didn't fix the issue,
but lets make sure.

Comment 9 Andrew Meredith 2005-05-30 15:48:32 UTC
Sadly no. I'm afraid it didn't.

I have mapped the sound modules out of the boot sequence so the machine cab be
used. When I modprobed the patched snd-nm256 module, it locked solid.


Comment 10 Andrew Meredith 2005-06-05 23:32:10 UTC
I wonder has anyone ever considered a sort of "die-loudly=1" switch, that causes
the module to log a load of debug info about internal state and then exit
gracefully before actually doing anything?

If nothing else, it might be useful to see the states of the various chip
workarounds, just in case there's a logic issue in there if for no other reason. 

I haven't touched this sort of code in ages, but I would be happy to build and
test if someone can wing me a patch.



Comment 11 Andrew Meredith 2005-06-29 22:13:54 UTC
Is there anything brewing on this issue, or should we abandon using this series
of Dell laptops with RHEL4?

Comment 14 John W. Linville 2005-08-24 14:29:37 UTC
I never saw a report (either here or in bug 121601) about the results of 
trying "vaio_hack=1" as a module parameter.  What effect (if any) did that 
have? 

Comment 15 Andrew Meredith 2005-08-31 11:17:38 UTC
I can confirm that vaio_hack=1 makes no difference.

Sorry



Comment 16 John W. Linville 2005-09-08 14:55:38 UTC
I have test kernels w/ an updated snd-nm256 driver available here: 
 
   http://people.redhat.com/linville/kernels/rhel4/ 
 
Please give those a try and post the results...thanks!  BTW, if you still 
experience problems, you may try adding "reset_workaround=1" as a module 
parameter. 

Comment 19 Andrew Meredith 2005-09-11 12:43:06 UTC
Sound did work briefly during one session, but for the most part, modprobe
nm-256 still locks the machine solid .. with and without "reset_workaround=1".

Sorry
 

Comment 20 John W. Linville 2005-09-14 17:25:52 UTC
Hmmm...I don't suppose you (or anyone else) would like to send me an example  
of this hardware? :-)  

Comment 21 Andrew Meredith 2005-09-15 08:01:47 UTC
Surely with Red Hat's 'Special Relationship' with Dell, you can get one from
them ;-)

Comment 22 Dan Williams 2005-09-15 14:11:40 UTC
Alan: you mentioned yesterday on IRC that you had a thinkpad 600 with a NeoMagic
chip.  Does it possibly use that for sound as well, and have you run into hangs
with the nm256 module ever on it?  If not, sorry for the bugspam...

Comment 23 Alan Cox 2005-09-15 14:48:02 UTC
I dont have neomagic audio just the older video.

The Dell and some vaio devices do have slightly odd neomagic setups - the low
bts of 0x6cc (believed to be GPIO) are used for something else and kill the box
if set.

Can you add a printk into snd_nm256_write* inlines and read* inlines that prints
port/values, recompile the kernel and boot into run level 3 so you get a trace
of each access and let me know where it dies


Comment 29 John W. Linville 2005-10-06 00:36:26 UTC
Hmmm...well, after tracking-down such a laptop, and figuring-out how to 
install RHEL4 on it (I hope you have a cdrom drive for yours!)...the sound 
works fine. 
 
Just to be sure, I verified that my lspci -n output matched yours from comment 
3 (except for the NIC).  Which begs the question, have you tried a different 
NIC? 
 
Have you tried RHEL4 U2? 

Comment 30 Andrew Meredith 2005-10-06 07:57:04 UTC
That'd be about right wouldn't it :)

I have used all sorts of different NICs. The freeze came with the actual
insertion of the sound module. Maybe if you tugged and reinserted it might
freeze. Works every time on my machine.

The laptop was built with U0, but was fully updated using up2date.

Comment 31 John W. Linville 2005-10-06 19:06:23 UTC
Well, the "good" news is that I am seeing some of this behaviour -- just not 
all the time... 
 
I'll have to get back to you... :-) 

Comment 32 John W. Linville 2005-10-07 18:07:06 UTC
Grrr...the box shut itself down for no apparent reason, and now it won't turn  
back on... 
 
I don't know if I'll be able to revive this box or not.  If not, I may not be 
able to fix this. 
 
FWIW, it did appear to be successful using the kernels currently at the 
location from comment 16.  However, it didn't stay-up long enough to give any 
real confidence.  Any chance you'd like to try the latest kernels from there? 

Comment 33 Andrew Meredith 2005-10-08 15:22:59 UTC
As I was running out of excuses for not showing my customers the (heavily sound
involved) product on my laptop, I have had to hock the cat and buy myself a new
one. I would still be interested in getting the old one to function properly as
a standby.

I am happy to go round the loop if you have done anything that might affect the
functionality of the snd-nm256 module. If nothing has been done that might fix
things then I'm afraid I can't really justify that time.

So .. have you :)

Comment 34 John W. Linville 2005-10-13 14:25:12 UTC
I believe I have...  (I somehow revived my box, btw...) 
 
Some modifications to the existing reset workaround have resulted in my box 
being able to load/use/unload the module many thousands of times without a 
lock-up. 
 
Test kernels are available at the same location as in comment 16.  Please give 
them a try and post the results...thanks! 

Comment 35 John W. Linville 2005-10-13 14:27:06 UTC
Created attachment 119913 [details]
jwltest-nm256-quirk.patch

Comment 36 Dan Williams 2005-10-13 16:32:45 UTC
jwl: you want me to test this on a Dell Latitude LS to make sure it works
without those snd_nm256_writeb() calls?  It normally requires the reset
workaround on this machine and since those two calls get removed for reset
workaround case, they affect the codepath for this box...

Comment 37 John W. Linville 2005-10-13 16:38:17 UTC
Yes, that would be quite welcome...thanks! 

Comment 38 Dan Williams 2005-10-13 18:46:27 UTC
Procedure with kernel from Comment 16:
1) Boot to desktop
2) Play a clip in RealPlayer
3) Adjust volume
4) WORKS
5) quit RealPlayer
6) unload module (requires logout because something is using it)
7) log back in
8) Play a clip in RealPlayer
9) NO SOUND

It appears that I can't get sound back on this box after unloading the module.

To isolate the issue and make sure its not RealPlayer, could I simply test with
"cat /dev/random > /dev/xxxx"?  What should xxxx be?

I just tried the test again twice, and both times it hardlocked when logging
back in.  Console 1 didn't print anything before the lock, and there's nothing
in /var/log/messages to indicate a panic.

So, I tried with stock RHEL4U2 kernel (2.6.9-22.EL I think).  Unfortunately, it
panics on unload.  Note that your kernel from Comment 16 does _not_ panic on unload.

Comment 39 Dan Williams 2005-10-13 18:54:26 UTC
Panic from stock RHEL4 U2 2.6.9-22.EL on module unload:

Oct 13 14:42:29 dhcp83-31 kernel: Unable to handle kernel paging request at
virtual address d0818a04
Oct 13 14:42:29 dhcp83-31 kernel:  printing eip:
Oct 13 14:42:29 dhcp83-31 kernel: d097504c
Oct 13 14:42:29 dhcp83-31 kernel: *pde = 0fd1b067
Oct 13 14:42:29 dhcp83-31 kernel: Oops: 0000 [#1]
Oct 13 14:42:29 dhcp83-31 kernel: Modules linked in: parport_pc lp parport
autofs4 sunrpc ds ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables md5
ipv6 yenta_socket pcmcia_core uhci_hcd snd_nm256 snd_ac97_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd soundcore 3c59x mii floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
Oct 13 14:42:29 dhcp83-31 kernel: CPU:    0
Oct 13 14:42:29 dhcp83-31 kernel: EIP:    0060:[<d097504c>]    Not tainted VLI
Oct 13 14:42:29 dhcp83-31 kernel: EFLAGS: 00010282   (2.6.9-22.EL) 
Oct 13 14:42:29 dhcp83-31 kernel: EIP is at snd_nm256_ac97_ready+0x17/0x3d
[snd_nm256]
Oct 13 14:42:29 dhcp83-31 kernel: eax: d0818a04   ebx: cfee53a0   ecx: 00009f9f
  edx: 00000002
Oct 13 14:42:29 dhcp83-31 kernel: esi: 00000009   edi: 00000800   ebp: 00000a04
  esp: c2c5fe64
Oct 13 14:42:29 dhcp83-31 kernel: ds: 007b   es: 007b   ss: 0068
Oct 13 14:42:29 dhcp83-31 kernel: Process rmmod (pid: 3432, threadinfo=c2c5f000
task=ce30c0d0)
Oct 13 14:42:29 dhcp83-31 kernel: Stack: cfee53a0 00000001 00000600 00009f9f
d09750cd 00000002 d0984a74 c126c200 
Oct 13 14:42:29 dhcp83-31 kernel:        00000002 00009f9f d096314d c126c200
c126c200 00002fff c2c5f000 d09663c7 
Oct 13 14:42:29 dhcp83-31 kernel:        c126c200 cdc76000 d0964517 cd8219c0
d08dd330 cdc76000 00002000 d08dd4b8 
Oct 13 14:42:29 dhcp83-31 kernel: Call Trace:
Oct 13 14:42:29 dhcp83-31 kernel:  [<d09750cd>] snd_nm256_ac97_write+0x20/0x4e
[snd_nm256]
Oct 13 14:42:29 dhcp83-31 kernel:  [<d096314d>] snd_ac97_write+0x53/0x58
[snd_ac97_codec]
Oct 13 14:42:29 dhcp83-31 kernel:  [<d09663c7>] snd_ac97_powerdown+0x19/0x69
[snd_ac97_codec]
Oct 13 14:42:29 dhcp83-31 kernel:  [<d0964517>] snd_ac97_dev_free+0xb/0x13
[snd_ac97_codec]
Oct 13 14:42:29 dhcp83-31 kernel:  [<d08dd330>] snd_device_free+0x65/0x8b [snd]
Oct 13 14:42:29 dhcp83-31 kernel:  [<d08dd4b8>] snd_device_free_all+0x3b/0x4b [snd]
Oct 13 14:42:29 dhcp83-31 kernel:  [<d08d8cfc>] snd_card_free+0x135/0x1b5 [snd]
Oct 13 14:42:29 dhcp83-31 kernel:  [<c01ab777>] sysfs_hash_and_remove+0xda/0x106
Oct 13 14:42:29 dhcp83-31 kernel:  [<d0975aee>] snd_nm256_remove+0xc/0x16
[snd_nm256]
Oct 13 14:42:29 dhcp83-31 kernel:  [<c01ec1db>] pci_device_remove+0x16/0x28
Oct 13 14:42:29 dhcp83-31 kernel:  [<c024a39b>] device_release_driver+0x3c/0x46
Oct 13 14:42:29 dhcp83-31 kernel:  [<c024a3bd>] driver_detach+0x18/0x1f
Oct 13 14:42:29 dhcp83-31 kernel:  [<c024a74d>] bus_remove_driver+0x48/0x75
Oct 13 14:42:29 dhcp83-31 kernel:  [<c024ab13>] driver_unregister+0xc/0x31
Oct 13 14:42:29 dhcp83-31 kernel:  [<c01ec398>] pci_unregister_driver+0xb/0x13
Oct 13 14:42:29 dhcp83-31 kernel:  [<c013b705>] sys_delete_module+0x132/0x179
Oct 13 14:42:29 dhcp83-31 kernel:  [<c015a6ad>] unmap_vma_list+0xe/0x17
Oct 13 14:42:29 dhcp83-31 kernel:  [<c015aa5c>] do_munmap+0x1c8/0x1d2
Oct 13 14:42:29 dhcp83-31 kernel:  [<c030f91f>] syscall_call+0x7/0xb
Oct 13 14:42:29 dhcp83-31 kernel: Code: 7a ef c7 83 c0 00 00 00 00 00 00 00 b8
01 00 00 00 5b 5e c3 55 57 56 be 09 00 00 00 53 8b 68 3c 89 c3 0f b7 78 40 8b 43
04 01 e8 <0f> b7 00 85 f8 75 07 b8 01 00 00 00 eb 13 b8 bc 8d 06 00 e8 ce 
Oct 13 14:42:29 dhcp83-31 kernel:  <0>Fatal exception: panic in 5 seconds


Comment 40 John W. Linville 2005-10-14 17:39:39 UTC
Created attachment 119994 [details]
jwltest-nm256-quirk.patch

Comment 41 John W. Linville 2005-10-14 17:42:04 UTC
OK, lets try the new kernels at the location from comment 16.  They contain 
the above patch, which add separates the new reset workaround from the old 
one, in hopes of making both Dell Latitude laptops happy... 

Comment 42 Dan Williams 2005-10-17 14:29:49 UTC
re comment 41:

2.6.9-22.3.EL.jwltest.75 works correctly for all cases that I tried on my
Latitude LS with an NM256av.  It does not panic on module unload, and it
correctly produces sound after module reload as well.

So the newest patch in .75 doesn't make the older workaround unhappy like .74. 
If .75 fixes the issues with the NM256zx chipset too, then I think we're all
good here.


Comment 43 John W. Linville 2005-10-17 14:45:49 UTC
Unfortunately, now this laptop's hard drive is making clicking sounds (and it 
won't boot)... 
 
Andrew, could you give the latest kernels at the location from comment 16 a 
try to verify that it is working.  Per the previous question, I have lost the 
ability to verify my own patch in this regard... :-( 

Comment 44 Andrew Meredith 2005-10-18 18:09:12 UTC
Created attachment 120131 [details]
Oops logs from two lockups

Comment 45 John W. Linville 2005-10-18 18:31:23 UTC
Hmmm...that sucks, but it also looks like a different problem... 
 
How often does that happen?  Is there some sequence to reproduce it? 

Comment 46 Andrew Meredith 2005-10-18 18:33:30 UTC
I have now rebuilt my Latitude CSx and installed your kernel named
kernel-2.6.9-22.3.EL.jwltest.76

The snd-nm256 module now loads and unloads, repeatedly, without incident :)

However, it plays with a great deal of interference (a pulsing at about 5 Hz)
under xmms and mpg321, although mplayer seems to work fine.

More worryingly, under xmms and mpg321, the machine locks up when the playback
ends and (I assume) the sound device is closed. NB both xmms and mplayer are set
for alsa mode. This lockup is pretty reliable (9/10) and happens in both
runlevels 3 and 5. The machine becomes entirely unresponsive (no ping or console
response); also the second and third status LEDs (Caps lock and the one to it's
right) start flashing together twice per second. 

Please note attached oops log extracts, contemporary with the lockups.

One thing I have noticed though, is that there is no kern.info output that shows
your second workaround is loading. The clause:

+	if (reset_workaround_2[dev]) {
+		snd_printdd(KERN_INFO "nm256: reset_workaround_2 activated\n");
+		chip->reset_workaround_2 = 1;
+	}

.. in the patch suggests that there should be.

For paranoia, I have created a syslog stream purely for kern.=info and nothing
from the snd-nm256 module is logged.


Comment 47 John W. Linville 2005-12-15 20:49:19 UTC
It is hard to debug this now, since I had to return the (now dead) hardware   
that I had...   
   
Perhaps you can get a   
  
   sysctl -w kernel.sysrq=1  
  
Then, create the hang (probably bets from a virtual console instead of from  
X).  Once it is hung, hold-down ALT+SysReq+P (note the EIP if nothing else) 
and maybe ALT+SysReq+W.  It is probably easiest to capture the results over a 
serial console. 
 
I'm sorry!  I wish I had a simpler/better idea... :-) 

Comment 48 Andrew Meredith 2005-12-16 11:20:29 UTC
I have a simpler idea still.

Lets run with that patch and close the report as fixed :-)

I have been beating seven shades of shellac out of this laptop since I installed
the above kernel. It was only last night that I finally managed to crash it
again and that was after being used as a media console for several weeks.
Frankly, if it can stay that sort of punishment for that long and only crash
after several weeks, it is doing better than some other platforms we could mention.

Thanks and well done.

Comment 49 John W. Linville 2005-12-16 19:29:21 UTC
Well...if you are happy, then I am happy... :-) 

Comment 50 John W. Linville 2006-03-10 13:45:48 UTC
Andrew, I have taken another nm256 update.  This is mostly just for due 
diligence, but would you mind giving it a sanity check?  I no longer have 
nm256 hardware... 
 
The kernels are here: 
 
   http://people.redhat.com/linville/kernels/rhel4/ 
 
Please give them a try and post the results here...thanks! 

Comment 51 Dan Williams 2006-03-10 14:48:33 UTC
John, is there a pointer to the patch that you've added somewhere?  I'm now
having issues with hangs in nm256 on the Latitude LS on _rawhide_, so if you
took an upstream ALSA update or something it may be a problem.  I'll try your
RHEL4 test kernel when I'm back in the office Monday though.

Comment 52 Dan Williams 2006-03-10 14:49:58 UTC
Note that the patch you list on your people.redhat.com page doesn't seem to exist:

http://people.redhat.com/linville/kernels/rhel4/patches/jwltest-nm256-2_6_16-rc5.patch

returns 404

Comment 53 John W. Linville 2006-03-10 15:20:53 UTC
Sorry about that...my automated process isn't automated enough... :-(  
  
The patch is there now...thanks!  

Comment 54 Andrew Meredith 2006-03-12 18:36:56 UTC
I have been running 2.6.9-34.1.EL.jwltest.121 for a couple of days now, using
mplayer and xine to play mp3s and vids. All seems well. It boots clean and stays
stable.

Good job :-)

When does this get into FC3/4 and RHEL ?

Comment 55 John W. Linville 2006-03-13 18:25:18 UTC
Dan, have you had a chance to try my rhel4 test kernels? 

Comment 56 Dan Williams 2006-03-13 18:41:23 UTC
Still working on that; tried RHEL4 install this morning but it borked due to
anaconda LVM bugs when installing over an old installation.  Will try to get
that done today.

Comment 58 Dan Williams 2006-03-14 17:46:48 UTC
On the Latitude LS, the latest kernel you've posted appears to work fine.

2.6.9-34.2.EL.jwltest.123

Go for it.

Comment 59 Jason Baron 2006-04-20 14:44:38 UTC
committed in stream U4 build 34.20. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 62 Andrew Meredith 2006-05-27 23:22:27 UTC
Apologies for not getting back sooner. This kernel works fine against the sound
chip induced lockups. Good work.

In which versions of the RHEL4 and FC kernels does this patch make it's debut.

Comment 63 Dan Williams 2006-05-28 02:24:26 UTC
For RHEL, you'll see this bug closed with a message stating that the fix has
been incorporated into a RHEL quarterly update.

For Fedora, I believe that it's fixed in the current kernels for FC5, at least?

Comment 65 Red Hat Bugzilla 2006-08-10 21:07:42 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html



Note You need to log in before you can comment on or make changes to this bug.