From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.7.8) Gecko/20050512 Red Hat/1.0.4-1.4.1 Firefox/1.0.4 Description of problem: As soon as the snd-nm256 module is modprobed, the laptop locks up hard. Version-Release number of selected component (if applicable): kernel-2.6.9-5.0.5.EL How reproducible: Always Steps to Reproduce: 1. modprobe snd-nm256 2. there is no 2) 3. see 2) ;) Additional info:
See also bug number 121601
Andrew: Can you also post the output of "lspci -n" ? I'm also assuming that your lspci output from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121601#c8 still stands as correct, right?
Sorry, yes indeed, same box .. and here's the "lspci -n" output again for the record. 00:00.0 Class 0600: 8086:7190 (rev 03) 00:01.0 Class 0604: 8086:7191 (rev 03) 00:03.0 Class 0607: 104c:ac1c (rev 01) 00:03.1 Class 0607: 104c:ac1c (rev 01) 00:07.0 Class 0680: 8086:7110 (rev 02) 00:07.1 Class 0101: 8086:7111 (rev 01) 00:07.2 Class 0c03: 8086:7112 (rev 01) 00:07.3 Class 0680: 8086:7113 (rev 03) 01:00.0 Class 0300: 10c8:0006 01:00.1 Class 0401: 10c8:8006 06:00.0 Class 0200: a727:0013 (rev 01) For those searching the database, the lspci output (no -n) for the NM256 lines are: 01:00.0 VGA compatible controller: Neomagic Corporation NM2360 [MagicMedia 256ZX] 01:00.1 Multimedia audio controller: Neomagic Corporation NM2360 [MagicMedia 256ZX Audio]
Ok, good. If the patch in Bug 121601 fixes the issue for patrick, it will probably fix your issue here as well and then we can investigate getting the fix into the RHEL4 kernel.
Created attachment 114916 [details] Proposed patch for Latitude CSx against RHEL4 U1 2.6.9-9.EL kernel Note: This code cleans up the workaround detect section a bit too. There were duplicate if() statements that both worked around the bug for the Latitdue LS (0x1028 0x0080) which could be condensed. The patch needs to be applied _after_ linux-2.6.0-compile.patch since that patch actually adds this workaround for the Latitude LS chip.
Nice one. Before I reach for my compiler, is it likely to work with kernel-2.6.9-6.37.EL, (which is the latest available on RHN) until U1 is released? Thanks
andrew: yes, it most likely will.
andrew: if you could try the patch out and see if it works, that would be great. A report from the other bug (bug 121601) says the patch didn't fix the issue, but lets make sure.
Sadly no. I'm afraid it didn't. I have mapped the sound modules out of the boot sequence so the machine cab be used. When I modprobed the patched snd-nm256 module, it locked solid.
I wonder has anyone ever considered a sort of "die-loudly=1" switch, that causes the module to log a load of debug info about internal state and then exit gracefully before actually doing anything? If nothing else, it might be useful to see the states of the various chip workarounds, just in case there's a logic issue in there if for no other reason. I haven't touched this sort of code in ages, but I would be happy to build and test if someone can wing me a patch.
Is there anything brewing on this issue, or should we abandon using this series of Dell laptops with RHEL4?
I never saw a report (either here or in bug 121601) about the results of trying "vaio_hack=1" as a module parameter. What effect (if any) did that have?
I can confirm that vaio_hack=1 makes no difference. Sorry
I have test kernels w/ an updated snd-nm256 driver available here: http://people.redhat.com/linville/kernels/rhel4/ Please give those a try and post the results...thanks! BTW, if you still experience problems, you may try adding "reset_workaround=1" as a module parameter.
Sound did work briefly during one session, but for the most part, modprobe nm-256 still locks the machine solid .. with and without "reset_workaround=1". Sorry
Hmmm...I don't suppose you (or anyone else) would like to send me an example of this hardware? :-)
Surely with Red Hat's 'Special Relationship' with Dell, you can get one from them ;-)
Alan: you mentioned yesterday on IRC that you had a thinkpad 600 with a NeoMagic chip. Does it possibly use that for sound as well, and have you run into hangs with the nm256 module ever on it? If not, sorry for the bugspam...
I dont have neomagic audio just the older video. The Dell and some vaio devices do have slightly odd neomagic setups - the low bts of 0x6cc (believed to be GPIO) are used for something else and kill the box if set. Can you add a printk into snd_nm256_write* inlines and read* inlines that prints port/values, recompile the kernel and boot into run level 3 so you get a trace of each access and let me know where it dies
Hmmm...well, after tracking-down such a laptop, and figuring-out how to install RHEL4 on it (I hope you have a cdrom drive for yours!)...the sound works fine. Just to be sure, I verified that my lspci -n output matched yours from comment 3 (except for the NIC). Which begs the question, have you tried a different NIC? Have you tried RHEL4 U2?
That'd be about right wouldn't it :) I have used all sorts of different NICs. The freeze came with the actual insertion of the sound module. Maybe if you tugged and reinserted it might freeze. Works every time on my machine. The laptop was built with U0, but was fully updated using up2date.
Well, the "good" news is that I am seeing some of this behaviour -- just not all the time... I'll have to get back to you... :-)
Grrr...the box shut itself down for no apparent reason, and now it won't turn back on... I don't know if I'll be able to revive this box or not. If not, I may not be able to fix this. FWIW, it did appear to be successful using the kernels currently at the location from comment 16. However, it didn't stay-up long enough to give any real confidence. Any chance you'd like to try the latest kernels from there?
As I was running out of excuses for not showing my customers the (heavily sound involved) product on my laptop, I have had to hock the cat and buy myself a new one. I would still be interested in getting the old one to function properly as a standby. I am happy to go round the loop if you have done anything that might affect the functionality of the snd-nm256 module. If nothing has been done that might fix things then I'm afraid I can't really justify that time. So .. have you :)
I believe I have... (I somehow revived my box, btw...) Some modifications to the existing reset workaround have resulted in my box being able to load/use/unload the module many thousands of times without a lock-up. Test kernels are available at the same location as in comment 16. Please give them a try and post the results...thanks!
Created attachment 119913 [details] jwltest-nm256-quirk.patch
jwl: you want me to test this on a Dell Latitude LS to make sure it works without those snd_nm256_writeb() calls? It normally requires the reset workaround on this machine and since those two calls get removed for reset workaround case, they affect the codepath for this box...
Yes, that would be quite welcome...thanks!
Procedure with kernel from Comment 16: 1) Boot to desktop 2) Play a clip in RealPlayer 3) Adjust volume 4) WORKS 5) quit RealPlayer 6) unload module (requires logout because something is using it) 7) log back in 8) Play a clip in RealPlayer 9) NO SOUND It appears that I can't get sound back on this box after unloading the module. To isolate the issue and make sure its not RealPlayer, could I simply test with "cat /dev/random > /dev/xxxx"? What should xxxx be? I just tried the test again twice, and both times it hardlocked when logging back in. Console 1 didn't print anything before the lock, and there's nothing in /var/log/messages to indicate a panic. So, I tried with stock RHEL4U2 kernel (2.6.9-22.EL I think). Unfortunately, it panics on unload. Note that your kernel from Comment 16 does _not_ panic on unload.
Panic from stock RHEL4 U2 2.6.9-22.EL on module unload: Oct 13 14:42:29 dhcp83-31 kernel: Unable to handle kernel paging request at virtual address d0818a04 Oct 13 14:42:29 dhcp83-31 kernel: printing eip: Oct 13 14:42:29 dhcp83-31 kernel: d097504c Oct 13 14:42:29 dhcp83-31 kernel: *pde = 0fd1b067 Oct 13 14:42:29 dhcp83-31 kernel: Oops: 0000 [#1] Oct 13 14:42:29 dhcp83-31 kernel: Modules linked in: parport_pc lp parport autofs4 sunrpc ds ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables md5 ipv6 yenta_socket pcmcia_core uhci_hcd snd_nm256 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd soundcore 3c59x mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod Oct 13 14:42:29 dhcp83-31 kernel: CPU: 0 Oct 13 14:42:29 dhcp83-31 kernel: EIP: 0060:[<d097504c>] Not tainted VLI Oct 13 14:42:29 dhcp83-31 kernel: EFLAGS: 00010282 (2.6.9-22.EL) Oct 13 14:42:29 dhcp83-31 kernel: EIP is at snd_nm256_ac97_ready+0x17/0x3d [snd_nm256] Oct 13 14:42:29 dhcp83-31 kernel: eax: d0818a04 ebx: cfee53a0 ecx: 00009f9f edx: 00000002 Oct 13 14:42:29 dhcp83-31 kernel: esi: 00000009 edi: 00000800 ebp: 00000a04 esp: c2c5fe64 Oct 13 14:42:29 dhcp83-31 kernel: ds: 007b es: 007b ss: 0068 Oct 13 14:42:29 dhcp83-31 kernel: Process rmmod (pid: 3432, threadinfo=c2c5f000 task=ce30c0d0) Oct 13 14:42:29 dhcp83-31 kernel: Stack: cfee53a0 00000001 00000600 00009f9f d09750cd 00000002 d0984a74 c126c200 Oct 13 14:42:29 dhcp83-31 kernel: 00000002 00009f9f d096314d c126c200 c126c200 00002fff c2c5f000 d09663c7 Oct 13 14:42:29 dhcp83-31 kernel: c126c200 cdc76000 d0964517 cd8219c0 d08dd330 cdc76000 00002000 d08dd4b8 Oct 13 14:42:29 dhcp83-31 kernel: Call Trace: Oct 13 14:42:29 dhcp83-31 kernel: [<d09750cd>] snd_nm256_ac97_write+0x20/0x4e [snd_nm256] Oct 13 14:42:29 dhcp83-31 kernel: [<d096314d>] snd_ac97_write+0x53/0x58 [snd_ac97_codec] Oct 13 14:42:29 dhcp83-31 kernel: [<d09663c7>] snd_ac97_powerdown+0x19/0x69 [snd_ac97_codec] Oct 13 14:42:29 dhcp83-31 kernel: [<d0964517>] snd_ac97_dev_free+0xb/0x13 [snd_ac97_codec] Oct 13 14:42:29 dhcp83-31 kernel: [<d08dd330>] snd_device_free+0x65/0x8b [snd] Oct 13 14:42:29 dhcp83-31 kernel: [<d08dd4b8>] snd_device_free_all+0x3b/0x4b [snd] Oct 13 14:42:29 dhcp83-31 kernel: [<d08d8cfc>] snd_card_free+0x135/0x1b5 [snd] Oct 13 14:42:29 dhcp83-31 kernel: [<c01ab777>] sysfs_hash_and_remove+0xda/0x106 Oct 13 14:42:29 dhcp83-31 kernel: [<d0975aee>] snd_nm256_remove+0xc/0x16 [snd_nm256] Oct 13 14:42:29 dhcp83-31 kernel: [<c01ec1db>] pci_device_remove+0x16/0x28 Oct 13 14:42:29 dhcp83-31 kernel: [<c024a39b>] device_release_driver+0x3c/0x46 Oct 13 14:42:29 dhcp83-31 kernel: [<c024a3bd>] driver_detach+0x18/0x1f Oct 13 14:42:29 dhcp83-31 kernel: [<c024a74d>] bus_remove_driver+0x48/0x75 Oct 13 14:42:29 dhcp83-31 kernel: [<c024ab13>] driver_unregister+0xc/0x31 Oct 13 14:42:29 dhcp83-31 kernel: [<c01ec398>] pci_unregister_driver+0xb/0x13 Oct 13 14:42:29 dhcp83-31 kernel: [<c013b705>] sys_delete_module+0x132/0x179 Oct 13 14:42:29 dhcp83-31 kernel: [<c015a6ad>] unmap_vma_list+0xe/0x17 Oct 13 14:42:29 dhcp83-31 kernel: [<c015aa5c>] do_munmap+0x1c8/0x1d2 Oct 13 14:42:29 dhcp83-31 kernel: [<c030f91f>] syscall_call+0x7/0xb Oct 13 14:42:29 dhcp83-31 kernel: Code: 7a ef c7 83 c0 00 00 00 00 00 00 00 b8 01 00 00 00 5b 5e c3 55 57 56 be 09 00 00 00 53 8b 68 3c 89 c3 0f b7 78 40 8b 43 04 01 e8 <0f> b7 00 85 f8 75 07 b8 01 00 00 00 eb 13 b8 bc 8d 06 00 e8 ce Oct 13 14:42:29 dhcp83-31 kernel: <0>Fatal exception: panic in 5 seconds
Created attachment 119994 [details] jwltest-nm256-quirk.patch
OK, lets try the new kernels at the location from comment 16. They contain the above patch, which add separates the new reset workaround from the old one, in hopes of making both Dell Latitude laptops happy...
re comment 41: 2.6.9-22.3.EL.jwltest.75 works correctly for all cases that I tried on my Latitude LS with an NM256av. It does not panic on module unload, and it correctly produces sound after module reload as well. So the newest patch in .75 doesn't make the older workaround unhappy like .74. If .75 fixes the issues with the NM256zx chipset too, then I think we're all good here.
Unfortunately, now this laptop's hard drive is making clicking sounds (and it won't boot)... Andrew, could you give the latest kernels at the location from comment 16 a try to verify that it is working. Per the previous question, I have lost the ability to verify my own patch in this regard... :-(
Created attachment 120131 [details] Oops logs from two lockups
Hmmm...that sucks, but it also looks like a different problem... How often does that happen? Is there some sequence to reproduce it?
I have now rebuilt my Latitude CSx and installed your kernel named kernel-2.6.9-22.3.EL.jwltest.76 The snd-nm256 module now loads and unloads, repeatedly, without incident :) However, it plays with a great deal of interference (a pulsing at about 5 Hz) under xmms and mpg321, although mplayer seems to work fine. More worryingly, under xmms and mpg321, the machine locks up when the playback ends and (I assume) the sound device is closed. NB both xmms and mplayer are set for alsa mode. This lockup is pretty reliable (9/10) and happens in both runlevels 3 and 5. The machine becomes entirely unresponsive (no ping or console response); also the second and third status LEDs (Caps lock and the one to it's right) start flashing together twice per second. Please note attached oops log extracts, contemporary with the lockups. One thing I have noticed though, is that there is no kern.info output that shows your second workaround is loading. The clause: + if (reset_workaround_2[dev]) { + snd_printdd(KERN_INFO "nm256: reset_workaround_2 activated\n"); + chip->reset_workaround_2 = 1; + } .. in the patch suggests that there should be. For paranoia, I have created a syslog stream purely for kern.=info and nothing from the snd-nm256 module is logged.
It is hard to debug this now, since I had to return the (now dead) hardware that I had... Perhaps you can get a sysctl -w kernel.sysrq=1 Then, create the hang (probably bets from a virtual console instead of from X). Once it is hung, hold-down ALT+SysReq+P (note the EIP if nothing else) and maybe ALT+SysReq+W. It is probably easiest to capture the results over a serial console. I'm sorry! I wish I had a simpler/better idea... :-)
I have a simpler idea still. Lets run with that patch and close the report as fixed :-) I have been beating seven shades of shellac out of this laptop since I installed the above kernel. It was only last night that I finally managed to crash it again and that was after being used as a media console for several weeks. Frankly, if it can stay that sort of punishment for that long and only crash after several weeks, it is doing better than some other platforms we could mention. Thanks and well done.
Well...if you are happy, then I am happy... :-)
Andrew, I have taken another nm256 update. This is mostly just for due diligence, but would you mind giving it a sanity check? I no longer have nm256 hardware... The kernels are here: http://people.redhat.com/linville/kernels/rhel4/ Please give them a try and post the results here...thanks!
John, is there a pointer to the patch that you've added somewhere? I'm now having issues with hangs in nm256 on the Latitude LS on _rawhide_, so if you took an upstream ALSA update or something it may be a problem. I'll try your RHEL4 test kernel when I'm back in the office Monday though.
Note that the patch you list on your people.redhat.com page doesn't seem to exist: http://people.redhat.com/linville/kernels/rhel4/patches/jwltest-nm256-2_6_16-rc5.patch returns 404
Sorry about that...my automated process isn't automated enough... :-( The patch is there now...thanks!
I have been running 2.6.9-34.1.EL.jwltest.121 for a couple of days now, using mplayer and xine to play mp3s and vids. All seems well. It boots clean and stays stable. Good job :-) When does this get into FC3/4 and RHEL ?
Dan, have you had a chance to try my rhel4 test kernels?
Still working on that; tried RHEL4 install this morning but it borked due to anaconda LVM bugs when installing over an old installation. Will try to get that done today.
On the Latitude LS, the latest kernel you've posted appears to work fine. 2.6.9-34.2.EL.jwltest.123 Go for it.
committed in stream U4 build 34.20. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
Apologies for not getting back sooner. This kernel works fine against the sound chip induced lockups. Good work. In which versions of the RHEL4 and FC kernels does this patch make it's debut.
For RHEL, you'll see this bug closed with a message stating that the fix has been incorporated into a RHEL quarterly update. For Fedora, I believe that it's fixed in the current kernels for FC5, at least?
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html