Bug 326411
Summary: | Freeze On Boot w/ Audigy PCMCIA | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Edoardo Patelli <etheban> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 8 | CC: | danieleg4, etheban, jarin.franek, kelvinhbo4, redhat-bugzilla, roos, triplehaata, ttn | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 2.6.25.6-27.fc8 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-06-20 19:05:02 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Edoardo Patelli
2007-10-10 14:44:44 UTC
audigy 2 zs pcmcia works fine on (fedora7 + yum update) but freezes on (fedora8 x86_64 + yum update) on my laptop. I think it is about the kernel (regardless of Fedora version): 2.6.20 and later 2.6.22.x worked well 2.6.21 and 2.6.23 have the boot issue with the SB PCMCIA I have got i686 architecture (HP nx7000 notebook), so it is not x86_64 specific. Hot-plug/unplug never worked (kernel Oops, panic). Yes, I verified on my laptop HP dv8051 turion 64: 2.6.23 have boot issue with audigy pcmcia regardless fedora version (fedora7/8) and architecture (i386/x86_64) 2.6.22 work well (In reply to comment #2) > I think it is about the kernel (regardless of Fedora version): Same freeze on boot sequence (at udev) (with Audigy 2 ZS notebook inserted) and freeze on insert when on desktop (ie. hot-plug). Boot continues if csrd is removed. SOn desktop unfreese when unplugged - after ahat a lot of lines like: --- Jan 4 20:54:04 localhost kernel: snd-emu10k1: Suspected sound card removal Jan 4 20:55:05 localhost kernel:last message repeated 4004 times Jan 4 20:56:06 localhost kernel:last message repeated 4023 times Jan 4 20:57:07 localhost kernel:last message repeated 4018 times Jan 4 20:58:08 localhost kernel:last message repeated 4216 times Jan 4 20:59:09 localhost kernel:last message repeated 5319 times --- in /var/log/messages until next reboot. Tried additional kernel parameters: cbmemsize=256M cbiosize=4096. No noticeable change. System imformation: Dell latitude D800/kernel-2.6.23.9-85.fc8 $ cat /proc/asound/version Advanced Linux Sound Architecture Driver Version 1.0.15 (Tue Nov 20 19:16:42 2007 UTC). Rgds, Tero PS. Used to work ok for me too under Fc6. Failed with fc7 and now fails with fc8. Created attachment 294384 [details]
log file of soundcard detection
After the last update of F8 I am able to start the Laptop with the PCMCIA cards
plugged in. Also the hot-plug seems to work. Moreover it seems work with all
the kernel version 2.6.23.* and 2.6.24.*
However the sound card is still not working. I added the log file generated by
the soundcard detection program.
To Edoardo: If I were you I would first took care of the broken partition (see dmesg) EXT3-fs warning: mounting unchecked fs, running e2fsck is recommended EXT3 FS on sda5, internal journal ... EXT3-fs error (device sda5): htree_dirblock_to_tree: bad entry in directory #10246640: rec_len % 4 != 0 - offset=0, inode=1852793951, rec_len=101, name_len=115 After securing your data, please attend to CPUs: I noticed that you have two CPU cores, which may be different situation to having only single CPU (my case). If one of your CPUs is stuck, the other still executes kernel threads. Is it possible that you boot up with one CPU stuck and the other taking all the burden of running the system? Just a crazy idea... Check top or gkrellm. Now a bit more seriously: I saw a problem with initializing and probing the card cannot find the slot for index 0 (range 0-7), error: -16 EMU10K1_Audigy: probe of 0000:0b:00.0 failed with error -12 The first one is error 16 (EBUSY) from snd_card_new() and it means that you want to create a snd_card kernel object at slot 0, however that slot is locked at the moment (snd_cards_lock bit is set). Looks like a nice synchronization issue :-) Maybe I will have some time to look at it... Anyway, the result is that the kernel object is freed and NULL pointer returned. Since this was called from the probe and it gets NULL pointer instead of the pointer to proper snd_card object, the probe thinks that there was not enough memory to allocate the object and returns 12 i.e. ENOMEM. Wow, no snd_card object, no sound. That may be the reason why your SB appears to be dead. Thanks Jaroslav for your comments. I have also tested the Fedora 9 (rawhide) but nothing changed. The system still freezing during the booting at the udev ! (regardless of Fedora version) Will we see one day the solution of this bug? Add another vote for me: Hangs at udev on startup; card insertion hangs system too. Kernel 2.6.24.3-34.fc8 on an x86_64 laptop (Ferrari 4000) I took few kernel rpms from koji.fedoraproject.org to find an approximate origin of the bug. Here is the result: latest working kernel: kernel-2.6.23-0.15.rc0.git1.fc8 first broken kernel: kernel-2.6.23-0.35.rc0.git6.fc8 If anything happened wrong then it must have been between them. All later kernels, including Fedora 9's 2.6.25 froze at boot with Audigy plugged in. Building vanilla kernels using Fedora .config(s) I got rougher interval: kernel OK: 2.6.22 kernel broken: 2.6.23-rc1 and later It is difficult to bisect vanilla kernels, since Fedora changed .config too between 2.6.22 and 2.6.23. Too many possibilities to test. Takes ages to build a kernel on my aging laptop... kernel-2.6.23-0.15.rc0.git1.fc8 only /seems/ to work because it is not in fact loading any modules. If you follow the boot process, you'll see that it attempts and fails to find /lib/modules/2.6.22-0.15.rc0.git1.fc8/modules.dep (note: 2.6.22 is the /wrong/ version). Why it does this, I have no idea, but looking at the spec there seems to be a lot of hacky stuff going on WRT modifying the version number based on whether or not it is a release package. If you do: ln -s /lib/modules/2.6.23-0.15.rc0.git1.fc8 /lib/modules/2.6.22-0.15.rc0.git1.fc8 Then reboot with the Audigy card still inserted, you'll find that it hangs just as before. Like others here, I skipped Fedora 7, and upgraded (clean) from 6 to 8, so I have no idea exactly when this broke. I'll just have to keep regressing back through older kernels until I find one that works, I suppose. You are right. I was overexcited to get it past the udev phase so I forgot to check dmesg. Sorry for the misinformation. None of 2.6.23 or later Fedora kernels work for me... In the meantime, the bisection of vanilla kernel led me into the set of CFS patches in the very beginning of the 2.6.23 development stage. This is in accordance with the fact the kernel-2.6.23-0.15.rc0.git1.fc8 is not booting. Few more steps to go, but I am afraid that the CFS scheduler only triggered a landmine somewhere else in the code. I am still on 2.6.22.9-91.fc7, the last Fedora release kernel that boot with the SB Audigy ZS Notebook plugged in. I even tried the Fedora 9 Preview live DVD, kernel 2.6.25 I guess, but boot froze in udev, too. I'm going to do a prep-build of the 2.6.22.9-91.fc7 and 2.6.23.1-10.fc7 SRPMS, then diff the relevant sections(emu10k1), to see if anything obvious changed, but if the bug is a knock-on effect from something else (e.g. scheduler) then I'm out of my depth. The comments from kernel bugzilla 9304 seem to indicate that this is a timing issue. The latest news: It is not the CFS scheduler who triggered the landmine. It is most probably different kernel .config. The whole bisection led me back to 2.6.22. I tried to compile vanilla 2.6.22 with the Fedora config: config-2.6.22.9-91.fc7. The resulting kernel boot successfully and I am using it just at the moment. While I tried .config taken from Fedora 2.6.23-rc0.something, modified slightly during bisection, the very same code of 2.6.22 compiles into a kernel that hangs in udev during boot the same way the Fedora release kernels 2.6.23+ do. This gives the search another dimension, now it is necessary to carefully examine differences in .configs and find the offending one (or more). More kernel builds, my laptop would love me... It is hot already :-) I reckon you do not need to examine the source differences between 2.6.22.9 and 2.6.23.1. It won't give you the answer. There were couple of thousands of patches between them... The problem from kernel bugzilla 9304 should be fixed in 2.6.24 (-rc4?). I tried with Fedora 8 kernel 2.6.24.3 on another notebook, but kernel froze during boot as usual (udev stage). Got it. Well, almost... The offending configuration item is: CONFIG_DEBUG_SHIRQ The flag was activated for 2.6.23 and later Fedora kernels. It causes the landmine to explode. WORKAROUND (NOT A SOLUTION): For you guys who want to make use of the Audigy with your laptop there is a simple workaround till the proper solution is available: When the Fedora kernel is rebuilt with the CONFIG_DEBUG_SHIRQ flag not set, the Soundblaster Audigy ZS notebook no longer freezes the boot, or the system when hot-plug in/out. I have tested this with both Fedora 7 (2.6.23.17-82, i686) and Fedora 8 (2.6.24.7-92, x86_64) kernels. Note that this is 'probabilistic' approach: it may still explode, but the probability is rather insignificant. WHAT IS GOING ON: There is an inherent synchronization gap when creating a device object that uses a shared interrupt (irq) handler. The handler is registered from "device_create" function, could be somewhere in the middle when the device object is not fully constructed. However, once the handler is registered, it can be triggered anytime (other devices may generate interrupts). The Linux calls all handlers passes them the appropriate device object and lets them decide whether to handle the irq or not (usually by polling some device's register). The irq handler must be able to cope with such situation, where the device object is not fully constructed (upon device plugin, boot) or partially destroyed (upon unplugging). The CONFIG_DEBUG_SHIRQ inserts a certain code that triggers the irq handler immediately after registration within request_irq() call. It seems that this 'early' interrupt gives the system a fatal blow when Audigy 2 ZS Notebook PCMCIA card is involved. I still do some research to pinpoint the location of the landmine more precisely. OTHER INFO: The Linux kernel bug #9304 seems to be not relevant. It was fixed in 2.6.24, but did not help to resolve this Audigy boot issue. I could not test it on newer kernels, e.g. Fedora 9's 2.6.25. If anyone could please post your results. This config also causes problems elsewhere, e.g. bug #362621 which was also filed upstream by Chuck: http://kerneltrap.org/mailarchive/linux-kernel/2008/2/29/1032864 I take it upstream will handle this eventually, or should we look at patching emu10k1? Would just disabling CONFIG_DEBUG_SHIRQ cause too many problems elsewhere? Wicked as it sounds, but the CONFIG_DEBUG_SHIRQ is there to cause such 'problems' on purpose, to force as many as possible device drivers to be fixed. Disabling the CONFIG_DEBUG_SHIRQ would be counterproductive as it will allow certain drivers' sync bugs to live happily ever after and strike unexpectedly. The way to go is to patch emu10k1 (provided the problem is there, but the probability is quite high). Last night I have pinpointed the freeze point to be in sound/pci/emu10k1/irq.c, snd_emu10k1_interrupt(): while (((status = inl(emu->port + IPR)) != 0) && (timeout < 1000)) { Early reading from the I/O port seems to be the problem. I scheduled few more experiments to be sure to blame the emu10k1 setup routine (there is also some pci stuff involved...). Fedora 9 x86_64 2.6.25-3-18 with kernel option CONFIG_DEBUG_SHIRQ off works. Thank you so much Jaroslav!! I can confirm that hot-plugging works as well. Glad to hear that :-) After few more nasty experiments I am sure to blame the emu10k1 setup routine. Now I have a pretty good idea what is going on. I have created and tested a patch. It works on Fedora 7 (2.6.23-17, i686, single-core), but without hot-plug - it Oopses like 2.6.22 kernels did. But on 2.6.25 the hot-plug is no longer experimental feature so I hope it would work well. Tomorrow I am going to test it on Fedora 8 (2.6.24.7, x86_64, dual-core), and if it passes the tests I will submit the patch to upstream for proper flaming. Lets hope this longstanding issue gets resolved soon. I should probably file a new bug on this, but IMHO CONFIG_DEBUG_SHIRQ should not be set on production/release kernels, since it is deliberately designed to break non-compliant drivers. Now certainly this is desirable from a development POV, and those buggy drivers will be be identified and fixed more quickly with this set, since the entire userbase will be exposed to it, but I question the prudence of deliberately breaking end users (non testers) systems. I'm inclined to suggest that this config should be reserved for rawhide and release candidates only. Opinions? Good news: My patch was accepted by ALSA maintainers and merged into Linux kernel upstream in 2.6.26-rc5-git2. The patch moves the irq handler registration after the I/O ports activation, thus eliminating the hangup in the irq handler. So now the SB Audigy2 PCMCIA works well even with the CONFIG_DEBUG_SHIRQ set to 'y'. To Fedora guys: If you decide to to silence the CONFIG_DEBUG_SHIRQ flag, there is no urgent need for the patch in Fedora 8, 9, as the probability of system hangup is decently low. Otherwise you can grab the patch here: http://git.alsa-project.org/?p=alsa-kernel.git;a=commitdiff;h=7d87500bbe68c2176197d039f3655301ad678db6;hp=bbdb913e91f0c2b301f124200f85f96fffb5c7ed Anyway, you now have enough information to deal with this bug quickly ;-) Nice work :) +1 submit for testing. I'm pretty new to Linux, so does comment 20 mean that come kernel 2.6.26 the problem should be taken care of? Fixed in 2.6.25.6-21 kernel-2.6.25.6-24.fc8 has been submitted as an update for Fedora 8 kernel-2.6.25.6-24.fc8 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-5267 kernel-2.6.25.6-27.fc8 has been submitted as an update for Fedora 8 kernel-2.6.25.6-27.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report. |