Description of problem: This is for a Thinkpad Z60m upgraded from Fedora 7, though I suspect the hardware doesn't matter for this bug. When this laptop has been hibernated and then restored, sound stops working. On closer inspection I see that in fact the ACLs that permit me to use the sound device have been removed. I don't know which component is responsible, so udev is my first guess. Please pass this on to another component if you know better. Version-Release number of selected component (if applicable): udev-116-3.fc8 How reproducible: Absolutely reliable so far Steps to Reproduce: 1. Hibernate the laptop 2. Restore the laptop 3. Check access to sound device with e.g. getfacl /dev/snd/pcmC0D0c Actual results: No ACL for my user # file: dev/snd/pcmC0D0c # owner: root # group: root user::rw- user:gdm:rw- group::rw- mask::rw- other::--- Expected results: Permissive ACL for my (logged in) user # file: dev/snd/pcmC0D0c # owner: root # group: root user::rw- user:gdm:rw- user:njl:rw- group::rw- mask::rw- other::--- Additional info: Sound worked after restoring in Fedora 7, but I don't know whether ACLs were used in that release.
udev does not set the ACLs
I have the same problem, but sound works OK for root. This is almost certainly a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=376011 When the problem occurs, gnome-volume-control reports "No volume control GStreamer plugins and/or devices found." as in the above bug. But sound works for root. E.g. luser@system$ aplay /usr/share/sounds/startup3.wav ALSA lib confmisc.c:768:(parse_card) cannot find card '0' ALSA lib conf.c:3510:(_snd_config_evaluate) function snd_func_card_driver returned error: No such device ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:3510:(_snd_config_evaluate) function snd_func_concat returned error: No such device ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name ALSA lib conf.c:3510:(_snd_config_evaluate) function snd_func_refer returned error: No such device ALSA lib conf.c:3982:(snd_config_expand) Evaluate error: No such device ALSA lib pcm.c:2145:(snd_pcm_open_noupdate) Unknown PCM default aplay: main:546: audio open error: No such device luser@system$ sudo aplay /usr/share/sounds/startup3.wav Playing WAVE '/usr/share/sounds/startup3.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
*** Bug 376011 has been marked as a duplicate of this bug. ***
Does this problem go away if you switch to VT1 and then back? (from a root shell in the session you can do 'chvt 1; sleep 2; chvt 7')
(In reply to comment #4) > Does this problem go away if you switch to VT1 and then back? > Yes!
yup, that fixes it here too.
Yes, the ACLs are put back when I switch away and back again, thanks David. Also I discovered that this isn't 100% reproducible after all. Sometimes it doesn't happen. I've yet to pin down what makes the difference.
Glad to hear the VT switching "fixes" it. I've been working on and off for a fix but it's pretty difficult to reproduce this one...
Same problem and fix here too. Maybe this bug qualifies as a "known issue" (https://fedoraproject.org/wiki/Bugs/F8Common)? I'm sure many other laptop users are facing the same problem.
*** Bug 395581 has been marked as a duplicate of this bug. ***
*** Bug 314411 has been marked as a duplicate of this bug. ***
David: I can reporoduce it in roughly 30% of cases. It doesn't only happen after resumes, but also on console switches (though less frequently), with fast user swhitching your chances to reproduce it are higher. Also, I observed that it happens more frequently on some machines, and less frequently on other ones. As I seem to be able to reproduce this often, is there I way I can be helpful? Would output of hald running verbosely be helpful?
I had the idea to capture the DBus system bus messages to see what's actually going on when this happens. For a while after I did this, I did not see the problem. However I notice that today, the ACLs were not restored. Disappointingly there is no much further clue beyond the absence of the expected ACLAdded messages following the ACLRemoved messages in the log. So, I suppose we at least know that the software doesn't think it has restored the ACLs, and the question remains why it either isn't trying or doesn't succeed, most likely the former. However, I do notice that in every case during a Hibernation, nothing happens until after the subsequent restore. That is, the ACLs aren't even removed, let alone restored, until after the machine is defrosted from its hibernation. What's the next step? Add some diagnostics to hal-acl-tool to see whether it's being called at all when this goes wrong? Which program actually runs hal-acl-tool, the hal daemon ?
tialaramex: Actually what happens is that after two VT switches in short time two instances of hal-acl-tool get spawned, and the later-created can get scheduled sooner, co they run in reverse. Currently hal passes the information on sessions in environment variables to hal-acl-tool (an optimization), which is incorrect, since the information can be invalid when hal-acl-tool applies those. My idea is to get the session information from consolekit via dbus calls after locking acl list. Unless davidz does it it can take some time, as I'm rather poor-minded in this area.
My plan is to work on hal this week. The info in comment 14 is useful; I'll review the locking code and torture test the whole thing.
David: I commited the fix for F-8 to CVS [1]. Please have a short look at it, it fixed the issue for me, I tried several suspend/resumes, VT switches, fast user switching. Mostly stolen from William Jon McCann. [1] https://www.redhat.com/archives/fedora-extras-commits/2008-February/msg10485.html If you won't object we may push this for F-8 as it's pretty simple, and issue it fixes is serious enough; no matter what will be the more elegant fix for future upstream version :)
Note that that patch also need changes to selinux policy, as currently hal-acl-tool is not allowed to talk to dbus
(In reply to comment #16) > David: I commited the fix for F-8 to CVS [1]. Please have a short look at it, it > fixed the issue for me, I tried several suspend/resumes, VT switches, fast user > switching. Mostly stolen from William Jon McCann. No, please avoid committing this for F-8. It fixes only the symptom, not the real bug. But thanks for the patch, testing and data points; might be useful to get to the bottom of this bug.
David, so far as I can tell Lubomir has identified the bug, and his fix is unavoidable. The current hal-acl-tool design can't work because it assumes that sub-process execution is synchronous, which isn't true on any remotely modern computer. So it needs to be replaced, as is done in Lubomir's patch. What is the "real bug" that you think is the problem here, and how will your fix avoid the case where the hal-acl-tool acts at the wrong moment ?
(In reply to comment #19) > David, so far as I can tell Lubomir has identified the bug, and his fix is > unavoidable. The current hal-acl-tool design can't work because it assumes that > sub-process execution is synchronous, which isn't true on any remotely modern > computer. So it needs to be replaced, as is done in Lubomir's patch. That's an interesting claim to make. FWIW, Lubomir's patch creates a ton of extra work because it gets information from CK via D-Bus instead of using the information passed from hald who is already watching CK asynchronously. This information was specifically added to avoid doing all this work. I think the bug is just that one or more hal-acl-tool processes gets in the way of each other e.g. that the locking is somehow broken. > > What is the "real bug" that you think is the problem here, and how will your fix > avoid the case where the hal-acl-tool acts at the wrong moment ? I won't have time to debug this until tomorrow.
(In reply to comment #20) > That's an interesting claim to make. FWIW, Lubomir's patch creates a ton of > extra work because it gets information from CK via D-Bus instead of using the > information passed from hald who is already watching CK asynchronously. This > information was specifically added to avoid doing all this work. Yes, but I think the work is unavoidable with the current design. > I think the bug is just that one or more hal-acl-tool processes gets in the > way of each other e.g. that the locking is somehow broken. Let me spell out a race condition, which I believe is commonplace. 1. Laptop lid closed, hibernate begins, console kit changes to 'false' 2. hal-acl-tool pid #846 is created to remove ACLs, but it doesn't run yet because the kernel is trying to hibernate. 3. Hibernation complete, power off 4. Restore initiates thawing, console kit back to 'true' 5. hal-acl-tool pid #851 is created to restore ACLs 6. hal-acl-tool pid #851 runs, ACLs are already present, nothing to do, exits 7. hal-acl-tool pid #846 is thawed, runs, removes ACLs Now, this is contrary to what you might /expect/ to happen, since you started #846 first, but there we are, this is a pre-emptive multitasking operating system and things don't necessarily happen in the order you expected. Any reliable fix for my bug report will need to address this race condition. Lubomir's fix addresses this race condition. Poking around in hal-acl-tool's own locking won't address the race condition. Maybe you'll find another bug, maybe you won't, but I think Lubomir's found the real cause of my trouble. If you're sure that D-Bus messages are too expensive, your other option is to arrange for HAL to only run one hal-acl-tool at a time (always waiting for the previous one to complete before starting another) and queue up ACL changes until a new sub-process can be started. If you don't already have a utility sub-routine to do this sort of thing correctly it will undoubtedly take some serious debugging to make this robust.
*** Bug 422751 has been marked as a duplicate of this bug. ***
*** Bug 431349 has been marked as a duplicate of this bug. ***
Did you find anything in your investigation David ? Lubomir, perhaps you can create a (Fedora 8) RPM with your change that interested parties can test while we wait a little while to see if David finds a more elegant solution? Also, you mentioned SELinux policy. Is the current situation that your patch does not function on systems where SELinux policy is enforced ? Or did I misunderstand.
tialaramex: the package is built in koji already for some time. [1] [1] http://koji.fedoraproject.org/koji/buildinfo?buildID=39956 I plan pushing it to testing tomorrow unless davidz comes up with a better solution until then -- it's been three months since this has been reported and caused lot of trouble to laptop users. I'm always in favour of fix from David! When it comes to SELinux, you're right. In enforcing mode it won't allow hal-acl-tool to communicate via dbus' socket and therefore it can't find information on seats and sessions from consolekit, so effectively it won't work at all. Modifying a SELinux policy should be trivial though, so if we agree on the fix (depending of whether solution comes from davidz), i'd do that.
> I plan pushing it to testing tomorrow No, please don't do this. Thanks.
*** Bug 397601 has been marked as a duplicate of this bug. ***
(In reply to comment #21) > Let me spell out a race condition, which I believe is commonplace. > > 1. Laptop lid closed, hibernate begins, console kit changes to 'false' > 2. hal-acl-tool pid #846 is created to remove ACLs, but it doesn't run yet > because the kernel is trying to hibernate. > 3. Hibernation complete, power off > 4. Restore initiates thawing, console kit back to 'true' > 5. hal-acl-tool pid #851 is created to restore ACLs > 6. hal-acl-tool pid #851 runs, ACLs are already present, nothing to do, exits > 7. hal-acl-tool pid #846 is thawed, runs, removes ACLs > > Now, this is contrary to what you might /expect/ to happen, since you started > #846 first, but there we are, this is a pre-emptive multitasking operating > system and things don't necessarily happen in the order you expected. Right. The bug is really that we don't serialize the hal-acl-tool calls. I've done this now http://gitweb.freedesktop.org/?p=hal.git;a=commitdiff;h=f047f03869b2f5d20de1eafdae02d4ebc6eddc06 and this fixes it for me (will land in Rawhide tomorrow with a ton of other fixes). Any chance anyone can check if this patch applies to the F8 srpm and if it fixes the problem? Thanks.
From reading the patch this fix looks correct assuming that the callback is the only way the affected code can be entered asynchronously -- which I can't verify without reading the rest of the HAL code. Also I assumed that the callback isn't actually running in signal handler context (for SIGCHLD) but just afterwards, since it does far too much and too dangerous work for a signal handler. I have been running Lubomir's RPMs for a few days now without problems, but this fix is more in the spirit of the original design. If no-one else does it then I will try to find time this month to look at getting my RPM build environment working again and test your patch on my F8 laptop.
hal-0.5.10-1.fc8.2 has been submitted as an update for Fedora 8
hal-0.5.10-1.fc8.2 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update hal'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-2246
I'm interested in this. I think I'm seeing another symptom of the same problem. When I start X from run level 3 using startx sound doesn't work, I get the gnome-volume-control reports "No volume control GStreamer plugins and/or devices found." message. However if I start X from run level 5 sound works fine. I suspect I may be seeing other symptoms of the same issue from VMWare.
(In reply to comment #32) > I'm interested in this. I think I'm seeing another symptom of the same problem. > > When I start X from run level 3 using startx sound doesn't work, I get the > gnome-volume-control reports "No volume control GStreamer plugins and/or devices > found." message. > > However if I start X from run level 5 sound works fine. > > I suspect I may be seeing other symptoms of the same issue from VMWare. What makes you believe it is a symptom of this problem? Launch "ck-list-sessions" to see if ConsoleKit knows about your session (I suspect it won't. Do not use startx. Use gdm. Or maybe reusing the same VT would help?)
hal-0.5.10-1.fc8.2 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report.
Hello, I am seeing the issue of "no sound after resume" in Fedora 11 with T60P. Killing and restarting pulse audio solves the issue.The desktop is kde 4.3.2. Please let me know the list of output files that I could attach here. Thanks.