Red Hat Bugzilla – Full Text Bug Listing
|Summary:||Hal does occassionally not restore ACLs|
|Product:||[Fedora] Fedora||Reporter:||Nick Lamb <redhat>|
|Component:||hal||Assignee:||David Zeuthen <davidz>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Version:||8||CC:||bche, belegdol, bojan, chris, clasohm, cra, dbaron, jhutar, lakshminaras2002, lkundrak, luis, mads, mcepl, mcepl, mclasen, pierre-bugzilla, sankarshan.mukhopadhyay, steve|
|Fixed In Version:||0.5.10-1.fc8.2||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-03-13 03:42:52 EDT||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
|Bug Blocks:||397601, 434909|
Description Nick Lamb 2007-11-15 05:04:22 EST
Description of problem: This is for a Thinkpad Z60m upgraded from Fedora 7, though I suspect the hardware doesn't matter for this bug. When this laptop has been hibernated and then restored, sound stops working. On closer inspection I see that in fact the ACLs that permit me to use the sound device have been removed. I don't know which component is responsible, so udev is my first guess. Please pass this on to another component if you know better. Version-Release number of selected component (if applicable): udev-116-3.fc8 How reproducible: Absolutely reliable so far Steps to Reproduce: 1. Hibernate the laptop 2. Restore the laptop 3. Check access to sound device with e.g. getfacl /dev/snd/pcmC0D0c Actual results: No ACL for my user # file: dev/snd/pcmC0D0c # owner: root # group: root user::rw- user:gdm:rw- group::rw- mask::rw- other::--- Expected results: Permissive ACL for my (logged in) user # file: dev/snd/pcmC0D0c # owner: root # group: root user::rw- user:gdm:rw- user:njl:rw- group::rw- mask::rw- other::--- Additional info: Sound worked after restoring in Fedora 7, but I don't know whether ACLs were used in that release.
Comment 1 Harald Hoyer 2007-11-15 05:19:38 EST
udev does not set the ACLs
Comment 2 Jack Spaar 2007-11-17 19:47:10 EST
I have the same problem, but sound works OK for root. This is almost certainly a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=376011 When the problem occurs, gnome-volume-control reports "No volume control GStreamer plugins and/or devices found." as in the above bug. But sound works for root. E.g. luser@system$ aplay /usr/share/sounds/startup3.wav ALSA lib confmisc.c:768:(parse_card) cannot find card '0' ALSA lib conf.c:3510:(_snd_config_evaluate) function snd_func_card_driver returned error: No such device ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:3510:(_snd_config_evaluate) function snd_func_concat returned error: No such device ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name ALSA lib conf.c:3510:(_snd_config_evaluate) function snd_func_refer returned error: No such device ALSA lib conf.c:3982:(snd_config_expand) Evaluate error: No such device ALSA lib pcm.c:2145:(snd_pcm_open_noupdate) Unknown PCM default aplay: main:546: audio open error: No such device luser@system$ sudo aplay /usr/share/sounds/startup3.wav Playing WAVE '/usr/share/sounds/startup3.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
Comment 3 Luis Villa 2007-11-17 20:04:56 EST
*** Bug 376011 has been marked as a duplicate of this bug. ***
Comment 4 David Zeuthen 2007-11-17 20:09:19 EST
Does this problem go away if you switch to VT1 and then back? (from a root shell in the session you can do 'chvt 1; sleep 2; chvt 7')
Comment 5 Jack Spaar 2007-11-17 20:20:48 EST
(In reply to comment #4) > Does this problem go away if you switch to VT1 and then back? > Yes!
Comment 6 Luis Villa 2007-11-17 22:10:18 EST
yup, that fixes it here too.
Comment 7 Nick Lamb 2007-11-19 04:45:33 EST
Yes, the ACLs are put back when I switch away and back again, thanks David. Also I discovered that this isn't 100% reproducible after all. Sometimes it doesn't happen. I've yet to pin down what makes the difference.
Comment 8 David Zeuthen 2007-11-19 11:02:06 EST
Glad to hear the VT switching "fixes" it. I've been working on and off for a fix but it's pretty difficult to reproduce this one...
Comment 9 thu992 2007-11-20 06:09:32 EST
Same problem and fix here too. Maybe this bug qualifies as a "known issue" (https://fedoraproject.org/wiki/Bugs/F8Common)? I'm sure many other laptop users are facing the same problem.
Comment 10 Will Woods 2007-12-03 15:39:25 EST
*** Bug 395581 has been marked as a duplicate of this bug. ***
Comment 11 Lubomir Kundrak 2008-01-25 06:30:28 EST
*** Bug 314411 has been marked as a duplicate of this bug. ***
Comment 12 Lubomir Kundrak 2008-01-25 06:34:16 EST
David: I can reporoduce it in roughly 30% of cases. It doesn't only happen after resumes, but also on console switches (though less frequently), with fast user swhitching your chances to reproduce it are higher. Also, I observed that it happens more frequently on some machines, and less frequently on other ones. As I seem to be able to reproduce this often, is there I way I can be helpful? Would output of hald running verbosely be helpful?
Comment 13 Nick Lamb 2008-02-25 19:40:11 EST
I had the idea to capture the DBus system bus messages to see what's actually going on when this happens. For a while after I did this, I did not see the problem. However I notice that today, the ACLs were not restored. Disappointingly there is no much further clue beyond the absence of the expected ACLAdded messages following the ACLRemoved messages in the log. So, I suppose we at least know that the software doesn't think it has restored the ACLs, and the question remains why it either isn't trying or doesn't succeed, most likely the former. However, I do notice that in every case during a Hibernation, nothing happens until after the subsequent restore. That is, the ACLs aren't even removed, let alone restored, until after the machine is defrosted from its hibernation. What's the next step? Add some diagnostics to hal-acl-tool to see whether it's being called at all when this goes wrong? Which program actually runs hal-acl-tool, the hal daemon ?
Comment 14 Lubomir Kundrak 2008-02-26 01:01:15 EST
tialaramex: Actually what happens is that after two VT switches in short time two instances of hal-acl-tool get spawned, and the later-created can get scheduled sooner, co they run in reverse. Currently hal passes the information on sessions in environment variables to hal-acl-tool (an optimization), which is incorrect, since the information can be invalid when hal-acl-tool applies those. My idea is to get the session information from consolekit via dbus calls after locking acl list. Unless davidz does it it can take some time, as I'm rather poor-minded in this area.
Comment 15 David Zeuthen 2008-02-26 01:14:31 EST
My plan is to work on hal this week. The info in comment 14 is useful; I'll review the locking code and torture test the whole thing.
Comment 16 Lubomir Kundrak 2008-02-26 03:36:29 EST
David: I commited the fix for F-8 to CVS . Please have a short look at it, it fixed the issue for me, I tried several suspend/resumes, VT switches, fast user switching. Mostly stolen from William Jon McCann.  https://www.redhat.com/archives/fedora-extras-commits/2008-February/msg10485.html If you won't object we may push this for F-8 as it's pretty simple, and issue it fixes is serious enough; no matter what will be the more elegant fix for future upstream version :)
Comment 17 Lubomir Kundrak 2008-02-26 03:39:51 EST
Note that that patch also need changes to selinux policy, as currently hal-acl-tool is not allowed to talk to dbus
Comment 18 David Zeuthen 2008-02-26 10:47:47 EST
(In reply to comment #16) > David: I commited the fix for F-8 to CVS . Please have a short look at it, it > fixed the issue for me, I tried several suspend/resumes, VT switches, fast user > switching. Mostly stolen from William Jon McCann. No, please avoid committing this for F-8. It fixes only the symptom, not the real bug. But thanks for the patch, testing and data points; might be useful to get to the bottom of this bug.
Comment 19 Nick Lamb 2008-02-26 11:26:28 EST
David, so far as I can tell Lubomir has identified the bug, and his fix is unavoidable. The current hal-acl-tool design can't work because it assumes that sub-process execution is synchronous, which isn't true on any remotely modern computer. So it needs to be replaced, as is done in Lubomir's patch. What is the "real bug" that you think is the problem here, and how will your fix avoid the case where the hal-acl-tool acts at the wrong moment ?
Comment 20 David Zeuthen 2008-02-26 11:43:29 EST
(In reply to comment #19) > David, so far as I can tell Lubomir has identified the bug, and his fix is > unavoidable. The current hal-acl-tool design can't work because it assumes that > sub-process execution is synchronous, which isn't true on any remotely modern > computer. So it needs to be replaced, as is done in Lubomir's patch. That's an interesting claim to make. FWIW, Lubomir's patch creates a ton of extra work because it gets information from CK via D-Bus instead of using the information passed from hald who is already watching CK asynchronously. This information was specifically added to avoid doing all this work. I think the bug is just that one or more hal-acl-tool processes gets in the way of each other e.g. that the locking is somehow broken. > > What is the "real bug" that you think is the problem here, and how will your fix > avoid the case where the hal-acl-tool acts at the wrong moment ? I won't have time to debug this until tomorrow.
Comment 21 Nick Lamb 2008-02-26 15:08:15 EST
(In reply to comment #20) > That's an interesting claim to make. FWIW, Lubomir's patch creates a ton of > extra work because it gets information from CK via D-Bus instead of using the > information passed from hald who is already watching CK asynchronously. This > information was specifically added to avoid doing all this work. Yes, but I think the work is unavoidable with the current design. > I think the bug is just that one or more hal-acl-tool processes gets in the > way of each other e.g. that the locking is somehow broken. Let me spell out a race condition, which I believe is commonplace. 1. Laptop lid closed, hibernate begins, console kit changes to 'false' 2. hal-acl-tool pid #846 is created to remove ACLs, but it doesn't run yet because the kernel is trying to hibernate. 3. Hibernation complete, power off 4. Restore initiates thawing, console kit back to 'true' 5. hal-acl-tool pid #851 is created to restore ACLs 6. hal-acl-tool pid #851 runs, ACLs are already present, nothing to do, exits 7. hal-acl-tool pid #846 is thawed, runs, removes ACLs Now, this is contrary to what you might /expect/ to happen, since you started #846 first, but there we are, this is a pre-emptive multitasking operating system and things don't necessarily happen in the order you expected. Any reliable fix for my bug report will need to address this race condition. Lubomir's fix addresses this race condition. Poking around in hal-acl-tool's own locking won't address the race condition. Maybe you'll find another bug, maybe you won't, but I think Lubomir's found the real cause of my trouble. If you're sure that D-Bus messages are too expensive, your other option is to arrange for HAL to only run one hal-acl-tool at a time (always waiting for the previous one to complete before starting another) and queue up ACL changes until a new sub-process can be started. If you don't already have a utility sub-routine to do this sort of thing correctly it will undoubtedly take some serious debugging to make this robust.
Comment 22 Lubomir Kundrak 2008-02-28 12:54:08 EST
*** Bug 422751 has been marked as a duplicate of this bug. ***
Comment 23 Lubomir Kundrak 2008-02-28 13:16:34 EST
*** Bug 431349 has been marked as a duplicate of this bug. ***
Comment 24 Nick Lamb 2008-02-28 13:54:24 EST
Did you find anything in your investigation David ? Lubomir, perhaps you can create a (Fedora 8) RPM with your change that interested parties can test while we wait a little while to see if David finds a more elegant solution? Also, you mentioned SELinux policy. Is the current situation that your patch does not function on systems where SELinux policy is enforced ? Or did I misunderstand.
Comment 25 Lubomir Kundrak 2008-02-28 17:31:02 EST
tialaramex: the package is built in koji already for some time.   http://koji.fedoraproject.org/koji/buildinfo?buildID=39956 I plan pushing it to testing tomorrow unless davidz comes up with a better solution until then -- it's been three months since this has been reported and caused lot of trouble to laptop users. I'm always in favour of fix from David! When it comes to SELinux, you're right. In enforcing mode it won't allow hal-acl-tool to communicate via dbus' socket and therefore it can't find information on seats and sessions from consolekit, so effectively it won't work at all. Modifying a SELinux policy should be trivial though, so if we agree on the fix (depending of whether solution comes from davidz), i'd do that.
Comment 26 David Zeuthen 2008-02-28 18:20:56 EST
> I plan pushing it to testing tomorrow No, please don't do this. Thanks.
Comment 27 Lubomir Kundrak 2008-02-29 04:35:46 EST
*** Bug 397601 has been marked as a duplicate of this bug. ***
Comment 28 David Zeuthen 2008-03-04 00:20:35 EST
(In reply to comment #21) > Let me spell out a race condition, which I believe is commonplace. > > 1. Laptop lid closed, hibernate begins, console kit changes to 'false' > 2. hal-acl-tool pid #846 is created to remove ACLs, but it doesn't run yet > because the kernel is trying to hibernate. > 3. Hibernation complete, power off > 4. Restore initiates thawing, console kit back to 'true' > 5. hal-acl-tool pid #851 is created to restore ACLs > 6. hal-acl-tool pid #851 runs, ACLs are already present, nothing to do, exits > 7. hal-acl-tool pid #846 is thawed, runs, removes ACLs > > Now, this is contrary to what you might /expect/ to happen, since you started > #846 first, but there we are, this is a pre-emptive multitasking operating > system and things don't necessarily happen in the order you expected. Right. The bug is really that we don't serialize the hal-acl-tool calls. I've done this now http://gitweb.freedesktop.org/?p=hal.git;a=commitdiff;h=f047f03869b2f5d20de1eafdae02d4ebc6eddc06 and this fixes it for me (will land in Rawhide tomorrow with a ton of other fixes). Any chance anyone can check if this patch applies to the F8 srpm and if it fixes the problem? Thanks.
Comment 29 Nick Lamb 2008-03-04 05:53:18 EST
From reading the patch this fix looks correct assuming that the callback is the only way the affected code can be entered asynchronously -- which I can't verify without reading the rest of the HAL code. Also I assumed that the callback isn't actually running in signal handler context (for SIGCHLD) but just afterwards, since it does far too much and too dangerous work for a signal handler. I have been running Lubomir's RPMs for a few days now without problems, but this fix is more in the spirit of the original design. If no-one else does it then I will try to find time this month to look at getting my RPM build environment working again and test your patch on my F8 laptop.
Comment 30 Fedora Update System 2008-03-04 07:35:35 EST
hal-0.5.10-1.fc8.2 has been submitted as an update for Fedora 8
Comment 31 Fedora Update System 2008-03-06 11:33:50 EST
hal-0.5.10-1.fc8.2 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update hal'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-2246
Comment 32 Chris Green 2008-03-07 07:02:57 EST
I'm interested in this. I think I'm seeing another symptom of the same problem. When I start X from run level 3 using startx sound doesn't work, I get the gnome-volume-control reports "No volume control GStreamer plugins and/or devices found." message. However if I start X from run level 5 sound works fine. I suspect I may be seeing other symptoms of the same issue from VMWare.
Comment 33 Lubomir Kundrak 2008-03-07 07:08:03 EST
(In reply to comment #32) > I'm interested in this. I think I'm seeing another symptom of the same problem. > > When I start X from run level 3 using startx sound doesn't work, I get the > gnome-volume-control reports "No volume control GStreamer plugins and/or devices > found." message. > > However if I start X from run level 5 sound works fine. > > I suspect I may be seeing other symptoms of the same issue from VMWare. What makes you believe it is a symptom of this problem? Launch "ck-list-sessions" to see if ConsoleKit knows about your session (I suspect it won't. Do not use startx. Use gdm. Or maybe reusing the same VT would help?)
Comment 34 Fedora Update System 2008-03-13 03:42:49 EDT
hal-0.5.10-1.fc8.2 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report.
Comment 35 Narasimhan 2009-11-06 00:19:57 EST
Hello, I am seeing the issue of "no sound after resume" in Fedora 11 with T60P. Killing and restarting pulse audio solves the issue.The desktop is kde 4.3.2. Please let me know the list of output files that I could attach here. Thanks.