Description of problem: On first boot of freshly installed system, hanging with plymouth boot crawl. plymouthd is consuming 100% cpu. strace shows: write(1, ""..., 0) = 0 write(1, ""..., 0) = 0 ..... Over and over. root 1505 1 0 16:41 ? 00:00:00 /bin/bash /etc/rc.d/rc 5 root 2427 1505 0 16:42 ? 00:00:00 /usr/bin/plymouth --update=cups # strace -p 2427 Process 2427 attached - interrupt to quit clock_gettime(CLOCK_MONOTONIC, {538, 967530055}) = 0 epoll_wait(3, # tail -2 /var/log/messages Nov 13 16:50:58 adelie kernel: type=1400 audit(1226620258.857:71): avc: denied{ recv } for pid=507 comm="plymouthd" saddr=192.168.0.8 src=631 daddr=192.168.0.255 dest=631 netif=eth0 scontext=system_u:system_r:portreserve_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=packet Nov 13 16:51:03 adelie kernel: type=1400 audit(1226620263.857:72): avc: denied{ recv } for pid=507 comm="plymouthd" saddr=192.168.0.8 src=631 daddr=192.168.0.255 dest=631 netif=eth0 scontext=system_u:system_r:portreserve_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=packet If I kill the plymouth cups process, plymouth --update=anacron appears hung in the same way Version-Release number of selected component (if applicable): plymouth-0.6.0-0.2008.11.10.5.fc10.x86_64 How reproducible: 2 for 2 so far...
Does booting with enforcing=0 on the kernel command line fix this issue? I don't understand how there can be AVCs when plymouthd is loaded in the initrd before selinux policy. Dan, do you know what's going on?
oh I guess it's the plymouth client that's getting locked down.
I think the avc messages are a red herring. These have been fixed in the latest policy either 19 or 20, but I don't see how portreserve could have hung plymouth
Looking at the avc again, this indicates plymouthd is running as portreserve_t? That is very strange. rpm -ql selinux-policy
enforcing=0 has no effect.
SELinux policy https://bugzilla.redhat.com/show_bug.cgi?id=470550 has been reported on portreserv bug and was fixed in selinux-policy-3.5.13-19.fc10 We have no idea why plymouthd would be running in portreserve_t though.
Someone else reported "boot hangs in Anacron" on #fedora-qa. Booting to runlevel 3 worked and ruled out anything but X-related stuff, and I think /var/log/Xorg.0.log gave the conclusion was that it was bug 467821. Perhaps the same here?
I don't think this is a duplicate of bug 467821. He mentioned plymouthd is stuck in a tight loop doing write calls.
Actually, it is. This machine has a Radeon XPRESS 200M 5955 (PCIE) as well. With yesterdays install I got a "text" plymouth boot screen. With today's install I get a graphical screen and no tight write call loop. *** This bug has been marked as a duplicate of bug 467821 ***
But it is still plymouthd that is consuming 100% cpu, and I don't see a /var/log/Xorg.0.log. Probably related.
so you're seeing plymouthd consuming 100% cpu but it's not in a tight write() loop? What is it doing now?
strace shows nothing. # gdb -e /sbin/plymouthd -p 507 (gdb) bt #0 0x00007f1f774a68e0 in free@plt () from /usr/lib64/libplybootsplash.so.2 #1 0x00007f1f774aa5a0 in check_buffer_for_key_events () at ply-window.c:361 #2 on_key_event (window=0x68b2a0) at ply-window.c:375 #3 0x00007f1f776b23a8 in ply_event_loop_handle_met_status_for_source () at ply-event-loop.c:992 #4 ply_event_loop_process_pending_events (loop=0x68a030) at ply-event-loop.c:1230 #5 0x00007f1f776b26b0 in ply_event_loop_run (loop=0x68a030) at ply-event-loop.c:1251 #6 0x0000000000404c13 in ?? () #7 0x0000000000406160 in ?? () #8 0x0000000000406290 in ?? () #9 0x00007fff7f8c33b0 in ?? () #10 0x000000000011a1ce in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 Backtrace stopped: previous frame inner to this frame (corrupt stack?)
root 2515 1559 0 09:58 ? 00:00:00 /usr/bin/plymouth --update=anacron (gdb) bt #0 0x00007ff5f6be0a83 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x000000000060a1cf in ply_event_loop_process_pending_events (loop=0x1007010) at ply-event-loop.c:1187 #2 0x000000000060a6b0 in ply_event_loop_run (loop=0x1007010) at ply-event-loop.c:1251 #3 0x0000000000402d7d in main (argc=<value optimized out>, argv=0x7ffffee89b98) at ./plymouth.c:545 # strace -p 2515 Process 2515 attached - interrupt to quit clock_gettime(CLOCK_MONOTONIC, {542, 314390394}) = 0 epoll_wait(3, From lsof: plymouth 2515 root 3u 0000 0,7 0 34 anon_inode
plymouthd loop in gdb: check_buffer_for_key_events () at ply-window.c:347 347 while (i < size) 352 character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL); check_buffer_for_key_events () at ply-window.c:352 352 character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL); check_buffer_for_key_events () at ply-window.c:354 354 if (character_size < 0) check_buffer_for_key_events () at ply-window.c:352 352 character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL); check_buffer_for_key_events () at ply-window.c:354 354 if (character_size < 0) 357 keyboard_input = strndup (bytes + i, character_size); process_keyboard_input () at ply-window.c:261 261 if (mbrtowc (&key, keyboard_input, 1, NULL) > 0) check_buffer_for_key_events () at ply-window.c:357 357 keyboard_input = strndup (bytes + i, character_size); check_buffer_for_key_events () at ply-window.c:359 359 process_keyboard_input (window, keyboard_input, character_size); process_keyboard_input () at ply-window.c:332 332 if (window->keyboard_input_handler != NULL) 333 window->keyboard_input_handler (window->keyboard_input_handler_user_data, check_buffer_for_key_events () at ply-window.c:361 361 free (keyboard_input); 363 i += character_size; 361 free (keyboard_input); 347 while (i < size)
Killing plymouthd gets me to X and end of rc scripts.
is character_size 0 during this loop?
I get no such symbol when I try to query, but i stays at 0, so I would assume so.
mind giving plymouth-0.6.0-0.2008.11.14.2.fc10, http://koji.fedoraproject.org/koji/taskinfo?taskID=933835 a go?
plymouthd: ply-image.c:118: ply_image_free: Assertion 'image != ((void *)0)' failed.
ah, that's probably related the background change that went in to fix bug 471227
I certainly see it on other hardware as well.
hmm, still, it's not 100% reproducible because I can't make it happen. Are you booting into runlevel 3 or runlevel 5? You're using the solar plugin? I've built a superficial patch around the issue (the assertion was dubious to begin with, free() allows NULL, why wouldn't ply_image_free ?). Would be grateful if you gave it a try. Looking at the diff for the change that went in for bug 471227 it's not immediately obvious how ply_image_free could get called with a NULL image anyway. I wonder if there's some nasty memory corruption that's causing 0's to get written in random places causing some of these weird side effects. valgrind on this machine suggests no, though.
I've seen the assertion failure on 2 of 2 machines. The cpu usage/plymouth client hang only on the one (didn't try the earlier plymouth version on the other). Happy to try any/all patches. Won't have access to the failing machine until Monday.
i tried reproducing on a machine at work here, and no luck either. If the plymouth build mentioned in comment 22 doesn't work for you, can you, run (as root) /sbin/init 3 (log into tty2 as root) run /sbin/plymouthd gdb attach $(/sbin/pidof plymouthd) (log into tty3 as root) plymouth --show-splash; sleep 5; plymouth --quit and then if the assertion failed on tty2, run "backtrace" at the gdb prompt ?
I don't see a link to a build in comment #22
I didn't actually link to the build (bad form, sorry), i just said "I've built a superficial patch around the issue". The build is here: http://koji.fedoraproject.org/koji/buildinfo?buildID=69803 or there's a newer one (built to address bug 471089) that also has the workaround: http://koji.fedoraproject.org/koji/buildinfo?buildID=69998 Either/or should tell us if the workaround is good enough.
So a plymouth with this fix got tagged yesterday. We're at the very end of the release cycle in a time crunch, so I'm going to close this out before getting final confirmation that the issue is fixed. It's no longer really a blocker anymore, anyway, since it's been demoted from from "spins blocking boot up" to "displays and ugly message in boot up" and isn't something others have been able to reproduce. If you have other problems let me know, and we can try to address them in a post-GA update.
Confirmed fixed