471498 – plymouth hangs in startup

Bug 471498 - plymouth hangs in startup

Summary: plymouth hangs in startup

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	plymouth
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Ray Strode [halfline]
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F10DesktopTarget
TreeView+	depends on / blocked

Reported:	2008-11-13 23:55 UTC by Orion Poplawski
Modified:	2008-11-20 05:10 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-11-18 22:38:33 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Orion Poplawski 2008-11-13 23:55:31 UTC

Description of problem:

On first boot of freshly installed system, hanging with plymouth boot crawl.

plymouthd is consuming 100% cpu.  strace shows:

write(1, ""..., 0)                      = 0
write(1, ""..., 0)                      = 0
.....


Over and over.

root      1505     1  0 16:41 ?        00:00:00 /bin/bash /etc/rc.d/rc 5
root      2427  1505  0 16:42 ?        00:00:00 /usr/bin/plymouth --update=cups

# strace -p 2427
Process 2427 attached - interrupt to quit
clock_gettime(CLOCK_MONOTONIC, {538, 967530055}) = 0
epoll_wait(3,

# tail -2 /var/log/messages
Nov 13 16:50:58 adelie kernel: type=1400 audit(1226620258.857:71): avc:  denied{ recv } for  pid=507 comm="plymouthd" saddr=192.168.0.8 src=631 daddr=192.168.0.255 dest=631 netif=eth0 scontext=system_u:system_r:portreserve_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=packet
Nov 13 16:51:03 adelie kernel: type=1400 audit(1226620263.857:72): avc:  denied{ recv } for  pid=507 comm="plymouthd" saddr=192.168.0.8 src=631 daddr=192.168.0.255 dest=631 netif=eth0 scontext=system_u:system_r:portreserve_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=packet

If I kill the plymouth cups process, plymouth --update=anacron appears hung in the same way

Version-Release number of selected component (if applicable):
plymouth-0.6.0-0.2008.11.10.5.fc10.x86_64

How reproducible:
2 for 2 so far...

Comment 1 Ray Strode [halfline] 2008-11-14 14:25:48 UTC

Does booting with enforcing=0 on the kernel command line fix this issue?

I don't understand how there can be AVCs when plymouthd is loaded in the initrd before selinux policy.  Dan, do you know what's going on?

Comment 2 Ray Strode [halfline] 2008-11-14 14:26:49 UTC

oh I guess it's the plymouth client that's getting locked down.

Comment 3 Daniel Walsh 2008-11-14 14:33:38 UTC

I think the avc messages are a red herring.  These have been fixed in the latest policy either 19 or 20, but I don't see how portreserve could have hung plymouth

Comment 4 Daniel Walsh 2008-11-14 14:40:30 UTC

Looking at the avc again, this indicates plymouthd is running as portreserve_t?  That is very strange.

rpm -ql selinux-policy

Comment 5 Orion Poplawski 2008-11-14 14:45:36 UTC

enforcing=0 has no effect.

Comment 6 Daniel Walsh 2008-11-14 14:56:29 UTC

SELinux policy https://bugzilla.redhat.com/show_bug.cgi?id=470550 has been
reported on portreserv bug and was fixed in selinux-policy-3.5.13-19.fc10

We have no idea why plymouthd would be running in portreserve_t though.

Comment 7 Mads Kiilerich 2008-11-14 15:12:25 UTC

Someone else reported "boot hangs in Anacron" on #fedora-qa. 

Booting to runlevel 3 worked and ruled out anything but X-related stuff, and I
think /var/log/Xorg.0.log gave the conclusion was that it was bug 467821. 

Perhaps the same here?

Comment 8 Ray Strode [halfline] 2008-11-14 15:50:17 UTC

I don't think this is a duplicate of bug 467821.

He mentioned plymouthd is stuck in a tight loop doing write calls.

Comment 9 Orion Poplawski 2008-11-14 16:42:53 UTC

Actually, it is.  This machine has a Radeon XPRESS 200M 5955 (PCIE) as well.  With yesterdays install I got a "text" plymouth boot screen.  With today's install I get a graphical screen and no tight write call loop.

*** This bug has been marked as a duplicate of bug 467821 ***

Comment 10 Orion Poplawski 2008-11-14 16:46:54 UTC

But it is still plymouthd that is consuming 100% cpu, and I don't see a /var/log/Xorg.0.log.  Probably related.

Comment 11 Ray Strode [halfline] 2008-11-14 16:56:50 UTC

so you're seeing plymouthd consuming 100% cpu but it's not in a tight write() loop?  What is it doing now?

Comment 12 Orion Poplawski 2008-11-14 17:04:45 UTC

strace shows nothing.


# gdb -e /sbin/plymouthd -p 507

(gdb) bt
#0  0x00007f1f774a68e0 in free@plt () from /usr/lib64/libplybootsplash.so.2
#1  0x00007f1f774aa5a0 in check_buffer_for_key_events () at ply-window.c:361
#2  on_key_event (window=0x68b2a0) at ply-window.c:375
#3  0x00007f1f776b23a8 in ply_event_loop_handle_met_status_for_source ()
    at ply-event-loop.c:992
#4  ply_event_loop_process_pending_events (loop=0x68a030)
    at ply-event-loop.c:1230
#5  0x00007f1f776b26b0 in ply_event_loop_run (loop=0x68a030)
    at ply-event-loop.c:1251
#6  0x0000000000404c13 in ?? ()
#7  0x0000000000406160 in ?? ()
#8  0x0000000000406290 in ?? ()
#9  0x00007fff7f8c33b0 in ?? ()
#10 0x000000000011a1ce in _dl_lookup_symbol_x ()
   from /lib64/ld-linux-x86-64.so.2
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Comment 13 Orion Poplawski 2008-11-14 17:07:05 UTC

root      2515  1559  0 09:58 ?        00:00:00 /usr/bin/plymouth --update=anacron                                                                                
(gdb) bt
#0  0x00007ff5f6be0a83 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x000000000060a1cf in ply_event_loop_process_pending_events (loop=0x1007010)
    at ply-event-loop.c:1187
#2  0x000000000060a6b0 in ply_event_loop_run (loop=0x1007010)
    at ply-event-loop.c:1251
#3  0x0000000000402d7d in main (argc=<value optimized out>, argv=0x7ffffee89b98)
    at ./plymouth.c:545


# strace -p 2515
Process 2515 attached - interrupt to quit
clock_gettime(CLOCK_MONOTONIC, {542, 314390394}) = 0
epoll_wait(3,

From lsof:
plymouth 2515 root    3u  0000                0,7        0     34 anon_inode

Comment 14 Orion Poplawski 2008-11-14 17:11:14 UTC

plymouthd loop in gdb:

check_buffer_for_key_events () at ply-window.c:347
347       while (i < size)
352           character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL);
check_buffer_for_key_events () at ply-window.c:352
352           character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL);
check_buffer_for_key_events () at ply-window.c:354
354           if (character_size < 0)
check_buffer_for_key_events () at ply-window.c:352
352           character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL);
check_buffer_for_key_events () at ply-window.c:354
354           if (character_size < 0)
357           keyboard_input = strndup (bytes + i, character_size);
process_keyboard_input () at ply-window.c:261
261       if (mbrtowc (&key, keyboard_input, 1, NULL) > 0)
check_buffer_for_key_events () at ply-window.c:357
357           keyboard_input = strndup (bytes + i, character_size);
check_buffer_for_key_events () at ply-window.c:359
359           process_keyboard_input (window, keyboard_input, character_size);
process_keyboard_input () at ply-window.c:332
332       if (window->keyboard_input_handler != NULL)
333         window->keyboard_input_handler (window->keyboard_input_handler_user_data,
check_buffer_for_key_events () at ply-window.c:361
361           free (keyboard_input);
363           i += character_size;
361           free (keyboard_input);
347       while (i < size)

Comment 15 Orion Poplawski 2008-11-14 17:18:42 UTC

Killing plymouthd gets me to X and end of rc scripts.

Comment 16 Ray Strode [halfline] 2008-11-14 17:57:36 UTC

is character_size 0 during this loop?

Comment 17 Orion Poplawski 2008-11-14 18:03:34 UTC

I get no such symbol when I try to query, but i stays at 0, so I would assume so.

Comment 18 Ray Strode [halfline] 2008-11-14 18:38:21 UTC

mind giving plymouth-0.6.0-0.2008.11.14.2.fc10,

http://koji.fedoraproject.org/koji/taskinfo?taskID=933835

a go?

Comment 19 Orion Poplawski 2008-11-14 23:04:14 UTC

plymouthd: ply-image.c:118: ply_image_free: Assertion 'image != ((void *)0)' failed.

Comment 20 Ray Strode [halfline] 2008-11-15 02:58:52 UTC

ah, that's probably related the background change that went in to fix bug 471227

Comment 21 Orion Poplawski 2008-11-15 03:05:23 UTC

I certainly see it on other hardware as well.

Comment 22 Ray Strode [halfline] 2008-11-15 03:54:47 UTC

hmm, still, it's not 100% reproducible because I can't make it happen.

Are you booting into runlevel 3 or runlevel 5?
You're using the solar plugin?

I've built a superficial patch around the issue (the assertion was dubious to begin with, free() allows NULL, why wouldn't ply_image_free ?).  Would be grateful if you gave it a try.

Looking at the diff for the change that went in for bug 471227 it's not immediately obvious how ply_image_free could get called with a NULL image anyway.  I wonder if there's some nasty memory corruption that's causing 0's to get written in random places causing some of these weird side effects.  valgrind on this machine suggests no, though.

Comment 23 Orion Poplawski 2008-11-15 04:49:49 UTC

I've seen the assertion failure on 2 of 2 machines.  The cpu usage/plymouth client hang only on the one (didn't try the earlier plymouth version on the other).

Happy to try any/all patches.  Won't have access to the failing machine until Monday.

Comment 24 Ray Strode [halfline] 2008-11-17 15:27:17 UTC

i tried reproducing on a machine at work here, and no luck either.  If the plymouth build mentioned in comment 22 doesn't work for you, can you,

run (as root)

/sbin/init 3
(log into tty2 as root)

run
/sbin/plymouthd
gdb attach $(/sbin/pidof plymouthd)

(log into tty3 as root)
plymouth --show-splash; sleep 5; plymouth --quit

and then if the assertion failed on tty2, run "backtrace" at the gdb prompt ?

Comment 25 Orion Poplawski 2008-11-17 15:35:21 UTC

I don't see a link to a build in comment #22

Comment 26 Ray Strode [halfline] 2008-11-17 15:42:00 UTC

I didn't actually link to the build (bad form, sorry), i just said "I've built a superficial patch around the issue".

The build is here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=69803

or there's a newer one (built to address bug 471089) that also has the workaround:

http://koji.fedoraproject.org/koji/buildinfo?buildID=69998

Either/or should tell us if the workaround is good enough.

Comment 27 Ray Strode [halfline] 2008-11-18 22:38:33 UTC

So a plymouth with this fix got tagged yesterday.  We're at the very end of the release cycle in a time crunch, so I'm going to close this out before getting final confirmation that the issue is fixed.  

It's no longer really a blocker anymore, anyway, since it's been demoted from from "spins blocking boot up" to "displays and ugly message in boot up" and isn't something others have been able to reproduce.

If you have other problems let me know, and we can try to address them in a post-GA update.

Comment 28 Orion Poplawski 2008-11-20 05:10:47 UTC

Confirmed fixed

Note You need to log in before you can comment on or make changes to this bug.