Bug 471498 - plymouth hangs in startup
plymouth hangs in startup
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: plymouth (Show other bugs)
rawhide
All Linux
high Severity high
: ---
: ---
Assigned To: Ray Strode [halfline]
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks: F10DesktopTarget
  Show dependency treegraph
 
Reported: 2008-11-13 18:55 EST by Orion Poplawski
Modified: 2008-11-20 00:10 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-11-18 17:38:33 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Orion Poplawski 2008-11-13 18:55:31 EST
Description of problem:

On first boot of freshly installed system, hanging with plymouth boot crawl.

plymouthd is consuming 100% cpu.  strace shows:

write(1, ""..., 0)                      = 0
write(1, ""..., 0)                      = 0
.....


Over and over.

root      1505     1  0 16:41 ?        00:00:00 /bin/bash /etc/rc.d/rc 5
root      2427  1505  0 16:42 ?        00:00:00 /usr/bin/plymouth --update=cups

# strace -p 2427
Process 2427 attached - interrupt to quit
clock_gettime(CLOCK_MONOTONIC, {538, 967530055}) = 0
epoll_wait(3,

# tail -2 /var/log/messages
Nov 13 16:50:58 adelie kernel: type=1400 audit(1226620258.857:71): avc:  denied{ recv } for  pid=507 comm="plymouthd" saddr=192.168.0.8 src=631 daddr=192.168.0.255 dest=631 netif=eth0 scontext=system_u:system_r:portreserve_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=packet
Nov 13 16:51:03 adelie kernel: type=1400 audit(1226620263.857:72): avc:  denied{ recv } for  pid=507 comm="plymouthd" saddr=192.168.0.8 src=631 daddr=192.168.0.255 dest=631 netif=eth0 scontext=system_u:system_r:portreserve_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=packet

If I kill the plymouth cups process, plymouth --update=anacron appears hung in the same way

Version-Release number of selected component (if applicable):
plymouth-0.6.0-0.2008.11.10.5.fc10.x86_64

How reproducible:
2 for 2 so far...
Comment 1 Ray Strode [halfline] 2008-11-14 09:25:48 EST
Does booting with enforcing=0 on the kernel command line fix this issue?

I don't understand how there can be AVCs when plymouthd is loaded in the initrd before selinux policy.  Dan, do you know what's going on?
Comment 2 Ray Strode [halfline] 2008-11-14 09:26:49 EST
oh I guess it's the plymouth client that's getting locked down.
Comment 3 Daniel Walsh 2008-11-14 09:33:38 EST
I think the avc messages are a red herring.  These have been fixed in the latest policy either 19 or 20, but I don't see how portreserve could have hung plymouth
Comment 4 Daniel Walsh 2008-11-14 09:40:30 EST
Looking at the avc again, this indicates plymouthd is running as portreserve_t?  That is very strange.

rpm -ql selinux-policy
Comment 5 Orion Poplawski 2008-11-14 09:45:36 EST
enforcing=0 has no effect.
Comment 6 Daniel Walsh 2008-11-14 09:56:29 EST
SELinux policy https://bugzilla.redhat.com/show_bug.cgi?id=470550 has been
reported on portreserv bug and was fixed in selinux-policy-3.5.13-19.fc10

We have no idea why plymouthd would be running in portreserve_t though.
Comment 7 Mads Kiilerich 2008-11-14 10:12:25 EST
Someone else reported "boot hangs in Anacron" on #fedora-qa. 

Booting to runlevel 3 worked and ruled out anything but X-related stuff, and I
think /var/log/Xorg.0.log gave the conclusion was that it was bug 467821. 

Perhaps the same here?
Comment 8 Ray Strode [halfline] 2008-11-14 10:50:17 EST
I don't think this is a duplicate of bug 467821.

He mentioned plymouthd is stuck in a tight loop doing write calls.
Comment 9 Orion Poplawski 2008-11-14 11:42:53 EST
Actually, it is.  This machine has a Radeon XPRESS 200M 5955 (PCIE) as well.  With yesterdays install I got a "text" plymouth boot screen.  With today's install I get a graphical screen and no tight write call loop.

*** This bug has been marked as a duplicate of bug 467821 ***
Comment 10 Orion Poplawski 2008-11-14 11:46:54 EST
But it is still plymouthd that is consuming 100% cpu, and I don't see a /var/log/Xorg.0.log.  Probably related.
Comment 11 Ray Strode [halfline] 2008-11-14 11:56:50 EST
so you're seeing plymouthd consuming 100% cpu but it's not in a tight write() loop?  What is it doing now?
Comment 12 Orion Poplawski 2008-11-14 12:04:45 EST
strace shows nothing.


# gdb -e /sbin/plymouthd -p 507

(gdb) bt
#0  0x00007f1f774a68e0 in free@plt () from /usr/lib64/libplybootsplash.so.2
#1  0x00007f1f774aa5a0 in check_buffer_for_key_events () at ply-window.c:361
#2  on_key_event (window=0x68b2a0) at ply-window.c:375
#3  0x00007f1f776b23a8 in ply_event_loop_handle_met_status_for_source ()
    at ply-event-loop.c:992
#4  ply_event_loop_process_pending_events (loop=0x68a030)
    at ply-event-loop.c:1230
#5  0x00007f1f776b26b0 in ply_event_loop_run (loop=0x68a030)
    at ply-event-loop.c:1251
#6  0x0000000000404c13 in ?? ()
#7  0x0000000000406160 in ?? ()
#8  0x0000000000406290 in ?? ()
#9  0x00007fff7f8c33b0 in ?? ()
#10 0x000000000011a1ce in _dl_lookup_symbol_x ()
   from /lib64/ld-linux-x86-64.so.2
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Comment 13 Orion Poplawski 2008-11-14 12:07:05 EST
root      2515  1559  0 09:58 ?        00:00:00 /usr/bin/plymouth --update=anacron                                                                                
(gdb) bt
#0  0x00007ff5f6be0a83 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x000000000060a1cf in ply_event_loop_process_pending_events (loop=0x1007010)
    at ply-event-loop.c:1187
#2  0x000000000060a6b0 in ply_event_loop_run (loop=0x1007010)
    at ply-event-loop.c:1251
#3  0x0000000000402d7d in main (argc=<value optimized out>, argv=0x7ffffee89b98)
    at ./plymouth.c:545


# strace -p 2515
Process 2515 attached - interrupt to quit
clock_gettime(CLOCK_MONOTONIC, {542, 314390394}) = 0
epoll_wait(3,

From lsof:
plymouth 2515 root    3u  0000                0,7        0     34 anon_inode
Comment 14 Orion Poplawski 2008-11-14 12:11:14 EST
plymouthd loop in gdb:

check_buffer_for_key_events () at ply-window.c:347
347       while (i < size)
352           character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL);
check_buffer_for_key_events () at ply-window.c:352
352           character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL);
check_buffer_for_key_events () at ply-window.c:354
354           if (character_size < 0)
check_buffer_for_key_events () at ply-window.c:352
352           character_size = (ssize_t) mbrlen (bytes + i, size - i, NULL);
check_buffer_for_key_events () at ply-window.c:354
354           if (character_size < 0)
357           keyboard_input = strndup (bytes + i, character_size);
process_keyboard_input () at ply-window.c:261
261       if (mbrtowc (&key, keyboard_input, 1, NULL) > 0)
check_buffer_for_key_events () at ply-window.c:357
357           keyboard_input = strndup (bytes + i, character_size);
check_buffer_for_key_events () at ply-window.c:359
359           process_keyboard_input (window, keyboard_input, character_size);
process_keyboard_input () at ply-window.c:332
332       if (window->keyboard_input_handler != NULL)
333         window->keyboard_input_handler (window->keyboard_input_handler_user_data,
check_buffer_for_key_events () at ply-window.c:361
361           free (keyboard_input);
363           i += character_size;
361           free (keyboard_input);
347       while (i < size)
Comment 15 Orion Poplawski 2008-11-14 12:18:42 EST
Killing plymouthd gets me to X and end of rc scripts.
Comment 16 Ray Strode [halfline] 2008-11-14 12:57:36 EST
is character_size 0 during this loop?
Comment 17 Orion Poplawski 2008-11-14 13:03:34 EST
I get no such symbol when I try to query, but i stays at 0, so I would assume so.
Comment 18 Ray Strode [halfline] 2008-11-14 13:38:21 EST
mind giving plymouth-0.6.0-0.2008.11.14.2.fc10,

http://koji.fedoraproject.org/koji/taskinfo?taskID=933835

a go?
Comment 19 Orion Poplawski 2008-11-14 18:04:14 EST
plymouthd: ply-image.c:118: ply_image_free: Assertion 'image != ((void *)0)' failed.
Comment 20 Ray Strode [halfline] 2008-11-14 21:58:52 EST
ah, that's probably related the background change that went in to fix bug 471227
Comment 21 Orion Poplawski 2008-11-14 22:05:23 EST
I certainly see it on other hardware as well.
Comment 22 Ray Strode [halfline] 2008-11-14 22:54:47 EST
hmm, still, it's not 100% reproducible because I can't make it happen.

Are you booting into runlevel 3 or runlevel 5?
You're using the solar plugin?

I've built a superficial patch around the issue (the assertion was dubious to begin with, free() allows NULL, why wouldn't ply_image_free ?).  Would be grateful if you gave it a try.

Looking at the diff for the change that went in for bug 471227 it's not immediately obvious how ply_image_free could get called with a NULL image anyway.  I wonder if there's some nasty memory corruption that's causing 0's to get written in random places causing some of these weird side effects.  valgrind on this machine suggests no, though.
Comment 23 Orion Poplawski 2008-11-14 23:49:49 EST
I've seen the assertion failure on 2 of 2 machines.  The cpu usage/plymouth client hang only on the one (didn't try the earlier plymouth version on the other).

Happy to try any/all patches.  Won't have access to the failing machine until Monday.
Comment 24 Ray Strode [halfline] 2008-11-17 10:27:17 EST
i tried reproducing on a machine at work here, and no luck either.  If the plymouth build mentioned in comment 22 doesn't work for you, can you,

run (as root)

/sbin/init 3
(log into tty2 as root)

run
/sbin/plymouthd
gdb attach $(/sbin/pidof plymouthd)

(log into tty3 as root)
plymouth --show-splash; sleep 5; plymouth --quit

and then if the assertion failed on tty2, run "backtrace" at the gdb prompt ?
Comment 25 Orion Poplawski 2008-11-17 10:35:21 EST
I don't see a link to a build in comment #22
Comment 26 Ray Strode [halfline] 2008-11-17 10:42:00 EST
I didn't actually link to the build (bad form, sorry), i just said "I've built a superficial patch around the issue".

The build is here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=69803

or there's a newer one (built to address bug 471089) that also has the workaround:

http://koji.fedoraproject.org/koji/buildinfo?buildID=69998

Either/or should tell us if the workaround is good enough.
Comment 27 Ray Strode [halfline] 2008-11-18 17:38:33 EST
So a plymouth with this fix got tagged yesterday.  We're at the very end of the release cycle in a time crunch, so I'm going to close this out before getting final confirmation that the issue is fixed.  

It's no longer really a blocker anymore, anyway, since it's been demoted from from "spins blocking boot up" to "displays and ugly message in boot up" and isn't something others have been able to reproduce.

If you have other problems let me know, and we can try to address them in a post-GA update.
Comment 28 Orion Poplawski 2008-11-20 00:10:47 EST
Confirmed fixed

Note You need to log in before you can comment on or make changes to this bug.