Bug 1724551

Summary: gnome-shell process eat 100% CPU after resume
Product: Red Hat Enterprise Linux 8 Reporter: Tomas Pelka <tpelka>
Component: mutterAssignee: Jonas Ådahl <jadahl>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: high    
Version: 8.1CC: fmuellner, hdegoede, jeff.burrell, jkoten, jwboyer, mkrajnak, rstrode, vbenes
Target Milestone: rcKeywords: Regression
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: mutter-3.32.2-7.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 22:14:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1660905, 1664096, 1678350    
Attachments:
Description Flags
journal.log
none
gdm session journal
none
backtrace
none
perf.log
none
gdm bus none

Description Tomas Pelka 2019-06-27 09:30:25 UTC
Created attachment 1585112 [details]
journal.log

Description of problem:
After resume gdm process work out CPU to 100%

Version-Release number of selected component (if applicable):
gdm-3.28.3-20.el8
gnome-shell-3.32.2-4.el8
gnome-session-3.28.1-6.el8

How reproducible:
50%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Adding journal log, 8:29 starts the suspend

Comment 1 Tomas Pelka 2019-06-27 10:22:54 UTC
Well actually gdm is not a process so it is rather gnome-shell

Comment 2 Jiri Koten 2019-06-28 08:47:58 UTC
Created attachment 1585495 [details]
gdm session journal

I'm also able to reproduce, gnome-shell running under gdm user is eating 100% cpu.


journalctl -b -u session-c1.scope

Comment 3 Jiri Koten 2019-06-28 08:48:29 UTC
Created attachment 1585496 [details]
backtrace

Comment 4 Ray Strode [halfline] 2019-06-28 14:03:59 UTC
probably mutter...

jkoten, can you do 

(gdb) print (char *) g_source_get_name (source)

in gdb?

Comment 7 Jiri Koten 2019-07-03 15:21:55 UTC
(In reply to Ray Strode [halfline] from comment #4)
> probably mutter...
> 
> jkoten, can you do 
> 
> (gdb) print (char *) g_source_get_name (source)
> 
> in gdb?

$1 = 0x7fdc38440a00 "[gio] emit_signal_instance_in_idle_cb"

Also I most often got "No symbol "source" in current context." when I attach to the running proccess.

The reproducer is changing screen configuration between suspend and resume e.g., use two external monitors; suspend; undock; resume

Comment 8 Ray Strode [halfline] 2019-07-10 18:54:41 UTC
okay so it seems like a stream of signals are coming in over the message bus.

if you run

busctl --user monitor 

does it have a constant flurry of activity?

Comment 10 Tomas Pelka 2019-07-17 13:26:37 UTC
Created attachment 1591413 [details]
perf.log

tpopela kindly helped us to get a bit more info:

Random bt's from gdb:

(gdb) bt
#0  0x00007fe900ed11de in timerfd_settime () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007fe8f6a31c57 in g_datetime_source_init_timerfd (unix_seconds=1563368880, expected_now_seconds=1563368879, datetime_source=0x562b5ead1b30) at ../libgnome-desktop/gnome-datetime-source.c:193
#2  _gnome_datetime_source_new (now=now@entry=0x562b5cb60600, expiry=expiry@entry=0x562b5cb61660, cancel_on_set=cancel_on_set@entry=1) at ../libgnome-desktop/gnome-datetime-source.c:275
#3  0x00007fe8f6a2c505 in update_clock (data=0x562b5dafa750, data@entry=<error reading variable: value has been optimized out>) at ../libgnome-desktop/gnome-wall-clock.c:411
#4  0x00007fe8f6a31a86 in g_datetime_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../libgnome-desktop/gnome-datetime-source.c:145
#5  0x00007fe902db667d in g_main_dispatch (context=0x562b5bfc9660) at gmain.c:3176
#6  g_main_context_dispatch (context=context@entry=0x562b5bfc9660) at gmain.c:3829
#7  0x00007fe902db6a48 in g_main_context_iterate (context=0x562b5bfc9660, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3902
#8  0x00007fe902db6d72 in g_main_loop_run (loop=0x562b5c6d64f0) at gmain.c:4098
#9  0x00007fe90124bc50 in meta_run () at ../src/core/main.c:685
#10 0x0000562b5a245536 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:499
(gdb) c
Continuing.
^C
Thread 1 "gnome-shell" received signal SIGINT, Interrupt.
g_mutex_unlock (mutex=0x7fe903080688 <g.aliases_lock>) at gthread-posix.c:1355
1355	  if G_UNLIKELY (prev != 1)
(gdb) bt
#0  g_mutex_unlock (mutex=0x7fe903080688 <g.aliases_lock>) at gthread-posix.c:1355
#1  0x00007fe902d90639 in g_get_charset (charset=charset@entry=0x0) at gcharset.c:196
#2  0x00007fe8f6a2c1c4 in date_time_format (format=0x7fe8f6a366fe "%b %-e_%R", datetime=0x562b5cb60600) at ../libgnome-desktop/gnome-wall-clock.c:380
#3  gnome_wall_clock_string_for_datetime (self=self@entry=0x562b5dafa750, now=now@entry=0x562b5cb60600, clock_format=clock_format@entry=G_DESKTOP_CLOCK_FORMAT_24H, show_weekday=show_weekday@entry=0, 
    show_full_date=show_full_date@entry=1, show_seconds=show_seconds@entry=0) at ../libgnome-desktop/gnome-wall-clock.c:380
#4  0x00007fe8f6a2c574 in update_clock (data=0x562b5dafa750, data@entry=<error reading variable: value has been optimized out>) at ../libgnome-desktop/gnome-wall-clock.c:418
#5  0x00007fe8f6a31a86 in g_datetime_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../libgnome-desktop/gnome-datetime-source.c:145
#6  0x00007fe902db667d in g_main_dispatch (context=0x562b5bfc9660) at gmain.c:3176
#7  g_main_context_dispatch (context=context@entry=0x562b5bfc9660) at gmain.c:3829
#8  0x00007fe902db6a48 in g_main_context_iterate (context=0x562b5bfc9660, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3902
#9  0x00007fe902db6d72 in g_main_loop_run (loop=0x562b5c6d64f0) at gmain.c:4098
#10 0x00007fe90124bc50 in meta_run () at ../src/core/main.c:685
#11 0x0000562b5a245536 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:499

This is actually quite often seen

output from perf attached

Comment 11 Tomas Pelka 2019-07-17 13:39:16 UTC
(In reply to Ray Strode [halfline] from comment #8)
> okay so it seems like a stream of signals are coming in over the message bus.
> 
> if you run
> 
> busctl --user monitor 
> 
> does it have a constant flurry of activity?

Assuming this is meant to be executed from gdm? If so 

sudo -u gdm busctl --user monitor
Failed to set address: No such file or directory

If this was meant to be executed as root than no activity is displayed.

busctl --user monitor
Monitoring bus message stream.

Even if I try to monitor the particular service it displays nothing:
busctl monitor session-c1.scope
Monitoring bus message stream.

Comment 12 Jiri Koten 2019-07-17 14:59:04 UTC
Created attachment 1591437 [details]
gdm bus

When I monitor the gdm bus while in user session, there is no activity - this is while the gnome-shell gdm process is eating 100% CPU.

When I switched the user I got blank screen instead of the login screen and there was an activity on the bus - please see attachment.

Comment 13 Vladimir Benes 2019-07-18 08:37:31 UTC
I can see similar backtrace from frozen GS user session after resume.. bt attached here:  https://bugzilla.redhat.com/show_bug.cgi?id=1718807#c17

I've forced GDM not to run on Wayland and my user session is on X too.

Comment 14 Ray Strode [halfline] 2019-07-23 19:40:42 UTC
so I took at look at this today and found the issue.  It's this code:

 priv->inhibit_sleep_fd = g_variant_get_handle (fd_variant);
... 
 close (priv->inhibit_sleep_fd);   

The problem is the first line there isn't returning and fd, it's returning an index which is supposed to be used to fetch the fd.

So mutter was doing close (0).  Suspend once, and it closes stdin, suspend twice, and it closes the timerfd (which now takes fd 0) behind the timerfd's back leading an infinite loop.

This bug existed in the 8.0 branch for a while, but I fixed it before release here:

    Fix bug in suspend/resume corruption patch leading to inhibit fd not getting fetched

but when the patch got forward ported for the 3.32 rebase, it was the older version that got inadvertently rebased.

fix should be trivial, just forward port the latest 8.0.0 patch, so devack

Comment 16 Florian Müllner 2019-07-24 13:22:37 UTC
*** Bug 1718807 has been marked as a duplicate of this bug. ***

Comment 17 Martin Krajnak 2019-07-25 08:41:33 UTC
Thanks a lot to everyone who helped working on this.

Fixed in mutter-3.32.2-7.el8

Comment 18 Lyude 2019-08-26 16:50:27 UTC
*** Bug 1732610 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2019-11-05 22:14:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3553