RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1724551 - gnome-shell process eat 100% CPU after resume
Summary: gnome-shell process eat 100% CPU after resume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: mutter
Version: 8.1
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: 8.1
Assignee: Jonas Ådahl
QA Contact: Desktop QE
URL:
Whiteboard:
: 1718807 1732610 (view as bug list)
Depends On:
Blocks: 1660905 1664096 1678350
TreeView+ depends on / blocked
 
Reported: 2019-06-27 09:30 UTC by Tomas Pelka
Modified: 2020-11-14 07:50 UTC (History)
8 users (show)

Fixed In Version: mutter-3.32.2-7.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-05 22:14:25 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
journal.log (645.76 KB, text/plain)
2019-06-27 09:30 UTC, Tomas Pelka
no flags Details
gdm session journal (4.84 KB, text/plain)
2019-06-28 08:47 UTC, Jiri Koten
no flags Details
backtrace (30.11 KB, text/plain)
2019-06-28 08:48 UTC, Jiri Koten
no flags Details
perf.log (76.26 KB, text/plain)
2019-07-17 13:26 UTC, Tomas Pelka
no flags Details
gdm bus (1.86 MB, text/plain)
2019-07-17 14:59 UTC, Jiri Koten
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2019:3553 0 None None None 2019-11-05 22:14:37 UTC

Description Tomas Pelka 2019-06-27 09:30:25 UTC
Created attachment 1585112 [details]
journal.log

Description of problem:
After resume gdm process work out CPU to 100%

Version-Release number of selected component (if applicable):
gdm-3.28.3-20.el8
gnome-shell-3.32.2-4.el8
gnome-session-3.28.1-6.el8

How reproducible:
50%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Adding journal log, 8:29 starts the suspend

Comment 1 Tomas Pelka 2019-06-27 10:22:54 UTC
Well actually gdm is not a process so it is rather gnome-shell

Comment 2 Jiri Koten 2019-06-28 08:47:58 UTC
Created attachment 1585495 [details]
gdm session journal

I'm also able to reproduce, gnome-shell running under gdm user is eating 100% cpu.


journalctl -b -u session-c1.scope

Comment 3 Jiri Koten 2019-06-28 08:48:29 UTC
Created attachment 1585496 [details]
backtrace

Comment 4 Ray Strode [halfline] 2019-06-28 14:03:59 UTC
probably mutter...

jkoten, can you do 

(gdb) print (char *) g_source_get_name (source)

in gdb?

Comment 7 Jiri Koten 2019-07-03 15:21:55 UTC
(In reply to Ray Strode [halfline] from comment #4)
> probably mutter...
> 
> jkoten, can you do 
> 
> (gdb) print (char *) g_source_get_name (source)
> 
> in gdb?

$1 = 0x7fdc38440a00 "[gio] emit_signal_instance_in_idle_cb"

Also I most often got "No symbol "source" in current context." when I attach to the running proccess.

The reproducer is changing screen configuration between suspend and resume e.g., use two external monitors; suspend; undock; resume

Comment 8 Ray Strode [halfline] 2019-07-10 18:54:41 UTC
okay so it seems like a stream of signals are coming in over the message bus.

if you run

busctl --user monitor 

does it have a constant flurry of activity?

Comment 10 Tomas Pelka 2019-07-17 13:26:37 UTC
Created attachment 1591413 [details]
perf.log

tpopela kindly helped us to get a bit more info:

Random bt's from gdb:

(gdb) bt
#0  0x00007fe900ed11de in timerfd_settime () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007fe8f6a31c57 in g_datetime_source_init_timerfd (unix_seconds=1563368880, expected_now_seconds=1563368879, datetime_source=0x562b5ead1b30) at ../libgnome-desktop/gnome-datetime-source.c:193
#2  _gnome_datetime_source_new (now=now@entry=0x562b5cb60600, expiry=expiry@entry=0x562b5cb61660, cancel_on_set=cancel_on_set@entry=1) at ../libgnome-desktop/gnome-datetime-source.c:275
#3  0x00007fe8f6a2c505 in update_clock (data=0x562b5dafa750, data@entry=<error reading variable: value has been optimized out>) at ../libgnome-desktop/gnome-wall-clock.c:411
#4  0x00007fe8f6a31a86 in g_datetime_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../libgnome-desktop/gnome-datetime-source.c:145
#5  0x00007fe902db667d in g_main_dispatch (context=0x562b5bfc9660) at gmain.c:3176
#6  g_main_context_dispatch (context=context@entry=0x562b5bfc9660) at gmain.c:3829
#7  0x00007fe902db6a48 in g_main_context_iterate (context=0x562b5bfc9660, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3902
#8  0x00007fe902db6d72 in g_main_loop_run (loop=0x562b5c6d64f0) at gmain.c:4098
#9  0x00007fe90124bc50 in meta_run () at ../src/core/main.c:685
#10 0x0000562b5a245536 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:499
(gdb) c
Continuing.
^C
Thread 1 "gnome-shell" received signal SIGINT, Interrupt.
g_mutex_unlock (mutex=0x7fe903080688 <g.aliases_lock>) at gthread-posix.c:1355
1355	  if G_UNLIKELY (prev != 1)
(gdb) bt
#0  g_mutex_unlock (mutex=0x7fe903080688 <g.aliases_lock>) at gthread-posix.c:1355
#1  0x00007fe902d90639 in g_get_charset (charset=charset@entry=0x0) at gcharset.c:196
#2  0x00007fe8f6a2c1c4 in date_time_format (format=0x7fe8f6a366fe "%b %-e_%R", datetime=0x562b5cb60600) at ../libgnome-desktop/gnome-wall-clock.c:380
#3  gnome_wall_clock_string_for_datetime (self=self@entry=0x562b5dafa750, now=now@entry=0x562b5cb60600, clock_format=clock_format@entry=G_DESKTOP_CLOCK_FORMAT_24H, show_weekday=show_weekday@entry=0, 
    show_full_date=show_full_date@entry=1, show_seconds=show_seconds@entry=0) at ../libgnome-desktop/gnome-wall-clock.c:380
#4  0x00007fe8f6a2c574 in update_clock (data=0x562b5dafa750, data@entry=<error reading variable: value has been optimized out>) at ../libgnome-desktop/gnome-wall-clock.c:418
#5  0x00007fe8f6a31a86 in g_datetime_source_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../libgnome-desktop/gnome-datetime-source.c:145
#6  0x00007fe902db667d in g_main_dispatch (context=0x562b5bfc9660) at gmain.c:3176
#7  g_main_context_dispatch (context=context@entry=0x562b5bfc9660) at gmain.c:3829
#8  0x00007fe902db6a48 in g_main_context_iterate (context=0x562b5bfc9660, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3902
#9  0x00007fe902db6d72 in g_main_loop_run (loop=0x562b5c6d64f0) at gmain.c:4098
#10 0x00007fe90124bc50 in meta_run () at ../src/core/main.c:685
#11 0x0000562b5a245536 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:499

This is actually quite often seen

output from perf attached

Comment 11 Tomas Pelka 2019-07-17 13:39:16 UTC
(In reply to Ray Strode [halfline] from comment #8)
> okay so it seems like a stream of signals are coming in over the message bus.
> 
> if you run
> 
> busctl --user monitor 
> 
> does it have a constant flurry of activity?

Assuming this is meant to be executed from gdm? If so 

sudo -u gdm busctl --user monitor
Failed to set address: No such file or directory

If this was meant to be executed as root than no activity is displayed.

busctl --user monitor
Monitoring bus message stream.

Even if I try to monitor the particular service it displays nothing:
busctl monitor session-c1.scope
Monitoring bus message stream.

Comment 12 Jiri Koten 2019-07-17 14:59:04 UTC
Created attachment 1591437 [details]
gdm bus

When I monitor the gdm bus while in user session, there is no activity - this is while the gnome-shell gdm process is eating 100% CPU.

When I switched the user I got blank screen instead of the login screen and there was an activity on the bus - please see attachment.

Comment 13 Vladimir Benes 2019-07-18 08:37:31 UTC
I can see similar backtrace from frozen GS user session after resume.. bt attached here:  https://bugzilla.redhat.com/show_bug.cgi?id=1718807#c17

I've forced GDM not to run on Wayland and my user session is on X too.

Comment 14 Ray Strode [halfline] 2019-07-23 19:40:42 UTC
so I took at look at this today and found the issue.  It's this code:

 priv->inhibit_sleep_fd = g_variant_get_handle (fd_variant);
... 
 close (priv->inhibit_sleep_fd);   

The problem is the first line there isn't returning and fd, it's returning an index which is supposed to be used to fetch the fd.

So mutter was doing close (0).  Suspend once, and it closes stdin, suspend twice, and it closes the timerfd (which now takes fd 0) behind the timerfd's back leading an infinite loop.

This bug existed in the 8.0 branch for a while, but I fixed it before release here:

    Fix bug in suspend/resume corruption patch leading to inhibit fd not getting fetched

but when the patch got forward ported for the 3.32 rebase, it was the older version that got inadvertently rebased.

fix should be trivial, just forward port the latest 8.0.0 patch, so devack

Comment 16 Florian Müllner 2019-07-24 13:22:37 UTC
*** Bug 1718807 has been marked as a duplicate of this bug. ***

Comment 17 Martin Krajnak 2019-07-25 08:41:33 UTC
Thanks a lot to everyone who helped working on this.

Fixed in mutter-3.32.2-7.el8

Comment 18 Lyude 2019-08-26 16:50:27 UTC
*** Bug 1732610 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2019-11-05 22:14:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3553


Note You need to log in before you can comment on or make changes to this bug.