Bug 1899260 - P15: Gnome shell crashes when switching to external mode only when connected via USB-c to thunderbolt port
Summary: P15: Gnome shell crashes when switching to external mode only when connected ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: mutter
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jonas Ådahl
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1816645 1816768 1902079
TreeView+ depends on / blocked
 
Reported: 2020-11-18 18:47 UTC by Mark Pearson
Modified: 2021-03-10 08:31 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
: 1902079 (view as bug list)
Environment:
Last Closed: 2021-03-10 08:31:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
sos report (15.81 MB, application/x-xz)
2020-11-18 18:47 UTC, Mark Pearson
no flags Details
journalctl log (3.58 MB, text/plain)
2020-11-19 14:27 UTC, Mark Pearson
no flags Details
journalctl log2 (15.67 MB, text/plain)
2020-11-19 20:11 UTC, Mark Pearson
no flags Details
backtrace log (10.35 KB, text/plain)
2020-11-19 20:11 UTC, Mark Pearson
no flags Details
modetest output (21.13 KB, text/plain)
2020-11-26 17:49 UTC, Mark Pearson
no flags Details
monitor resources (4.46 KB, text/plain)
2020-11-26 17:50 UTC, Mark Pearson
no flags Details
monitor state (2.36 KB, text/plain)
2020-11-26 17:50 UTC, Mark Pearson
no flags Details
modetest output (21.13 KB, text/plain)
2020-11-26 18:01 UTC, Mark Pearson
no flags Details
monitor resources (10.57 KB, text/plain)
2020-11-26 18:02 UTC, Mark Pearson
no flags Details
monitor state (6.81 KB, text/plain)
2020-11-26 18:02 UTC, Mark Pearson
no flags Details
dmesg log - on first insert as it worked (100.41 KB, text/plain)
2020-11-27 04:05 UTC, Mark Pearson
no flags Details
log on failed insert. (105.96 KB, text/plain)
2020-11-27 04:05 UTC, Mark Pearson
no flags Details
display settings looking good (151.94 KB, image/png)
2020-11-27 04:06 UTC, Mark Pearson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNOME Gitlab GNOME mutter merge_requests 1607 0 None None None 2020-11-27 08:14:19 UTC

Description Mark Pearson 2020-11-18 18:47:11 UTC
Created attachment 1730677 [details]
sos report

Description of problem: 
Note - this probably isn't a gnome-shell issue, I think it's related to x, but it's the piece that crashes and I'm hoping this can be redirected appropriately.

Running Fedora33 latest on a P15 connected to a P27 monitor USB-c port via the P15 Thunderbolt port. Using Hybrid mode and nouveau driver.

When changing display to be external only gnome-shell crashes and you get logged out and back at the login screen.

I've attached the sosreport output. The var log also showed:
Nov 18 13:34:07 localhost.localdomain gnome-shell[1838]: The program 'gnome-shell' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadValue (integer parameter out of range for operation)'.
(Details: serial 621 error_code 2 request_code 12 (core protocol) minor_code 0)
(Note to programmers: normally, X errors are reported asynchronously;
that is, you will receive the error a while after causing it.
To debug your program, run it with the GDK_SYNCHRONIZE environment
variable to change this behavior. You can then get a meaningful
backtrace from your debugger if you break on the gdk_x_error() function.)
Nov 18 13:34:07 localhost.localdomain gnome-shell[1838]: == Stack trace for context 0x562b5a2fd0f0 ==

Version-Release number of selected component (if applicable): F33 latest


How reproducible: 100%


Steps to Reproduce:
1.Connect system via USB-c to monitor (using TBT in laptop may or may not be critical)
2. change to external display only
3.

Actual results: gnome-shell crashes


Expected results: Switch to external display only


Additional info: Let me know if there is any debug information I can collect.
Afraid this issue is likely to gate releasing our P15+Fedora offering

Comment 1 Jonas Ådahl 2020-11-19 07:44:45 UTC
Was this using the Wayland or X11 session?

Could you attach:

 1. The journal log entries from the when the crash occurred
 2. The backtrace of gnome-shell, Xorg and Xwayland, whatever exists. Make sure to install debuginfo packages first.

Comment 2 Mark Pearson 2020-11-19 14:27:45 UTC
Created attachment 1730951 [details]
journalctl log

Comment 3 Mark Pearson 2020-11-19 14:29:56 UTC
journalctl log attached.

Running under Wayland

Can you give me more steps for getting the backtraces please? I've got a day full of meetings and afraid I'm a bit limited on time to figure it out myself so any shortcuts would be really appreciated :)

Thanks
Mark

Comment 4 Jonas Ådahl 2020-11-19 14:45:09 UTC
First install debuginfo symbols:

    sudo dnf debuginfo-install xorg-x11-server-Xwayland mutter gnome-shell glib2

Then look up the crash that happens:

    coredumpctl list

Find the <pid> of the crashed process. Make sure it says the dump is "present"; if it is not, reproduce again. Then run

    coredumpctl gdb <pid>

That should give you a gdb prompt. In that type

    backtrace full

Copy the result of that into a file and attach here.

Comment 5 Mark Pearson 2020-11-19 18:32:58 UTC
This doesn't look as useful as I hoped it would be
Mark

(gdb) backtrace full
#0  0x00007f9c58d139d5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000557704f99ee2 in dump_gjs_stack_on_signal_handler (signo=5) at ../src/main.c:392
        sa = {__sigaction_handler = {sa_handler = 0x557704f99d50 <dump_gjs_stack_alarm_sigaction>, sa_sigaction = 0x557704f99d50 <dump_gjs_stack_alarm_sigaction>}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0}
        i = 65
#2  <signal handler called>
No symbol table info available.
#3  0x00007f9c59af9d9b in g_log_writer_default () from /lib64/libglib-2.0.so.0
No symbol table info available.
#4  0x00007f9c59af57c7 in g_log_structured_array () from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00007f9c59af59bf in g_log_structured_standard () from /lib64/libglib-2.0.so.0
No symbol table info available.
#6  0x00007f9c58bb4e84 in gdk_x_error.lto_priv () from /lib64/libgdk-3.so.0
No symbol table info available.
#7  0x00007f9c5886f14b in _XError () from /lib64/libX11.so.6
No symbol table info available.
#8  0x00007f9c5886f227 in handle_error () from /lib64/libX11.so.6
No symbol table info available.
#9  0x00007f9c5886f2c5 in handle_response () from /lib64/libX11.so.6
No symbol table info available.
#10 0x00007f9c5886f372 in _XEventsQueued () from /lib64/libX11.so.6
No symbol table info available.
#11 0x00007f9c5885b711 in XPending () from /lib64/libX11.so.6
No symbol table info available.
#12 0x00007f9c58ba856f in gdk_event_source_prepare () from /lib64/libgdk-3.so.0
No symbol table info available.
#13 0x00007f9c59aeec5a in g_main_context_prepare () from /lib64/libglib-2.0.so.0
No symbol table info available.
#14 0x00007f9c59b3fc4b in g_main_context_iterate.constprop () from /lib64/libglib-2.0.so.0
No symbol table info available.
#15 0x00007f9c59aee6ab in g_main_loop_run () from /lib64/libglib-2.0.so.0
No symbol table info available.
#16 0x00007f9c58f61426 in meta_run () at ../src/core/main.c:673
No locals.
#17 0x0000557704f99848 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:550
        ctx = <optimized out>
        error = 0x0
        ecode = <optimized out>

Comment 6 Jonas Ådahl 2020-11-19 19:58:13 UTC
Yea, that didn't help too much. Another thing that might help shine some light on what is causing issues would be to add

    export GDK_SYNCHRONIZE=1
    export MUTTER_VERBOSE=1

to ~/.bashrc, log out, log back in, and reproduce the issue again. That should result in a lot more things logged to the journal, and hopefully also a bit more relevant backtrace.

Comment 7 Mark Pearson 2020-11-19 20:11:11 UTC
Created attachment 1731041 [details]
journalctl log2

Comment 8 Mark Pearson 2020-11-19 20:11:41 UTC
Created attachment 1731042 [details]
backtrace log

Comment 9 Mark Pearson 2020-11-19 20:16:57 UTC
logs attached.
Holler if you need anything else :)

Mark

Comment 10 Jonas Ådahl 2020-11-19 20:27:14 UTC
(In reply to Mark Pearson from comment #9)
> logs attached.
> Holler if you need anything else :)
> 
> Mark

Thanks, that shines some useful light; it does look like a mutter issue; it seems a fail safe related to handle temporarily being headless gracefully for some reason is failing to have the intended effect.

Feel free to remove the two added .bashrc lines, they may negatively affect performance.

Comment 11 Jonas Ådahl 2020-11-19 20:53:13 UTC
Could you run the following commands and report back the result?

    coredumpctl gdb 1812

then

    frame 24
    print *config
    print *config->key
    print *(MetaLogicalMonitorConfig*)config->logical_monitor_configs->data

Comment 12 Mark Pearson 2020-11-24 20:23:55 UTC
sorry for the slow reply - I missed the notification in the swamp of my inbox:

(gdb) print *config
$1 = {parent = {g_type_instance = {g_class = 0x56085ec94590}, ref_count = 1, qdata = 0x0}, key = 0x560861d60010, logical_monitor_configs = 0x0, disabled_monitor_specs = 0x560861c7bec0, 
  flags = META_MONITORS_CONFIG_FLAG_NONE, layout_mode = META_LOGICAL_MONITOR_LAYOUT_MODE_PHYSICAL, switch_config = META_MONITOR_SWITCH_CONFIG_EXTERNAL}
(gdb) print *config->key
$2 = {monitor_specs = 0x560861c7bee0}

Can't print logical_monitor_configs as it's null...

Thanks!
Mark

Comment 13 Jonas Ådahl 2020-11-26 17:01:01 UTC
So looking at the debug information so far, it looks like mutter tries to create a "external only" display, which it will only ever attempt if there are more than 1 monitor available, but all the monitors it finds it thinks are laptop panels, and thus fails due to this. It's easy to avoid the crash in this situation, by making mutter handle the case when there are multiple seemingly "built in" displays, which should definitely be done, but this would likely cause the "external only" binding to misbehave.

So in order to find out why mutter gets confused about what monitor is external and what is the built in panel, can you provide p15-modetest.tx by running the following:

    sudo dnf install drm-utils
    sudo modetest > p15-modetest.txt

Then also the following two generated files (p15-monitor-resources.txt and p15-monitor-state.txt):

    gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetResources > p15-monitor-resources.txt
    gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetCurrentState > p15-monitor-state.txt

Comment 14 Mark Pearson 2020-11-26 17:49:51 UTC
Created attachment 1733848 [details]
modetest output

Comment 15 Mark Pearson 2020-11-26 17:50:18 UTC
Created attachment 1733849 [details]
monitor resources

Comment 16 Mark Pearson 2020-11-26 17:50:39 UTC
Created attachment 1733850 [details]
monitor state

Comment 17 Mark Pearson 2020-11-26 17:52:01 UTC
Crud - ignore those attachments....it would help if I actually attached the monitor. Durrrr

Comment 18 Mark Pearson 2020-11-26 18:01:48 UTC
Created attachment 1733854 [details]
modetest output

Comment 19 Mark Pearson 2020-11-26 18:02:11 UTC
Created attachment 1733855 [details]
monitor resources

Comment 20 Mark Pearson 2020-11-26 18:02:32 UTC
Created attachment 1733856 [details]
monitor state

Comment 21 Mark Pearson 2020-11-26 18:03:46 UTC
OK - attachments updated with versions run whilst the monitor is connected. Sorry about that.
Let me know if anything looks wonky or you need anything else
Mark

Comment 22 Jonas Ådahl 2020-11-26 18:20:23 UTC
Thanks; that confirms my suspicion, there are two monitors connected via Embedded DisplayPort (eDP) connectors, one to eDP-1 (SDC 0x4141) and one to eDP-4 (LEN T27p-10). Being connected via such a connector is how mutter decides whether it thinks a panel is built in or not, so the way to fix this is for mutter to be a bit more pick in how it decides this.

Comment 23 Jonas Ådahl 2020-11-26 19:49:32 UTC
So FWIW, I created a kernel bug clone of this, as after having discussed with David Airlie, we concluded that it might also be a kernel bug. I'll leave this one to be about mutter handling things more gracefully, and the clone about the potential kernel issue.

Comment 24 Mark Pearson 2020-11-26 19:51:43 UTC
Sounds good - thanks for digging into this, really appreciated.
Mark

Comment 25 Mark Pearson 2020-11-27 04:05:27 UTC
Created attachment 1733985 [details]
dmesg log - on first insert as it worked

Comment 26 Mark Pearson 2020-11-27 04:05:57 UTC
Created attachment 1733986 [details]
log on failed insert.

Comment 27 Mark Pearson 2020-11-27 04:06:27 UTC
Created attachment 1733987 [details]
display settings looking good

Comment 28 Mark Pearson 2020-11-27 04:18:30 UTC
(Sorry - the attachments really belong in https://bugzilla.redhat.com/show_bug.cgi?id=1896904)
But I can confirm that Karol's patch for 1896904 fixes this issue - external display now works correctly

Just need to figure out the final nouveau timeout referenced in 1896904 and we'll be in good shape

Thanks
mark


Note You need to log in before you can comment on or make changes to this bug.