Bug 1902079 - P15: Gnome shell crashes when switching to external mode only when connected via USB-c to thunderbolt port
Summary: P15: Gnome shell crashes when switching to external mode only when connected ...
Keywords:
Status: CLOSED DUPLICATE of bug 1896904
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1899260
Blocks: 1816645 1816768
TreeView+ depends on / blocked
 
Reported: 2020-11-26 19:26 UTC by Jonas Ådahl
Modified: 2020-11-26 21:39 UTC (History)
28 users (show)

Fixed In Version:
Clone Of: 1899260
Environment:
Last Closed: 2020-11-26 21:39:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jonas Ådahl 2020-11-26 19:26:03 UTC
This is a clone of Bug #1899260 which is about a crash in mutter. However, after having discussed this with David Airlie, we concluded that it might be really caused by the nouveau driver misidentifying the connector as a Embedded DisplayPort (eDP) connector, causing confusion in mutter.

So while leaving Bug #1899260 to be about mutter handling the appearance of multiple eDP connectors more gracefully, this bug is about the potential kernel driver bug causing the confusion to begin with.


+++ This bug was initially created as a clone of Bug #1899260 +++

Description of problem: 
Note - this probably isn't a gnome-shell issue, I think it's related to x, but it's the piece that crashes and I'm hoping this can be redirected appropriately.

Running Fedora33 latest on a P15 connected to a P27 monitor USB-c port via the P15 Thunderbolt port. Using Hybrid mode and nouveau driver.

When changing display to be external only gnome-shell crashes and you get logged out and back at the login screen.

I've attached the sosreport output. The var log also showed:
Nov 18 13:34:07 localhost.localdomain gnome-shell[1838]: The program 'gnome-shell' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadValue (integer parameter out of range for operation)'.
(Details: serial 621 error_code 2 request_code 12 (core protocol) minor_code 0)
(Note to programmers: normally, X errors are reported asynchronously;
that is, you will receive the error a while after causing it.
To debug your program, run it with the GDK_SYNCHRONIZE environment
variable to change this behavior. You can then get a meaningful
backtrace from your debugger if you break on the gdk_x_error() function.)
Nov 18 13:34:07 localhost.localdomain gnome-shell[1838]: == Stack trace for context 0x562b5a2fd0f0 ==

Version-Release number of selected component (if applicable): F33 latest


How reproducible: 100%


Steps to Reproduce:
1.Connect system via USB-c to monitor (using TBT in laptop may or may not be critical)
2. change to external display only
3.

Actual results: gnome-shell crashes


Expected results: Switch to external display only


Additional info: Let me know if there is any debug information I can collect.
Afraid this issue is likely to gate releasing our P15+Fedora offering

--- Additional comment from Jonas Ådahl on 2020-11-19 07:44:45 UTC ---

Was this using the Wayland or X11 session?

Could you attach:

 1. The journal log entries from the when the crash occurred
 2. The backtrace of gnome-shell, Xorg and Xwayland, whatever exists. Make sure to install debuginfo packages first.

--- Additional comment from Mark Pearson on 2020-11-19 14:27:45 UTC ---



--- Additional comment from Mark Pearson on 2020-11-19 14:29:56 UTC ---

journalctl log attached.

Running under Wayland

Can you give me more steps for getting the backtraces please? I've got a day full of meetings and afraid I'm a bit limited on time to figure it out myself so any shortcuts would be really appreciated :)

Thanks
Mark

--- Additional comment from Jonas Ådahl on 2020-11-19 14:45:09 UTC ---

First install debuginfo symbols:

    sudo dnf debuginfo-install xorg-x11-server-Xwayland mutter gnome-shell glib2

Then look up the crash that happens:

    coredumpctl list

Find the <pid> of the crashed process. Make sure it says the dump is "present"; if it is not, reproduce again. Then run

    coredumpctl gdb <pid>

That should give you a gdb prompt. In that type

    backtrace full

Copy the result of that into a file and attach here.

--- Additional comment from Mark Pearson on 2020-11-19 18:32:58 UTC ---

This doesn't look as useful as I hoped it would be
Mark

(gdb) backtrace full
#0  0x00007f9c58d139d5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000557704f99ee2 in dump_gjs_stack_on_signal_handler (signo=5) at ../src/main.c:392
        sa = {__sigaction_handler = {sa_handler = 0x557704f99d50 <dump_gjs_stack_alarm_sigaction>, sa_sigaction = 0x557704f99d50 <dump_gjs_stack_alarm_sigaction>}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0}
        i = 65
#2  <signal handler called>
No symbol table info available.
#3  0x00007f9c59af9d9b in g_log_writer_default () from /lib64/libglib-2.0.so.0
No symbol table info available.
#4  0x00007f9c59af57c7 in g_log_structured_array () from /lib64/libglib-2.0.so.0
No symbol table info available.
#5  0x00007f9c59af59bf in g_log_structured_standard () from /lib64/libglib-2.0.so.0
No symbol table info available.
#6  0x00007f9c58bb4e84 in gdk_x_error.lto_priv () from /lib64/libgdk-3.so.0
No symbol table info available.
#7  0x00007f9c5886f14b in _XError () from /lib64/libX11.so.6
No symbol table info available.
#8  0x00007f9c5886f227 in handle_error () from /lib64/libX11.so.6
No symbol table info available.
#9  0x00007f9c5886f2c5 in handle_response () from /lib64/libX11.so.6
No symbol table info available.
#10 0x00007f9c5886f372 in _XEventsQueued () from /lib64/libX11.so.6
No symbol table info available.
#11 0x00007f9c5885b711 in XPending () from /lib64/libX11.so.6
No symbol table info available.
#12 0x00007f9c58ba856f in gdk_event_source_prepare () from /lib64/libgdk-3.so.0
No symbol table info available.
#13 0x00007f9c59aeec5a in g_main_context_prepare () from /lib64/libglib-2.0.so.0
No symbol table info available.
#14 0x00007f9c59b3fc4b in g_main_context_iterate.constprop () from /lib64/libglib-2.0.so.0
No symbol table info available.
#15 0x00007f9c59aee6ab in g_main_loop_run () from /lib64/libglib-2.0.so.0
No symbol table info available.
#16 0x00007f9c58f61426 in meta_run () at ../src/core/main.c:673
No locals.
#17 0x0000557704f99848 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:550
        ctx = <optimized out>
        error = 0x0
        ecode = <optimized out>

--- Additional comment from Jonas Ådahl on 2020-11-19 19:58:13 UTC ---

Yea, that didn't help too much. Another thing that might help shine some light on what is causing issues would be to add

    export GDK_SYNCHRONIZE=1
    export MUTTER_VERBOSE=1

to ~/.bashrc, log out, log back in, and reproduce the issue again. That should result in a lot more things logged to the journal, and hopefully also a bit more relevant backtrace.

--- Additional comment from Mark Pearson on 2020-11-19 20:11:11 UTC ---



--- Additional comment from Mark Pearson on 2020-11-19 20:11:41 UTC ---



--- Additional comment from Mark Pearson on 2020-11-19 20:16:57 UTC ---

logs attached.
Holler if you need anything else :)

Mark

--- Additional comment from Jonas Ådahl on 2020-11-19 20:27:14 UTC ---

(In reply to Mark Pearson from comment #9)
> logs attached.
> Holler if you need anything else :)
> 
> Mark

Thanks, that shines some useful light; it does look like a mutter issue; it seems a fail safe related to handle temporarily being headless gracefully for some reason is failing to have the intended effect.

Feel free to remove the two added .bashrc lines, they may negatively affect performance.

--- Additional comment from Jonas Ådahl on 2020-11-19 20:53:13 UTC ---

Could you run the following commands and report back the result?

    coredumpctl gdb 1812

then

    frame 24
    print *config
    print *config->key
    print *(MetaLogicalMonitorConfig*)config->logical_monitor_configs->data

--- Additional comment from Mark Pearson on 2020-11-24 20:23:55 UTC ---

sorry for the slow reply - I missed the notification in the swamp of my inbox:

(gdb) print *config
$1 = {parent = {g_type_instance = {g_class = 0x56085ec94590}, ref_count = 1, qdata = 0x0}, key = 0x560861d60010, logical_monitor_configs = 0x0, disabled_monitor_specs = 0x560861c7bec0, 
  flags = META_MONITORS_CONFIG_FLAG_NONE, layout_mode = META_LOGICAL_MONITOR_LAYOUT_MODE_PHYSICAL, switch_config = META_MONITOR_SWITCH_CONFIG_EXTERNAL}
(gdb) print *config->key
$2 = {monitor_specs = 0x560861c7bee0}

Can't print logical_monitor_configs as it's null...

Thanks!
Mark

--- Additional comment from Jonas Ådahl on 2020-11-26 17:01:01 UTC ---

So looking at the debug information so far, it looks like mutter tries to create a "external only" display, which it will only ever attempt if there are more than 1 monitor available, but all the monitors it finds it thinks are laptop panels, and thus fails due to this. It's easy to avoid the crash in this situation, by making mutter handle the case when there are multiple seemingly "built in" displays, which should definitely be done, but this would likely cause the "external only" binding to misbehave.

So in order to find out why mutter gets confused about what monitor is external and what is the built in panel, can you provide p15-modetest.tx by running the following:

    sudo dnf install drm-utils
    sudo modetest > p15-modetest.txt

Then also the following two generated files (p15-monitor-resources.txt and p15-monitor-state.txt):

    gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetResources > p15-monitor-resources.txt
    gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetCurrentState > p15-monitor-state.txt

--- Additional comment from Mark Pearson on 2020-11-26 17:49:51 UTC ---



--- Additional comment from Mark Pearson on 2020-11-26 17:50:18 UTC ---



--- Additional comment from Mark Pearson on 2020-11-26 17:50:39 UTC ---



--- Additional comment from Mark Pearson on 2020-11-26 17:52:01 UTC ---

Crud - ignore those attachments....it would help if I actually attached the monitor. Durrrr

--- Additional comment from Mark Pearson on 2020-11-26 18:01:48 UTC ---



--- Additional comment from Mark Pearson on 2020-11-26 18:02:11 UTC ---



--- Additional comment from Mark Pearson on 2020-11-26 18:02:32 UTC ---



--- Additional comment from Mark Pearson on 2020-11-26 18:03:46 UTC ---

OK - attachments updated with versions run whilst the monitor is connected. Sorry about that.
Let me know if anything looks wonky or you need anything else
Mark

--- Additional comment from Jonas Ådahl on 2020-11-26 18:20:23 UTC ---

Thanks; that confirms my suspicion, there are two monitors connected via Embedded DisplayPort (eDP) connectors, one to eDP-1 (SDC 0x4141) and one to eDP-4 (LEN T27p-10). Being connected via such a connector is how mutter decides whether it thinks a panel is built in or not, so the way to fix this is for mutter to be a bit more pick in how it decides this.

Comment 1 Jonas Ådahl 2020-11-26 21:39:42 UTC
Likely a duplicate of bug 1896904, so closing as such.

*** This bug has been marked as a duplicate of bug 1896904 ***


Note You need to log in before you can comment on or make changes to this bug.