Created attachment 1730677 [details] sos report Description of problem: Note - this probably isn't a gnome-shell issue, I think it's related to x, but it's the piece that crashes and I'm hoping this can be redirected appropriately. Running Fedora33 latest on a P15 connected to a P27 monitor USB-c port via the P15 Thunderbolt port. Using Hybrid mode and nouveau driver. When changing display to be external only gnome-shell crashes and you get logged out and back at the login screen. I've attached the sosreport output. The var log also showed: Nov 18 13:34:07 localhost.localdomain gnome-shell[1838]: The program 'gnome-shell' received an X Window System error. This probably reflects a bug in the program. The error was 'BadValue (integer parameter out of range for operation)'. (Details: serial 621 error_code 2 request_code 12 (core protocol) minor_code 0) (Note to programmers: normally, X errors are reported asynchronously; that is, you will receive the error a while after causing it. To debug your program, run it with the GDK_SYNCHRONIZE environment variable to change this behavior. You can then get a meaningful backtrace from your debugger if you break on the gdk_x_error() function.) Nov 18 13:34:07 localhost.localdomain gnome-shell[1838]: == Stack trace for context 0x562b5a2fd0f0 == Version-Release number of selected component (if applicable): F33 latest How reproducible: 100% Steps to Reproduce: 1.Connect system via USB-c to monitor (using TBT in laptop may or may not be critical) 2. change to external display only 3. Actual results: gnome-shell crashes Expected results: Switch to external display only Additional info: Let me know if there is any debug information I can collect. Afraid this issue is likely to gate releasing our P15+Fedora offering
Was this using the Wayland or X11 session? Could you attach: 1. The journal log entries from the when the crash occurred 2. The backtrace of gnome-shell, Xorg and Xwayland, whatever exists. Make sure to install debuginfo packages first.
Created attachment 1730951 [details] journalctl log
journalctl log attached. Running under Wayland Can you give me more steps for getting the backtraces please? I've got a day full of meetings and afraid I'm a bit limited on time to figure it out myself so any shortcuts would be really appreciated :) Thanks Mark
First install debuginfo symbols: sudo dnf debuginfo-install xorg-x11-server-Xwayland mutter gnome-shell glib2 Then look up the crash that happens: coredumpctl list Find the <pid> of the crashed process. Make sure it says the dump is "present"; if it is not, reproduce again. Then run coredumpctl gdb <pid> That should give you a gdb prompt. In that type backtrace full Copy the result of that into a file and attach here.
This doesn't look as useful as I hoped it would be Mark (gdb) backtrace full #0 0x00007f9c58d139d5 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x0000557704f99ee2 in dump_gjs_stack_on_signal_handler (signo=5) at ../src/main.c:392 sa = {__sigaction_handler = {sa_handler = 0x557704f99d50 <dump_gjs_stack_alarm_sigaction>, sa_sigaction = 0x557704f99d50 <dump_gjs_stack_alarm_sigaction>}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0} i = 65 #2 <signal handler called> No symbol table info available. #3 0x00007f9c59af9d9b in g_log_writer_default () from /lib64/libglib-2.0.so.0 No symbol table info available. #4 0x00007f9c59af57c7 in g_log_structured_array () from /lib64/libglib-2.0.so.0 No symbol table info available. #5 0x00007f9c59af59bf in g_log_structured_standard () from /lib64/libglib-2.0.so.0 No symbol table info available. #6 0x00007f9c58bb4e84 in gdk_x_error.lto_priv () from /lib64/libgdk-3.so.0 No symbol table info available. #7 0x00007f9c5886f14b in _XError () from /lib64/libX11.so.6 No symbol table info available. #8 0x00007f9c5886f227 in handle_error () from /lib64/libX11.so.6 No symbol table info available. #9 0x00007f9c5886f2c5 in handle_response () from /lib64/libX11.so.6 No symbol table info available. #10 0x00007f9c5886f372 in _XEventsQueued () from /lib64/libX11.so.6 No symbol table info available. #11 0x00007f9c5885b711 in XPending () from /lib64/libX11.so.6 No symbol table info available. #12 0x00007f9c58ba856f in gdk_event_source_prepare () from /lib64/libgdk-3.so.0 No symbol table info available. #13 0x00007f9c59aeec5a in g_main_context_prepare () from /lib64/libglib-2.0.so.0 No symbol table info available. #14 0x00007f9c59b3fc4b in g_main_context_iterate.constprop () from /lib64/libglib-2.0.so.0 No symbol table info available. #15 0x00007f9c59aee6ab in g_main_loop_run () from /lib64/libglib-2.0.so.0 No symbol table info available. #16 0x00007f9c58f61426 in meta_run () at ../src/core/main.c:673 No locals. #17 0x0000557704f99848 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:550 ctx = <optimized out> error = 0x0 ecode = <optimized out>
Yea, that didn't help too much. Another thing that might help shine some light on what is causing issues would be to add export GDK_SYNCHRONIZE=1 export MUTTER_VERBOSE=1 to ~/.bashrc, log out, log back in, and reproduce the issue again. That should result in a lot more things logged to the journal, and hopefully also a bit more relevant backtrace.
Created attachment 1731041 [details] journalctl log2
Created attachment 1731042 [details] backtrace log
logs attached. Holler if you need anything else :) Mark
(In reply to Mark Pearson from comment #9) > logs attached. > Holler if you need anything else :) > > Mark Thanks, that shines some useful light; it does look like a mutter issue; it seems a fail safe related to handle temporarily being headless gracefully for some reason is failing to have the intended effect. Feel free to remove the two added .bashrc lines, they may negatively affect performance.
Could you run the following commands and report back the result? coredumpctl gdb 1812 then frame 24 print *config print *config->key print *(MetaLogicalMonitorConfig*)config->logical_monitor_configs->data
sorry for the slow reply - I missed the notification in the swamp of my inbox: (gdb) print *config $1 = {parent = {g_type_instance = {g_class = 0x56085ec94590}, ref_count = 1, qdata = 0x0}, key = 0x560861d60010, logical_monitor_configs = 0x0, disabled_monitor_specs = 0x560861c7bec0, flags = META_MONITORS_CONFIG_FLAG_NONE, layout_mode = META_LOGICAL_MONITOR_LAYOUT_MODE_PHYSICAL, switch_config = META_MONITOR_SWITCH_CONFIG_EXTERNAL} (gdb) print *config->key $2 = {monitor_specs = 0x560861c7bee0} Can't print logical_monitor_configs as it's null... Thanks! Mark
So looking at the debug information so far, it looks like mutter tries to create a "external only" display, which it will only ever attempt if there are more than 1 monitor available, but all the monitors it finds it thinks are laptop panels, and thus fails due to this. It's easy to avoid the crash in this situation, by making mutter handle the case when there are multiple seemingly "built in" displays, which should definitely be done, but this would likely cause the "external only" binding to misbehave. So in order to find out why mutter gets confused about what monitor is external and what is the built in panel, can you provide p15-modetest.tx by running the following: sudo dnf install drm-utils sudo modetest > p15-modetest.txt Then also the following two generated files (p15-monitor-resources.txt and p15-monitor-state.txt): gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetResources > p15-monitor-resources.txt gdbus call -e -d org.gnome.Mutter.DisplayConfig -o /org/gnome/Mutter/DisplayConfig -m org.gnome.Mutter.DisplayConfig.GetCurrentState > p15-monitor-state.txt
Created attachment 1733848 [details] modetest output
Created attachment 1733849 [details] monitor resources
Created attachment 1733850 [details] monitor state
Crud - ignore those attachments....it would help if I actually attached the monitor. Durrrr
Created attachment 1733854 [details] modetest output
Created attachment 1733855 [details] monitor resources
Created attachment 1733856 [details] monitor state
OK - attachments updated with versions run whilst the monitor is connected. Sorry about that. Let me know if anything looks wonky or you need anything else Mark
Thanks; that confirms my suspicion, there are two monitors connected via Embedded DisplayPort (eDP) connectors, one to eDP-1 (SDC 0x4141) and one to eDP-4 (LEN T27p-10). Being connected via such a connector is how mutter decides whether it thinks a panel is built in or not, so the way to fix this is for mutter to be a bit more pick in how it decides this.
So FWIW, I created a kernel bug clone of this, as after having discussed with David Airlie, we concluded that it might also be a kernel bug. I'll leave this one to be about mutter handling things more gracefully, and the clone about the potential kernel issue.
Sounds good - thanks for digging into this, really appreciated. Mark
Created attachment 1733985 [details] dmesg log - on first insert as it worked
Created attachment 1733986 [details] log on failed insert.
Created attachment 1733987 [details] display settings looking good
(Sorry - the attachments really belong in https://bugzilla.redhat.com/show_bug.cgi?id=1896904) But I can confirm that Karol's patch for 1896904 fixes this issue - external display now works correctly Just need to figure out the final nouveau timeout referenced in 1896904 and we'll be in good shape Thanks mark