Bug 1427340 - macbookpro 8,2: unusable graphics, regression in dualgpu vga_switcheroo: client refused switch
Summary: macbookpro 8,2: unusable graphics, regression in dualgpu vga_switcheroo: clie...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: switcheroo-control
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kalev Lember
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-27 23:58 UTC by Chris Murphy
Modified: 2019-05-28 23:33 UTC (History)
15 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-05-28 23:33:28 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
photo of display with artifact (2.42 MB, image/jpeg)
2017-02-28 00:01 UTC, Chris Murphy
no flags Details
f24 journal full (338.94 KB, text/x-vhdl)
2017-02-28 00:04 UTC, Chris Murphy
no flags Details
f24 journal kernel messages only (269.99 KB, text/x-vhdl)
2017-02-28 00:04 UTC, Chris Murphy
no flags Details
f25 journal full (863.50 KB, text/x-vhdl)
2017-02-28 00:06 UTC, Chris Murphy
no flags Details
f25 journal kernel messages only (708.61 KB, text/x-vhdl)
2017-02-28 00:06 UTC, Chris Murphy
no flags Details
f26 journal full, switcheroo enabled (678.92 KB, text/x-vhdl)
2017-02-28 06:12 UTC, Chris Murphy
no flags Details
f26 journal kernel messages only, switcheroo enabled (668.05 KB, text/x-vhdl)
2017-02-28 06:14 UTC, Chris Murphy
no flags Details
f26 journal full, switcheroo disabled (727.11 KB, text/x-vhdl)
2017-02-28 06:14 UTC, Chris Murphy
no flags Details
f26 journal kernel messages only, switcheroo disabled (557.11 KB, text/x-vhdl)
2017-02-28 06:14 UTC, Chris Murphy
no flags Details
f26 journal full, rhgb disabled (609.93 KB, text/x-vhdl)
2017-02-28 15:17 UTC, Chris Murphy
no flags Details
f26 journal kernel messages only, rhgb disabled (442.40 KB, text/x-vhdl)
2017-02-28 15:17 UTC, Chris Murphy
no flags Details
f26 journal full, alsa masked, switcheroo enabled, 4.11.0-rc1 debug (735.25 KB, text/x-vhdl)
2017-03-09 21:08 UTC, Chris Murphy
no flags Details
f26 journal full, alsa masked, switcheroo enabled, 4.11.0-rc1 (887.34 KB, text/x-vhdl)
2017-03-13 20:09 UTC, Chris Murphy
no flags Details
dmesg.log 4.15.0-0.rc9.git4.1.fc28 (119.59 KB, text/plain)
2018-01-27 03:09 UTC, Chris Murphy
no flags Details
dmesg kernel 4.16.7-fc28 (119.38 KB, text/plain)
2018-05-11 02:29 UTC, Chris Murphy
no flags Details
full journal w/ kernel 4.16.7-fc28 (324.42 KB, text/x-vhdl)
2018-05-11 02:29 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2017-02-27 23:58:45 UTC
Description of problem:

Starting with Fedora 25, there is no plymouth splash screen and no gdm login screen. Instead there's artifacts (see attached photo). The problem doesn't happen with Fedora 24 as installed or as updated, even when using the Fedora 25 kernel. Problem happens with X or Wayland, tested by changing /etc/gdm/custom.conf and uncommenting WaylandEnable=false line.

The problem also doesn't happen if I add i915.modeset=0 as a boot parameter.


Version-Release number of selected component (if applicable):
mutter-3.22.1-8.fc25.x86_64

How reproducible:
Always.


Steps to Reproduce:
1. Boot.
2.
3.

Actual results:

Display shows artifacts, contents unrecognizable.

Looks like radeon and i915 graphics remain enabled.


Expected results:

No artifacts.

Additional info:

Problem does not happen with this combination (F24 fully updated, with F26 kernel):
4.10.0-1.fc26.x86_64
mutter-3.20.3-2.fc24.x86_64

Problem does happen with this combination (F25 as installed from release media)
4.8.6-300.fc25.x86_64 (still happens with 4.10.0-1.fc26.x86_64)
mutter-3.22.1-8.fc25.x86_64

Comment 1 Chris Murphy 2017-02-28 00:01:21 UTC
Created attachment 1258228 [details]
photo of display with artifact

There is always some variation of this pattern; although if I boot with i915.modeset=0, and I get a working desktop, then reboot without that boot parameter, the artifaction includes recognizable portions of the previous boot's desktop.

Comment 2 Chris Murphy 2017-02-28 00:04:04 UTC
Created attachment 1258229 [details]
f24 journal full

journalctl -b -o short-monotonic
This is Fedora 24 fully updated, 4.9.10-100.fc24.x86_64, with drm.debug=0x1e set.

Comment 3 Chris Murphy 2017-02-28 00:04:46 UTC
Created attachment 1258230 [details]
f24 journal kernel messages only

journalctl -b -o short-monotonic -k
This is Fedora 24 fully updated, 4.9.10-100.fc24.x86_64, with drm.debug=0x1e set.

Comment 4 Chris Murphy 2017-02-28 00:06:23 UTC
Created attachment 1258231 [details]
f25 journal full

journalctl -b -o short-monotonic
This is Fedora 25 as installed, 4.8.6-300.fc25.x86_64, drm.debug=0x1e set.

Comment 5 Chris Murphy 2017-02-28 00:06:59 UTC
Created attachment 1258232 [details]
f25 journal kernel messages only

journalctl -b -o short-monotonic -k
This is Fedora 25 as installed, 4.8.6-300.fc25.x86_64, drm.debug=0x1e set.

Comment 6 Chris Murphy 2017-02-28 00:21:43 UTC
Present only in working case:
[    4.101982] localhost.localdomain kernel: [drm:radeon_crtc_handle_flip [radeon]] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[    4.947471] localhost.localdomain kernel: Linux video capture interface: v2.00


Only in failure case:
[    6.049112] localhost.localdomain kernel: vga_switcheroo: client 1 refused switch
[    6.049173] localhost.localdomain kernel: vga_switcheroo: setting delayed switch to client 0
[    7.283300] localhost.localdomain kernel: vga_switcheroo: processing delayed switch to 0
[    7.283302] localhost.localdomain kernel: vga_switcheroo: client 101 refused switch

Comment 7 Chris Murphy 2017-02-28 05:59:27 UTC
On Fedora 26, this also eliminates the problem:
$ sudo systemctl disable switcheroo-control

switcheroo-control-1.1-2.fc26.x86_64

Comment 8 Chris Murphy 2017-02-28 06:10:44 UTC
The problem consistently happens when switcheroo-control is enabled; and doesn't happen when disabled. So that seems to be the culprit. Since I've got a consistent reproducer (fail and no fail) with the same software versions on Fedora 26, I'll attached those logs.

[    7.534198] localhost.localdomain systemd[1]: Starting Switcheroo Control Proxy service...
[    7.536731] localhost.localdomain systemd[1]: Started Manage Sound Card State (restore and store).
[    7.539689] localhost.localdomain systemd[1]: Starting Accounts Service...
[    7.543176] localhost.localdomain kernel: vga_switcheroo: client 1 refused switch
[    7.543229] localhost.localdomain kernel: vga_switcheroo: setting delayed switch to client 0

Comment 9 Chris Murphy 2017-02-28 06:12:24 UTC
Created attachment 1258269 [details]
f26 journal full, switcheroo enabled

Comment 10 Chris Murphy 2017-02-28 06:14:20 UTC
Created attachment 1258270 [details]
f26 journal kernel messages only, switcheroo enabled

Comment 11 Chris Murphy 2017-02-28 06:14:31 UTC
Created attachment 1258271 [details]
f26 journal full, switcheroo disabled

Comment 12 Chris Murphy 2017-02-28 06:14:41 UTC
Created attachment 1258272 [details]
f26 journal kernel messages only, switcheroo disabled

Comment 13 Hans de Goede 2017-02-28 08:21:29 UTC
Right, so when Bastien first introduced switcheroo-control I already was afraid we would hit laptops where this would not work.

The purpose of switcheroo-control is to flip control of the LCD to the intel GPU since some machines boot with the discrete GPU as the default and using the intel GPU is a lot more power efficient. Unfortunately it seems that this does not work on your machine.

Looking at the log messages the problem seems to be that the switch never happens because the GPUs are busy. Can you try booting with "rhgb" removed from the kernel cmdline?

Comment 14 Chris Murphy 2017-02-28 15:17:38 UTC
Created attachment 1258400 [details]
f26 journal full, rhgb disabled

Comment 15 Chris Murphy 2017-02-28 15:17:50 UTC
Created attachment 1258401 [details]
f26 journal kernel messages only, rhgb disabled

Comment 16 Chris Murphy 2017-02-28 15:18:18 UTC
Problem still happens with rhgb omitted.

Comment 17 Hans de Goede 2017-02-28 23:35:57 UTC
Can you try booting into text mode ? (at "3" to the kernel commandline) , log in and then do startx and see of things work that way ?

Comment 18 Bastien Nocera 2017-03-01 11:37:53 UTC
From the README:

Disabling automatic switch to integrated GPU 
--------------------------------------------

By default, on startup and whatever the BIOS settings (which might or
might not be available, depending on the system), we will force the 
integrated GPU to be used so that power savings are made by default,
and the discrete GPU is only used for select applications.

If this causes problems, this behaviour can be disabled by passing
`xdg.force_integrated=0` as a kernel command-line options in the 
bootloader.

Don't forget to file a bug against your distribution to get the kernel
or graphics drivers fixed, depending on the exact problem at hand.

Comment 19 Hans de Goede 2017-03-01 12:13:13 UTC
Bastien, You're being way to quick with blaming the kernel here. So far the switch is not actually completing because userspace is holding one of the 2 drm device-nodes open the whole time. So this so far clearly is an userspace issue.

Comment 20 Bastien Nocera 2017-03-01 12:19:38 UTC
(In reply to Hans de Goede from comment #19)
> Bastien, You're being way to quick with blaming the kernel here. So far the
> switch is not actually completing because userspace is holding one of the 2
> drm device-nodes open the whole time. So this so far clearly is an userspace
> issue.

It's supposed to queue the switches, so that the switch is done as soon as the GPU is released. As I've asked before, please tell me where that code is wrong:
https://github.com/hadess/switcheroo-control/blob/master/src/switcheroo-control.c#L223

I don't see how it could *not* be a kernel problem.

Comment 21 Chris Murphy 2017-03-01 18:54:06 UTC
(In reply to Hans de Goede from comment #17)
> Can you try booting into text mode ? (at "3" to the kernel commandline) ,
> log in and then do startx and see of things work that way ?

That does work. Let me know if you want to see the drm.debug output for this, as i915 hasn't been disabled by boot param or switcheroo control since switcheroo control is wanted only by graphical.target which isn't isolated with this test; the debug messages I see contain both drm:radeon and drm:intel.

As for kernel stuff, this article might relate https://lwn.net/Articles/707616/
The test computer in my case is pre-Retina, as is the author's. But I don't know how to narrow down whether it's a kernel bug.

Comment 22 Bastien Nocera 2017-03-02 13:44:41 UTC
(In reply to Chris Murphy from comment #21)
> (In reply to Hans de Goede from comment #17)
> > Can you try booting into text mode ? (at "3" to the kernel commandline) ,
> > log in and then do startx and see of things work that way ?
> 
> That does work. Let me know if you want to see the drm.debug output for
> this, as i915 hasn't been disabled by boot param or switcheroo control since
> switcheroo control is wanted only by graphical.target which isn't isolated
> with this test; the debug messages I see contain both drm:radeon and
> drm:intel.
> 
> As for kernel stuff, this article might relate
> https://lwn.net/Articles/707616/
> The test computer in my case is pre-Retina, as is the author's. But I don't
> know how to narrow down whether it's a kernel bug.

I also saw this problem on my test system, but never had the time to report it back upstream. I'm confident that the problem is in the "gmux" switcheroo that Macs use for this functionality, but never had the time to gather debug or investigate this problem.

I'd advise getting in touch with Lukas for help debugging this.

Comment 23 Hans de Goede 2017-03-02 15:39:34 UTC
Hi,

(In reply to Chris Murphy from comment #21)
> (In reply to Hans de Goede from comment #17)
> > Can you try booting into text mode ? (at "3" to the kernel commandline) ,
> > log in and then do startx and see of things work that way ?
> 
> That does work. Let me know if you want to see the drm.debug output for
> this, as i915 hasn't been disabled by boot param or switcheroo control since
> switcheroo control is wanted only by graphical.target which isn't isolated
> with this test; the debug messages I see contain both drm:radeon and
> drm:intel.

Ah, so can you try to manually start switcheroo control before startx by doing:

sudo systemctl start switcheroo-control.service

Before doing the startx ?

What I'm trying to achieve here is to hopefully actually make the switch happen, currently it is queued / started but never completes because of this error:

localhost.localdomain kernel: vga_switcheroo: client 101 refused switch

If we can get rid of that errors and then can get things to actually run on the i915 GPU rather then the radeon GPU then we are actually moving in the direction which is the whole goal of switcheroo control.

Regards,

Hans

Comment 24 Chris Murphy 2017-03-09 18:55:46 UTC
OK so I made sure 'systemctl disable switcheroo-control.service', reboot and set boot param 3, login and do:

# systemctl start switcheroo-control.service
[   35.927449] vga_switcheroo: client 101 refused switch

# startx

I get messed up graphics.

Comment 25 Hans de Goede 2017-03-09 19:13:16 UTC
Hi,

On 09-03-17 19:51, Chris Murphy wrote:
> Manual switching fails with the same error 'kernel: vga_switcheroo:
> client 101 refused switch'
>
> [chris@localhost ~]$ sudo lsof /dev/snd/controlC1
> COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
> alsactl    737 root    5r   CHR  116,2      0t0 17404 /dev/snd/controlC1
> pulseaudi 1235  gdm   15u   CHR  116,2      0t0 17404 /dev/snd/controlC1

Ok, now we are getting somewhere :)

Can you do as root

systemctl disable alsa-state.service

And then reboot and for the new boot remove "rhgb" from the commandline
and add "3" to the kernel cmdline. That should boot you to a text console,
then log on, sudo lsof /dev/snd/controlC1 should now show an empty list.

If it does, perhaps switcheroo-control has already successfully switched
do: cat /sys/kernel/debug/vgaswitcheroo/switch and see if the IGD line
in there has the + symbol, if not do:

sudo sh -c "echo  IGD > sys/kernel/debug/vgaswitcheroo/switch" and then
do cat /sys/kernel/debug/vgaswitcheroo/switch again.

If you get things to switch properly do startx to start a gnome-session
and then in a terminal do: "glxgears -info | grep REND" and let me know
what it says. It should say something like this:

GL_RENDERER   = Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2)

Which means that you are now successfully running on the intel GPU
(and will have much better battery life).

Regards,

Hans

Comment 26 Chris Murphy 2017-03-09 21:08:56 UTC
Created attachment 1261701 [details]
f26 journal full, alsa masked, switcheroo enabled, 4.11.0-rc1 debug

I need to use 'sudo systemctl mask alsa-state.service' to clear 'lsof /dev/snd/controlC1' list. Disable alone still leaves an entry. After that I run into a few additional problems, but they don't make graphics unusable.

1. Both gdm and gnome shell refused to use wayland with i915 graphic, both fall back to X. When using radeon, wayland is used for both. Is this a clue?

[   15.790851] localhost.localdomain gnome-shell[1010]: Can't initialize KMS backend: could not find drm kms device

2. With debug kernel I'm getting a circular locking warning right at the time switcheroo-control starts up, 

[    7.736480] localhost.localdomain kernel: ======================================================
[    7.736481] localhost.localdomain kernel: [ INFO: possible circular locking dependency detected ]
[    7.736482] localhost.localdomain kernel: 4.11.0-0.rc1.git0.1.fc26.x86_64+debug #1 Not tainted
[    7.736483] localhost.localdomain kernel: -------------------------------------------------------
[    7.736484] localhost.localdomain kernel: switcheroo-cont/775 is trying to acquire lock:
[    7.736485] localhost.localdomain kernel:  (console_lock){+.+.+.}, at: [<ffffffff8a631389>] vga_switchto_stage2+0x99/0x130
[    7.736492] localhost.localdomain kernel: 
                                             but task is already holding lock:
[    7.736492] localhost.localdomain kernel:  (vgasr_mutex){+.+.+.}, at: [<ffffffff8a631a4a>] vga_switcheroo_debugfs_write+0x8a/0x3f0
[    7.736496] localhost.localdomain kernel:

Comment 27 Chris Murphy 2017-03-09 21:14:40 UTC
> If you get things to switch properly do startx to start a gnome-session
> and then in a terminal do: "glxgears -info | grep REND" and let me know
> what it says. It should say something like this:
> 
> GL_RENDERER   = Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2)

I get this now, I'll file the kernel bug on freedesktop.org

GL_RENDERER = Mes DRI Intel(R) Sandybridge Mobile, and a window of animates RGB gears.

Comment 28 Hans de Goede 2017-03-09 21:26:09 UTC
Hi,

(In reply to Chris Murphy from comment #26)
> Created attachment 1261701 [details]
> f26 journal full, alsa masked, switcheroo enabled, 4.11.0-rc1 debug
> 
> I need to use 'sudo systemctl mask alsa-state.service' to clear 'lsof
> /dev/snd/controlC1' list. Disable alone still leaves an entry.

That is because /lib/systemd/system/alsa-state.service is missing a [Install] section with a WantedBy line, instead the alsa-utils package installs symlinks directly under /usr/lib/systemd/system/basic.target.wants making it impossible to disable it, I guess this is done this way on purpose.

> After that I
> run into a few additional problems, but they don't make graphics unusable.
> 
> 1. Both gdm and gnome shell refused to use wayland with i915 graphic, both
> fall back to X. When using radeon, wayland is used for both. Is this a clue?
>
> [   15.790851] localhost.localdomain gnome-shell[1010]: Can't initialize KMS
> backend: could not find drm kms device

That is expected, chances are some of your external display connectors
are only available on the radeon, so to be able to use those when i915
is driving the lcd panel mutter automatically forces the use of
Xorg since setups with multiple GPUs with video outputs active are
not supported yet by wayland (this is being worked on).

> 2. With debug kernel I'm getting a circular locking warning right at the
> time switcheroo-control starts up, 
> 
> [    7.736480] localhost.localdomain kernel:
> ======================================================
> [    7.736481] localhost.localdomain kernel: [ INFO: possible circular
> locking dependency detected ]
> [    7.736482] localhost.localdomain kernel:
> 4.11.0-0.rc1.git0.1.fc26.x86_64+debug #1 Not tainted
> [    7.736483] localhost.localdomain kernel:
> -------------------------------------------------------
> [    7.736484] localhost.localdomain kernel: switcheroo-cont/775 is trying
> to acquire lock:
> [    7.736485] localhost.localdomain kernel:  (console_lock){+.+.+.}, at:
> [<ffffffff8a631389>] vga_switchto_stage2+0x99/0x130
> [    7.736492] localhost.localdomain kernel: 
>                                              but task is already holding
> lock:
> [    7.736492] localhost.localdomain kernel:  (vgasr_mutex){+.+.+.}, at:
> [<ffffffff8a631a4a>] vga_switcheroo_debugfs_write+0x8a/0x3f0
> [    7.736496] localhost.localdomain kernel:

Yeah that is a kernel bug somewhere. Would be good if you can collect
the full output and send a mail with it to dri-devel.org,
please put me in the Cc when you do so.

(In reply to Chris Murphy from comment #27)
> > If you get things to switch properly do startx to start a gnome-session
> > and then in a terminal do: "glxgears -info | grep REND" and let me know
> > what it says. It should say something like this:
> > 
> > GL_RENDERER   = Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2)
> 
> I get this now, I'll file the kernel bug on freedesktop.org
> 
> GL_RENDERER = Mes DRI Intel(R) Sandybridge Mobile, and a window of animates
> RGB gears.

Great, would be interesting to see what this does for your battery life compared to using the radeon GPU (no need for exact numbers, but if you've an idea how much hours the system-menu used to estimate when full, vs what it is showing now that would be interesting to know).

So it seems we've 2 problems here:

1) alsactl is getting in the way of the switch
2) rhgb is getting in the way of the switch

One next thing to try is to mask alsa-state.service (again if you've tried the changes I suggest via email) and then boot with plymouth (rhgb) active. Then the switch should get delayed until plymouth quits, which hopefully is before the gdm session opens /dev/dri/card? and starts pulseaudio. If that works we could consider moving alsa-state.service to graphical.target and make it start After gdm.

Regards,

Hans

Comment 29 Chris Murphy 2017-03-09 22:05:12 UTC
I'll see about using powertop to get a load independent idea of GPUs. I know from past experience with the GRUB hack to disable radeon, the laptop is vastly cooler. As in, it's barely warmer than room temperature with i915, and it is very warm to hot with radeon. And that's just sitting idle. Doing anything with radeon and the fans are running.

Anyway, the battery is the original, so now 6 years old, and running it out maybe isn't the most reliable test.

Comment 30 Chris Murphy 2017-03-09 23:07:43 UTC
i915:
12.2w, 5h estimate battery

radeon:
18.4w, 3h10m estimate battery

Comment 31 Hans de Goede 2017-03-10 07:51:23 UTC
(In reply to Chris Murphy from comment #30)
> i915:
> 12.2w, 5h estimate battery
> 
> radeon:
> 18.4w, 3h10m estimate battery

Nice improvement :)

As mentioned in an earlier comment:

One more thing to try is to mask alsa-state.service (again) and then boot with plymouth (rhgb) active. Then the switch should get delayed until plymouth quits, which hopefully is before the gdm session opens /dev/dri/card? and starts pulseaudio. If that works we could consider moving alsa-state.service to graphical.target and make it start after gdm (or some such).

Comment 32 Chris Murphy 2017-03-10 17:05:24 UTC
RE: comment 31, Looks like it fails twice, but then does ultimately switch.


[chris@localhost ~]$ sudo journalctl -b -k -o short-monotonic | grep switch
[sudo] password for chris: 
[    1.062777] localhost.localdomain kernel: Console: switching to colour frame buffer device 210x65
[    2.103770] localhost.localdomain kernel: fb: switching to radeondrmfb from EFI VGA
[    2.105555] localhost.localdomain kernel: Console: switching to colour dummy device 80x25
[    3.422431] localhost.localdomain kernel: [drm:drm_crtc_helper_set_config [drm_kms_helper]] encoder changed, full mode switch
[    3.422436] localhost.localdomain kernel: [drm:drm_crtc_helper_set_config [drm_kms_helper]] crtc changed, full mode switch
[    3.612572] localhost.localdomain kernel: Console: switching to colour frame buffer device 210x65
[    4.641058] localhost.localdomain kernel: thunderbolt 0000:07:00.0: old switch config:
[    4.716851] localhost.localdomain kernel: vga_switcheroo: enabled
[    4.903315] localhost.localdomain kernel: snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
[    6.367392] localhost.localdomain kernel: vga_switcheroo: client 1 refused switch
[    6.371358] localhost.localdomain kernel: vga_switcheroo: setting delayed switch to client 0
[    8.892086] localhost.localdomain kernel: vga_switcheroo: processing delayed switch to 0
[    8.892095] localhost.localdomain kernel: vga_switcheroo: client 1 refused switch
[    8.893823] localhost.localdomain kernel: vga_switcheroo: processing delayed switch to 0
[    8.893837] localhost.localdomain kernel: snd_hda_intel 0000:01:00.1: Disabling via vga_switcheroo
[    8.925522] localhost.localdomain kernel: radeon: switched off


An unrelated thing is that I'm seeing a "bloom" during boot. The screen is white text on black screen, but there is a moment where the black background becomes gradually brighter for about 3 seconds before going black again and startup continues. This isn't a new problem with rhgb, nor is it new with 4.11. I'm not sure which kernel it started, it's been going on for some time.

Comment 33 Hans de Goede 2017-03-13 10:10:33 UTC
Hi,

Does the bloom also happen when you remove "rhgb" from the kernel cmdline, so that the switch is instant ?

Regards,

Hans

Comment 34 Chris Murphy 2017-03-13 19:52:32 UTC
The bloom is there no matter rhgb present or not. With quiet rhgb it happens as well; and in that case the plymouth splash only shows up for maybe 1-2 seconds, right at the fedora logo, there is no animation like the HP laptop which only has i915. I'm fairly certain when I use the GRUB  hack to disable radeon at the bootloader, so it's only ever i915 during boot, the bloom doesn't happen. If it's useful to know, I'll test it and report back. I can also make a video of the bloom upon request.

Comment 35 Chris Murphy 2017-03-13 20:09:30 UTC
Created attachment 1262576 [details]
f26 journal full, alsa masked, switcheroo enabled, 4.11.0-rc1

Contrary to comment 32, I have a case where the switch does not work. I wonder if there's a race or some way it can sometimes work as it did in comment 32 but not work as in this case. This is not a debug kernel though...so maybe the slower debug kernel makes it work? Nope, just tried it with debug kernel and it fails again.

Attaching the non-debug kernel journal log. alsa-state.service is enabled and running; as is the modified switcheroo-control.service.

Not work is defined as the same graphics artifact photographed as attachment 1258228 [details]: photo of display with artifact

Comment 36 Hans de Goede 2017-03-13 21:22:23 UTC
(In reply to Chris Murphy from comment #35)
> Contrary to comment 32, I have a case where the switch does not work. I
> wonder if there's a race or some way it can sometimes work as it did in
> comment 32 but not work as in this case.

Yeah this is probably racy. We really need to get back to the drawing board here. For now you should remove rhbg from your kernel cmdline as a workaround.

I will start a discussion with some of the plymouth, gdm and alsa people to see how we can make this all fit together smoothly.

Comment 37 Chris Murphy 2017-04-16 14:52:09 UTC
This is working out of the box with Fedora-Workstation-Live-x86_64-26-20170410.n.0.iso:

[liveuser@localhost ~]$ dmesg | grep -i switch
[    0.576954] clocksource: Switched to clocksource hpet
[    4.314376] Console: switching to colour frame buffer device 210x65
[    4.319453] input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input0
[    4.319461] ACPI: Lid Switch [LID0]
[    6.303944] clocksource: Switched to clocksource tsc
[   61.135190] thunderbolt 0000:07:00.0: initializing Switch at 0x0 (depth: 0, up port: 6)
[   61.135192] thunderbolt 0000:07:00.0: old switch config:
[   61.135193] thunderbolt 0000:07:00.0:  Switch: 8086:1513 (Revision: 2, TB Version: 1)
[   61.816215] fb: switching to radeondrmfb from EFI VGA
[   61.816274] Console: switching to colour dummy device 80x25
[   61.830243] vga_switcheroo: enabled
[   61.872391] Console: switching to colour frame buffer device 210x65
[   63.733436] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
[   64.411260] snd_hda_intel 0000:01:00.1: Disabling via vga_switcheroo
[   65.527859] radeon: switched off


[liveuser@localhost ~]$ glxgears -info | grep RENDER
GL_RENDERER   = Mesa DRI Intel(R) Sandybridge Mobile

Comment 38 Chris Murphy 2017-04-16 15:06:25 UTC
I spoke to soon! It works out of the box with USB stick media; but once installed it's still necessary to use 'systemd.mask=alsa-state' boot parameter.

Comment 39 Fedora End Of Life 2017-11-16 19:45:06 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 40 Fedora End Of Life 2017-12-12 10:05:43 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 41 Chris Murphy 2018-01-24 06:49:58 UTC
Still a problem and somewhere between kernel 4.14.3 and 4.14.14 the masking of alsa-state no longer works.

Comment 42 Chris Murphy 2018-01-27 03:01:57 UTC
Comment 41 is wrong, somehow rhgb boot option reappeared and that was causing the problem. HOWEVER, vgaswitcheroo seems to be gone in kernel 4.15:

kernel 4.14.14

[    5.107163] vga_switcheroo: enabled

# cat /sys/kernel/debug/vgaswitcheroo/switch
0:DIS: :Off:0000:01:00.0
1:IGD:+:Pwr:0000:00:02.0
2:DIS-Audio: :Off:0000:01:00.1

GL_RENDERER   = Mesa DRI Intel(R) Sandybridge Mobile 


kernel 4.15.0-0.rc1.git1.1.fc28.x86_64 through rc9:

vga_switcheroo doesn't appear in dmesg at all

/sys/kernel/debug/vgaswitcheroo/switch does not exist

GL_RENDERER   = AMD TURKS (DRM 2.50.0 / 4.15.0-0.rc1.git1.1.fc28.x86_64, LLVM 5.0.0)

And the laptop gets quite hot under 4.15 as a result of running the discreet AMD GPU...

Comment 43 Chris Murphy 2018-01-27 03:09:06 UTC
Created attachment 1386768 [details]
dmesg.log 4.15.0-0.rc9.git4.1.fc28

Comment 44 Chris Murphy 2018-01-27 08:37:06 UTC
Narrows down this latest problem in c42 c43 with git bisect and filed an upstream bug.
https://bugs.freedesktop.org/show_bug.cgi?id=104805

Comment 45 Fedora End Of Life 2018-02-20 15:29:37 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 28 development cycle.
Changing version to '28'.

Comment 46 Chris Murphy 2018-05-11 02:28:26 UTC
So this is a problem again in Fedora 28 with kernel 4.16.7 and 4.17.0-rc4.

I can login remotely even though the display is a mess (at gdm presumably):

[chris@f28m ~]$ sudo lsof /dev/snd/controlC1
[chris@f28m ~]$ sudo lsof /dev/snd/controlC0
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
pulseaudi 1461  gdm   15u   CHR  116,5      0t0 19321 /dev/snd/controlC0
pulseaudi 1461  gdm   22u   CHR  116,5      0t0 19321 /dev/snd/controlC0
pulseaudi 1461  gdm   27u   CHR  116,5      0t0 19321 /dev/snd/controlC0

I can't tell if it's pulseaudio keeping radeon busy or if it's a combination of gnome-shell and cogl as I'm also getting crazy output from the radeon driver:

[   10.140291] radeon 0000:01:00.0: evergreen_surface_check_linear_aligned:216 texture pitch 1680 invalid must be aligned with 64
[   10.140293] radeon 0000:01:00.0: evergreen_cs_track_validate_texture:831 texture invalid 0x1a3c3441 0x10000419 0x0a0a0000 0x00000000 0x00000000 0x8003001a
[   10.140310] [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream !

So then I do this:
[chris@f28m ~]$ sudo systemctl isolate multi-user.target

And immediately I get:
[   12.217865] radeon: switched off

And also:

[chris@f28m ~]$ sudo lsof /dev/snd/controlC0
[chris@f28m ~]$ sudo lsof /dev/snd/controlC1

And now I can isolate graphical.target again and everything is fine. So basically the work around right now is to boot multi-user.target and let switcheroo switch off radeon, and then manually switch to graphical.target.

Comment 47 Chris Murphy 2018-05-11 02:29:16 UTC
Created attachment 1434675 [details]
dmesg kernel 4.16.7-fc28

Comment 48 Chris Murphy 2018-05-11 02:29:49 UTC
Created attachment 1434676 [details]
full journal w/ kernel 4.16.7-fc28

Comment 49 Tom Anderson 2018-05-20 16:42:54 UTC
I also get this. I'm on a Macbook 8,2. Had been running Fedora 27, which had a similar problem that could be worked around by masking alsa-state. Upgraded to Fedora 28. Got the pop-art display corruption and full-speed fans on boot. Added the kernel command line argument to mask alsa-state, but that didn't help much. Logged in on a text console, observed:

1. /sys/kernel/debug/vgaswitcheroo/switch exists and indicates that i'm on discrete graphics

2. sudo lsof /dev/snd/controlC1 shows one hit on pulseaudio, sudo lsof /dev/snd/controlC0 shows three hits

3. dmesg contains the "evergreen_surface_check_linear_aligned:216 texture pitch 1680 invalid must be aligned with 64" messages

I tried 'sudo systemctl isolate multi-user.target', then saw the 'radeon: switched off' in dmesg, then 'sudo systemctl isolate graphical.target', and everything started working normally

I'm happy to do what i can to help solve this bug properly - please let me know if there is anything that might be useful.

Comment 50 Ben Cotton 2019-05-02 20:03:28 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 51 Ben Cotton 2019-05-28 23:33:28 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.