Bug 1757891
Summary: | vga switcheroo won't turn off discrete graphics on 5.3.x kernel | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michał <e.misiek> | ||||||||||||||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||||||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||
Priority: | unspecified | ||||||||||||||||||||||||
Version: | 31 | CC: | airlied, bskeggs, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mbrancaleoni, mchehab, mihai, mjg59, pasik, redhat, steved | ||||||||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||
Last Closed: | 2020-03-25 22:26:46 UTC | Type: | Bug | ||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||
Attachments: |
|
Description
Michał
2019-10-02 16:12:48 UTC
Created attachment 1621925 [details]
dmesg 5.3.1
Created attachment 1621926 [details]
dmesg 5.2.18
Created attachment 1623531 [details]
dmesg 5.3.5
Tested 5.3.6 vgaswitcheroo keeps power on nvidia card. Vanilla kernels 5.4rcX have the same issue. Related: https://bbs.archlinux.org/viewtopic.php?id=249330 And probably THE CAUSE: https://bugs.freedesktop.org/show_bug.cgi?id=75985 If someone is looking for temporary solution for this problem, then this is a "fix". Without dis-audio vga switching is working fine. # cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynPwr:0000:01:00.0 2:DIS-Audio: :DynPwr:0000:01:00.1 # echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/remove # cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynPwr:0000:01:00.0 # echo "1:Off" > /sys/kernel/debug/vgaswitcheroo/switch # cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynOff:0000:01:00.0 # echo "1:DynOff" > /sys/kernel/debug/vgaswitcheroo/switch --- Save this and run on every boot as root... saveTheWorld.sh: #!/bin/bash echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/remove echo "1:Off" > /sys/kernel/debug/vgaswitcheroo/switch echo "1:DynOff" > /sys/kernel/debug/vgaswitcheroo/switch Most of this is fixed in kernel 5.3.8. Still doesn't work powering down with dock station. commit 3f5fa0ba267074fe39d4cd56f34d873064350911 Author: Lukas Wunner <lukas> Date: Thu Oct 17 17:04:11 2019 +0200 ALSA: hda - Force runtime PM on Nvidia HDMI codecs commit 94989e318b2f11e217e86bee058088064fa9a2e9 upstream. *** Bug 1766198 has been marked as a duplicate of this bug. *** Still happening on 5.3.8-300.fc31.x86_64 from updates-testing. Removing the devices and powering it off (as reported) fixes it. (the battery discarge rate goes from ~30w to ~15w) It worked for a few seconds. Long enough to think it's ok and this is specific to docking station, but it isn't apparently. It turns down for few seconds and after this is always on. Sorry for noise with closed. My bad. The 5.4 kernel has some fixes for this and 1 extra fix is pending for 5.5. I've done a scratch-build of a 5.4 kernel with the extra fix here: https://koji.fedoraproject.org/koji/taskinfo?taskID=39132834 Here are some generic testing instructions for installing a kernel-build directly from koji: https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt Please give this kernel a try and let us know if this fixes things. Note koji keeps scratch-builds only for a couple of days (about a week) before removing them to free up disk-space. So if you do not have time to test right now, at least download the rpms so that you can test later. Tried right now, unfortunately nothing changes for me. (In reply to Matteo Brancaleoni from comment #12) > Tried right now, unfortunately nothing changes for me. Hmm, that is unfortunate it works on my test-machine. Is there anything special about your setup? Do you perhaps have the nvidia binary driver installed? No, I don't have any binaries installed. The only thing is that I have different from "standard" is nouveau.modeset=0 on my kernel parameters (this is a prime laptop, where I don't need nvidia at all). Not putting it causes a lot of issues with resume from sleep (since ever, is not a new laptop). 5.2.18-200 is still working ok, and power usage is the same if I use nouveau.modeset is set or not (except from resume from sleep, but this is an old story, as said). (In reply to Matteo Brancaleoni from comment #14) > No, I don't have any binaries installed. > > The only thing is that I have different from "standard" is nouveau.modeset=0 > on my kernel parameters (this is a prime laptop, where I don't need nvidia > at all). Not putting it causes a lot of issues with resume from sleep (since > ever, is not a new laptop). > > 5.2.18-200 is still working ok, and power usage is the same if I use > nouveau.modeset is set or not (except from resume from sleep, but this is an > old story, as said). Ah, that might explain, although the 2 extra patches should fix the case where there is no driver bound. Anyways, can you try: 1) Removing nouveau.modeset=0, and see if that fixes the power-consumption issue? I guess you may have your suspend/resume issues back then, but it is still a good data point to have. We have been working on some fixes wrt suspend/resume issues and nouveau, so things might even just work this way. But I believe not all of these fixes have landed yet (there is some nasty hw underlying issue with one model Intel PCI bridge there somewhere). 2) If 1. gives you your suspend/resume issues back, can you try adding: "modprobe.blacklist=snd_hda_intel" to your kernel commandline, that will likely fix this issue, at the cost of also disabling audio, so again this is mainly a good data point to have. Ok, will redo the tests then. Cannot do right now (or today), but tomorrow in the evening (UTC+1) and will report back. I assume that kernel scratch builds are still valid (already downloaded them). Yes you can re-use the already downloaded scratch-build for those 2 new tests, thanks. Created attachment 1639921 [details]
dmesg
This is on docking station. Not working, but there is some extra info inside dmesg.
Created attachment 1639922 [details]
dmesg 5.4 w/o docking station
Without docking station - same result and probably same extra infos in dmesg.
Sorry for delay. Thanks for keeping eye on this one!
(In reply to Michał from comment #18) > Created attachment 1639921 [details] > dmesg > > This is on docking station. Not working, but there is some extra info inside > dmesg. Ok, so I see an oops related to the new HDA audio handling for DP MST: kernel: WARNING: CPU: 1 PID: 330 at sound/hda/hdac_component.c:290 snd_hdac_acomp_init+0xde/0x130 [snd_hda_core] Which points to these lines in the kernel: if (WARN_ON(hdac_get_acomp(dev))) return -EBUSY; It is probably best if you report this directly to the upstream developers of this part of the kernel by sending an email to "Takashi Iwai <tiwai>" with "alsa-devel" and me in the Cc. (In reply to Hans de Goede from comment #15) > 1) Removing nouveau.modeset=0, and see if that fixes the power-consumption > issue? > I guess you may have your suspend/resume issues back then, but it is still a > good > data point to have. Did it, same high power consumption, ~27W. > We have been working on some fixes wrt suspend/resume issues and nouveau, so > things might even just work this way. But I believe not all of these fixes > have > landed yet (there is some nasty hw underlying issue with one model Intel PCI > bridge there somewhere). No fixes for that, some locking errors on dmesg during normal boot into xorg, very slow display. Not tried suspend resume. > 2) If 1. gives you your suspend/resume issues back, can you try adding: > "modprobe.blacklist=snd_hda_intel" to your kernel commandline, that will > likely > fix this issue, at the cost of also disabling audio, so again this is mainly > a > good data point to have. Did that also, confirmed no sound as expected, but no changes in high power usage. Created attachment 1639946 [details]
dmesg for 5.4.0 test kernel
dmesg from 5.4.0 testing kernel, a lot of nouveau timeout errors which probably are not related to this specific issue.
Created attachment 1639949 [details]
Full 5.4.0 dmesg
Sorry, had to recreate the dmesg since the ring buffer was not big enough.
Hmm, if blacklisting the hda codec does not help, then you might be seeing a different issue then other people. What is the output of running the following command as root ? cat /sys/kernel/debug/vgaswitcheroo/switch ? If that includes a 3th line listing an audio-device, please try again with modprobe.blacklist=snd_hda_intel, the 3th line should then be gone. Probably I've misunderstood your 2nd point above and mixed modprobe.blacklist with modeset=0, so did the tests again and let me recap: - standard cmd line (no additional params): high power usage (~27W), lots of timeout errors as posted dmesg on nouveau, suspend/resume broken. vgaswitcheroo/switch contains a 3rd entry DIS-Audio: :DynOff:0000:01:00.1 - nouveau.modeset=0 *and* modprobe.blacklist=snd_hda_intel: high power usage, no nouveau errors (of course) and no vgaswitcheroo (expected). suspend/resume ok - only modprobe.blacklist=snd_hda_intel: low power usage (hooray), no nouveau timeout errors, no 3rd entry on vgaswitcheroo/switch. Suspend/resume seems ok. (In reply to Matteo Brancaleoni from comment #25) > Probably I've misunderstood your 2nd point above and mixed > modprobe.blacklist with modeset=0, so did the tests again and let me recap: > > - standard cmd line (no additional params): high power usage (~27W), lots of > timeout errors as posted dmesg on nouveau, suspend/resume broken. > vgaswitcheroo/switch contains a 3rd entry DIS-Audio: :DynOff:0000:01:00.1 > > - nouveau.modeset=0 *and* modprobe.blacklist=snd_hda_intel: high power > usage, no nouveau errors (of course) and no vgaswitcheroo (expected). > suspend/resume ok > > - only modprobe.blacklist=snd_hda_intel: low power usage (hooray), no > nouveau timeout errors, no 3rd entry on vgaswitcheroo/switch. Suspend/resume > seems ok. Ok, so your dGPU suspend issues are also caused by the recent changes for support for audio over HDMI/DP. Then the patches in the test kernel should fix this, but clearly they do not. Are you maybe also seeing an oops with an error like this with the new kernel? : kernel: WARNING: CPU: 1 PID: 330 at sound/hda/hdac_component.c:290 snd_hdac_acomp_init+0xde/0x130 [snd_hda_core] Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. Looks like the F29 EOL closing script got a bit too enthusiastic, re-opening. I've started a new scratch kernel-build which contains fixes from upstream for the oops. I'm not 100% sure if this will also fix the dGPU not suspending but please give it a try: https://koji.fedoraproject.org/koji/taskinfo?taskID=39378056 Note this is still building atm, it may take a couple of hours to finish. Note I forgot to set the Fedora version to 31 this time, so it looks like a F32 kernel, but that does not matter. Created attachment 1640200 [details]
dmesg linux-next-20191127
On linux-next 20191127 warning magically disappeared. Still dGPU is always on. Next I'll try kernel from koji when it's ready. For the record, don't know if this is helpfull at all but since I'm now for awhile on linux-next I can add this. This is my output from alsa-info.sh: http://alsa-project.org/db/?f=91bb789a01f9eed92d0534fe8951619312b355da Created attachment 1640221 [details]
dmesg 5.4.0-2.rhbz1757891.fc32.x86_6
Warning is gone. But nothing else happened.
Turns out, that disabling tlp helped. Using Hans de Goede's kernel from koji with disabled TLP solves this issue. Ok, so Michal's case has been solved on the alsa-devel mailinglist. Michal was using TLP which was turning of the HDA power-save options and since new kernels support audio over HDMI/DP for Nvidia cards the HDA power-saving now must be on to allow the dGPU to suspend. For other people still having issues, please run these 2 commands: [hans@shalem ~]$ cat /sys/module/snd_hda_intel/parameters/power_save 1 [hans@shalem ~]$ cat /sys/module/snd_hda_intel/parameters/power_save_controller Y If the output is different then 1 / Y, that is probably why your dGPU is not suspending even with the fixed kernels. In this case you are probably using TLP or have a file in /etc/modprobe.conf.d messing with the snd_hda_intel settings. Note that running TLP is no longer necessary with recent Fedora versions, all worthwhile power savings are enabled by default, including the HDA power saving settings. Matteo, can you check your snd_hda_intel parameters please? (see comment 35) Don't want to send more noise... But there is an important thing I forgot to add in my last comment. THANK YOU! (In reply to Hans de Goede from comment #36) > Matteo, can you check your snd_hda_intel parameters please? (see comment 35) Sure, with both kernel 5.4.0-2.rhbz1757891.fc32.x86_64 and 5.2.18-200 I have: [root@yoda ~]# cat /sys/module/snd_hda_intel/parameters/power_save 1 [root@yoda ~]# cat /sys/module/snd_hda_intel/parameters/power_save_controller Y Both tested without any kernel cmdline and with nouveau.modeset=0. I have also tlp enabled and disabled it for these tests, nothing changes. Unfortunately same nouveau timeout errors as reported dmesg occurs when not setting modeset=0 with latest test kernel, no matter if tlp is enabled or not. (In reply to Matteo Brancaleoni from comment #38) > (In reply to Hans de Goede from comment #36) > > Matteo, can you check your snd_hda_intel parameters please? (see comment 35) > > Sure, > > with both kernel 5.4.0-2.rhbz1757891.fc32.x86_64 and 5.2.18-200 I have: > > [root@yoda ~]# cat /sys/module/snd_hda_intel/parameters/power_save > 1 > [root@yoda ~]# cat /sys/module/snd_hda_intel/parameters/power_save_controller > Y > > Both tested without any kernel cmdline and with nouveau.modeset=0. > > I have also tlp enabled and disabled it for these tests, nothing changes. > > Unfortunately same nouveau timeout errors as reported dmesg occurs when not > setting modeset=0 with latest test kernel, no matter if tlp is enabled or > not. So if I understand correctly then blacklisting snd_hda_intel, without modeset=0, does fix the high power-consumption, right? And this combo also fixes the nouveau time-out errors, correct? If I understand that correctly, then I believe it is best if you do the same thing Michal did and contact upstream about this, see comment 20. (In reply to Hans de Goede from comment #39) > So if I understand correctly then blacklisting snd_hda_intel, without > modeset=0, does fix the high power-consumption, right? And this combo also > fixes the nouveau time-out errors, correct? yes, that's correct. > If I understand that correctly, then I believe it is best if you do the same > thing Michal did and contact upstream about this, see comment 20. Ok, I'll do it as soon as possible, thanks! *********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs. Fedora 31 has now been rebased to 5.5.7-200.fc31. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32. If you experience different issues, please open a new bug report for those. *********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |