Bug 1941883 - Idle Radeon RX 550 has a very fast fan and high temperatures under 5.11.7-200.fc33.x86_64
Summary: Idle Radeon RX 550 has a very fast fan and high temperatures under 5.11.7-200...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 34
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-23 02:21 UTC by Chris Siebenmann
Modified: 2022-06-07 22:28 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-07 22:28:31 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg.txt from 5.11.7 without it87 (98.73 KB, text/plain)
2021-03-23 02:21 UTC, Chris Siebenmann
no flags Details

Description Chris Siebenmann 2021-03-23 02:21:48 UTC
Created attachment 1765428 [details]
dmesg.txt from 5.11.7 without it87

1. Please describe the problem:

After updating to 5.11.7-200.fc33.x86_64 on my idle office workstation that is sitting in framebuffer text mode with its screen blanked, hwmon reports that my Radeon RX 550 has gone from a typical fan RPM of 780-800 and a typical temperature of 28C (under previous 5.10 and earlier kernels) to the fan being at 2100 RPM and a reported GPU temperature of 35 C (after about half an hour of rising temperatures). The only other reported hwmon difference is that the card's hwmon/hwmon2/pwm1 changed from 81 to 0. In particular, reported power and voltage remain unchanged.

2. What is the Version-Release number of the kernel:

5.11.7-200.fc33.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

It worked as recently as 5.10.23-200.fc33.x86_64

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

I can reproduce this on demand by booting into 5.11.7 (and then stop it by booting back into 5.10.23).

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Have not tested. Sorry, I'm not running Rawhide kernels on a machine I need to work.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

Yes, Guenter Roeck's it87 module that supports my ASUS Prime X370-Pro motherboard and the OpenZFS ZFS modules (latest development versions). In fact the issue reproduces without the it87 kernel module loaded, although I normally do have it active.

(I sometimes use VMWare Workstation's out of kernel modules, but I have reproduced this without them loaded; the dmesg attached is from such a reproduction.)


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Chris Siebenmann 2021-03-23 17:16:42 UTC
I had an opportunity to inspect the physical machine today and it turns out that the fans are not running at all, despite what appears in hwmon/hwmon2/fan1_input (and is reported by 'sensors' from lm_sensors). hwmon/hwmon2/fan1_enable is 0, but setting it to '1' does nothing.

Comment 2 Chris Siebenmann 2021-03-23 19:41:29 UTC
It appears that the fan doesn't turn on at all, even under high load and high temperatures. I ran a GPU benchmark that raised GPU temperatures to over 80C and the fans were still not active. On 5.10.23, fan RPMs rise to a reported 1500 RPM by the time the GPU hits 64 C (and the listed GPU power consumption is only slightly under what lm_sensors lists as the cap).

Comment 3 Chris Siebenmann 2021-03-24 19:54:51 UTC
This issue is still present in the just-released 5.11.8-200.fc33.x86_64 kernel.

Comment 4 Chris Siebenmann 2021-03-26 16:49:48 UTC
This issue is still present in the just-released 5.11.9-200.fc33.x86_64 kernel.

Comment 5 Chris Siebenmann 2021-03-29 14:35:41 UTC
This issue is still present in the just-released 5.11.10-200.fc33.x86_64 kernel.

Comment 6 Chris Siebenmann 2021-04-05 14:56:32 UTC
This issue is still present in the just-released 5.11.11-200.fc33.x86_64 kernel.

Comment 7 Chris Siebenmann 2021-04-13 15:20:34 UTC
This issue is still present in the just-released 5.11.12-200.fc33.x86_64 kernel.

Comment 8 Chris Siebenmann 2021-05-17 15:51:38 UTC
This issue is still present in the just-released 5.11.20-200.fc33.x86_64 (and has been present in the few intermediate kernels I also checked).

Comment 9 Chris Siebenmann 2021-05-25 16:20:05 UTC
This issue is still present in the just-released 5.12.6-200.fc33.x86_64 kernel.

Comment 10 Chris Siebenmann 2021-05-25 17:01:39 UTC
Examining boot time messages between 5.10 (working) and 5.11 and 5.12 (not), the 5.11 and 5.12 kernels report:

amdgpu 0000:0a:00.0: amdgpu: Using BACO for runtime pm

The 5.10 kernel(s) also report values for clocks from DM PPLIB, while 5.12 and 5.11 don't:

hawkwind.cs kernel: [drm] DM_PPLIB: values for Engine clock
hawkwind.cs kernel: [drm] DM_PPLIB:         214000
hawkwind.cs kernel: [drm] DM_PPLIB:         551000
hawkwind.cs kernel: [drm] DM_PPLIB:         734000
hawkwind.cs kernel: [drm] DM_PPLIB:         980000
hawkwind.cs kernel: [drm] DM_PPLIB:         1046000
hawkwind.cs kernel: [drm] DM_PPLIB:         1098000
hawkwind.cs kernel: [drm] DM_PPLIB:         1124000
hawkwind.cs kernel: [drm] DM_PPLIB:         1206000
hawkwind.cs kernel: [drm] DM_PPLIB: Validation clocks:
hawkwind.cs kernel: [drm] DM_PPLIB:    engine_max_clock: 120600
hawkwind.cs kernel: [drm] DM_PPLIB:    memory_max_clock: 175000
hawkwind.cs kernel: [drm] DM_PPLIB:    level           : 8
hawkwind.cs kernel: [drm] DM_PPLIB: values for Memory clock
hawkwind.cs kernel: [drm] DM_PPLIB:         300000
hawkwind.cs kernel: [drm] DM_PPLIB:         625000
hawkwind.cs kernel: [drm] DM_PPLIB:         1750000
hawkwind.cs kernel: [drm] DM_PPLIB: Validation clocks:
hawkwind.cs kernel: [drm] DM_PPLIB:    engine_max_clock: 120600
hawkwind.cs kernel: [drm] DM_PPLIB:    memory_max_clock: 175000
hawkwind.cs kernel: [drm] DM_PPLIB:    level           : 8

5.12 and 5.10 report different DRM display core initialization versions:

  [drm] Display Core initialized with v3.2.122!

5.10 reports v3.2.104.

Comment 11 Chris Siebenmann 2021-05-25 17:45:48 UTC
More poking in /sys and some (remote) experiments have revealed that setting pwm1_enable to 1 and writing a suitable non-zero value to pwm1 in /sys/devices/pci0000:00/0000:00:03.1/0000:0a:00.0/hwmon/hwmon2 will cause the fan to apparently spin up and the card to cool down. On 5.12, pwm1_enable's normal value is 2, but pwm1 itself sticks at zero, instead of the '81' that it normally is on 5.10.

Changing pwm1_enable back to 2 after it was set to 1 (and pwm1 set to something) on 5.12.6 causes pwm1 to shift rapidly around in a range between 94 and 127 (so far) and the reported GPU temperature to hold steady around 30 C (which is somewhat cooler than 5.10 was holding the card; at the moment that was about 32 C, up from 28 C presumably due to summer heat arriving here and the ambient office temperature going up).

Comment 12 Neal Gompa 2021-05-30 00:42:33 UTC
> 5. Does this problem occur with the latest Rawhide kernel? To install the
>   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
>   ``sudo dnf update --enablerepo=rawhide kernel``:
>
> Have not tested. Sorry, I'm not running Rawhide kernels on a machine I need to work.

This makes things a bit more difficult, because right now, the 5.13 rc kernels are in Rawhide and knowing whether this is still broken in an RC kernel is valuable so that it can be looked at to be fixed during this kernel cycle and backported to stable kernels. And if it's fixed in 5.13, then at least there's that as an option too.

Comment 13 Chris Siebenmann 2021-05-31 15:22:57 UTC
I tried to quickly test a Rawhide kernel, but discovered that OpenZFS isn't compatible with 5.13-rc at this point (its work for even 5.12 is still somewhat in progress in git tip). Since much of my data storage is in ZFS pools, I cannot even start to reboot my office machine remotely without ZFS available (at the moment and for the likely future we are not in the office).

Comment 14 Ben Cotton 2021-11-04 13:57:23 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Ben Cotton 2021-11-04 14:26:43 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Ben Cotton 2021-11-04 15:24:22 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Chris Siebenmann 2021-11-05 14:01:02 UTC
This continues to be the case on Fedora 34 with kernels up to 5.14.15-200.fc34.x86_64. I've updated this to be a Fedora 34 bug.

Comment 18 Ben Cotton 2022-05-12 15:38:41 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 19 Ben Cotton 2022-06-07 22:28:31 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.