Bug 1669564 - nct6775: Kernel 4.20.3-200.fc29.x86_64 doesn't detect all fan sensors on an Asus PRIME Z370-A motherboard when prior kernels did
Summary: nct6775: Kernel 4.20.3-200.fc29.x86_64 doesn't detect all fan sensors on an A...
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel   
(Show other bugs)
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-25 16:43 UTC by Chris Siebenmann
Modified: 2019-02-19 02:35 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
4.20.3 boot dmesg (67.44 KB, text/plain)
2019-01-25 16:43 UTC, Chris Siebenmann
no flags Details
4.19.15 lsmod (3.83 KB, text/plain)
2019-01-25 22:04 UTC, Chris Siebenmann
no flags Details
4.19.15 /sys/devices hwmon and fan (2.75 KB, text/plain)
2019-01-25 22:04 UTC, Chris Siebenmann
no flags Details
4.20.3 lsmod (3.83 KB, text/plain)
2019-01-25 22:05 UTC, Chris Siebenmann
no flags Details
4.20.3 /sys/devices hwmon and fan (2.52 KB, text/plain)
2019-01-25 22:06 UTC, Chris Siebenmann
no flags Details
4.19.15 grep-sys-devices-nct6775-fan-4.19.15.txt (2.43 KB, text/plain)
2019-01-26 05:02 UTC, Chris Siebenmann
no flags Details
4.20.3 grep-sys-devices-nct6775-fan-4.20.3.txt (2.20 KB, text/plain)
2019-01-26 05:03 UTC, Chris Siebenmann
no flags Details

Description Chris Siebenmann 2019-01-25 16:43:12 UTC
Created attachment 1523544 [details]
4.20.3 boot dmesg

1. Please describe the problem:

I have an Asus PRIME Z370-A based desktop. In kernels before 4.20.3-200,
the nct6775 hardware sensors module correctly detected and reported all
six motherboard fan sensors (including in 4.19.15-300). In 4.20.3, only
the first five fan sensors are detected. In both kernels, the nct6775
reports detecting the same chip:

  nct6775: Enabling hardware monitor logical device mappings.
  nct6775: Found NCT6793D or compatible chip at 0x2e:0x290

(This is the correct chip for this motherboard.)

In 4.19.15, /sys/devices/platform/nct6775.656/hwmon/hwmon3 contains a
variety of files for fan6:

  fan6_alarm fan6_input fan6_min fan6_pulses fan6_target fan6_tolerance

In 4.20.3, this directory only has fan6_target and fan6_tolerance. This
is the only difference between the files in the directory in the two
versions.

To get results from the nct6775 driver on this motherboard requires
booting the kernel with 'acpi_enforce_resources=lax'. In both 4.19.15
and 4.20.3, booting the kernel and bringing up the driver (with or
without the option) produces some ACPI messages:

  ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\_GPE.HWM) (20181003/utaddress-213)
  ACPI: This conflict may cause random problems and system instability
  ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

Reproduction of this issue is particularly visible to me because fan6
on this motherboard is the second chassis fan.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

Yes in both 4.20.3 and 4.19.15; WireGuard from COPR jdoss/wireguard,
and ZFS on Linux (0.8.0-rc3). The two kernels have the same versions
of the modules installed (through DKMS) and I have used various versions
of both for a long time on previous kernel versions without problems.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I've attached a dmesg dump from 4.20.3 from boot through me logging in,
verifying that sensors did not see the sixth fan, and capturing
dmesg output. I can provide a 4.19.3 dmesg dump from the same situation.
As far as I can tell they have no substantial differences.

Comment 1 Steve 2019-01-25 18:21:33 UTC
Thanks for your report. The attached log shows:

$ grep fans dmesg-4.20.3-200.fc29.x86_64 
[    3.755678] asus_wmi: Number of fans: 1

Could you attach the output from:

$ lsmod > lsmod-1.txt
$ find /sys/devices/ -name '*hwmon*' -o -name '*fan*' > sys-devices-hwmon-fan-1.txt

Also, have you had any problems with fan noise? Compare:

Bug 1665750 - ASUS ROG laptop: asus_wmi: Number of fans: 0 (Bug 1665750, Comment 10)
Bug 1663927 - I installed Fedora 29 to Asus X542U. Now fan is working always and I can not control it.

Comment 2 Chris Siebenmann 2019-01-25 18:41:05 UTC
I haven't had any problems with fan noise in either 4.19.15 or 4.20.3
(when I was running it), and 'sensors' reports that fans it can see have
plausible RPMs. All of my fans that are connected to the motherboard
sensor points at all are visible in 4.19.15 through the nct6775 driver
(and I have no other fans in this desktop, apart from one in the power
supply, which as far as I know cannot be read at all). Under both 4.19.15
and 4.20.3, the asus_wmi data reports a fan RPM of '0' (I have logged
sensors output from both). On at least 4.19.15, the actual /sys/devices
device directory for asus_wmi is 'platform/eeepc-wmi'.

I will get the lsmod and find output for 4.20.3 later, when I can
reboot the machine into that kernel (it's currently running 4.19.15).
I can get 4.19.15 /sys/devices output now if that would be helpful.

Comment 3 Steve 2019-01-25 19:17:54 UTC
(In reply to Chris Siebenmann from comment #2)
> I haven't had any problems with fan noise ...

OK.

> All of my fans that are connected to the motherboard
> sensor points at all are visible in 4.19.15 through the nct6775 driver
> (and I have no other fans in this desktop, apart from one in the power
> supply, which as far as I know cannot be read at all). Under both 4.19.15
> and 4.20.3, the asus_wmi data reports a fan RPM of '0' (I have logged
> sensors output from both). On at least 4.19.15, the actual /sys/devices
> device directory for asus_wmi is 'platform/eeepc-wmi'.

Thanks for pointing that out:

$ modinfo eeepc_wmi | egrep 'alias|description|depends'
alias:          wmi:ABBC0F72-8EA1-11D1-00A0-C90629100000
description:    Eee PC WMI Hotkey Driver
depends:        asus-wmi

> I will get the lsmod and find output for 4.20.3 later, when I can
> reboot the machine into that kernel (it's currently running 4.19.15).
> I can get 4.19.15 /sys/devices output now if that would be helpful.

Since you are reporting a regression, attaching both is a good idea:

sys-devices-hwmon-fan-4.19.15.txt
sys-devices-hwmon-fan-4.20.3.txt

Comment 4 Steve 2019-01-25 20:14:06 UTC
(In reply to Steve from comment #3)
...
> Since you are reporting a regression, attaching both is a good idea:
> 
> sys-devices-hwmon-fan-4.19.15.txt
> sys-devices-hwmon-fan-4.20.3.txt

There was an unrelated bug recently in which the loaded modules were different for different kernels, so could you also attach the lsmod output for both kernels?

$ lsmod > lsmod-4.19.15.txt
$ find /sys/devices/ -name '*hwmon*' -o -name '*fan*' > sys-devices-hwmon-fan-4.19.15.txt

$ lsmod > lsmod-4.20.3.txt
$ find /sys/devices/ -name '*hwmon*' -o -name '*fan*' > sys-devices-hwmon-fan-4.20.3.txt

Comment 5 Chris Siebenmann 2019-01-25 22:04 UTC
Created attachment 1523601 [details]
4.19.15 lsmod

Comment 6 Chris Siebenmann 2019-01-25 22:04 UTC
Created attachment 1523602 [details]
4.19.15 /sys/devices hwmon and fan

Comment 7 Chris Siebenmann 2019-01-25 22:05 UTC
Created attachment 1523603 [details]
4.20.3 lsmod

Comment 8 Chris Siebenmann 2019-01-25 22:06 UTC
Created attachment 1523604 [details]
4.20.3 /sys/devices hwmon and fan

Comment 9 Steve 2019-01-26 00:48:58 UTC
Thanks for the attachments. This confirms what you said:

$ diff -u --label '4.19.15' --label '4.20.3' sys-devices-hwmon-fan-4.19.15-sort.txt sys-devices-hwmon-fan-4.20.3-sort.txt
--- 4.19.15
+++ 4.20.3
@@ -41,10 +41,6 @@
 /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan5_pulses
 /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan5_target
 /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan5_tolerance
-/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_alarm
-/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_input
-/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_min
-/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_pulses
 /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_target
 /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_tolerance
 /sys/devices/virtual/thermal/thermal_zone0/hwmon0

(NB: I sorted the files first.)

The lsmod output doesn't show any changes in the loaded modules other than sizes.

Comment 10 Steve 2019-01-26 02:28:25 UTC
This could be related:

hwmon: (nct6775) Only display fan speed tolerance conditionally
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/hwmon/nct6775.c?h=v4.20.3&id=61b6c66a8f740b5025ac49ddf1c2e29091a1274e

Could you attach the output for:

$ grep . /sys/devices/platform/nct6775.*/hwmon/hwmon*/fan* > grep-sys-devices-nct6775-fan-4.19.15.txt
$ grep . /sys/devices/platform/nct6775.*/hwmon/hwmon*/fan* > grep-sys-devices-nct6775-fan-4.20.3.txt

For reference, here is a list of commits for nct6775.c in 4.20.3:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/hwmon/nct6775.c?h=v4.20.3

Comment 11 Chris Siebenmann 2019-01-26 04:32:42 UTC
It looks there is a potentially significant difference in the code
introduced here:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/hwmon/nct6775.c?h=v4.20.3&id=2d99925a15b639026b67bd96419df6f9d760b212

Here is the code before the change, and I think we care about fan6pin:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/hwmon/nct6775.c?h=v4.20.3&id=7dcdbdeb1b45b9071ad986bf20d8c2da6a057eb6#n3532

After the change:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/hwmon/nct6775.c?h=v4.20.3&id=2d99925a15b639026b67bd96419df6f9d760b212#n3532

We change from:
    fan6pin = !dsw_en && (cr2d & BIT(1));
    fan6pin |= creb & BIT(3);

To just:
    fan6pin = creb & BIT(3);

If dsw_en is false, this could create a different result. The 4.19.15
code is differently structured but, I believe, has the same calculation
for fan6pin:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/hwmon/nct6775.c?h=v4.19.15&id=e3185123541204ca4f715eeaaa1f9929c09ff3b4#n3514

The old code runs:
   if (!dsw_en) {
     fan6pin = regval & BIT(1);
     ...
   }
   ...
   if (!fan6pin)
     fan6pin = regval_eb & BIT(3);

regval is the current cr2d, regval_eb is the current creb. dsw_en
is 'bool dsw_en = cr2f & BIT(3);', but I don't know if there's any
way of figuring out its state from outside the driver.

Comment 12 Chris Siebenmann 2019-01-26 05:02 UTC
Created attachment 1523634 [details]
4.19.15 grep-sys-devices-nct6775-fan-4.19.15.txt

Comment 13 Chris Siebenmann 2019-01-26 05:03 UTC
Created attachment 1523635 [details]
4.20.3 grep-sys-devices-nct6775-fan-4.20.3.txt

Comment 14 Steve 2019-01-26 05:12:16 UTC
(In reply to Chris Siebenmann from comment #11)
...
> We change from:
>     fan6pin = !dsw_en && (cr2d & BIT(1));
>     fan6pin |= creb & BIT(3);
> 
> To just:
>     fan6pin = creb & BIT(3);
...

Good catch. It looks like the first fan6pin assignment didn't get included in the refactoring.

I tried to add the maintainer, Guenter Roeck, to the CC list, but BZ won't accept his email address, because he is not registered. If you want to, you could try emailing him:

$ modinfo nct6775 | grep author
author:         Guenter Roeck <linux@roeck-us.net>

Comment 15 Steve 2019-01-26 05:41:33 UTC
I suggest adding this to the beginning of the bug summary: "nct6775: ".

Comment 16 Chris Siebenmann 2019-01-27 23:43:14 UTC
I've now sent email to Guenter Roeck describing the issue and so on (and
giving the URL of this bug).

Comment 17 Steve 2019-01-29 06:53:10 UTC
(In reply to Chris Siebenmann from comment #16)
> I've now sent email to Guenter Roeck describing the issue and so on (and
> giving the URL of this bug).

Guenter has a fix in his git repo:

hwmon: (nct6775) Fix fan6 detection for NCT6793D
https://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git/commit/?h=hwmon&id=2a2ec4aa0577ec0b7df2d1bde5c84ed39a8637cb

Comment 18 Chris Siebenmann 2019-02-05 04:49:52 UTC
I've hand-built a version of the fixed module for 4.20.5-200.fc29.x86_64
and can verify that it works for me; it detects all six fans on my
motherboard.

Comment 19 Steve 2019-02-10 00:44:41 UTC
(In reply to Chris Siebenmann from comment #18)
> I've hand-built a version of the fixed module for 4.20.5-200.fc29.x86_64
> and can verify that it works for me; it detects all six fans on my
> motherboard.

Thanks for testing the patch. There appears to be a minor glitch in the commit:

Subject	linux-next: Fixes tag needs some work in the hwmon-fixes tree
https://lkml.org/lkml/2019/1/27/177

The commit ID in that message is different from the one in Comment 17, so here is a link:

hwmon: (nct6775) Fix fan6 detection for NCT6793D
https://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git/commit/?id=315fd42cc1f8837a134c3b671dde9b132c46ddcb

Comment 20 Steve 2019-02-10 04:00:43 UTC
(In reply to Steve from comment #19)
... 
> The commit ID in that message is different from the one in Comment 17, so here is a link:
...

OK, I figured it out -- commit 2a2ec4aa0577 has the "Fixes" tag on one line, and it is in the "linux-next" git repo:

hwmon: (nct6775) Fix fan6 detection for NCT6793D
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=2a2ec4aa0577ec0b7df2d1bde5c84ed39a8637cb

Comment 21 Steve 2019-02-19 02:35:16 UTC
The fix is in kernel 5.0-rc7:

Merge tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.0-rc7&id=991b9eb4243b53e6dcaeda94e515d713ca7ddd2e


Note You need to log in before you can comment on or make changes to this bug.