Bug 1669564
Summary: | nct6775: Kernel 4.20.3-200.fc29.x86_64 doesn't detect all fan sensors on an Asus PRIME Z370-A motherboard when prior kernels did | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Chris Siebenmann <cks-rhbugzilla> | ||||||||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||||||
Priority: | unspecified | ||||||||||||||||||
Version: | 28 | CC: | airlied, bskeggs, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, steved, y9t7sypezp | ||||||||||||||||
Target Milestone: | --- | ||||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2019-05-28 22:25:57 UTC | Type: | Bug | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Thanks for your report. The attached log shows: $ grep fans dmesg-4.20.3-200.fc29.x86_64 [ 3.755678] asus_wmi: Number of fans: 1 Could you attach the output from: $ lsmod > lsmod-1.txt $ find /sys/devices/ -name '*hwmon*' -o -name '*fan*' > sys-devices-hwmon-fan-1.txt Also, have you had any problems with fan noise? Compare: Bug 1665750 - ASUS ROG laptop: asus_wmi: Number of fans: 0 (Bug 1665750, Comment 10) Bug 1663927 - I installed Fedora 29 to Asus X542U. Now fan is working always and I can not control it. I haven't had any problems with fan noise in either 4.19.15 or 4.20.3 (when I was running it), and 'sensors' reports that fans it can see have plausible RPMs. All of my fans that are connected to the motherboard sensor points at all are visible in 4.19.15 through the nct6775 driver (and I have no other fans in this desktop, apart from one in the power supply, which as far as I know cannot be read at all). Under both 4.19.15 and 4.20.3, the asus_wmi data reports a fan RPM of '0' (I have logged sensors output from both). On at least 4.19.15, the actual /sys/devices device directory for asus_wmi is 'platform/eeepc-wmi'. I will get the lsmod and find output for 4.20.3 later, when I can reboot the machine into that kernel (it's currently running 4.19.15). I can get 4.19.15 /sys/devices output now if that would be helpful. (In reply to Chris Siebenmann from comment #2) > I haven't had any problems with fan noise ... OK. > All of my fans that are connected to the motherboard > sensor points at all are visible in 4.19.15 through the nct6775 driver > (and I have no other fans in this desktop, apart from one in the power > supply, which as far as I know cannot be read at all). Under both 4.19.15 > and 4.20.3, the asus_wmi data reports a fan RPM of '0' (I have logged > sensors output from both). On at least 4.19.15, the actual /sys/devices > device directory for asus_wmi is 'platform/eeepc-wmi'. Thanks for pointing that out: $ modinfo eeepc_wmi | egrep 'alias|description|depends' alias: wmi:ABBC0F72-8EA1-11D1-00A0-C90629100000 description: Eee PC WMI Hotkey Driver depends: asus-wmi > I will get the lsmod and find output for 4.20.3 later, when I can > reboot the machine into that kernel (it's currently running 4.19.15). > I can get 4.19.15 /sys/devices output now if that would be helpful. Since you are reporting a regression, attaching both is a good idea: sys-devices-hwmon-fan-4.19.15.txt sys-devices-hwmon-fan-4.20.3.txt (In reply to Steve from comment #3) ... > Since you are reporting a regression, attaching both is a good idea: > > sys-devices-hwmon-fan-4.19.15.txt > sys-devices-hwmon-fan-4.20.3.txt There was an unrelated bug recently in which the loaded modules were different for different kernels, so could you also attach the lsmod output for both kernels? $ lsmod > lsmod-4.19.15.txt $ find /sys/devices/ -name '*hwmon*' -o -name '*fan*' > sys-devices-hwmon-fan-4.19.15.txt $ lsmod > lsmod-4.20.3.txt $ find /sys/devices/ -name '*hwmon*' -o -name '*fan*' > sys-devices-hwmon-fan-4.20.3.txt Created attachment 1523601 [details]
4.19.15 lsmod
Created attachment 1523602 [details]
4.19.15 /sys/devices hwmon and fan
Created attachment 1523603 [details]
4.20.3 lsmod
Created attachment 1523604 [details]
4.20.3 /sys/devices hwmon and fan
Thanks for the attachments. This confirms what you said: $ diff -u --label '4.19.15' --label '4.20.3' sys-devices-hwmon-fan-4.19.15-sort.txt sys-devices-hwmon-fan-4.20.3-sort.txt --- 4.19.15 +++ 4.20.3 @@ -41,10 +41,6 @@ /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan5_pulses /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan5_target /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan5_tolerance -/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_alarm -/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_input -/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_min -/sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_pulses /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_target /sys/devices/platform/nct6775.656/hwmon/hwmon3/fan6_tolerance /sys/devices/virtual/thermal/thermal_zone0/hwmon0 (NB: I sorted the files first.) The lsmod output doesn't show any changes in the loaded modules other than sizes. This could be related: hwmon: (nct6775) Only display fan speed tolerance conditionally https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/hwmon/nct6775.c?h=v4.20.3&id=61b6c66a8f740b5025ac49ddf1c2e29091a1274e Could you attach the output for: $ grep . /sys/devices/platform/nct6775.*/hwmon/hwmon*/fan* > grep-sys-devices-nct6775-fan-4.19.15.txt $ grep . /sys/devices/platform/nct6775.*/hwmon/hwmon*/fan* > grep-sys-devices-nct6775-fan-4.20.3.txt For reference, here is a list of commits for nct6775.c in 4.20.3: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/hwmon/nct6775.c?h=v4.20.3 It looks there is a potentially significant difference in the code introduced here: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/hwmon/nct6775.c?h=v4.20.3&id=2d99925a15b639026b67bd96419df6f9d760b212 Here is the code before the change, and I think we care about fan6pin: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/hwmon/nct6775.c?h=v4.20.3&id=7dcdbdeb1b45b9071ad986bf20d8c2da6a057eb6#n3532 After the change: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/hwmon/nct6775.c?h=v4.20.3&id=2d99925a15b639026b67bd96419df6f9d760b212#n3532 We change from: fan6pin = !dsw_en && (cr2d & BIT(1)); fan6pin |= creb & BIT(3); To just: fan6pin = creb & BIT(3); If dsw_en is false, this could create a different result. The 4.19.15 code is differently structured but, I believe, has the same calculation for fan6pin: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/hwmon/nct6775.c?h=v4.19.15&id=e3185123541204ca4f715eeaaa1f9929c09ff3b4#n3514 The old code runs: if (!dsw_en) { fan6pin = regval & BIT(1); ... } ... if (!fan6pin) fan6pin = regval_eb & BIT(3); regval is the current cr2d, regval_eb is the current creb. dsw_en is 'bool dsw_en = cr2f & BIT(3);', but I don't know if there's any way of figuring out its state from outside the driver. Created attachment 1523634 [details]
4.19.15 grep-sys-devices-nct6775-fan-4.19.15.txt
Created attachment 1523635 [details]
4.20.3 grep-sys-devices-nct6775-fan-4.20.3.txt
(In reply to Chris Siebenmann from comment #11) ... > We change from: > fan6pin = !dsw_en && (cr2d & BIT(1)); > fan6pin |= creb & BIT(3); > > To just: > fan6pin = creb & BIT(3); ... Good catch. It looks like the first fan6pin assignment didn't get included in the refactoring. I tried to add the maintainer, Guenter Roeck, to the CC list, but BZ won't accept his email address, because he is not registered. If you want to, you could try emailing him: $ modinfo nct6775 | grep author author: Guenter Roeck <linux> I suggest adding this to the beginning of the bug summary: "nct6775: ". I've now sent email to Guenter Roeck describing the issue and so on (and giving the URL of this bug). (In reply to Chris Siebenmann from comment #16) > I've now sent email to Guenter Roeck describing the issue and so on (and > giving the URL of this bug). Guenter has a fix in his git repo: hwmon: (nct6775) Fix fan6 detection for NCT6793D https://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git/commit/?h=hwmon&id=2a2ec4aa0577ec0b7df2d1bde5c84ed39a8637cb I've hand-built a version of the fixed module for 4.20.5-200.fc29.x86_64 and can verify that it works for me; it detects all six fans on my motherboard. (In reply to Chris Siebenmann from comment #18) > I've hand-built a version of the fixed module for 4.20.5-200.fc29.x86_64 > and can verify that it works for me; it detects all six fans on my > motherboard. Thanks for testing the patch. There appears to be a minor glitch in the commit: Subject linux-next: Fixes tag needs some work in the hwmon-fixes tree https://lkml.org/lkml/2019/1/27/177 The commit ID in that message is different from the one in Comment 17, so here is a link: hwmon: (nct6775) Fix fan6 detection for NCT6793D https://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git/commit/?id=315fd42cc1f8837a134c3b671dde9b132c46ddcb (In reply to Steve from comment #19) ... > The commit ID in that message is different from the one in Comment 17, so here is a link: ... OK, I figured it out -- commit 2a2ec4aa0577 has the "Fixes" tag on one line, and it is in the "linux-next" git repo: hwmon: (nct6775) Fix fan6 detection for NCT6793D https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=2a2ec4aa0577ec0b7df2d1bde5c84ed39a8637cb The fix is in kernel 5.0-rc7: Merge tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.0-rc7&id=991b9eb4243b53e6dcaeda94e515d713ca7ddd2e This message is a reminder that Fedora 28 is nearing its end of life. On 2019-May-28 Fedora will stop maintaining and issuing updates for Fedora 28. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '28'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 28 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Two notes: first, this was accidentally filed or classified against Fedora 28, although I was running Fedora 29 at the time I found it. Second, this is fixed in at least the recent Fedora 29 kernels (I can't speak for Fedora 28 ones). It's certainly fixed in 5.0.7-200.fc29.x86_64, and I believe it was also fixed in several prior Fedora 29 5.0.x kernels. So I think that this can be closed as 'fixed in errata'. Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |
Created attachment 1523544 [details] 4.20.3 boot dmesg 1. Please describe the problem: I have an Asus PRIME Z370-A based desktop. In kernels before 4.20.3-200, the nct6775 hardware sensors module correctly detected and reported all six motherboard fan sensors (including in 4.19.15-300). In 4.20.3, only the first five fan sensors are detected. In both kernels, the nct6775 reports detecting the same chip: nct6775: Enabling hardware monitor logical device mappings. nct6775: Found NCT6793D or compatible chip at 0x2e:0x290 (This is the correct chip for this motherboard.) In 4.19.15, /sys/devices/platform/nct6775.656/hwmon/hwmon3 contains a variety of files for fan6: fan6_alarm fan6_input fan6_min fan6_pulses fan6_target fan6_tolerance In 4.20.3, this directory only has fan6_target and fan6_tolerance. This is the only difference between the files in the directory in the two versions. To get results from the nct6775 driver on this motherboard requires booting the kernel with 'acpi_enforce_resources=lax'. In both 4.19.15 and 4.20.3, booting the kernel and bringing up the driver (with or without the option) produces some ACPI messages: ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\_GPE.HWM) (20181003/utaddress-213) ACPI: This conflict may cause random problems and system instability ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver Reproduction of this issue is particularly visible to me because fan6 on this motherboard is the second chassis fan. 6. Are you running any modules that not shipped with directly Fedora's kernel?: Yes in both 4.20.3 and 4.19.15; WireGuard from COPR jdoss/wireguard, and ZFS on Linux (0.8.0-rc3). The two kernels have the same versions of the modules installed (through DKMS) and I have used various versions of both for a long time on previous kernel versions without problems. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. I've attached a dmesg dump from 4.20.3 from boot through me logging in, verifying that sensors did not see the sixth fan, and capturing dmesg output. I can provide a 4.19.3 dmesg dump from the same situation. As far as I can tell they have no substantial differences.