Bug 1599642

Summary: Reported temperature of nvidia card with nouveau driver is wrong
Product: [Fedora] Fedora Reporter: Jirka Novak <j.novak>
Component: lm_sensorsAssignee: Ondřej Lysoněk <olysonek>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: hdegoede, jaromir.capik, olysonek, pknirsch
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-23 07:20:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jirka Novak 2018-07-10 09:36:01 UTC
Description of problem:
When I use sensors tool or any other tool for showing temperature of system components, I see obviously wrong temperature for my nvidia card controlled by nouveau driver:

nouveau-pci-0100
Adapter: PCI adapter
temp1:       +511.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

My guess is that temperature is multiplied by 10.

Version-Release number of selected component (if applicable):
Linux p3530 4.17.3-200.fc28.x86_64 #1 SMP Tue Jun 26 14:17:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
lm_sensors-3.4.0-13.fc28.x86_64
lm_sensors-libs-3.4.0-13.fc28.x86_64

How reproducible:
During system operation.

Steps to Reproduce:
1. Run the system

Actual results:
nouveau-pci-0100
Adapter: PCI adapter
temp1:       +511.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

Expected results:
Probably +51.0 or +51.1 in place of +511.0

Additional info:
My HW is:
Product Name:           Precision 3530
Vendor:                 Dell Inc.
BIOS Version:           1.2.5
System ID:              0x0820
Service Tag:            7Z694Q2

Comment 1 Ben Cotton 2019-05-02 21:20:24 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Ondřej Lysoněk 2019-05-03 10:11:48 UTC
Can you still reproduce it? If so, let's determine if this is a problem on the lm_sensors side or the kernel side. What is the output of the following, when the reported temperature is wrong?
grep . $(dirname $(grep -l nouveau /sys/class/hwmon/*/name))/temp*

Comment 3 Jirka Novak 2019-10-23 05:22:29 UTC
Sorry I missed your comment.
In meantime I switched whole nouveau off as it was not able suspend/resume on latest kernels in FC30. Before I made it (mid of this year), I checked temperature and it was wrong after resume on 5.1.11-300.fc30.x86_64.
Now I switched to intel GPU therefore I can't validate it again.

Best regards,

Jirka Novak

Comment 4 Ondřej Lysoněk 2019-10-23 07:20:55 UTC
Ok. Let's close the bug for now. Note however that this is most likely a problem on the driver side.