Bug 1884016

Summary: intel_gpu_top segfaults on startup due a failed assert in tr_pmu_name
Product: [Fedora] Fedora Reporter: Chris Siebenmann <cks-rhbugzilla>
Component: igt-gpu-toolsAssignee: Lyude <lyude>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: airlied, lyude
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: igt-gpu-tools-1.25-1.20201012gitd5f40f0.fc31 igt-gpu-tools-1.25-1.20201012gitd5f40f0.fc32 igt-gpu-tools-1.25-1.20201012gitd5f40f0.fc33 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-21 19:58:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Siebenmann 2020-09-30 18:57:43 UTC
Description of problem:
intel_gpu_top segfaults on startup:

   # intel_gpu_top
   intel_gpu_top: ../tools/intel_gpu_top.c:1297: tr_pmu_name: Assertion `ret == (bufsize-1)' failed.
   abort--core dumped

This is happening on an Intel Core i7-8700K where I am using integrated graphics (and thus the on-chip GPU is active). 'intel_gpu_top -L' lists only the onboard device (in three variants), none of which work. Eg:

  # intel_gpu_top sys:/sys/devices/pci0000:00/0000:00:02.0
  intel_gpu_top: ../tools/intel_gpu_top.c:1297: tr_pmu_name: Assertion `ret == (bufsize-1)' failed.
  abort--core dumped

Running intel_gpu_top under valgrind reports:
  ==1582376== Conditional jump or move depends on uninitialised value(s)
  ==1582376==    at 0x402CE3: main (intel_gpu_top.c:1408)

(and then it works.)

Based on inspection of the source code, this occurs because the 'struct
igt_device_card card' in main() is never explicitly zeroed. The code
appears to assume that card.pci_slot_name[0] will be zero if it has never
been set (it's set by either igt_device_find_first_i915_discrete_card() or
igt_device_card_match()), but as a local variable its contents may be
random, and so they are.

Version-Release number of selected component (if applicable):

igt-gpu-tools-1.25-1.20200920git0ec9620.fc31.x86_64

How reproducible:

100%.

Steps to Reproduce:
1. Try to run 'intel_gpu_top'

Actual results:

Assertion failure and segfault.

Expected results:

Proper operation.

Comment 1 Lyude 2020-09-30 19:49:12 UTC
Hi, wrote up a patch that should fix this and cc'd it to your email. I've also launched a brew build with the patch applies, could you test this and let me know if it fixes your issue? if it does I'll go ahead and start up a new update for igt

Comment 2 Chris Siebenmann 2020-09-30 21:27:33 UTC
I think I need a test RPM build to reproduce this. My personal builds of intel-gpu-tools
from the git upstream have never exhibited this problem, probably because the default
source tree build uses different options than the RPM build, so I don't have confidence
that I can properly build and test a patch.

Comment 3 Lyude 2020-10-01 18:52:44 UTC
WHOOPS-I meant to cc you on the patch, but I also actually meant to post a link to the RPM build that I already created for this yesterday: https://koji.fedoraproject.org/koji/taskinfo?taskID=52540860

Comment 4 Chris Siebenmann 2020-10-02 14:15:17 UTC
This RPM build appears to work for me; intel_gpu_top doesn't fail its assertion and seems to run fine (ie, it produces plausible results and so on).

Comment 5 Fedora Update System 2020-10-12 23:15:37 UTC
FEDORA-2020-29c7e1b7b5 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-29c7e1b7b5

Comment 6 Fedora Update System 2020-10-12 23:15:38 UTC
FEDORA-2020-4874238986 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-4874238986

Comment 7 Fedora Update System 2020-10-13 20:18:48 UTC
FEDORA-2020-93017310c0 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-93017310c0`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-93017310c0

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 8 Fedora Update System 2020-10-13 21:12:02 UTC
FEDORA-2020-29c7e1b7b5 has been pushed to the Fedora 31 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-29c7e1b7b5`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-29c7e1b7b5

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2020-10-13 22:39:15 UTC
FEDORA-2020-4874238986 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-4874238986`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-4874238986

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 10 Fedora Update System 2020-10-21 19:58:26 UTC
FEDORA-2020-29c7e1b7b5 has been pushed to the Fedora 31 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 11 Fedora Update System 2020-10-21 19:58:44 UTC
FEDORA-2020-93017310c0 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 Fedora Update System 2020-10-23 22:12:57 UTC
FEDORA-2020-4874238986 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.