Bug 1758259 - long execution times with draw program after updating to 7.6 intel graphics driver
Summary: long execution times with draw program after updating to 7.6 intel graphics d...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: xorg-x11-drv-intel
Version: 7.6
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Adam Jackson
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-03 17:07 UTC by Joe Wright
Modified: 2022-01-05 22:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-11 21:50:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer program (41.95 KB, application/x-executable)
2019-10-03 17:08 UTC, Joe Wright
no flags Details

Description Joe Wright 2019-10-03 17:07:42 UTC
Description of problem:
- long execution time and poor output performance with xorg-x11-drv-intel-2.99.917-28.20180530.el7


Version-Release number of selected component (if applicable):
xorg-x11-drv-intel-2.99.917-28.20180530.el7

How reproducible:


Steps to Reproduce:
1. run the attached reproducer program
2. downgrade to previous driver
3. re-run the attached reproducer and compare execution time and visible behavior

Actual results:

Note how long it takes to run the draw program:
RHEL 7.6 with the old xorg intel driver (xorg-x11-drv-intel-2.99.917-22.20151206.el7) takes 1m40s to run and the output is smooth.
[root@localhost ~]# time ./draw

real	1m40.234s
user	0m5.192s
sys	0m0.355s
[root@localhost ~]# rpm -qa |grep intel
xorg-x11-drv-intel-2.99.917-22.20151206.el7.x86_64

The same system running RHEL 7.6 with the new xorg intel driver (xorg-x11-drv-intel-2.99.917-28.20180530.el7) took 12m3s to run with very slow and jerky output.
[root@localhost ~]# time ./draw

real	12m3.152s
user	0m2.983s
sys	0m0.286s
[root@localhost ~]# rpm -qa |grep intel
xorg-x11-drv-intel-2.99.917-28.20180530.el7.x86_64

There is no difference between the two except for the xorg-x11-drv-intel package version.

Expected results:


Additional info:
Reproduceable on 7.7 as well

00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07) (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Device [8086:7270]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at 98400000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at 80000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 6130 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [d0] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: i915
        Kernel modules: i915

00:02.1 Display controller [0380]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a43] (rev 07)
        Subsystem: Intel Corporation Device [8086:7270]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 0: Memory at 9b800000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [d0] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Comment 2 Joe Wright 2019-10-03 17:08:24 UTC
Created attachment 1622368 [details]
reproducer program

Comment 8 Adam Jackson 2020-02-17 16:38:23 UTC
Is this reproducible with the "modesetting" driver? If you uninstall xorg-x11-drv-intel
(which may also remove xorg-x11-drivers, which is harmless) and restart X you should
fall back to the generic modesetting driver.

The modesetting driver does have some limitations relative to the intel driver, so this
may not be an acceptable workaround, but it would help to isolate where the guilty code
change is to compare the two.

Comment 9 Benjamin Winter 2020-02-18 02:14:31 UTC
It is not reproducible if I use "nomodeset" on the grub line.  However, the limitations we get with that make it an unacceptable workaround.  I have not tried uninstalling the driver as you suggest.  Will try that next.

Comment 10 Adam Jackson 2020-03-12 19:15:15 UTC
(In reply to Benjamin Winter from comment #9)
> It is not reproducible if I use "nomodeset" on the grub line.  However, the
> limitations we get with that make it an unacceptable workaround.  I have not
> tried uninstalling the driver as you suggest.  Will try that next.

Ping, any update on this?

Comment 12 Benjamin Winter 2020-03-23 21:34:24 UTC
It appears to be reproducible when I uninstall the xorg-x11-drv-intel driver.  Similar long execution time.

Another data point - I used an HP Z420 system with an NVidia Quadro K600 graphics card using the xorg-x11-drv-nouveau driver.
RHEL 7.6 - 6m22.365s
nomodeset - 1m40.183s
yum erase xorg-x11-drv-nouveau - 6m15.549s
nomodeset - 1m40.133s

Comment 14 Adam Jackson 2020-10-07 14:48:42 UTC
So... I have questions.

The two sosreports attached to this bug are actually comparing two entirely different drivers, not just two builds of the intel driver. In both cases Xorg.0.log says that the server version is 1.17.2-22.el7, which corresponds to 7.3. The "new" configuration fails to load the intel driver at all:

[    14.689] (II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so
[    14.696] (II) Module intel: vendor="X.Org Foundation"
[    14.696] 	compiled for 1.20.0, module version = 2.99.917
[    14.697] 	Module class: X.Org Video Driver
[    14.697] 	ABI class: X.Org Video Driver, version 24.0
[    14.697] (EE) module ABI major version (24) doesn't match the server's version (19)
[    14.697] (II) UnloadModule: "intel"
[    14.697] (II) Unloading intel
[    14.697] (EE) Failed to load module "intel" (module requirement mismatch, 0)

And from that point it proceeds with the generic "modesetting" driver. Now, I don't quite understand how you got to that point, because the driver and server packages have rpm-level version interlocks that should have prevented you from installing a driver that's incompatible with the installed server. I suspect what happened was:

- Test with 7.6, notice performance was bad
- Backrev to 7.3, find performance acceptable
- Copy the intel_drv.so from the 7.6 package into an otherwise 7.3 system
- 7.3's xserver refuses to load a 7.6 driver, falls back to modesetting

Now, in 7.4, we changed the default driver for _most_ Intel graphics chips to be modesetting instead of intel, as noted in the 7.4 release notes:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.4_release_notes/new_features_desktop

We did this for a number of reasons but primarily because Intel have largely failed to make releases for their driver, which means we've been on our own for testing and validation of arbitrary git snapshots. (You can see this reflected in the xorg-x11-drv-intel package version number. 2.99.917 was the last nominal release, in December 2014, and everything has been random snapshots since.) Since the modesetting driver uses roughly identical OpenGL-based acceleration paths for _all_ supported GPUs - be they Intel or AMD or NVIDIA or whatever - it made sense to have just one set of bugs. The problem then is that the intel driver is finely tuned for Intel's hardware, so performance suffers. Suffering by a factor of 7x like you're seeing here is admittedly quite a lot worse than expected.

The good news is that this change is only a default setting, it can be overridden by explicitly asking for the intel driver in /etc/X11/xorg.conf. The following snippet should be all you need to revert to using the intel driver on 7.4 and later:

Section "Device"
    Identifier "Integrated GPU"
    Driver "intel"
EndSection

Please give that a try and see if it restores performance to an acceptable level.

Comment 15 Benjamin Winter 2020-10-07 18:53:56 UTC
Yes, this works and gives me the proper performance again.  Thanks!

This is what I did:
1) Install a fresh copy of RHEL 7.6
2) Install fltk since the reproducer program uses libfltk
3) Run the reproducer program - it took 12 minutes to run
4) Create an /etc/X11/xorg.conf.d/test.conf file with the device section specifying the intel driver. Reboot
5) Run the reproducer program - it took 1 min 40 secs to run

So we clearly need to continue using the intel driver for performance.  The slowness of the modesetting driver is unacceptable.  What is the support path for the intel driver in the future?
How do I make a similar change for RHEL 8 since RHEL 8 uses Wayland instead of Xorg?

Comment 16 Adam Jackson 2020-10-14 19:49:05 UTC
> So we clearly need to continue using the intel driver for performance.
> The slowness of the modesetting driver is unacceptable.  What is the
> support path for the intel driver in the future?

The intel driver is supported, but not the default, for both RHEL 7 and 8. I suspect that part of the performance problem here is that Intel GPUs underwent a major architecture shift between the Cantiga you're using here, where the memory controller and GPU live on the northbridge, and Sandybridge, where they live on the same die as the CPU. OpenGL performance got _way_ better between those two points as a result, and the acceleration path used in both modesetting and the Xwayland server is OpenGL-based.

> How do I make a similar change for RHEL 8 since RHEL 8 uses Wayland instead of Xorg?

The wayland session is the default in RHEL8, but the Xorg session is still available. So the change would be to use the same xorg.conf file as in RHEL7, and also install gnome-session-xsession and select it from the gdm login screen (little gear icon either next to the login button or in the bottom right corner, I think, and I'm sure there's a gsettings(1) invocation you can make to apply that choice automatically at install time.).

Now, Cantiga isn't technically supported in RHEL8 (the minimum was Haswell IIRC), so the question is somewhat academic: any machine you'd be running RHEL8 would post-date the above architecture changes. But it's entirely likely that you'd encounter similar but maybe lesser performance problems even on a supported GPU.

I'll try to get the reproducer program running on a newer machine using the GL-based acceleration code to see which path through the X server it's taking, there's probably some optimizations we can make to speed things up.

Comment 18 Chris Williams 2020-11-11 21:50:42 UTC
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7


Note You need to log in before you can comment on or make changes to this bug.