1254699 – Panic in i915 driver from 2.6.32-573.1.1.el6.x86_64 with nomodeset

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1254699 - Panic in i915 driver from 2.6.32-573.1.1.el6.x86_64 with nomodeset

Summary: Panic in i915 driver from 2.6.32-573.1.1.el6.x86_64 with nomodeset

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.7
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Rob Clark
QA Contact:	Desktop QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1255892 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-08-18 17:00 UTC by Kevin Stange
Modified:	2017-12-06 12:56 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1255892 (view as bug list)
Environment:
Last Closed:	2017-12-06 12:56:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Photo of panic message (2.47 MB, image/jpeg) 2015-08-18 17:00 UTC, Kevin Stange	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
CentOS	9209	0	None	None	None	Never

Description Kevin Stange 2015-08-18 17:00:59 UTC

Created attachment 1064379 [details]
Photo of panic message

Description of problem:

Since the CentOS 6.7 release kernel 2.6.32-573.1.1.el6.x86_64, we are seeing a panic on some systems with the following hardware:

00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
        Subsystem: Super Micro Computer Inc Device 0602
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at fe880000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at bc00 [size=8]
        Memory at e0000000 (32-bit, prefetchable) [size=256M]
        Memory at fe840000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Kernel modules: i915

This panic occurs in the i915 driver if the nomodeset option is specified.  Removing the option (or blacklisting the driver) allows the system to boot.  I haven't been able to get kdump configured on this hardware due to a strange storage configuration, but I've attached a photo of the panic message.

Problem is unchanged in 2.6.32-573.3.1.el6.x86_64.  Bug was already reported to CentOS tracker, it seems likely to be upstream based on the problem description.

More copies of the panic messages are found on the CentOS bug.

Version-Release number of selected component (if applicable):

2.6.32-573.1.1.el6.x86_64 and 2.6.32-573.3.1.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. Install kernel 2.6.32-573.(1|3).1.el6.x86_64
2. Reboot
3. Panic

Comment 2 Kevin Stange 2015-08-18 17:52:32 UTC

I didn't mean to put this under Desktop -> Graphics.  This is a server which is mostly run headless, so it doesn't belong to the X/OpenGL team.  Please reassign accordingly.

Comment 3 Kevin Stange 2015-08-21 19:46:58 UTC

I couldn't get this bug to assign to something else, so I just created a new copy as bug 1255892 as an "other" kernel bug.  Please close this one as a duplicate.

Comment 4 Rob Clark 2016-01-15 15:43:50 UTC

I think you may want i915.modeset=0 ?

Comment 5 Kevin Stange 2016-01-15 15:51:16 UTC

We are already aware of workarounds that avoid the problem.  Additional ones are not really helpful at this point.

This is a kernel panic with the default kernel options.  I feel this should not require adding custom boot parameters of any kind, especially since it's a regression from the previous 6.6 kernel where no panic occurred at all.

#1255892 and #1254699 are duplicates due to my misunderstanding of the categories.  Please close one as a duplicate so everything can be discussed in a single bug.  Right now there is conversation going on in both.

Comment 6 Kevin Stange 2016-01-15 15:54:31 UTC

Also, I'm copying this comment (3) from the duplicate bug:

> The CentOS bug was just updated with the following additional information, including a patch:
>
>> Investigated and came up with a patch:
>> It seems that 
>> drivers/gpu/drm/i915/intel_ringbuffer.c:intel_cleanup_ring_buffer()
>> doesn't check NULL pointer and thus panics when "nomodeset" kernel
>> option is set.
>> 
>> Patch attached:
>> https://bugs.centos.org/file_download.php?file_id=7744&type=bug [^]
>> 
>> At least this patch will make "nomodeset" boot properly. Xserver works.
>> I'm not sure whether the intel_cleanup_ring_buffer() itself should or shouldn't 
>> be called when "nomodeset" is in effect.
>
>Hopefully this helps with resolving the issue.

Comment 7 Rob Clark 2016-01-19 14:47:23 UTC

*** Bug 1255892 has been marked as a duplicate of this bug. ***

Comment 8 Rob Clark 2016-01-19 15:24:24 UTC

(In reply to Kevin Stange from comment #5)
> We are already aware of workarounds that avoid the problem.  Additional ones
> are not really helpful at this point.
> 
> This is a kernel panic with the default kernel options.  I feel this should
> not require adding custom boot parameters of any kind, especially since it's
> a regression from the previous 6.6 kernel where no panic occurred at all.

Technically, "nomodeset" is a custom boot parameter as well.

That said, "nomodeset" works fine for me, with the rhel67 kernel (2.6.32-573[1]).  And if nomodeset was specified (and not overridden by i915.modeset) the driver would bail out in it's module_init function and you'd never get into intel_cleanup_ringbuffer() in the first place (or even have a drm device file to close).  So unless your kernel is something different from what I expect, I have really no idea how you manage to get it to crash like that.

[1] note, I have 2.6.32-573.el6, not 2.6.32-573.1.1.el6.. not sure if there are some centos customizations which may be to blame?

> #1255892 and #1254699 are duplicates due to my misunderstanding of the
> categories.  Please close one as a duplicate so everything can be discussed
> in a single bug.  Right now there is conversation going on in both.

I've closed the dup.

I'm tempted to close this bug as well, but if you can provide the kernel cmdline which you are using, and any other customizations, I can try to reproduce again.

Also, why are you using "nomodeset" in the first place?  If there is a actual issue loading i915, then I can look into that.

Comment 9 Kevin Stange 2016-01-19 16:03:09 UTC

According to a CentOS developer, "nomodeset" was a common workaround to a problem that used to occur in older EL6 kernels so it's very commonly on kernel cmdlines.   It's apparently a workaround employed so long ago that I don't even remember that we started using it.  It's just always been there as far as I remember.

It only *crashes* with i915 hardware from what I can tell, and only since the 6.7 kernel releases.

CentOS makes no changes to the kernel code or build options.  The CentOS Plus kernels fill that role.  That means the crash exists in the RHEL kernel implicitly, when using this hardware configuration and nomodeset.

If nomodeset is non-default, then I accept removing that option as a valid solution to the problem, though I think a newly introduced kernel panic bug is worth fixing, given this is a regression in behavior.

Clearly when nomodeset is specified, the i915 driver still inits and finds its way to intel_cleanup_ringbuffer().  If that's not supposed to happen (and the documentation for the nomodeset option says it shouldn't) some kind of bug is a play.

Comment 10 Rob Clark 2016-01-19 16:55:34 UTC

(In reply to Kevin Stange from comment #9)
> According to a CentOS developer, "nomodeset" was a common workaround to a
> problem that used to occur in older EL6 kernels so it's very commonly on
> kernel cmdlines.   It's apparently a workaround employed so long ago that I
> don't even remember that we started using it.  It's just always been there
> as far as I remember.

well, fwiw, I'm not even sure if RHEL QA tests with 'nomodeset'.. if there are still issues which need workaround w/ 'nomodeset' then we would like to know..

> It only *crashes* with i915 hardware from what I can tell, and only since
> the 6.7 kernel releases.

I guess you mean "certain i915 hardware"?  I do have a theory (see below) that this is an issue effecting older i915 hardware.

> CentOS makes no changes to the kernel code or build options.  The CentOS
> Plus kernels fill that role.  That means the crash exists in the RHEL kernel
> implicitly, when using this hardware configuration and nomodeset.
> 
> If nomodeset is non-default, then I accept removing that option as a valid
> solution to the problem, though I think a newly introduced kernel panic bug
> is worth fixing, given this is a regression in behavior.
> 
> Clearly when nomodeset is specified, the i915 driver still inits and finds
> its way to intel_cleanup_ringbuffer().  If that's not supposed to happen
> (and the documentation for the nomodeset option says it shouldn't) some kind
> of bug is a play.

So I realized that rhel67 does still have UMS enabled, which is removed in later kernels upstream (so the code and config option will be removed in rhel68).  I was looking at upstream and rhel68 earlier, which do not call drm_pci_init() in nomodeset case (ie, vgacon_text_force() returns true).

Newer hw requires KMS, and so even for rhel67 kernel, probe would fail in nomodeset case, but at a later point in the driver probe.  However, for hw which does support user mode setting (ie. hw older than what I have tested with), the driver will still probe w/ 'nomodeset'.  I suspect that is how you end up in this state (and why I cannot reproduce).

The easy thing to do is disable CONFIG_DRM_I915_UMS in the kernel configuration.  Userspace has used KMS since (I think) rhel6.0, so there should be no need to keep UMS enabled.

Comment 11 Kevin Stange 2016-01-19 20:50:25 UTC

Yes, I think it's probably limited to only certain i915, but I only have one system board with i915 that I've encountered recently.  It's older, in that the hardware is probably from 2011 or so, something Supermicro still sells.

Do you happen to have a test kernel around that I could install on one of these machines from rhel68 to confirm if the removal of UMS resolves the crash?

Comment 12 Rob Clark 2016-01-19 21:32:39 UTC

(In reply to Kevin Stange from comment #11)
>
> Do you happen to have a test kernel around that I could install on one of
> these machines from rhel68 to confirm if the removal of UMS resolves the
> crash?

tbh, I'm not entirely sure how to share a kernel outside of RH.  But that said, it would be safer right now to stick w/ 6.7 kernel (which has actually been QA'd, etc) and simply change the configuration to disable CONFIG_DRM_I915_UMS.

Comment 13 Akemi Yagi 2016-01-20 07:43:04 UTC

A test kernel that has CONFIG_DRM_I915_UMS disabled is available from:

http://people.centos.org/toracat/kernel/6/distro/bug9209/

(Please note that the packages are not signed and are provided for testing purposes only.)

Comment 14 BugMasta 2016-02-01 07:43:52 UTC

We've just been hit by this bug. Luckily I was in the office when i updated to the new kernel, so i was able to power cycle the machine (no ILO) and fix the problem.

The problem is described here:
http://vega.sra-tohoku.co.jp/~kabe/vsd/c6-i586/ts.html

And the chap who describes the problem there has a kernel patch to fix the bug, here:

http://vega.sra-tohoku.co.jp/~kabe/vsd/c6-i586/repo/patch/patch-i915-nomodeset-panic.patch

I am pasting the patch in here, for posterity:

"On boot, i915.ko panics when "nomodeset" or "i915.modeset=-1" kernel option is set.
Workaround: add bogus "i915.crash=0" option to not load i915.ko .
Patch below: properly check the NULL pointer dereference.

diff -up linux-2.6.32-573.3.1.el6.emu686.v19.i586/drivers/gpu/drm/i915/intel_ringbuffer.c.v19+ linux-2.6.32-573.3.1.el6.emu686.v19.i586/drivers/gpu/drm/i915/intel_ringbuffer.c
--- linux-2.6.32-573.3.1.el6.emu686.v19.i586/drivers/gpu/drm/i915/intel_ringbuffer.c.v19+	2015-08-10 22:16:43.000000000 +0900
+++ linux-2.6.32-573.3.1.el6.emu686.v19.i586/drivers/gpu/drm/i915/intel_ringbuffer.c	2015-09-18 13:25:30.000000000 +0900
@@ -1821,12 +1821,18 @@ error:
 
 void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 {
-	struct drm_i915_private *dev_priv = to_i915(ring->dev);
-	struct intel_ringbuffer *ringbuf = ring->buffer;
+	struct drm_i915_private *dev_priv;
+	struct intel_ringbuffer *ringbuf;
 
+	/* do not dereference NULL! */
+	if (ring == NULL)
+		return;
 	if (!intel_ring_initialized(ring))
 		return;
 
+	dev_priv = to_i915(ring->dev);	/*ring->dev->private_*/
+	ringbuf = ring->buffer;
+
 	intel_stop_ring_buffer(ring);
 	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
"


Looks to me like this bug should be fixed properly, by cleaning up the NULL pointer dereference - ASAP. It should not be masked by just disabling CONFIG_DRM_I915_UMS as proposed above.

Comment 15 Rob Clark 2016-02-01 15:20:46 UTC

(In reply to BugMasta from comment #14)
> We've just been hit by this bug. Luckily I was in the office when i updated
> to the new kernel, so i was able to power cycle the machine (no ILO) and fix
> the problem.
> 
> The problem is described here:
> http://vega.sra-tohoku.co.jp/~kabe/vsd/c6-i586/ts.html
> 
> And the chap who describes the problem there has a kernel patch to fix the
> bug, here:
> 
> http://vega.sra-tohoku.co.jp/~kabe/vsd/c6-i586/repo/patch/patch-i915-
> nomodeset-panic.patch
> 
[snip]

I am still waiting for someone to confirm that the test kernel mentioned in #c13 solves the issue.

> Looks to me like this bug should be fixed properly, by cleaning up the NULL
> pointer dereference - ASAP. It should not be masked by just disabling
> CONFIG_DRM_I915_UMS as proposed above.

Yes, I have seen the patch that adds a NULL check.  Although I prefer the config change since that guarantees that the driver won't load and you won't hit other potential unexpected null-ptr codepaths.

Note that the UMS support has been completely removed now upstream (and before the option was removed, I suspect it was not well tested).  And RHEL6 never used UMS for i915.

Hence my preference for the config change.

Comment 16 Akemi Yagi 2016-02-01 16:32:23 UTC

The current CentOSPlus kernel set released by CentOS now has CONFIG_DRM_I915_UMS disabled ( kernel-plus-3.10.0-327.4.5.el7.centos.plus.x86_64 ). It is at:

http://mirror.centos.org/centos/7/centosplus/x86_64/Packages/

Comment 17 Akemi Yagi 2016-02-01 16:39:57 UTC

(In reply to Akemi Yagi from comment #16)
> The current CentOSPlus kernel set released by CentOS now has
> CONFIG_DRM_I915_UMS disabled (
> kernel-plus-3.10.0-327.4.5.el7.centos.plus.x86_64 ). It is at:
> 
> http://mirror.centos.org/centos/7/centosplus/x86_64/Packages/

Sorry, scratch that. It is planned for el6 (but has not happened).

By the way, the DRM_I915_UMS option is marked "deprecated" in Kconfig.

Comment 18 BugMasta 2016-02-01 23:43:43 UTC

(In reply to Rob Clark from comment #15)
> Yes, I have seen the patch that adds a NULL check.  Although I prefer the
> config change since that guarantees that the driver won't load and you won't
> hit other potential unexpected null-ptr codepaths.
> 
> Note that the UMS support has been completely removed now upstream (and
> before the option was removed, I suspect it was not well tested).  And RHEL6
> never used UMS for i915.
> 
> Hence my preference for the config change.

So we're going to leave a known bug in the kernel, hidden by a disabled config option - so anyone optimistic enough to enable that config option gets screwed by this bug for years to come. Cool.

Or maybe we can just debate the various options here for another 20 years, so as many people as possible have the pleasure of making their acquaintance with this delightful bug.

Great work. Carry on.

Comment 19 Rob Clark 2016-02-02 01:29:07 UTC

(In reply to BugMasta from comment #18)
> (In reply to Rob Clark from comment #15)
> > Yes, I have seen the patch that adds a NULL check.  Although I prefer the
> > config change since that guarantees that the driver won't load and you won't
> > hit other potential unexpected null-ptr codepaths.
> > 
> > Note that the UMS support has been completely removed now upstream (and
> > before the option was removed, I suspect it was not well tested).  And RHEL6
> > never used UMS for i915.
> > 
> > Hence my preference for the config change.
> 
> So we're going to leave a known bug in the kernel, hidden by a disabled
> config option - so anyone optimistic enough to enable that config option
> gets screwed by this bug for years to come. Cool.

Like I've said already, the config option and corresponding codepaths are already gone upstream.  There is nothing remaining to fix.

Comment 20 BugMasta 2016-02-02 01:41:13 UTC

Nothing left to fix eh.
So how did i hit this bug when i did a yum update yesterday?

Comment 21 Rob Clark 2016-02-02 01:49:27 UTC

(In reply to BugMasta from comment #20)
> Nothing left to fix eh.
> So how did i hit this bug when i did a yum update yesterday?

I think your answer is in #c17

Comment 22 Kevin Stange 2016-02-02 02:00:07 UTC

He's saying the code that causes the panic is removed from the EL 6.8 kernel tree for Red Hat and from all future versions of the kernel from kernel.org, so it will not be able to resurface later.

Thus the claim is, fixing this by releasing a patch on top of code that goes away in the next EL release would be a waste of time.  There's enough bureaucracy in releasing a kernel change in RHEL that it's probably not a good battle to fight for something that has an easy workaround.

The obvious workaround as we've determined is to stop using nomodeset or to disable the code for now at build time until it goes away for good.

You might check if you still need nomodeset (and/or what caused it to be added to your kernel cmdline).  As it turned out to not be relevant to my config and was a legacy workaround we didn't even need anymore, and I am now educated that it was a non-standard non-default option, I am satisfied with just having that information.

Obviously, if you can reproduce the crash with nomodeset removed, or with the UMS feature disabled, then you still have a bug that needs review, but it's likely a different one than this.

Comment 23 Rob Clark 2016-02-02 02:06:28 UTC

(In reply to BugMasta from comment #20)
> Nothing left to fix eh.
> So how did i hit this bug when i did a yum update yesterday?

btw, looking over my earlier reply, I realize I did not really explain the reasons for my preference for the config change vs the other patch.  The short version is that the UMS=y codepaths have really not been tested at all... the option was deprecated upstream for the version of drivers/gpu/drm in rhel67, and was later removed completely (which will be in rhel68 kernel, fwiw).  I'm not really sure fixing a NULL ptr issue in this one particular case will be sufficient.  I really prefer if people using a rhel67 based kernel disable UMS support completely to avoid other issues cropping up later.  Since we removed i915 UMS support from rhel6 userspace before rhel 6.0 (afaict), I think it is safest to have UMS support in the kernel disabled.

Comment 24 BugMasta 2016-02-02 02:13:02 UTC

Ah ok that sounds good. Cheers.

I was able to remove the nomodeset option and boot OK in this instance. I didn't install the OS on this machine, so I'm not sure whether nomodeset was added manually to the kernel commandline to fix an old kernel that wouldnt boot without it, or, if it was added by anaconda during install as a generic precaution to avoid problems on this chipset. It's not needed anymore with newer kernels on this machine though, obviously.

But yeah, nomodeset is a very, very common boot option, so yeah, it's good this dirty code is out of the rhel kernel, and hopefully 6.8 with fixed kernel will be released before it bites other people.

Comment 25 Rob Clark 2016-02-02 02:32:54 UTC

In the mean time, I'd strongly recommend disabling the kernel option in centos kernel.  That is what we *should* have done in rhel67 kernel, in retrospect.

Comment 26 Akemi Yagi 2016-02-02 20:52:30 UTC

(In reply to Rob Clark from comment #25)
> In the mean time, I'd strongly recommend disabling the kernel option in
> centos kernel.  That is what we *should* have done in rhel67 kernel, in
> retrospect.

CentOS distro kernel cannot be patched because it is supposed to be a feature-for-feature and bug-for-bug rebuild of the RHEL kernel.

Instead, CentOS offers a custom kernel, centosplus kernel, that can accommodate patches and modified features. This is what is quoted in comment #17. The upcoming release (that is, the next kernel update) will have the option disabled.

Comment 27 Akemi Yagi 2016-02-21 04:17:27 UTC

As promised, the latest centosplus kernel now has CONFIG_DRM_I915_UMS disabled.

kernel-2.6.32-573.18.1.el6.centos.plus

Please give it a shot if you have affected hardware.

Comment 28 Akemi Yagi 2016-11-24 19:37:57 UTC

In RHEL 6.8, the DRM_I915_UMS option does not exist in the config. So, I presume the issue has been "resolved" for 6.8 users. Can someone confirm this?

Comment 29 Rob Clark 2016-11-28 15:29:47 UTC

(In reply to Akemi Yagi from comment #28)
> In RHEL 6.8, the DRM_I915_UMS option does not exist in the config. So, I
> presume the issue has been "resolved" for 6.8 users. Can someone confirm
> this?

yes, it should be

Comment 30 Jan Kurik 2017-12-06 12:56:26 UTC

Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.