Bug 474255

Summary: Oops on NULL dereference in i915_kernel_lost_context and X hard lockup
Product: [Fedora] Fedora Reporter: Eric Rannaud <eric.rannaud>
Component: kernelAssignee: Dave Airlie <airlied>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 10CC: eric.rannaud, fdc, james, kernel-maint, me, nhruby, sullivanshea
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-18 07:06:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg-eric
none
/var/log/Xorg.0.log.old-Eric (a session that crashed)
none
/var/log/Xorg.0.log-eric DRI=False none

Description Eric Rannaud 2008-12-02 23:38:20 UTC
Description of problem:

Oops on NULL dereference in i915_kernel_lost_context, resulting in graphic lockup. Killing -9 X through SSH is ineffective: the display remains frozen after Xorg terminates. Reboot was necessary.


Version-Release number of selected component (if applicable):

Linux nc050 2.6.27.5-117.fc10.x86_64 #1 SMP Tue Nov 18 11:58:53 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

X.Org X Server 1.5.3
Release Date: 5 November 2008
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.18-92.1.10.el5 x86_64
Current Operating System: Linux nc050 2.6.27.5-117.fc10.x86_64 #1 SMP Tue Nov 18 11:58:53 EST 2008 x86_64
Build Date: 16 November 2008  08:28:40PM
Build ID: xorg-x11-server 1.5.3-5.fc10


How reproducible / Steps to Reproduce:
Unclear, doing nothing special, just moving a gvim window in Gnome.


Actual results: n/a
Expected results: n/a
Additional info:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffffa0369101>] i915_kernel_lost_context+0x19/0x77 [i915]
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: rfcomm btusb oprofile vfat fat usb_storage fuse i915 drm bridge stp bnep sco l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath uinput snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq arc4 ecb snd_pcm_oss snd_usb_audio iwl3945 snd_mixer_oss snd_usb_lib snd_pcm b44 snd_rawmidi uvcvideo snd_timer ssb rfkill snd_seq_device snd_page_alloc compat_ioctl32 snd_hwdep iTCO_wdt i2c_i801 dcdbas videodev mac80211 mii snd iTCO_vendor_support i2c_core v4l1_compat joydev video output soundcore wmi cfg80211 battery ac ata_generic pata_acpi sha256_generic cbc aes_x86_64 aes_generic dm_crypt crypto_blkcipher [last unloaded: pcspkr]
Pid: 5676, comm: Xorg Not tainted 2.6.27.5-117.fc10.x86_64 #1
RIP: 0010:[<ffffffffa0369101>]  [<ffffffffa0369101>] i915_kernel_lost_context+0x19/0x77 [i915]
RSP: 0018:ffff8800b3c03a58  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800b62d7000 RCX: ffff8800b62d7000
RDX: 0000000000000000 RSI: ffff8800b3c03a58 RDI: ffff8800b62d6000
RBP: ffff8800b3c03a58 R08: 0000000000000000 R09: ffffffff8156be90
R10: ffffffffa0369788 R11: ffff8800b3c03b18 R12: ffff8800b62d7000
R13: ffff880092495540 R14: ffff8800bb88dfc0 R15: ffff8800b62d6000
FS:  00007fc2f1957780(0000) GS:ffffffff8155d100(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process Xorg (pid: 5676, threadinfo ffff8800b3c02000, task ffff8800af05c530)
Stack:  ffff8800b3c03ab8 ffffffffa036f642 ffff8800b3c03aa8 0000000000000202
 ffff8800b62d6010 ffff8800a9c8aa10 ffff8800b3c03ad8 ffff8800b62d7000
 ffff8800b62d6000 ffff880092495540 ffff8800bb88dfc0 ffff8800b62d6128
Call Trace:
 [<ffffffffa036f642>] i915_gem_idle+0x79/0x314 [i915]
 [<ffffffffa036f8e6>] i915_gem_lastclose+0x9/0x26 [i915]
 [<ffffffffa03697be>] i915_driver_lastclose+0x1b/0x3e [i915]
 [<ffffffffa0339ed4>] drm_lastclose+0x48/0x2a7 [drm]
 [<ffffffffa033a786>] drm_release+0x454/0x475 [drm]
 [<ffffffff810c0d7f>] __fput+0xca/0x16d
 [<ffffffff810c0e37>] fput+0x15/0x17
 [<ffffffff810be2e1>] filp_close+0x67/0x72
 [<ffffffff8104328c>] put_files_struct+0x74/0xc8
 [<ffffffff81043328>] exit_files+0x48/0x51
 [<ffffffff81044c89>] do_exit+0x26a/0x8a0
 [<ffffffff81045341>] do_group_exit+0x82/0xaf
 [<ffffffff8104e70e>] get_signal_to_deliver+0x2b0/0x2dc
 [<ffffffff81010349>] ? sysret_signal+0x42/0x71
 [<ffffffff8100f42f>] do_notify_resume+0x90/0x93f
 [<ffffffff8103a5e4>] ? wake_up_process+0x10/0x12
 [<ffffffff81330dd0>] ? __mutex_unlock_slowpath+0x3a/0x40
 [<ffffffff81330b81>] ? mutex_unlock+0xe/0x10
 [<ffffffffa036e187>] ? i915_gem_throttle_ioctl+0x40/0x4a [i915]
 [<ffffffffa0339d1d>] ? drm_ioctl+0x1d6/0x25e [drm]
 [<ffffffff813321dc>] ? unlock_kernel+0x2f/0x32
 [<ffffffff810cb7a2>] ? vfs_ioctl+0x66/0x78
 [<ffffffff810cba01>] ? do_vfs_ioctl+0x24d/0x26a
 [<ffffffff810c71b3>] ? path_put+0x1d/0x21
 [<ffffffff81010349>] ? sysret_signal+0x42/0x71
 [<ffffffff81010707>] ptregscall_common+0x67/0xb0


Code: be 03 00 00 00 e8 ca e3 e0 e0 31 c0 5b 41 5c c9 c3 90 48 8b 87 08 04 00 00 55 48 8b 8f c8 03 00 00 48 89 e5 48 8b 80 68 02 00 00 <48> 8b b0 a0 00 00 00 48 8b 41 08 48 05 34 20 00 00 8b 00 25 fc
RIP  [<ffffffffa0369101>] i915_kernel_lost_context+0x19/0x77 [i915]
 RSP <ffff8800b3c03a58>
CR2: 00000000000000a0
---[ end trace ed93bb6e606c2930 ]---
Fixing recursive fault but reboot is needed!

Comment 1 Chris Fleming 2009-01-14 18:45:39 UTC
I've just seen pretty much the exact same thing happen:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffffa0403101>] i915_kernel_lost_context+0x19/0x77 [i915]
PGD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: lp fuse i915 drm i2c_core rfcomm sco bridge stp bnep l2cap vmnet vmblock vmci vmmon sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath sha256_generic aes_x86_64 aes_generic cbc dm_crypt uinput pata_pcmcia btusb bluetooth arc4 ecb ppdev crypto_blkcipher snd_hda_intel snd_seq_dummy snd_seq_oss iwlagn snd_seq_midi_event snd_seq video iwlcore snd_seq_device output snd_pcm_oss snd_mixer_oss sdhci_pci sdhci ricoh_mmc snd_pcm yenta_socket rfkill joydev pcspkr snd_timer snd_page_alloc snd_hwdep snd rsrc_nonstatic firewire_ohci serio_raw mac80211 mmc_core e1000e tpm_infineon firewire_core cfg80211 wmi tpm tpm_bios soundcore crc_itu_t battery parport_pc parport ac [last unloaded: microcode]
Pid: 7945, comm: scalc.bin Not tainted 2.6.27.9-159.fc10.x86_64 #1
RIP: 0010:[<ffffffffa0403101>]  [<ffffffffa0403101>] i915_kernel_lost_context+0x19/0x77 [i915]
RSP: 0018:ffff8800058b9a68  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88012f164000 RCX: ffff88012f164000
RDX: 0000000000000000 RSI: ffff8800058b9a68 RDI: ffff880131e27000
RBP: ffff8800058b9a68 R08: 0000000000000000 R09: ffffffff8156ce98
R10: ffffffff810d003c R11: ffff8800058b9b18 R12: ffff880131e27000
R13: ffff8800bb4eea80 R14: ffff88013b1fc3e8 R15: ffff880131e27000
FS:  00007f7c94d06950(0000) GS:ffff88013fc04980(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scalc.bin (pid: 7945, threadinfo ffff8800058b8000, task ffff880103c54530)
Stack:  ffff8800058b9ab8 ffffffffa04096de ffff880131e27000 ffff880129c63dd0
 ffff880131e27020 ffff88012f164000 ffff880131e27000 ffff8800bb4eea80
 ffff88013b1fc3e8 ffff880131e27138 ffff8800058b9ac8 ffffffffa04099b8
Call Trace:
 [<ffffffffa04096de>] i915_gem_idle+0x7c/0x34d [i915]
 [<ffffffffa04099b8>] i915_gem_lastclose+0x9/0x26 [i915]
 [<ffffffffa04037be>] i915_driver_lastclose+0x1b/0x3e [i915]
 [<ffffffffa03d3e97>] drm_lastclose+0x48/0x2a7 [drm]
 [<ffffffffa03d4766>] drm_release+0x454/0x475 [drm]
 [<ffffffff810c1047>] __fput+0xca/0x16d
 [<ffffffff810c10ff>] fput+0x15/0x17
 [<ffffffff810be5a9>] filp_close+0x67/0x72
 [<ffffffff810432dc>] put_files_struct+0x74/0xc8
 [<ffffffff81043378>] exit_files+0x48/0x51
 [<ffffffff81044cd9>] do_exit+0x26a/0x8a0
 [<ffffffff81332aba>] ? _spin_lock+0x9/0xc
 [<ffffffff81045391>] do_group_exit+0x82/0xaf
 [<ffffffff8104e75e>] get_signal_to_deliver+0x2b0/0x2dc
 [<ffffffff8100f42f>] do_notify_resume+0x90/0x93f
 [<ffffffff81031033>] ? need_resched+0x1e/0x28
 [<ffffffff81331302>] ? _cond_resched+0x9/0x38
 [<ffffffff8133188e>] ? mutex_lock+0x22/0x33
 [<ffffffff810eba1b>] ? inotify_dev_queue_event+0x118/0x129
 [<ffffffff810eaab6>] ? inotify_inode_queue_event+0xca/0xd9
 [<ffffffff810eb04b>] ? inotify_dentry_parent_queue_event+0x79/0x92
 [<ffffffff810bfc2a>] ? fsnotify_modify+0x62/0x6a
 [<ffffffff810c749b>] ? path_put+0x1d/0x21
 [<ffffffff81010b13>] retint_signal+0x65/0xc2


Code: be 03 00 00 00 e8 ba 4b d7 e0 31 c0 5b 41 5c c9 c3 90 48 8b 87 18 04 00 00 55 48 8b 8f d8 03 00 00 48 89 e5 48 8b 80 68 02 00 00 <48> 8b b0 a0 00 00 00 48 8b 41 08 48 05 34 20 00 00 8b 00 25 fc
RIP  [<ffffffffa0403101>] i915_kernel_lost_context+0x19/0x77 [i915]
 RSP <ffff8800058b9a68>
CR2: 00000000000000a0
---[ end trace 3d7b0ec2372cb639 ]---
Fixing recursive fault but reboot is needed!

Comment 2 François Cami 2009-02-19 01:07:46 UTC
Eric or Chris,

Could you add full dmesg and /var/log/Xorg.0.log as uncompressed plain text attachments to this bug ?

Thanks

Comment 3 Eric Rannaud 2009-03-02 01:21:30 UTC
Created attachment 333671 [details]
dmesg-eric

Comment 4 Eric Rannaud 2009-03-02 01:25:13 UTC
Created attachment 333672 [details]
/var/log/Xorg.0.log.old-Eric (a session that crashed)

Comment 5 Eric Rannaud 2009-03-02 01:44:27 UTC
Similar crash was witnessed again with the following versions (no stacktrace this time, but the same symptoms and apparent causes):

Linux nc050 2.6.27.12-170.2.5.fc10.x86_64 #1 SMP Wed Jan 21 01:33:24 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

X.Org X Server 1.5.3
Release Date: 5 November 2008
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.18-92.1.18.el5 x86_64 
Current Operating System: Linux nc050 2.6.27.12-170.2.5.fc10.x86_64 #1 SMP Wed J
an 21 01:33:24 EST 2009 x86_64
Build Date: 11 December 2008  05:27:30PM
Build ID: xorg-x11-server 1.5.3-6.fc10 


(I see this crash every week or two)

Comment 6 François Cami 2009-03-02 07:49:32 UTC
Eric,

Thank you for the information.
Could you add the output of :
rpm -q xorg-x11-server-Xorg xorg-x11-drv-intel
as well, and try to run with a xorg.conf containing
Option "DRI" "off"

The following lines are of interest :
(WW) intel(0): ESR is 0x00000001, instruction error
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): Existing errors found in hardware state.

---
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 7 Eric Rannaud 2009-03-02 08:22:57 UTC
$ rpm -q xorg-x11-server-Xorg xorg-x11-drv-intel
xorg-x11-server-Xorg-1.5.3-13.fc10.x86_64
package xorg-x11-drv-intel is not installed

Will try Option "DRI" "off", thanks.

Comment 8 François Cami 2009-03-02 17:10:39 UTC
Apparently it's xorg-x11-drv-i810 in F10, sorry...
could you post 
rpm -q xorg-x11-drv-i810
please

Switching to assigned.

---
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 9 Eric Rannaud 2009-03-02 17:48:12 UTC
$ rpm -q xorg-x11-drv-i810
xorg-x11-drv-i810-2.5.0-4.fc10.x86_64

Comment 10 Eric Rannaud 2009-03-03 20:28:17 UTC
With the DRI set to "False", X crashes on start-up. (/var/log/Xorg.0.log attached)

When GDM comes up, the screen freezes (except for the mouse) before the entire login screen is rendered (the background is there but not the login dialog is not drawn completely). The keyboard is unresponsive. I could log in through SSH and reboot; killing Xorg didn't seem effective.

Xorg.0.log contains a bunch of extra warnings of the form:

(WW) intel(0): Register 0x70024 (PIPEASTAT) changed from 0x80000203 to 0x00000000
(WW) intel(0): PIPEASTAT before: status: FIFO_UNDERRUN VSYNC_INT_STATUS VBLANK_INT_STATUS OREG_UPDATE_STATUS
(WW) intel(0): PIPEASTAT after: status:


Thanks.

Comment 11 Eric Rannaud 2009-03-03 20:29:19 UTC
Created attachment 333923 [details]
/var/log/Xorg.0.log-eric DRI=False

Comment 12 Bug Zapper 2009-11-18 09:37:24 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Bug Zapper 2009-12-18 07:06:37 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.