Bug 857979

Summary: [ivb gt1] dell optiplex 7010, GPU hung
Product: [Fedora] Fedora Reporter: Rex Dieter <rdieter>
Component: mesaAssignee: Adam Jackson <ajax>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: airlied, ajax, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-09-26 09:14:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rex Dieter 2012-09-17 16:41:42 UTC
Rats, got one dell optiplex 7010 desktop test model, worked great, got a small batch of small form-factor 7010's, and they bomb out on any attempt to enable compositing. :( Tested both fedora 17 and f18-alpha (live) so far.

Once the bombing starts, the screen gets corrupted a bit, blinks on/off a few times, then stays off, and the box is hung tight.

Any hints or suggestions?  (I'm very close to sending these back for the known-working-better desktop edition)

worth submitting this upstream anywhere?  freedesktop.org?

rpm -q kernel libdrm mesa-dri-drivers
kernel-3.3.4-5.fc17.x86_64
kernel-3.5.3-1.fc17.x86_64
libdrm-2.4.37-1.fc17.x86_64
mesa-dri-drivers-8.0.3-3.fc17.x86_64

Smolt profile:
http://www.smolts.org/client/show/pub_7028b1a3-d6f2-4d6a-b63f-5ff2eabb39ae

Here's /var/log/messages after it failed most recently,
(I can install more -debuginfo and repeat if it would be helpful)

Sep 17 11:15:26 math-066 kernel: [ 2410.772881] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Sep 17 11:15:26 math-066 kernel: [ 2410.772885] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Sep 17 11:15:26 math-066 kernel: [ 2410.775724] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
Sep 17 11:15:32 math-066 kernel: [ 2417.661264] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Sep 17 11:15:32 math-066 kernel: [ 2417.661426] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
Sep 17 11:15:39 math-066 kernel: [ 2423.723463] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Sep 17 11:15:39 math-066 kernel: [ 2423.723646] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
Sep 17 11:15:45 math-066 kernel: [ 2429.877483] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Sep 17 11:15:45 math-066 kernel: [ 2429.878132] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
Sep 17 11:17:34 math-066 kernel: [ 2539.093387] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Sep 17 11:17:34 math-066 kernel: [ 2539.095648] [drm] Enabling RC6 states: RC6 on, RC6p on, RC6pp off
Sep 17 11:17:39 math-066 kernel: [ 2539.177650] ------------[ cut here ]------------
Sep 17 11:17:39 math-066 kernel: [ 2539.177669] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3084!
Sep 17 11:17:39 math-066 kernel: [ 2539.177685] invalid opcode: 0000 [#1] SMP 
Sep 17 11:17:39 math-066 kernel: [ 2539.177699] CPU 1 
Sep 17 11:17:39 math-066 kernel: [ 2539.177706] Modules linked in: nfs nfs_acl auth_rpcgss fscache bnep bluetooth rfkill fuse lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf
_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq lpc_ich mfd_core snd_seq_device mei serio_raw snd_pcm snd_page_alloc snd_
timer snd coretemp i2c_i801 dcdbas e1000e soundcore kvm_intel kvm microcode crc32c_intel ghash_clmulni_intel i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
Sep 17 11:17:39 math-066 kernel: [ 2539.177890] 
Sep 17 11:17:39 math-066 kernel: [ 2539.177893] Pid: 4777, comm: X Not tainted 3.5.3-1.fc17.x86_64 #1 Dell Inc. OptiPlex 7010/0GXM1W
Sep 17 11:17:39 math-066 kernel: [ 2539.177918] RIP: 0010:[<ffffffffa00843df>]  [<ffffffffa00843df>] i915_gem_object_unpin+0x4f/0x60 [i915]
Sep 17 11:17:39 math-066 kernel: [ 2539.177950] RSP: 0018:ffff88020dd39b38  EFLAGS: 00010246
Sep 17 11:17:39 math-066 kernel: [ 2539.177964] RAX: ffff88020e7adc00 RBX: ffff88020e5d1800 RCX: ffffffffa00d174a
Sep 17 11:17:39 math-066 kernel: [ 2539.177982] RDX: 0000000003020402 RSI: 0000000000070008 RDI: ffff880191c23600
Sep 17 11:17:39 math-066 kernel: [ 2539.177999] RBP: ffff88020dd39b38 R08: ffffffffa00c68c0 R09: 0000000000000258
Sep 17 11:17:39 math-066 kernel: [ 2539.178016] R10: ffffffffa00c6820 R11: 0000000000000000 R12: ffff88020e5d2020
Sep 17 11:17:39 math-066 kernel: [ 2539.178033] R13: ffff88020e7ac000 R14: ffff88020e5d2000 R15: ffff88020dd39c88
Sep 17 11:17:39 math-066 kernel: [ 2539.178051] FS:  00007fa59799c8c0(0000) GS:ffff88021e280000(0000) knlGS:0000000000000000
Sep 17 11:17:39 math-066 kernel: [ 2539.178070] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 17 11:17:39 math-066 kernel: [ 2539.178084] CR2: 00007fb83e25ef30 CR3: 000000014feac000 CR4: 00000000001407e0
Sep 17 11:17:39 math-066 kernel: [ 2539.178102] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 17 11:17:39 math-066 kernel: [ 2539.178119] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 17 11:17:39 math-066 kernel: [ 2539.178137] Process X (pid: 4777, threadinfo ffff88020dd38000, task ffff8801e9252e20)
Sep 17 11:17:39 math-066 kernel: [ 2539.178155] Stack:
Sep 17 11:17:39 math-066 kernel: [ 2539.178162]  ffff88020dd39b48 ffffffffa0094dbf ffff88020dd39b78 ffffffffa00981bc
Sep 17 11:17:39 math-066 kernel: [ 2539.178184]  ffff88020e5d2438 ffff88020e5d1800 ffffffffa00d74c0 ffff88020e5d2468
Sep 17 11:17:39 math-066 kernel: [ 2539.178206]  ffff88020dd39ba8 ffffffffa0066795 0000000000000cf6 ffff88020e5d1800
Sep 17 11:17:39 math-066 kernel: [ 2539.178229] Call Trace:
Sep 17 11:17:39 math-066 kernel: [ 2539.178244]  [<ffffffffa0094dbf>] intel_unpin_fb_obj+0x3f/0x50 [i915]
Sep 17 11:17:39 math-066 kernel: [ 2539.178265]  [<ffffffffa00981bc>] intel_crtc_disable+0x8c/0xb0 [i915]
Sep 17 11:17:39 math-066 kernel: [ 2539.178283]  [<ffffffffa0066795>] drm_helper_disable_unused_functions+0x115/0x170 [drm_kms_helper]
Sep 17 11:17:39 math-066 kernel: [ 2539.178307]  [<ffffffffa00680ba>] drm_crtc_helper_set_config+0x96a/0xb30 [drm_kms_helper]
Sep 17 11:17:39 math-066 kernel: [ 2539.178330]  [<ffffffff81126556>] ? __generic_file_aio_write+0x236/0x440
Sep 17 11:17:39 math-066 kernel: [ 2539.178349]  [<ffffffff8106943f>] ? mod_timer+0x4f/0x2a0
Sep 17 11:17:39 math-066 kernel: [ 2539.178368]  [<ffffffffa0021fa6>] drm_framebuffer_cleanup+0xd6/0x160 [drm]
Sep 17 11:17:39 math-066 kernel: [ 2539.178390]  [<ffffffffa0091c61>] intel_user_framebuffer_destroy+0x21/0x80 [i915]
Sep 17 11:17:39 math-066 kernel: [ 2539.178413]  [<ffffffffa0025a1d>] drm_mode_rmfb+0xed/0xf0 [drm]
Sep 17 11:17:39 math-066 kernel: [ 2539.178432]  [<ffffffffa00154f3>] drm_ioctl+0x4d3/0x580 [drm]
Sep 17 11:17:39 math-066 kernel: [ 2539.178451]  [<ffffffffa0025930>] ? drm_mode_addfb2+0x6b0/0x6b0 [drm]
Sep 17 11:17:39 math-066 kernel: [ 2539.178470]  [<ffffffff811878e6>] ? do_sync_write+0xe6/0x120
Sep 17 11:17:39 math-066 kernel: [ 2539.178486]  [<ffffffff8146dc0c>] ? input_event_to_user+0x5c/0x90
Sep 17 11:17:39 math-066 kernel: [ 2539.178503]  [<ffffffff811c514b>] ? fsnotify+0x24b/0x340
Sep 17 11:17:39 math-066 kernel: [ 2539.178517]  [<ffffffff811996c9>] do_vfs_ioctl+0x99/0x580
Sep 17 11:17:39 math-066 kernel: [ 2539.178532]  [<ffffffff8127948a>] ? inode_has_perm.isra.31.constprop.61+0x2a/0x30
Sep 17 11:17:39 math-066 kernel: [ 2539.178551]  [<ffffffff8127aa67>] ? file_has_perm+0x97/0xb0
Sep 17 11:17:39 math-066 kernel: [ 2539.178565]  [<ffffffff81199c49>] sys_ioctl+0x99/0xa0
Sep 17 11:17:39 math-066 kernel: [ 2539.178580]  [<ffffffff81614ae9>] system_call_fastpath+0x16/0x1b
Sep 17 11:17:39 math-066 kernel: [ 2539.178595] Code: e2 ff 1f fe ff c1 e8 0d 83 c0 0f 83 e0 0f 89 c1 83 e1 0f c1 e1 0d 09 ca 84 c0 89 97 f0 00 00 00 75 07 80 a7 f2 00 00 00 f7 5d c3 <0f> 0b 0f 0b 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 
Sep 17 11:17:39 math-066 kernel: [ 2539.178728] RIP  [<ffffffffa00843df>] i915_gem_object_unpin+0x4f/0x60 [i915]
Sep 17 11:17:39 math-066 kernel: [ 2539.178751]  RSP <ffff88020dd39b38>

Comment 1 Rex Dieter 2012-09-18 23:21:19 UTC
According to the upstream report I filed, 
mesa-8.0.4 + xf86-video-intel-2.20.3 is apparently the magic combination that can help fix this.


$ koji latest-pkg f17-updates-testing mesa xorg-x11-drv-intel
Build                                     Tag                   Built by
----------------------------------------  --------------------  ----------------
mesa-8.0.3-3.fc17                         f17-updates           ajax
xorg-x11-drv-intel-2.20.7-1.fc17          f17-updates-testing   ajax

so looks like we're missing the mesa fix => triaging there for now.

I'll see about rebasing to 8.0.4 and confirm it actually helps before further nagging from me happens.

Comment 2 Dave Airlie 2012-09-19 03:06:00 UTC
I'm building an 8.0.4 in koji now

http://koji.fedoraproject.org/koji/taskinfo?taskID=4501554

Comment 3 Fedora Update System 2012-09-19 03:21:51 UTC
mesa-8.0.4-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/mesa-8.0.4-1.fc17

Comment 4 Fedora Update System 2012-09-22 00:09:48 UTC
Package mesa-8.0.4-1.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing mesa-8.0.4-1.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-14497/mesa-8.0.4-1.fc17
then log in and leave karma (feedback).

Comment 5 Fedora Update System 2012-09-26 09:14:09 UTC
mesa-8.0.4-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.