877461 – Xorg/gdm/gnome-shell broken

Bug 877461 - Xorg/gdm/gnome-shell broken

Summary: Xorg/gdm/gnome-shell broken

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	xorg-x11-drv-intel
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Adam Jackson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-11-16 15:34 UTC by Tom London
Modified:	2015-02-17 14:34 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-02-17 14:34:20 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
output of 'dmesg' after graphics froze.... (79.51 KB, text/plain) 2012-11-20 15:29 UTC, Tom London	no flags	Details
Output of 'dmesg' after freeze/X/gdm crash (80.88 KB, text/plain) 2012-11-22 16:36 UTC, Tom London	no flags	Details
Contents of /sys/kernel/debug/dri/0/i915_error_state after freeze/crash. (1.40 MB, text/plain) 2012-11-22 16:38 UTC, Tom London	no flags	Details
Another dmesg output after another freeze/borkage (81.96 KB, text/plain) 2012-11-24 20:02 UTC, Tom London	no flags	Details
Another /sys/kernel/debug/dri/0/i915_error_state from another freeze/crash (1.40 MB, text/plain) 2012-11-24 20:03 UTC, Tom London	no flags	Details
Output of 'dmesg' after Xorg/gdm/gnome-shell borkage (77.89 KB, text/plain) 2012-11-30 14:39 UTC, Tom London	no flags	Details
Copy of i915_error_state after borkage to Xorg/gdm/gnome-shell (1.40 MB, text/plain) 2012-11-30 14:39 UTC, Tom London	no flags	Details
Output of 'dmesg' when 'i915_hangcheck_hung' (77.84 KB, text/plain) 2012-12-01 16:05 UTC, Tom London	no flags	Details
Contents of /sys/kernel/debug/dri/0/i915_error_state after 'i915_hangcheck_error' (1.40 MB, text/plain) 2012-12-01 16:10 UTC, Tom London	no flags	Details
Output of 'dmesg' when gnome-shell/Xorg/gdm break and kernel oops (83.46 KB, text/plain) 2012-12-01 19:15 UTC, Tom London	no flags	Details
Contents of i915_error_state when i915_hangcheck_hung (1.40 MB, text/plain) 2012-12-05 15:33 UTC, Tom London	no flags	Details
Output of 'dmesg' when graphical interface crashed. (127.11 KB, text/plain) 2012-12-05 15:34 UTC, Tom London	no flags	Details
Output of 'dmesg' showing drm borkage and kernel page allocation failure. (81.93 KB, text/plain) 2012-12-09 21:19 UTC, Tom London	no flags	Details
Contents of /sys/kernel/debug/dri/0/i915_error_state (1.40 MB, text/plain) 2012-12-16 23:26 UTC, Tom London	no flags	Details
*tar.gz with Xog.0.log, gdm/:0.log, dmesg, and (zero-length)i915_error_state** (44.27 KB, application/octet-stream) 2012-12-20 15:27 UTC, Tom London	no flags	Details
*Another tar.gz file containing i915_error_state dmesg, Xorg.0.log, gdm/0:.log** (298.87 KB, application/x-tar) 2012-12-22 19:08 UTC, Tom London	no flags	Details
tar.gz containing dmesg, Xorg.0.log, i915_error_state, etc. (296.76 KB, application/x-gzip) 2013-01-08 14:49 UTC, Tom London	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
FreeDesktop.org	57136	0	None	None	None	Never

Description Tom London 2012-11-16 15:34:29 UTC

Description of problem:
I've been getting X/gdm breakage with updates to the texlive packages.

Here is the last one:

[  3908.467] (WW) intel(0): I830DRI2GetMSC:1360 get vblank counter failed: Invalid argument
[  3916.049] (EE) intel(0): Detected a hung GPU, disabling acceleration.
[  3916.049] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.
[  3919.248] (WW) intel(0): I830DRI2GetMSC:1360 get vblank counter failed: Invalid argument
[  3919.250] (WW) intel(0): I830DRI2GetMSC:1360 get vblank counter failed: Invalid argument
[  3927.941] (WW) intel(0): I830DRI2GetMSC:1360 get vblank counter failed: Invalid argument
[  3927.942] (WW) intel(0): I830DRI2GetMSC:1360 get vblank counter failed: Invalid argument

and from /var/log/messages:

Nov 16 07:14:03 tlondon systemd[1]: Started CUPS Printing Service.
Nov 16 07:16:12 tlondon kernel: [ 3908.256097] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 16 07:16:12 tlondon kernel: [ 3908.256110] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Nov 16 07:16:12 tlondon kernel: [ 3908.563834] traps: gnome-shell[1240] trap int3 ip:361084eb67 sp:7fff92c116b0 error:0
Nov 16 07:16:16 tlondon gnome-session[1021]: WARNING: Detected that screensaver has left the bus
Nov 16 07:16:16 tlondon gnome-session[1021]: WARNING: Application 'gnome-shell.desktop' killed by signal 5
Nov 16 07:16:18 tlondon kernel: [ 3914.488036] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 16 07:16:18 tlondon kernel: [ 3914.539048] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
Nov 16 07:16:20 tlondon kernel: [ 3916.040044] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 16 07:16:20 tlondon kernel: [ 3916.040190] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Nov 16 07:16:20 tlondon kernel: [ 3916.040192] [drm:i915_reset] *ERROR* Failed to reset chip.
Nov 16 07:16:29 tlondon dbus-daemon[553]: dbus[553]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.93" (uid=1000 pid=17743 comm="/usr/bin/gnome-shell ") interface="org.freedesktop.DBus.Properties" member="GetAll" error name="(unset)" requested_reply="0" destination=":1.19" (uid=0 pid=802 comm="/usr/sbin/console-kit-daemon --no-daemon ")
Nov 16 07:16:29 tlondon dbus[553]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.93" (uid=1000 pid=17743 comm="/usr/bin/gnome-shell ") interface="org.freedesktop.DBus.Properties" member="GetAll" error name="(unset)" requested_reply="0" destination=":1.19" (uid=0 pid=802 comm="/usr/sbin/console-kit-daemon --no-daemon ")
Nov 16 07:16:32 tlondon kernel: [ 3928.336912] gnome-shell[17743]: segfault at 230 ip 00007ffa38c64e8f sp 00007fff2e48eaf0 error 4 in i965_dri.so[7ffa38c12000+b4000]
Nov 16 07:16:33 tlondon gnome-session[1021]: WARNING: Detected that screensaver has left the bus
Nov 16 07:16:33 tlondon gnome-session[1021]: WARNING: Application 'gnome-shell.desktop' killed by signal 11

What else to provide? 

Version-Release number of selected component (if applicable):
texlive-2012-8.20121115_r28267.fc19.x86_64
and +900 other packages

How reproducible:
Every update of texlive packages....

Steps to Reproduce:
1. yum update texlive\*
2.
3.
  
Actual results:
"Oops something is wrong" screen, or screen freeze, or...

Restarting X/gdm seems to fix...

Expected results:


Additional info:

Comment 1 Tom London 2012-11-16 15:37:09 UTC

Here's snippet from earlier /var/log/messages during previous updates:

Nov 12 06:25:28 tlondon yum[2952]: Updated: 1:texlive-rsfs-0.svn15878-7.fc19.noarch
Nov 12 06:25:35 tlondon yum[2952]: Updated: 1:texlive-cm-super-0.svn15878-7.fc19.noarch
Nov 12 06:25:36 tlondon yum[2952]: Updated: 1:texlive-teubner-3.3a.svn27651-7.fc19.noarch
Nov 12 06:25:36 tlondon yum[2952]: Updated: 1:texlive-texdef-1.7b.svn26420-7.fc19.noarch
Nov 12 06:25:37 tlondon yum[2952]: Updated: 1:texlive-texdef-bin-2012-0.svn21802.7.20121111_r28233.fc19.noarch
Nov 12 06:25:38 tlondon kernel: [ 1228.280060] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 12 06:25:38 tlondon kernel: [ 1228.280067] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Nov 12 06:25:38 tlondon yum[2952]: Updated: 1:texlive-was-0.svn21439-7.fc19.noarch
Nov 12 06:25:38 tlondon yum[2952]: Updated: 1:texlive-xstring-1.5d.svn17614-7.fc19.noarch
Nov 12 06:25:39 tlondon yum[2952]: Updated: 1:texlive-ctanify-1.1.svn26318-7.fc19.noarch
Nov 12 06:25:40 tlondon yum[2952]: Updated: 1:texlive-ctanify-bin-2012-0.svn24061.7.20121111_r28233.fc19.noarch
@

Comment 2 Jindrich Novy 2012-11-20 07:30:00 UTC

Hi,

this seems like a kernel issue to me. It happened to me as well but under different circumstances. The graphic subsystem was failing every time to me after booting of 3.6.6 kernel on F18 and logging in to any hardware accelerated session. I'm sure with one thing: it is not texlive related so reassigning.

Comment 3 Josh Boyer 2012-11-20 13:47:27 UTC

Please attach the dmesg output as a text/plain attachment.  Also, please provide the contents of /debug/dri/0/i915_error_state if it exists.

Comment 4 Tom London 2012-11-20 14:25:23 UTC

I'll try to recreate this probably tomorrow.

I've previously tried to examine /debug/dri/0/i915_error_state, but nothing in that path exists.

Is there some 'magic' I can try to instantiate it?

Comment 5 Tom London 2012-11-20 15:29:59 UTC

Created attachment 648604 [details]
output of 'dmesg' after graphics froze....

OK. Not related to updating texlive.... :-)

Got this running qemu-kvm and associated user-level stuff (e.g., firefox, rhythmbox, etc.).

Again, /debug does not exist on my system.  How to generate?

Comment 6 Josh Boyer 2012-11-20 15:50:24 UTC

(In reply to comment #5)
> Created attachment 648604 [details]
> output of 'dmesg' after graphics froze....
> 
> OK. Not related to updating texlive.... :-)
> 
> Got this running qemu-kvm and associated user-level stuff (e.g., firefox,
> rhythmbox, etc.).
> 
> Again, /debug does not exist on my system.  How to generate?

Oh, I forgot we mount that in a different location.  You might want to check for /sys/kernel/debug/dri/0/i915_error_state

Comment 7 Tom London 2012-11-22 16:36:24 UTC

Created attachment 649932 [details]
Output of 'dmesg' after freeze/X/gdm crash

I got another freeze/crash.

This time I captured dmesg and i915_error_state.  I also tar-ed up the entire /sys/kernel/debug/dri/0 directory.

Here is the dmesg output. I'll post next i915_error_state.

Let me know if the other files in the debug directory are helpful.

Comment 8 Tom London 2012-11-22 16:38:08 UTC

Created attachment 649933 [details]
Contents of /sys/kernel/debug/dri/0/i915_error_state after freeze/crash.

As above, I have the other files in the debug directory. Let me know if they are needed/useful.

Comment 9 Tom London 2012-11-24 20:02:13 UTC

Created attachment 651184 [details]
Another dmesg output after another freeze/borkage

Got another freeze and gdm "oops" screen.

Here is the dmesg output.

[/sys/kernel/dmesg/dri/0/i915_error_state attached next.]

Comment 10 Tom London 2012-11-24 20:03:28 UTC

Created attachment 651185 [details]
Another /sys/kernel/debug/dri/0/i915_error_state from another freeze/crash

Another i915_error_state .....

Comment 11 Tom London 2012-11-24 20:09:42 UTC

Not sure if it is helpful, but here is the backtrace from the corefile generated by the segfault in gnome-shell/i965_dri.so:

Core was generated by `/usr/bin/gnome-shell'.
Program terminated with signal 11, Segmentation fault.
#0  brw_update_renderbuffer_surface (brw=0x1b994b0, rb=0x2ada0f0, unit=0)
    at brw_wm_surface_state.c:1109
1109	   region = irb->mt->region;
Missing separate debuginfos, use: debuginfo-install glib2-2.35.2-1.fc19.x86_64 json-c-0.10-2.fc19.x86_64 libcanberra-0.30-2.fc18.x86_64 libcanberra-gtk3-0.30-2.fc18.x86_64
(gdb) set pagination off
(gdb) bt full
#0  brw_update_renderbuffer_surface (brw=0x1b994b0, rb=0x2ada0f0, unit=0) at brw_wm_surface_state.c:1109
        intel = 0x1b994b0
        ctx = 0x1b994b0
        irb = 0x2ada0f0
        mt = 0x0
        region = <optimized out>
        surf = <optimized out>
        tile_x = 0
        tile_y = 32736
        format = 0
        rb_format = MESA_FORMAT_XRGB8888
        __FUNCTION__ = "brw_update_renderbuffer_surface"
#1  0x00007f20f8a47840 in brw_update_renderbuffer_surfaces (brw=0x1b994b0) at brw_wm_surface_state.c:1205
        intel = 0x1b994b0
        ctx = 0x1b994b0
        i = <optimized out>
#2  0x00007f20f8a31892 in brw_upload_state (brw=brw@entry=0x1b994b0) at brw_state_upload.c:498
        atom = <optimized out>
        ctx = 0x1b994b0
        intel = 0x1b994b0
        state = 0x1bafb1c
        i = <optimized out>
        dirty_count = 0
#3  0x00007f20f8a1ef77 in brw_try_draw_prims (max_index=3, min_index=1499407640, ib=<optimized out>, nr_prims=<optimized out>, prim=0x7fff595f2500, arrays=<optimized out>, ctx=0x1b994b0) at brw_draw.c:493
        estimated_max_prim_size = 4096
        brw = 0x1b994b0
        retval = <optimized out>
        i = <optimized out>
        intel = 0x1b994b0
        fail_next = false
#4  brw_draw_prims (ctx=0x1b994b0, prim=0x7fff595f2500, nr_prims=<optimized out>, ib=<optimized out>, index_bounds_valid=<optimized out>, min_index=0, max_index=3, tfb_vertcount=0x0) at brw_draw.c:589
        arrays = <optimized out>
#5  0x00007f20f85734e4 in vbo_draw_arrays (ctx=0x1b994b0, mode=7, start=0, count=4, numInstances=1, baseInstance=<optimized out>) at ../../../src/mesa/vbo/vbo_exec_array.c:645
        vbo = 0x1c26300
        exec = 0x1c26f48
        prim = {{mode = 7, indexed = 0, begin = 1, end = 1, weak = 0, no_current_update = 0, pad = 0, start = 0, count = 4, basevertex = 0, num_instances = 1, base_instance = 0}, {mode = 0, indexed = 0, begin = 0, end = 0, weak = 0, no_current_update = 0, pad = 0, start = 0, count = 0, basevertex = 0, num_instances = 0, base_instance = 0}}
#6  0x0000003ffde59bde in _cogl_journal_flush_modelview_and_entries (batch_start=<optimized out>, batch_len=1, data=0x7fff595f26e0) at ./cogl-journal.c:309
        state = 0x7fff595f26e0
        framebuffer = 0x2ad94b0
        attributes = 0x38b2130
        draw_flags = (COGL_DRAW_SKIP_JOURNAL_FLUSH | COGL_DRAW_SKIP_PIPELINE_VALIDATION | COGL_DRAW_SKIP_FRAMEBUFFER_FLUSH | COGL_DRAW_SKIP_LEGACY_STATE | COGL_DRAW_COLOR_ATTRIBUTE_IS_OPAQUE)
        ctx = 0x2824ab0
#7  0x0000003ffde596fc in _cogl_journal_flush_vbo_offsets_and_entries (batch_start=0x38b20e0, batch_len=1, data=<optimized out>) at ./cogl-journal.c:647
        state = <optimized out>
        ctx = 0x2824ab0
        stride = 32
        i = <optimized out>
        attribute_entry = <optimized out>
#8  0x0000003ffde5ab48 in _cogl_journal_flush (journal=0x1a8a8b0) at ./cogl-journal.c:1353
        framebuffer = <optimized out>
        ctx = <optimized out>
        state = {journal = 0x1a8a8b0, attribute_buffer = 0x3849240, attributes = 0x1b81120, current_attribute = 0, stride = 32, array_offset = 0, current_vertex = 0, indices = 0x7f21012a9770, indices_type_size = 274842270776, pipeline = 0x3a389a0}
        i = <optimized out>
#9  0x0000003ffde5bdbc in _cogl_framebuffer_flush_journal (framebuffer=framebuffer@entry=0x2ad94b0) at ./cogl-framebuffer.c:632
No locals.
#10 0x0000003ffde5d58f in cogl_framebuffer_clear4f (framebuffer=0x2ad94b0, buffers=buffers@entry=2, red=0.180392161, green=0.203921571, blue=0.211764708, alpha=1) at ./cogl-framebuffer.c:418
        clip_stack = 0x0
        scissor_x0 = 0
        scissor_y0 = 0
        scissor_x1 = 2147483647
        scissor_y1 = 2147483647
#11 0x0000003ffde5d81c in cogl_framebuffer_clear (framebuffer=<optimized out>, buffers=buffers@entry=2, color=color@entry=0x7fff595f2870) at ./cogl-framebuffer.c:484
No locals.
#12 0x0000003ffde22895 in cogl_clear (color=color@entry=0x7fff595f2870, buffers=buffers@entry=2) at ./cogl.c:137
No locals.
#13 0x0000003ffeaa524e in clutter_stage_paint (self=0x2ad6c80) at ./clutter-stage.c:711
        priv = 0x2ad6fc0
        clear_flags = <optimized out>
        bg_color = {red = 46 '.', green = 52 '4', blue = 54 '6', alpha = 255 '\377'}
        stage_color = {private_member_red = 46 '.', private_member_green = 52 '4', private_member_blue = 54 '6', private_member_alpha = 255 '\377', private_member_padding0 = 0, private_member_padding1 = 9363749, private_member_padding2 = 32545}
        iter = {dummy1 = 0x7fff595f2ab0, dummy2 = 0x1a7e430, dummy3 = 0xffffffff, dummy4 = 9363749, dummy5 = 0x3ffeaa51a0 <clutter_stage_paint>}
        child = 0x1a7e430
        real_alpha = <optimized out>
#14 0x00007f21008c8850 in g_closure_invoke () from /lib64/libgobject-2.0.so.0
No symbol table info available.
#15 0x00007f21008da880 in signal_emit_unlocked_R () from /lib64/libgobject-2.0.so.0
No symbol table info available.
#16 0x00007f21008e2e5f in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0
No symbol table info available.
#17 0x00007f21008e3042 in g_signal_emit () from /lib64/libgobject-2.0.so.0
No symbol table info available.
#18 0x0000003ffea46a0d in clutter_actor_continue_paint (self=self@entry=0x2ad6c80) at ./clutter-actor.c:3867
        priv = <optimized out>
        __PRETTY_FUNCTION__ = "clutter_actor_continue_paint"
#19 0x0000003ffea52543 in clutter_actor_paint (self=0x2ad6c80) at ./clutter-actor.c:3791
        priv = 0x2ad6cc0
        pick_mode = CLUTTER_PICK_NONE
        clip_set = 0
        shader_applied = 0
#20 0x0000003ffeaa9399 in _clutter_stage_do_paint (stage=stage@entry=0x2ad6c80, clip=clip@entry=0x0) at ./clutter-stage.c:669
        priv = <optimized out>
        clip_poly = {0, 0, 1680, 0, 1680, 1050, 0, 1050}
        geom = {x = 0, y = 0, width = 1680, height = 1050}
#21 0x0000003ffea3cd8a in clutter_stage_cogl_redraw (stage_window=0x1a3ecb0) at cogl/clutter-stage-cogl.c:404
        stage_cogl = 0x1a3ecb0
        may_use_clipped_redraw = <optimized out>
        use_clipped_redraw = 0
        can_blit_sub_buffer = <optimized out>
        wrapper = 0x2ad6c80
#22 0x0000003ffeaa7ded in clutter_stage_do_redraw (stage=0x2ad6c80) at ./clutter-stage.c:1170
        backend = 0x1a28080
        actor = 0x2ad6c80
        priv = 0x2ad6fc0
#23 _clutter_stage_do_update (stage=0x2ad6c80) at ./clutter-stage.c:1228
        priv = 0x2ad6fc0
#24 0x0000003ffea8c70d in master_clock_update_stages (stages=0x3634b50 = {...}, master_clock=0x297b0f0) at ./clutter-master-clock.c:386
        stages_updated = <optimized out>
        l = 0x3634b50 = {0x2ad6c80}
        start = 1800442921
#25 clutter_clock_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ./clutter-master-clock.c:520
        clock_source = <optimized out>
        master_clock = 0x297b0f0
        stage_manager = <optimized out>
        stages_updated = 0
        stages = 0x3634b50 = {0x2ad6c80}
#26 0x00007f21005dca85 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
No symbol table info available.
#27 0x00007f21005dcdc8 in g_main_context_iterate.isra.25 () from /lib64/libglib-2.0.so.0
No symbol table info available.
#28 0x00007f21005dd222 in g_main_loop_run () from /lib64/libglib-2.0.so.0
No symbol table info available.
#29 0x0000003ff6253d97 in meta_run () at core/main.c:545
        log_domains = {0x0, 0x3ff629a497 "mutter", 0x3ff6299822 "Gtk", 0x3ff6299826 "Gdk", 0x3ff629982a "GLib", 0x3ff629982f "Pango", 0x3ff6299835 "GLib-GObject", 0x3ff6299842 "GThread"}
        i = <optimized out>
#30 0x0000000000401db7 in main (argc=1, argv=0x7fff595f34d8) at main.c:414
        ctx = <optimized out>
        error = 0x0
        ecode = <optimized out>
        sender = 0x2959120
(gdb)

Comment 12 Tom London 2012-11-27 16:43:09 UTC

Got this again with kernel-3.7.0-0.rc7.git0.2.fc19.x86_64:

Nov 27 07:25:06 tlondon kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 27 07:25:06 tlondon kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Nov 27 07:25:12 tlondon kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 27 07:25:12 tlondon kernel: [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
Nov 27 07:25:14 tlondon kernel: [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Nov 27 07:25:14 tlondon kernel: [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Nov 27 07:25:14 tlondon kernel: [drm:i915_reset] *ERROR* Failed to reset chip.

By the time I checked, i915_error_state was empty.

Comment 13 Tom London 2012-11-30 14:38:27 UTC

Xorg/gdm/gnome-shell borkage continues with:

xorg-x11-drv-intel-2.20.14-1.fc19.x86_64
gdm-3.7.2-1.fc19.x86_64
xorg-x11-server-utils-7.5-14.fc18.x86_64
xorg-x11-server-devel-1.13.0-10.fc19.x86_64
xorg-x11-server-Xorg-1.13.0-10.fc19.x86_64
xorg-x11-server-Xephyr-1.13.0-10.fc19.x86_64
xorg-x11-server-common-1.13.0-10.fc19.x86_64
gnome-shell-3.7.2-1.fc19.x86_64

I was running kernel-3.7.0-0.rc7.git1.2.fc19.x86_64.

I attach below output of 'dmesg' and a copy of 'i915_error_state'.

Anything more I can do?

Comment 14 Tom London 2012-11-30 14:39:17 UTC

Created attachment 655086 [details]
Output of 'dmesg' after Xorg/gdm/gnome-shell borkage

Comment 15 Tom London 2012-11-30 14:39:55 UTC

Created attachment 655087 [details]
Copy of i915_error_state after borkage to Xorg/gdm/gnome-shell

Comment 16 Tom London 2012-12-01 16:05:39 UTC

Created attachment 655613 [details]
Output of 'dmesg' when 'i915_hangcheck_hung'

Got another event, but this time, gnome-shell did not crash:

[drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Will post output of 'dmesg' and contents of i915_error_state.

First, dmesg output.

Comment 17 Tom London 2012-12-01 16:10:25 UTC

Created attachment 655614 [details]
Contents of /sys/kernel/debug/dri/0/i915_error_state after 'i915_hangcheck_error'

Copy of i915_error_state when 'i915_hangcheck_error'.

Comment 18 Tom London 2012-12-01 19:15:14 UTC

Created attachment 655695 [details]
Output of 'dmesg' when gnome-shell/Xorg/gdm break and kernel oops

Uhhh, this one looks different.

I was running an "rsync" job to backup my hard drive, and went away for a while for it to complete.

When I returned, the gdm "Oops, something went wrong" screen was displayed.

Since the rsync had not completed (I could hear the USB drive still chattering), I'ctrl-alt-F2'ed, logged in as root, and tried to capture 'dmesg' and /debug/dri/0/i915_error_state.

I got dmesg output just fine, but got the following kernel oops.

I attach complete dmesg output, could not copy i915_error_state.


[ 5170.856032] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5170.856039] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 5181.788016] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5181.844029] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 5183.352023] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5183.352132] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 5183.352134] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 5205.940710] gnome-shell[3194]: segfault at 230 ip 00007f94a1676e8f sp 00007fffe56a4d20 error 4 in i965_dri.so[7f94a1624000+b4000]
[ 6208.669721] cp: page allocation failure: order:9, mode:0x40d0
[ 6208.669756] Pid: 3526, comm: cp Not tainted 3.7.0-0.rc7.git2.2.fc19.x86_64 #1
[ 6208.669786] Call Trace:
[ 6208.669806]  [<ffffffff81135759>] warn_alloc_failed+0xe9/0x150
[ 6208.669834]  [<ffffffff811380a6>] ? drain_local_pages+0x16/0x20
[ 6208.669860]  [<ffffffff811397c6>] __alloc_pages_nodemask+0x736/0x990
[ 6208.669891]  [<ffffffff811766d0>] alloc_pages_current+0xb0/0x120
[ 6208.669919]  [<ffffffff8113464a>] __get_free_pages+0x2a/0x80
[ 6208.669947]  [<ffffffff811805a9>] kmalloc_order_trace+0x39/0xb0
[ 6208.669973]  [<ffffffff81180789>] __kmalloc+0x169/0x1a0
[ 6208.669995]  [<ffffffff8117fbff>] ? kfree+0x15f/0x170
[ 6208.670038]  [<ffffffff811b646e>] seq_read+0x10e/0x3b0
[ 6208.670064]  [<ffffffff81195649>] vfs_read+0xa9/0x180
[ 6208.670085]  [<ffffffff81195772>] sys_read+0x52/0xa0
[ 6208.670111]  [<ffffffff8163951e>] ? do_page_fault+0xe/0x10
[ 6208.670138]  [<ffffffff8163db59>] system_call_fastpath+0x16/0x1b
[ 6208.670166] Mem-Info:
[ 6208.670177] Node 0 DMA per-cpu:
[ 6208.670195] CPU    0: hi:    0, btch:   1 usd:   0
[ 6208.670218] CPU    1: hi:    0, btch:   1 usd:   0
[ 6208.670239] Node 0 DMA32 per-cpu:
[ 6208.670257] CPU    0: hi:  186, btch:  31 usd:   0
[ 6208.670280] CPU    1: hi:  186, btch:  31 usd:   0
[ 6208.670301] Node 0 Normal per-cpu:
[ 6208.671332] CPU    0: hi:  186, btch:  31 usd:   0
[ 6208.672363] CPU    1: hi:  186, btch:  31 usd:   0
[ 6208.673423] active_anon:112612 inactive_anon:128418 isolated_anon:0
 active_file:181030 inactive_file:473129 isolated_file:0
 unevictable:34 dirty:45 writeback:0 unstable:0
 free:38889 slab_reclaimable:15999 slab_unreclaimable:9067
 mapped:17198 shmem:17557 pagetables:8628 bounce:0
 free_cma:0
[ 6208.679634] Node 0 DMA free:15816kB min:264kB low:328kB high:396kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:80kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 6208.683194] lowmem_reserve[]: 0 2947 3892 3892
[ 6208.684443] Node 0 DMA32 free:111404kB min:50976kB low:63720kB high:76464kB active_anon:335380kB inactive_anon:371652kB active_file:518808kB inactive_file:1609232kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:3018404kB mlocked:16kB dirty:116kB writeback:0kB mapped:40252kB shmem:13880kB slab_reclaimable:39564kB slab_unreclaimable:6244kB kernel_stack:528kB pagetables:9676kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 6208.689622] lowmem_reserve[]: 0 0 945 945
[ 6208.690968] Node 0 Normal free:27104kB min:16340kB low:20424kB high:24508kB active_anon:115068kB inactive_anon:142020kB active_file:205312kB inactive_file:284496kB unevictable:120kB isolated(anon):0kB isolated(file):0kB present:967680kB mlocked:120kB dirty:64kB writeback:0kB mapped:28540kB shmem:56348kB slab_reclaimable:24432kB slab_unreclaimable:30016kB kernel_stack:2072kB pagetables:24836kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 6208.696428] lowmem_reserve[]: 0 0 0 0
[ 6208.697813] Node 0 DMA: 0*4kB 1*8kB 2*16kB 1*32kB 2*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15816kB
[ 6208.699231] Node 0 DMA32: 1593*4kB 835*8kB 741*16kB 425*32kB 267*64kB 114*128kB 51*256kB 37*512kB 7*1024kB 1*2048kB 0*4096kB = 111404kB
[ 6208.700672] Node 0 Normal: 2384*4kB 646*8kB 364*16kB 118*32kB 12*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 26608kB
[ 6208.702184] 678209 total pagecache pages
[ 6208.703624] 5949 pages in swap cache
[ 6208.705077] Swap cache stats: add 39488, delete 33539, find 6656/7912
[ 6208.706491] Free swap  = 6044828kB
[ 6208.707966] Total swap = 6127612kB
[ 6208.726016] 1032176 pages RAM
[ 6208.727555] 47518 pages reserved
[ 6208.729122] 644039 pages shared
[ 6208.730515] 878274 pages non-shared

Comment 19 Tom London 2012-12-05 15:33:06 UTC

Created attachment 658267 [details]
Contents of i915_error_state  when i915_hangcheck_hung

Got another freeze/crash with kernel-3.7.0-0.rc7.git3.2.fc19.x86_64, etc.

I attach here contents of i915_error_state and will attach output of dmesg below.

Comment 20 Tom London 2012-12-05 15:34:25 UTC

Created attachment 658268 [details]
Output of 'dmesg' when graphical interface crashed.

Anything more I can provide?

Anything I can test?

Comment 21 Tom London 2012-12-09 21:19:03 UTC

Created attachment 660448 [details]
Output of 'dmesg' showing drm borkage and kernel page allocation failure.

Got another graphical crash followed by another kernel page allocation failure when I tried to copy /sys/kernel/debug/dri/0/915_error_state.

I'm running xorg-x11-drv-intel-2.20.14-1.fc19.x86_64 and kernel-3.7.0-0.rc8.git0.2.fc19.x86_64.

Complete output of 'dmesg' attached.  Here is the 'tail of that':

[ 5671.204025] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5671.204031] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 5678.944094] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5678.996035] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 5680.500078] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 5680.500255] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 5680.500260] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 5712.873898] gnome-shell[11352]: segfault at 230 ip 00007eff11204e9f sp 00007fff45c57480 error 4 in i965_dri.so[7eff111b2000+b4000]
[ 5771.373061] cp: page allocation failure: order:9, mode:0x40d0
[ 5771.373097] Pid: 11557, comm: cp Not tainted 3.7.0-0.rc8.git0.2.fc19.x86_64 #1
[ 5771.373131] Call Trace:
[ 5771.373152]  [<ffffffff81135779>] warn_alloc_failed+0xe9/0x150
[ 5771.373183]  [<ffffffff811380c6>] ? drain_local_pages+0x16/0x20
[ 5771.373209]  [<ffffffff8113980a>] __alloc_pages_nodemask+0x75a/0x9e0
[ 5771.373257]  [<ffffffff81176760>] alloc_pages_current+0xb0/0x120
[ 5771.373288]  [<ffffffff8113466a>] __get_free_pages+0x2a/0x80
[ 5771.373318]  [<ffffffff81180639>] kmalloc_order_trace+0x39/0xb0
[ 5771.373344]  [<ffffffff81180819>] __kmalloc+0x169/0x1a0
[ 5771.373367]  [<ffffffff8117fc8f>] ? kfree+0x15f/0x170
[ 5771.373391]  [<ffffffff811b65ae>] seq_read+0x10e/0x3b0
[ 5771.373419]  [<ffffffff81195779>] vfs_read+0xa9/0x180
[ 5771.373441]  [<ffffffff811958a2>] sys_read+0x52/0xa0
[ 5771.373469]  [<ffffffff816390de>] ? do_page_fault+0xe/0x10
[ 5771.373499]  [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
[ 5771.374541] Mem-Info:
[ 5771.375559] Node 0 DMA per-cpu:
[ 5771.376456] CPU    0: hi:    0, btch:   1 usd:   0
[ 5771.377346] CPU    1: hi:    0, btch:   1 usd:   0
[ 5771.378231] Node 0 DMA32 per-cpu:
[ 5771.379132] CPU    0: hi:  186, btch:  31 usd: 179
[ 5771.380106] CPU    1: hi:  186, btch:  31 usd:   0
[ 5771.380973] Node 0 Normal per-cpu:
[ 5771.381870] CPU    0: hi:  186, btch:  31 usd:  40
[ 5771.382778] CPU    1: hi:  186, btch:  31 usd:   0
[ 5771.383689] active_anon:449260 inactive_anon:192609 isolated_anon:0
 active_file:102060 inactive_file:141052 isolated_file:0
 unevictable:30 dirty:47 writeback:0 unstable:0
 free:30851 slab_reclaimable:18166 slab_unreclaimable:10562
 mapped:30233 shmem:19862 pagetables:8905 bounce:0
 free_cma:0
[ 5771.389083] Node 0 DMA free:15856kB min:264kB low:328kB high:396kB active_anon:24kB inactive_anon:16kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:46 all_unreclaimable? no
[ 5771.392060] lowmem_reserve[]: 0 2947 3892 3892
[ 5771.393093] Node 0 DMA32 free:81880kB min:50976kB low:63720kB high:76464kB active_anon:1540312kB inactive_anon:513540kB active_file:361656kB inactive_file:399964kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3018404kB mlocked:0kB dirty:176kB writeback:0kB mapped:92952kB shmem:21140kB slab_reclaimable:49644kB slab_unreclaimable:10176kB kernel_stack:536kB pagetables:10480kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:23 all_unreclaimable? no
[ 5771.397474] lowmem_reserve[]: 0 0 945 945
[ 5771.398660] Node 0 Normal free:25668kB min:16340kB low:20424kB high:24508kB active_anon:256704kB inactive_anon:256880kB active_file:46584kB inactive_file:164244kB unevictable:120kB isolated(anon):0kB isolated(file):0kB present:967680kB mlocked:120kB dirty:12kB writeback:0kB mapped:27980kB shmem:58308kB slab_reclaimable:23020kB slab_unreclaimable:32064kB kernel_stack:2032kB pagetables:25140kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:18 all_unreclaimable? no
[ 5771.403978] lowmem_reserve[]: 0 0 0 0
[ 5771.405784] Node 0 DMA: 2*4kB 3*8kB 3*16kB 1*32kB 2*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15856kB
[ 5771.407713] Node 0 DMA32: 3516*4kB 1045*8kB 872*16kB 484*32kB 185*64kB 96*128kB 15*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 81880kB
[ 5771.409689] Node 0 Normal: 1162*4kB 248*8kB 178*16kB 106*32kB 26*64kB 20*128kB 5*256kB 8*512kB 1*1024kB 1*2048kB 0*4096kB = 25544kB
[ 5771.411756] 264791 total pagecache pages
[ 5771.413757] 1780 pages in swap cache
[ 5771.415686] Swap cache stats: add 30466, delete 28686, find 3531/4147
[ 5771.419437] Free swap  = 6050720kB
[ 5771.423211] Total swap = 6127612kB
[ 5771.448693] 1032176 pages RAM
[ 5771.450628] 47517 pages reserved
[ 5771.452555] 770570 pages shared
[ 5771.454468] 770496 pages non-shared

Comment 22 Tom London 2012-12-15 17:49:24 UTC

Got another hang, this time I was running qemu-kvm.

I notice posting on fedora-devel referring to xf86-video-intel 2.20.16. Related?



[ 1664.279645] kvm: SMP vm created on host with unstable TSC; guest TSC will not be reliable
[ 1699.388052] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1699.388352] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 1720.653774] traps: gnome-shell[1294] trap int3 ip:3fc364ed77 sp:7fffa6044880 error:0
[ 1726.500061] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1726.552070] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 1726.954135] gnome-shell (1294) used greatest stack depth: 1616 bytes left
[ 1728.572059] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 1728.573539] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 1728.573542] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 1753.544146] gnome-shell[2352]: segfault at 230 ip 00007fe516a91e9f sp 00007fff15c0ec10 error 4 in i965_dri.so[7fe516a3f000+b4000]
[ 1828.772568] cp: page allocation failure: order:9, mode:0x40d0
[ 1828.772606] Pid: 2565, comm: cp Not tainted 3.7.0-2.fc19.x86_64 #1
[ 1828.772638] Call Trace:
[ 1828.772658]  [<ffffffff81167469>] warn_alloc_failed+0xe9/0x150
[ 1828.772690]  [<ffffffff8116a090>] ? page_alloc_cpu_notify+0x50/0x50
[ 1828.772721]  [<ffffffff810d8b6d>] ? trace_hardirqs_on+0xd/0x10
[ 1828.772752]  [<ffffffff8116bc25>] __alloc_pages_nodemask+0x8b5/0xb40
[ 1828.772787]  [<ffffffff811ad460>] alloc_pages_current+0xb0/0x120
[ 1828.772819]  [<ffffffff8116991e>] ? __free_pages_ok.part.54+0x9e/0xe0
[ 1828.772849]  [<ffffffff8116632a>] __get_free_pages+0x2a/0x80
[ 1828.772880]  [<ffffffff811b9c89>] kmalloc_order_trace+0x39/0x190
[ 1828.772911]  [<ffffffff811ba07d>] __kmalloc+0x29d/0x2d0
[ 1828.772938]  [<ffffffff811f8fcf>] seq_read+0x11f/0x3e0
[ 1828.772967]  [<ffffffff811d320c>] vfs_read+0xac/0x180
[ 1828.772991]  [<ffffffff811d3335>] sys_read+0x55/0xa0
[ 1828.773061]  [<ffffffff816fbd19>] system_call_fastpath+0x16/0x1b
[ 1828.773101] Mem-Info:
[ 1828.773114] Node 0 DMA per-cpu:
[ 1828.773135] CPU    0: hi:    0, btch:   1 usd:   0
[ 1828.774322] CPU    1: hi:    0, btch:   1 usd:   0
[ 1828.775452] Node 0 DMA32 per-cpu:
[ 1828.776570] CPU    0: hi:  186, btch:  31 usd:   0
[ 1828.777659] CPU    1: hi:  186, btch:  31 usd:   0
[ 1828.778746] Node 0 Normal per-cpu:
[ 1828.779842] CPU    0: hi:  186, btch:  31 usd:   0
[ 1828.780928] CPU    1: hi:  186, btch:  31 usd:  59
[ 1828.782026] active_anon:489192 inactive_anon:169170 isolated_anon:32
 active_file:62733 inactive_file:90191 isolated_file:0
 unevictable:30 dirty:9 writeback:1 unstable:0
 free:37065 slab_reclaimable:40958 slab_unreclaimable:59058
 mapped:24849 shmem:24027 pagetables:9197 bounce:0
 free_cma:0
[ 1828.788550] Node 0 DMA free:15820kB min:264kB low:328kB high:396kB active_anon:24kB inactive_anon:44kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:17 all_unreclaimable? no
[ 1828.792126] lowmem_reserve[]: 0 2947 3892 3892
[ 1828.793278] Node 0 DMA32 free:111580kB min:50976kB low:63720kB high:76464kB active_anon:1714660kB inactive_anon:440668kB active_file:212044kB inactive_file:313080kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:3018404kB mlocked:16kB dirty:20kB writeback:4kB mapped:65900kB shmem:56804kB slab_reclaimable:115032kB slab_unreclaimable:49992kB kernel_stack:1032kB pagetables:19476kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1828.798035] lowmem_reserve[]: 0 0 945 945
[ 1828.799295] Node 0 Normal free:20860kB min:16340kB low:20424kB high:24508kB active_anon:242084kB inactive_anon:235968kB active_file:38888kB inactive_file:47684kB unevictable:104kB isolated(anon):128kB isolated(file):0kB present:967680kB mlocked:104kB dirty:16kB writeback:0kB mapped:33496kB shmem:39304kB slab_reclaimable:48800kB slab_unreclaimable:186224kB kernel_stack:1560kB pagetables:17312kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 1828.804631] lowmem_reserve[]: 0 0 0 0
[ 1828.806021] Node 0 DMA: 1*4kB 1*8kB 2*16kB 1*32kB 2*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15820kB
[ 1828.807492] Node 0 DMA32: 129*4kB 1529*8kB 1503*16kB 1051*32kB 269*64kB 97*128kB 29*256kB 4*512kB 2*1024kB 0*2048kB 0*4096kB = 111580kB
[ 1828.808996] Node 0 Normal: 75*4kB 156*8kB 325*16kB 211*32kB 43*64kB 24*128kB 6*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20860kB
[ 1828.810530] 178689 total pagecache pages
[ 1828.812013] 1751 pages in swap cache
[ 1828.813486] Swap cache stats: add 8761, delete 7010, find 1227/1425
[ 1828.815035] Free swap  = 6102832kB
[ 1828.816510] Total swap = 6127612kB
[ 1828.833588] 1032176 pages RAM
[ 1828.835092] 52602 pages reserved
[ 1828.836561] 648834 pages shared
[ 1828.838024] 875627 pages non-shared

Comment 23 Tom London 2012-12-16 20:28:28 UTC

Per discussion on fedora-dev, I locally build xorg-x11-drv-intel-2.20.16-0.tbl.fc19.x86_64 (includes xf86-video-intel-2.20.16.tar.bz2).

System seems much snappier, and I have not yet been able to reproduce the hang by running my usual crashers: qemu-kvm with image configured to use 2 cores, etc.

Will continue to try to stress.

Comment 24 Tom London 2012-12-16 23:26:40 UTC

Created attachment 664593 [details]
Contents of /sys/kernel/debug/dri/0/i915_error_state

Sigh... Spoke too soon.

It took quite a bit longer to hang/crash, but its still there..... 

This is with my local version of xorg-x11-drv-intel as above.

Sorry for not posting contents of i915_error_state above, but I am consistently getting page allocation failures when I try to copy it...

'cat ...  >/tmp/i915_error_state' does seem to work.  I've attached that output.


[ 2824.139642] kvm: SMP vm created on host with unstable TSC; guest TSC will not be reliable
[ 4463.980063] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 4463.980075] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 7327.556060] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 7327.608028] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 7329.112073] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 7329.112239] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 7329.112245] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 7384.700497] gnome-shell[29134]: segfault at 230 ip 00007f2738637e9f sp 00007fff743e8650 error 4 in i965_dri.so[7f27385e5000+b4000]
[13414.182671] cp: page allocation failure: order:9, mode:0x40d0
[13414.182708] Pid: 30353, comm: cp Not tainted 3.7.0-0.rc8.git0.2.fc19.x86_64 #1
[13414.182744] Call Trace:
[13414.182764]  [<ffffffff81135779>] warn_alloc_failed+0xe9/0x150
[13414.182795]  [<ffffffff811380c6>] ? drain_local_pages+0x16/0x20
[13414.182823]  [<ffffffff8113980a>] __alloc_pages_nodemask+0x75a/0x9e0
[13414.182871]  [<ffffffff81176760>] alloc_pages_current+0xb0/0x120
[13414.182904]  [<ffffffff8113466a>] __get_free_pages+0x2a/0x80
[13414.182934]  [<ffffffff81180639>] kmalloc_order_trace+0x39/0xb0
[13414.182960]  [<ffffffff81180819>] __kmalloc+0x169/0x1a0
[13414.182983]  [<ffffffff8117fc8f>] ? kfree+0x15f/0x170
[13414.183028]  [<ffffffff811b65ae>] seq_read+0x10e/0x3b0
[13414.183058]  [<ffffffff81195779>] vfs_read+0xa9/0x180
[13414.183080]  [<ffffffff811958a2>] sys_read+0x52/0xa0
[13414.183111]  [<ffffffff816390de>] ? do_page_fault+0xe/0x10
[13414.183147]  [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
[13414.183183] Mem-Info:
[13414.183195] Node 0 DMA per-cpu:
[13414.183214] CPU    0: hi:    0, btch:   1 usd:   0
[13414.183238] CPU    1: hi:    0, btch:   1 usd:   0
[13414.183266] Node 0 DMA32 per-cpu:
[13414.183285] CPU    0: hi:  186, btch:  31 usd: 184
[13414.183310] CPU    1: hi:  186, btch:  31 usd:   0
[13414.183337] Node 0 Normal per-cpu:
[13414.183357] CPU    0: hi:  186, btch:  31 usd:  27
[13414.183381] CPU    1: hi:  186, btch:  31 usd:   0
[13414.183406] active_anon:446251 inactive_anon:189095 isolated_anon:0
 active_file:130235 inactive_file:121581 isolated_file:0
 unevictable:26 dirty:18 writeback:0 unstable:0
 free:30503 slab_reclaimable:14413 slab_unreclaimable:9910
 mapped:16361 shmem:22037 pagetables:9413 bounce:0
 free_cma:0
[13414.183537] Node 0 DMA free:15896kB min:264kB low:328kB high:396kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[13414.183684] lowmem_reserve[]: 0 2947 3892 3892
[13414.184718] Node 0 DMA32 free:80296kB min:50976kB low:63720kB high:76464kB active_anon:1560080kB inactive_anon:520188kB active_file:384936kB inactive_file:358396kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3018404kB mlocked:0kB dirty:40kB writeback:0kB mapped:40816kB shmem:32288kB slab_reclaimable:34628kB slab_unreclaimable:9992kB kernel_stack:608kB pagetables:12348kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:62 all_unreclaimable? no
[13414.188997] lowmem_reserve[]: 0 0 945 945
[13414.190138] Node 0 Normal free:25820kB min:16340kB low:20424kB high:24508kB active_anon:224924kB inactive_anon:236192kB active_file:136004kB inactive_file:127928kB unevictable:104kB isolated(anon):0kB isolated(file):0kB present:967680kB mlocked:104kB dirty:32kB writeback:0kB mapped:24628kB shmem:55860kB slab_reclaimable:23024kB slab_unreclaimable:29640kB kernel_stack:1968kB pagetables:25304kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:13 all_unreclaimable? no
[13414.195157] lowmem_reserve[]: 0 0 0 0
[13414.196530] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
[13414.198048] Node 0 DMA32: 2108*4kB 785*8kB 487*16kB 428*32kB 309*64kB 116*128kB 27*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 80296kB
[13414.199543] Node 0 Normal: 1586*4kB 609*8kB 191*16kB 39*32kB 9*64kB 19*128kB 2*256kB 5*512kB 4*1024kB 0*2048kB 0*4096kB = 25696kB
[13414.201037] 280848 total pagecache pages
[13414.202680] 6949 pages in swap cache
[13414.204128] Swap cache stats: add 90845, delete 83896, find 6341/7853
[13414.205556] Free swap  = 5883716kB
[13414.206979] Total swap = 6127612kB
[13414.225054] 1032176 pages RAM
[13414.226485] 47517 pages reserved
[13414.227915] 756240 pages shared
[13414.229378] 763118 pages non-shared

Comment 25 Tom London 2012-12-20 15:27:53 UTC

Created attachment 666736 [details]
tar.gz with Xog.0.log, gdm/:0*.log, dmesg, and (zero-length)i915_error_state

This continues to hang for me.

I'm running:

xorg-x11-drv-intel-2.20.16-1.fc19.x86_64
xorg-x11-server-common-1.13.1-1.fc19.x86_64
kernel-3.7.0-0.rc8.git0.2.fc19.x86_64

I only get this crash when I run "significant" memory/system load on the system. In this case, I was running qemu-kvm on a Win7 guest that was configured to use 2 cores. (Believe my host only has 1 core/2 hyperthreads).

I also cannot get the contents of /sys/kernel/debug/dri/0/i915_error_state: kernel throws "page allocation" failure each time I try to cat or cp this file.

Is there more I can provide? More I can do to help here?

Better place for me to BZ this?


Here is snippet from dmesg output:

[ 3426.304055] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 3426.304067] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 3432.328034] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 3432.380064] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 3433.936058] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 3433.936249] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 3433.936254] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 3472.579861] gnome-shell[5913]: segfault at 230 ip 00007f6e4b157e9f sp 00007fffdc9b4580 error 4 in i965_dri.so[7f6e4b105000+b4000]
[ 3514.830427] cat: page allocation failure: order:9, mode:0x40d0
[ 3514.831306] Pid: 5994, comm: cat Not tainted 3.7.0-0.rc8.git0.2.fc19.x86_64 #1
[ 3514.832301] Call Trace:
[ 3514.833254]  [<ffffffff81135779>] warn_alloc_failed+0xe9/0x150
[ 3514.834220]  [<ffffffff811380c6>] ? drain_local_pages+0x16/0x20
[ 3514.835157]  [<ffffffff8113980a>] __alloc_pages_nodemask+0x75a/0x9e0
[ 3514.836440]  [<ffffffff81176760>] alloc_pages_current+0xb0/0x120
[ 3514.837701]  [<ffffffff8113466a>] __get_free_pages+0x2a/0x80
[ 3514.838925]  [<ffffffff81180639>] kmalloc_order_trace+0x39/0xb0
[ 3514.840092]  [<ffffffff81180819>] __kmalloc+0x169/0x1a0
[ 3514.841177]  [<ffffffff8117fc8f>] ? kfree+0x15f/0x170
[ 3514.842477]  [<ffffffff811b65ae>] seq_read+0x10e/0x3b0
[ 3514.843639]  [<ffffffff81195779>] vfs_read+0xa9/0x180
[ 3514.844796]  [<ffffffff811958a2>] sys_read+0x52/0xa0
[ 3514.845915]  [<ffffffff816390de>] ? do_page_fault+0xe/0x10
[ 3514.847032]  [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
[ 3514.848155] Mem-Info:
[ 3514.849275] Node 0 DMA per-cpu:
[ 3514.850367] CPU    0: hi:    0, btch:   1 usd:   0
[ 3514.851480] CPU    1: hi:    0, btch:   1 usd:   0
[ 3514.852559] Node 0 DMA32 per-cpu:
[ 3514.853646] CPU    0: hi:  186, btch:  31 usd:   3
[ 3514.854748] CPU    1: hi:  186, btch:  31 usd:   0
[ 3514.855644] Node 0 Normal per-cpu:
[ 3514.856483] CPU    0: hi:  186, btch:  31 usd:  51
[ 3514.857346] CPU    1: hi:  186, btch:  31 usd:   0
[ 3514.858163] active_anon:444404 inactive_anon:188842 isolated_anon:0
 active_file:113299 inactive_file:120543 isolated_file:0
 unevictable:30 dirty:243 writeback:0 unstable:0
 free:51257 slab_reclaimable:20840 slab_unreclaimable:10403
 mapped:25385 shmem:25602 pagetables:8953 bounce:0
 free_cma:0
[ 3514.863058] Node 0 DMA free:15896kB min:264kB low:328kB high:396kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15648kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 3514.865772] lowmem_reserve[]: 0 2947 3892 3892
[ 3514.866736] Node 0 DMA32 free:159076kB min:50976kB low:63720kB high:76464kB active_anon:1465380kB inactive_anon:442856kB active_file:396024kB inactive_file:425680kB unevictable:16kB isolated(anon):0kB isolated(file):0kB present:3018404kB mlocked:16kB dirty:872kB writeback:0kB mapped:76224kB shmem:31964kB slab_reclaimable:58536kB slab_unreclaimable:13004kB kernel_stack:792kB pagetables:14416kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 3514.871201] lowmem_reserve[]: 0 0 945 945
[ 3514.872414] Node 0 Normal free:30056kB min:16340kB low:20424kB high:24508kB active_anon:312236kB inactive_anon:312512kB active_file:57172kB inactive_file:56492kB unevictable:104kB isolated(anon):0kB isolated(file):0kB present:967680kB mlocked:104kB dirty:100kB writeback:0kB mapped:25316kB shmem:70444kB slab_reclaimable:24824kB slab_unreclaimable:28600kB kernel_stack:1824kB pagetables:21396kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 3514.877714] lowmem_reserve[]: 0 0 0 0
[ 3514.879101] Node 0 DMA: 0*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15896kB
[ 3514.880571] Node 0 DMA32: 6167*4kB 2566*8kB 1586*16kB 692*32kB 537*64kB 174*128kB 18*256kB 4*512kB 1*1024kB 1*2048kB 0*4096kB = 159084kB
[ 3514.882079] Node 0 Normal: 1616*4kB 433*8kB 80*16kB 27*32kB 51*64kB 41*128kB 5*256kB 4*512kB 4*1024kB 1*2048kB 0*4096kB = 30056kB
[ 3514.883633] 261387 total pagecache pages
[ 3514.885119] 1936 pages in swap cache
[ 3514.886584] Swap cache stats: add 19278, delete 17342, find 2112/2506
[ 3514.888081] Free swap  = 6081440kB
[ 3514.889550] Total swap = 6127612kB
[ 3514.907123] 1032176 pages RAM
[ 3514.908601] 47517 pages reserved
[ 3514.910070] 674733 pages shared
[ 3514.911509] 839493 pages non-shared

Comment 26 Tom London 2012-12-22 19:08:24 UTC

Created attachment 667774 [details]
Another tar.gz file containing i915_error_state dmesg, Xorg.0.log, gdm/0:*.log

Got another hang crash.  Here is tail of 'dmesg':

[ 7747.352021] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 7747.352027] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 7753.372040] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 7753.423049] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[ 7754.960065] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[ 7754.960165] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[ 7754.960167] [drm:i915_reset] *ERROR* Failed to reset chip.
[ 7757.992866] gnome-shell[4677]: segfault at 230 ip 00007f90c3b9289f sp 00007fff19c0e6a0 error 4 in i965_dri.so[7f90c3b40000+b3000]

This time system successfully grabbed /sys/kernel/debug/dri/0/i915_error_state. It is included the the attached tar.gz file.

I had just started 'digikam', and it was churning the disk generating its DB of my Pictures directory.

System was running:
xorg-x11-drv-intel-2.20.16-1.fc19.x86_64
xorg-x11-server-Xorg-1.13.1-1.fc19.x86_64
kernel-3.7.1-1.fc19.x86_64

More I can provide/test?

Comment 27 Tom London 2012-12-22 19:29:31 UTC

Could this be related: https://bugs.freedesktop.org/show_bug.cgi?id=57136

Any value in me attempting to build a kernel with the referenced patch?

Comment 28 Tom London 2012-12-24 16:25:10 UTC

As posted on https://bugs.freedesktop.org/show_bug.cgi?id=57136, I've built a couple of local kernels with proposed patch, but the problem persists.

Appears to be related to heavy disk traffic, memory pressure, ...

Believe upstream is treating this as a kernel issue.

Comment 29 Tom London 2013-01-05 18:59:51 UTC

Updated to xorg-x11-drv-intel-2.20.17-1.fc19.x86_64, reran my "disk load" test ("cat bigfiles >/dev/null"), and waited.

Within about 2 minutes gdm/Xorg hard crashed, the screen was black, and the system was unresponsive to the usual keyboard entries (i.e., ctrl-alt-F2, ctrl-alt-bksp, ctrl-alt-delete).

I did not get the "gdm Ooops something has gone wrong" screen.

I had to hard power reset the system.

On rebooting, I see this in /var/log/messages.


Jan  5 10:27:59 tlondon kernel: [ 2017.404040] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan  5 10:27:59 tlondon kernel: [ 2017.404047] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
Jan  5 10:28:05 tlondon kernel: [ 2023.424023] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan  5 10:28:05 tlondon kernel: [ 2023.475044] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
Jan  5 10:28:06 tlondon kernel: [ 2025.140021] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
Jan  5 10:28:06 tlondon kernel: [ 2025.140106] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
Jan  5 10:28:06 tlondon kernel: [ 2025.140108] [drm:i915_reset] *ERROR* Failed to reset chip.
Jan  5 10:28:07 tlondon kernel: [ 2025.214077] ------------[ cut here ]------------
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3476!
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] invalid opcode: 0000 [#1] SMP 
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] Modules linked in: fuse(F) ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) ipt_MASQUERADE(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) xt_CHECKSUM(F) iptable_mangle(F) bridge(F) stp(F) llc(F) lockd(F) sunrpc(F) snd_usb_audio(F) snd_hda_codec_conexant(F) snd_usbmidi_lib(F) arc4(F) iwldvm(F) snd_hda_intel(F) snd_hda_codec(F) uvcvideo(F) snd_hwdep(F) snd_rawmidi(F) snd_seq(F) snd_seq_device(F) mac80211(F) videobuf2_vmalloc(F) videobuf2_memops(F) videobuf2_core(F) videodev(F) snd_pcm(F) thinkpad_acpi(F) iwlwifi(F) snd_page_alloc(F) media(F) snd_timer(F) snd(F) cfg80211(F) soundcore(F) e1000e(F) btusb(F) iTCO_wdt(F) bluetooth(F) coretemp(F) iTCO_vendor_support(F) mei(F) tpm_tis(F) tpm(F) lpc_ich(F) rfkill(F) mfd_core(F) i2c_i801(F) tpm_bios(F) microcode(F) vhost_net(F) tun(F) macvtap(F) macvlan(F) kvm_intel(F) kvm(F) binfmt_misc(F) uinput(F) i915(F) i2c_algo_bit(F) drm_kms_helper(F) drm(F) i2c_core(F) wmi(F) video(F)
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] CPU 0 
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] Pid: 660, comm: Xorg Tainted: GF            3.7.1-1.local2.fc19.x86_64 #1 LENOVO 74585FU/74585FU
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] RIP: 0010:[<ffffffffa009c847>]  [<ffffffffa009c847>] i915_gem_object_unpin+0x47/0x50 [i915]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] RSP: 0018:ffff880134be7938  EFLAGS: 00010246
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] RAX: ffff880130a78000 RBX: ffff880130da3800 RCX: 0000000000000000
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] RDX: 0000000000000002 RSI: 0000000000070008 RDI: ffff8801262db400
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] RBP: ffff880134be7938 R08: 0000000000000030 R09: 0000000000000006
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880130da0800
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] R13: ffff880130da0820 R14: 0000000000000000 R15: ffff880130da0800
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] FS:  00007fc5f1d5f940(0000) GS:ffff88013bc00000(0000) knlGS:0000000000000000
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] CR2: 00000000008054bc CR3: 0000000130822000 CR4: 00000000000007f0
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] Process Xorg (pid: 660, threadinfo ffff880134be6000, task ffff880130964560)
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] Stack:
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  ffff880134be7948 ffffffffa00adf5e ffff880134be7978 ffffffffa00b17e6
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  ffff8801338497d8 ffff880130da3800 0000000000000001 ffff880130da0c50
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  ffff880134be7c08 ffffffffa00b43d2 ffff880100000001 000000008121ac18
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] Call Trace:
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa00adf5e>] intel_unpin_fb_obj+0x3e/0x40 [i915]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa00b17e6>] intel_crtc_disable+0x96/0x130 [i915]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa00b43d2>] intel_set_mode+0x262/0xa50 [i915]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff8121d26c>] ? ext4_dirty_inode+0x3c/0x60
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff8125b182>] ? jbd2_journal_stop+0x1b2/0x2a0
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff81237dc6>] ? __ext4_journal_stop+0x76/0xa0
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff8121badd>] ? ext4_da_write_end+0x9d/0x350
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff812f1a31>] ? vsnprintf+0x461/0x600
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff812f1c74>] ? snprintf+0x34/0x40
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa00b4d11>] ? intel_crtc_set_config+0x151/0x970 [i915]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa00b52d6>] intel_crtc_set_config+0x716/0x970 [i915]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff81633af6>] ? __schedule+0x3c6/0x7a0
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa0037286>] drm_framebuffer_remove+0xc6/0x150 [drm]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa003ac75>] drm_mode_rmfb+0xd5/0xe0 [drm]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa002a4a3>] drm_ioctl+0x4d3/0x580 [drm]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff811d3402>] ? send_to_group+0x182/0x250
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffffa003aba0>] ? drm_mode_addfb2+0x6d0/0x6d0 [drm]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff811d372f>] ? fsnotify+0x25f/0x340
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff811a6649>] do_vfs_ioctl+0x99/0x580
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff8128b94a>] ? inode_has_perm.isra.31.constprop.61+0x2a/0x30
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff8128cd17>] ? file_has_perm+0x97/0xb0
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff811a6bc1>] sys_ioctl+0x91/0xb0
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff810dc8cc>] ? __audit_syscall_exit+0x3ec/0x450
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  [<ffffffff8163d9d9>] system_call_fastpath+0x16/0x1b
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] Code: 00 74 2a 89 d0 83 e2 0f c0 e8 04 83 e8 01 83 e0 0f 89 c1 c1 e1 04 09 ca 84 c0 88 97 e9 00 00 00 75 07 80 a7 ea 00 00 00 fb 5d c3 <0f> 0b 0f 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 
Jan  5 10:28:07 tlondon kernel: [ 2025.215017] RIP  [<ffffffffa009c847>] i915_gem_object_unpin+0x47/0x50 [i915]
Jan  5 10:28:07 tlondon kernel: [ 2025.215017]  RSP <ffff880134be7938>

Comment 30 Tom London 2013-01-05 21:27:11 UTC

This just popped up in dmesg:


[10213.840108] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[10213.841102] i915: render error detected, EIR: 0x00000010
[10213.841102] i915:   IPEIR: 0x00000000
[10213.841102] i915:   IPEHR: 0x69040000
[10213.841102] i915:   INSTDONE_0: 0xffffffff
[10213.841102] i915:   INSTDONE_1: 0xbfbbffff
[10213.841102] i915:   INSTDONE_2: 0x00000000
[10213.841102] i915:   INSTDONE_3: 0x00000000
[10213.841102] i915:   INSTPS: 0x8001e025
[10213.841102] i915:   ACTHD: 0x055b608c
[10213.841102] i915: page table error
[10213.841102] i915:   PGTBL_ER: 0x00000001
[10213.841102] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking


i915_error_state was empty:

[root@tlondon dri]# ls -l i915_error_state 
-rw-r--r--. 1 root root 0 Jan  5 13:25 i915_error_state
[root@tlondon dri]#

Comment 31 Tom London 2013-01-08 14:49:13 UTC

Created attachment 674875 [details]
tar.gz containing dmesg, Xorg.0.log, i915_error_state, etc.

Hang/crash continues with kernel-3.8.0-0.rc2.git2.2.fc19.x86_64 and xorg-x11-drv-intel-2.20.17-1.fc19.x86_64.


[  368.708039] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  368.708047] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[  376.708382] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  376.759026] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 00003000 tail 00000000 start 00003000
[  378.704039] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung
[  378.704541] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[  378.704543] [drm:i915_reset] *ERROR* Failed to reset chip.
[  403.384220] gnome-shell[1981]: segfault at 230 ip 00007f40b0fb989f sp 00007fff1db58130 error 4 in i965_dri.so[7f40b0f67000+b3000]

Here are the first 20 lines of i915_error_state:

Time: 1357655991 s 435476 us
PCI ID: 0x2a42
EIR: 0x00000000
IER: 0x02028c53
PGTBL_ER: 0x00000000
CCID: 0x00000000
  fence[0] = 00000000
  fence[1] = 00000000
  fence[2] = 00000000
  fence[3] = 591e0000511f0dd
  fence[4] = 00000000
  fence[5] = 00000000
  fence[6] = 00000000
  fence[7] = 00000000
  fence[8] = 00000000
  fence[9] = 00000000
  fence[10] = 00000000
  fence[11] = 00000000
  fence[12] = 00000000
  fence[13] = 00000000


I attach tar.gz containing dmesg output, /var/log/gdm/*.log, Xorg.0.log and i915_error_state.

More I can do?

Comment 32 Tom London 2013-01-13 16:50:25 UTC

As posted on https://bugs.freedesktop.org/show_bug.cgi?id=57136 there is an updated intel-drm-fixes patch that "works for me".

Comment 33 Tom London 2013-01-16 14:57:19 UTC

"Works for me" with kernel-3.8.0-0.rc3.git1.2.fc19.x86_64.

System is now stable with no graphical/GPU hangs/crashes.

Close?

Here is output of 'vmstat 10' when I was running my "crasher" ('cat 42GB-files >/dev/null'):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  1      0 1852848 130352 821272    0    0   529    75  671 1160 16  4 66 14
 0  2      0 1147996 130368 1524040    0    0 70442   143 1177 2143  7  4 38 50
 0  1      0 431280 130388 2238688    0    0 71458     6 1129 2050  8  4 47 41
 1  3      0 151168  36672 2616048    0    0 81748     1 1393 2173  8  6 53 33
 0  2      0 143580  36672 2629852    0    0 85529     5 1676 2244  8  6 54 31
 0  1      0 147388  36836 2630740    0    0 83891     0 1445 2309  8  6 48 37
 0  1      0 147620  36740 2635532    0    0 85707     2 1384 2308  8  6 49 36
 0  2      0 146096  36732 2642012    0    0 87822     0 1371 2251  8  6 50 36
 0  2      0 150412  36724 2642648    0    0 84706     2 1384 2229  7  6 46 41
 1  2      0 147460  36720 2648628    0    0 94431     0 1408 2372  8  6 45 41
 0  1      0 141492  36672 2654760    0    0 102380     1 1493 2375  8  8 51 33
 0  2      0 150200  36728 2645664    0    0 89871   148 1495 2340 11  6 36 47
 0  2      0 146064  36776 2658700    0    0 95709    44 1739 2957 15  7 36 42
 1  0      0 146684  36744 2672228    0    0 105561    23 1554 2426  8  7 45 39
 0  2      0 151012  36744 2666768    0    0 104404    53 1482 2407  8  7 54 30
 0  1      0 150700  36756 2666620    0    0 93882    20 1525 2406  9  8 53 31
 1  0      0 145636  36748 2671992    0    0 100744     3 1478 2391  8  7 48 36
 1  2      0 150644  36744 2664192    0    0 94840     9 1467 2389  8  7 47 38
 0  2      0 143776  36752 2672888    0    0 93978     2 1422 2478  7  6 27 60
 0  2      0 150496  36760 2665064    0    0 78910     1 1429 2505  7  5 27 61
 2  1      0 148284  37056 2658988    0    0 73410    26 1706 3397 10  8 24 58
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  2      0 146576  37184 2638560    0    0 65906     8 1911 3643 15  9 18 58
 1  2      0 146624  37136 2658168    0    0 72342    36 1981 4086 14 10 22 54
 0  3      0 145204  37144 2656428    0    0 72634    23 1578 3020 11  6 25 59
 1  2      0 148944  37132 2640172    0    0 63786   212 1687 3123 13  6 19 62
 0  2      0 150448  37324 2655008    0    0 67948    37 1925 3646 14  9 21 56
 1  2      0 147208  37592 2659720    0    0 70613     2 1499 2713  8  6 26 61
 0  3      0 145172  37540 2660880    0    0 71466  1201 1512 2611 10  7 26 57
 0  2      0 148232  37592 2654312    0    0 68422   126 1488 2691  9  6 29 57
 0  2      0 146068  37732 2657372    0    0 63763     8 1416 2368  7  4 28 61
 1  3      0 149812  37752 2644272    0    0 64321    26 1573 2715 10  6 24 60
 0  2      0 149136  37716 2639300    0    0 68466   185 1872 2844 17  6 21 56
 0  2      0 147104  37828 2655020    0    0 66119     4 1626 2881 10  5 26 59
 1  2      0 145264  37672 2651160    0    0 67910    16 1373 2395  8  5 28 59
 0  2      0 152316  37528 2639272    0    0 67469   146 1930 3998 13 10 22 56
 0  2      0 141692  37552 2652232    0    0 66947     4 1402 2483  8  5 27 61
 0  2      0 149348  37516 2641464    0    0 69679    16 1389 2468  8  5 27 61
 1  2      0 148852  37448 2640848    0    0 68531    20 1678 3021 10  7 23 59
 0  2      0 146760  37444 2642788    0    0 68535     6 1450 2591  8  5 28 59
 0  2      0 151068  37492 2637484    0    0 68310    16 1483 2661  8  6 26 59
 1  2      0 129352  37464 2658936    0    0 72455    20 1307 2214  7  5 28 60
 0  2      0 145968  38244 2641356    0    0 70990     7 1470 2492  8  7 28 57
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  1      0 148588  38268 2639224    0    0 72073     4 1277 2173  7  4 27 62
 1  2      0 144888  38244 2641832    0    0 71859     0 1549 2807  9  7 27 58
 0  2      0 150864  38280 2633736    0    0 70260    26 1482 2742  9  6 26 59
 1  2      0 151960  38292 2634240    0    0 67687    18 1370 2291  8  5 27 60
 0  2      0 149004  38244 2637348    0    0 70388     4 1256 2157  6  4 28 61
 1  2      0 147892  38264 2637420    0    0 66363    30 1267 2160  7  4 26 63
 0  2      0 145828  37624 2640700    0    0 66344     3 1270 2107  6  5 28 60
 1  1      0 135580  38580 2648904    0    0 37212     0 2134 2321  7 34 14 45
 4  0      0 165432  38504 2622408    0    0    61     2  858 1437  9  3 87  1

Comment 34 Fedora End Of Life 2013-04-03 13:34:37 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 35 Fedora End Of Life 2015-01-09 17:28:55 UTC

This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 36 Fedora End Of Life 2015-02-17 14:34:20 UTC

Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.