214635 – dom0: panic w/X on i965 (appears to be agpgart?)

Bug 214635 - dom0: panic w/X on i965 (appears to be agpgart?)

Summary: dom0: panic w/X on i965 (appears to be agpgart?)

Keywords:
Status:	CLOSED DUPLICATE of bug 217715
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Rik van Riel
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	214287 214650 (view as bug list)
Depends On:
Blocks:	217715
TreeView+	depends on / blocked

Reported:	2006-11-08 18:43 UTC by Bill Nottingham
Modified:	2014-03-17 03:03 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-01-10 20:26:17 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Bill Nottingham 2006-11-08 18:43:20 UTC

Description of problem:

Usage of X on i965 under dom0 often panics the kernel, leading to a fault that
spontaneously reboots the machine.

Operations that can cause this:

1) starting X (has happened in gdm)
2) running a GL app
3) typing in a terminal (!)

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/mm/pageattr-xen.c:309
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/drm/card0/dev
CPU 1 
Modules linked in: nfs lockd fscache nfs_acl i915 drm netconsole bridge netloop
netbk blktap blkbk autofs4 hidp rfcomm l2cap bluetooth sunrpc
ip_conntrack_netbios_ns ipt_REJECT iptable_filter ip_tables xt_state
ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 video
sbs i2c_ec button battery asus_acpi ac parport_pc lp parport intel_rng
snd_hda_intel snd_hda_codec sr_mod cdrom snd_seq_dummy snd_seq_oss sg
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
i2c_i801 e1000 i2c_core pcspkr serio_raw serial_core snd_timer snd soundcore
shpchp snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod
scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 3050, comm: Xorg Not tainted 2.6.18-1.2745.el5xen #1
RIP: e030:[<ffffffff8024f677>]  [<ffffffff8024f677>] __change_page_attr+0x78c/0xa5e
RSP: e02b:ffff88006ec67dd8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88000102b000 RCX: 0000000000000023
RDX: 7fffffffffffffff RSI: 0000000000000067 RDI: ffff880000000000
RBP: 0000000000000000 R08: ffff880001418968 R09: 0000000000000001
R10: 0000000000007f00 R11: 8000000000000063 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff80201880
FS:  00002aaaac62afb0(0000) GS:ffffffff8058d080(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process Xorg (pid: 3050, threadinfo ffff88006ec66000, task ffff8800607d0860)
Stack:  ffffffff8027549e  8000000000000063  000000000005e846  ffffffff8026073d 
 0000000080000000  ffffffff802601d0  0000000000000000  0000000000000000 
 ffff880000000000  0000000000000000 
Call Trace:
 [<ffffffff8027549e>] flush_kernel_map+0x0/0x11
 [<ffffffff8026073d>] _spin_lock_irq+0x9/0x14
 [<ffffffff802601d0>] __down_write_nested+0x35/0x9a
 [<ffffffff80275b68>] change_page_attr_addr+0x91/0x11a
 [<ffffffff80381250>] agp_generic_destroy_page+0x4e/0x7a
 [<ffffffff80381130>] agp_free_memory+0x65/0x90
 [<ffffffff80380347>] agp_release+0x9f/0x18a
 [<ffffffff802124e1>] __fput+0xae/0x198
 [<ffffffff80223669>] filp_close+0x5c/0x64
 [<ffffffff8021d22a>] sys_close+0x88/0xa2
 [<ffffffff8025c65d>] tracesys+0xa7/0xb2


Code: 0f 0b 68 bd 77 46 80 c2 35 01 48 ff c8 49 89 40 10 eb 0a 0f 
RIP  [<ffffffff8024f677>] __change_page_attr+0x78c/0xa5e
 RSP <ffff88006ec67dd8>
 <0>Kernel panic - not syncing: Fatal exception

Version-Release number of selected component (if applicable):

2.6.18-1.2745.el5xen

How reproducible:

Often. Basically, X + Xen is unusable.

Steps to Reproduce:
1. Install tree.
2. Boot Xen domain0
3. Start X.
4. Do some stuff/restart X/wait

Comment 1 Bill Nottingham 2006-11-08 19:30:03 UTC

Fixing subject, as xen reboots on panic by default.

Comment 2 Adam Jackson 2006-11-08 19:34:08 UTC

As a wild guess, agp_release isn't paranoid enough to nop away double-frees or
frees of invalid regions.

Comment 3 Bill Nottingham 2006-11-09 16:50:51 UTC

A kernel built with most CONFIG_DEBUG_* enabled yields:

Xorg: Corrupted page table at address a7f300
PGD 6706e067 PUD 695bf067 PMD 5d39c067 PTE 74992fff
Bad pagetable: 000f [1] SMP 
last sysfs file: /class/drm/card0/dev

Comment 4 Bill Nottingham 2006-11-09 20:25:25 UTC

*** Bug 214287 has been marked as a duplicate of this bug. ***

Comment 5 Bill Nottingham 2006-11-09 20:27:10 UTC

As an interesting data point (that I have no idea what it means):

If you boot into runlevel 5, this happens with the first X instance (if you're
using GL.)

If you boot into runlevel 3 and run 'telinit 5', it doesn't crash until the
second or third time X is started.

Comment 6 Bill Nottingham 2006-11-09 21:20:05 UTC

And here's one with dri disabled:

swap_free: Bad swap offset entry 1000000000000
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/rmap.c:587
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/xen/blktap0/dev
CPU 0 
Modules linked in: nfs lockd fscache nfs_acl xt_physdev x_tables netconsole
bridge netloop netbk blktap blkbk autofs4 hidp rfcomm l2cap bluetooth sunrpc
ipv6 video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport
intel_rng snd_hda_intel snd_hda_codec sr_mod cdrom snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sg snd_pcm
snd_timer snd soundcore snd_page_alloc i2c_i801 e1000 serial_core i2c_core
shpchp pcspkr serio_raw dm_snapshot dm_zero dm_mirror dm_mod ahci libata sd_mod
scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 3806, comm: python Not tainted 2.6.18-1.2746.el5xen #1
RIP: e030:[<ffffffff8020adfa>]  [<ffffffff8020adfa>] page_remove_rmap+0x13/0x2c
RSP: e02b:ffff880039dc9c40  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffff880001a165b8 RCX: 030000001f1d1d1d
RDX: 0000000000000000 RSI: 0000000057ad6120 RDI: ffff880001a165b8
RBP: 000000001f1d1d00 R08: 0000000000074992 R09: 0000000000006400
R10: 0000003eec201000 R11: 0000000000000000 R12: 0000003eec201000
R13: ffff880041f94008 R14: ffff880038c93b00 R15: 0000003eec208000
FS:  00002aaaaaabdf40(0000) GS:ffffffff8058e000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process python (pid: 3806, threadinfo ffff880039dc8000, task ffff8800411f9100)
Stack:  ffffffff80207972  0000000000000000  ffff880039dc9d18  ffffffffffffffff 
 0000000000000000  ffff880037d6d5d0  ffff880039dc9d20  000000000003931e 
 0000000000000000  0000000138c93b00 
Call Trace:
 [<ffffffff80207972>] unmap_vmas+0x793/0xae7
 [<ffffffff80239cea>] exit_mmap+0x7d/0xf8
 [<ffffffff8023c02a>] mmput+0x30/0x83
 [<ffffffff80214fd5>] do_exit+0x288/0x89a
 [<ffffffff80247aa3>] cpuset_exit+0x0/0x6b
 [<ffffffff8022aaab>] get_signal_to_deliver+0x439/0x46c
 [<ffffffff8025a182>] do_notify_resume+0x9c/0x7b4
 [<ffffffff80280afb>] task_rq_lock+0x3f/0x71
 [<ffffffff802458c4>] try_to_wake_up+0x365/0x376
 [<ffffffff8028d912>] signal_wake_up+0x1e/0x2d
 [<ffffffff8028e40b>] specific_send_sig_info+0xa4/0xaf
 [<ffffffff8028e682>] force_sig_info+0xa9/0xb3
 [<ffffffff80269537>] do_stack_segment+0x84/0x8b
 [<ffffffff8025cace>] retint_signal+0x5d/0xb7


Code: 0f 0b 68 21 b3 46 80 c2 4b 02 8b 77 18 83 f6 01 83 e6 01 e9 
RIP  [<ffffffff8020adfa>] page_remove_rmap+0x13/0x2c
 RSP <ffff880039dc9c40>
 <1>Fixing recursive fault but reboot is needed!

Comment 7 Bill Nottingham 2006-11-15 21:52:15 UTC

Proposing as RC blocker. Xen dom0 + X + i965 = panic-o-rama.

Comment 8 Bill Nottingham 2006-11-29 22:23:01 UTC

Aha, we have an interesting data point. This is x86-64 specific - the i386 xen
kernel is fine.

Comment 9 Glauber Costa 2006-12-11 14:09:10 UTC

*** Bug 214650 has been marked as a duplicate of this bug. ***

Comment 10 Adam Jackson 2006-12-14 16:53:37 UTC

This really needs to not be assigned to me, I don't understand the agpgart code
at all.

Comment 11 Rik van Riel 2007-01-10 20:26:17 UTC

Should be fixed by adding the include/asm-x86_64/mach-xen/asm/agp.h file, as in
bug 217715.

*** This bug has been marked as a duplicate of 217715 ***

Note You need to log in before you can comment on or make changes to this bug.