Bug 207432

Summary:

X/drm kernel VM corruptions

Product:

[Fedora] Fedora

Reporter:

Naoki <naoki>

Component:

kernel-xen

Assignee:

Steven Rostedt <srostedt>

Status:

CLOSED WONTFIX

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

medium

Version:

CC:

bstein, davide_bolcioni, dm, katzj, notting, srostedt, xen-maint

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-02-26 23:24:39 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

217715

Bug Blocks:

Attachments:

Description	Flags
Output of kernel errors when X starts up.	none
messages after boot : kernel-xen-2.6.18-1.2849.fc6 selinux=0, i810 X driver	none

Description Naoki 2006-09-21 01:49:33 UTC

Description of problem:
"2.6.17-1.2647.fc6 #1 SMP Wed Sep 13 12:51:28 EDT 2006 x86_64 x86_64 x86_64
GNU/Linux" boots but will crash soon after logging into GDM. This problem does
not occur with the same non-xen kernel.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.17-1.2647.fc6

How reproducible:
Always.

Steps to Reproduce:
1. boot from said kernel.
2. Log into X (if possible).
  
Actual results:
Booting process will work seeminly ok.
Graphical boot completes.
GDM will accept a login (although it might not always get this far).
Shortly thereafter the screen will either go blank or graphic corruption will
occur.  I have also seen my screen display the message "Out of range signal, set
monitor to 1680x1050", the native resolution to which it is always set.

A reboot to non-xen kernel fixes this and the box remains stable for days.

Expected results:
GDM accepts login and gnome fires up as normal.

Additional info:
Sep 21 10:22:10 dragon kernel: [drm] Initialized drm 1.0.1 20051102
Sep 21 10:22:10 dragon kernel: ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16
(level, low) -> IRQ 16
Sep 21 10:22:10 dragon kernel: [drm] Initialized i915 1.5.0 20060119 on minor 0
Sep 21 10:22:12 dragon kernel: Unable to handle kernel paging request at
ffff8800012d5c60 RIP: 
Sep 21 10:22:12 dragon kernel:  [<ffffffff80250c86>] __change_page_attr+0xa1e/0xa8e
Sep 21 10:22:12 dragon kernel: PGD 10d3067 PUD 10d4067 PMD 10de067 PTE 12d5065
Sep 21 10:22:12 dragon kernel: Oops: 0003 [1] SMP 
Sep 21 10:22:12 dragon kernel: last sysfs file: /class/drm/card0/dev
Sep 21 10:22:12 dragon kernel: CPU 1 
Sep 21 10:22:12 dragon kernel: Modules linked in: i915 drm bridge netloop netbk
blkbk autofs4 sunrpc ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT
xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables
acpi_cpufreq video sbs i2c_ec button battery asus_acpi ac ipv6 parport_pc lp
parport snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss ide_cd pcspkr
snd_mixer_oss snd_pcm tg3 intel_rng cdrom sg snd_timer snd soundcore
snd_page_alloc serio_raw i2c_i801 i2c_core shpchp dm_snapshot dm_zero dm_mirror
dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Sep 21 10:22:12 dragon kernel: Pid: 1309, comm: Xorg Not tainted
2.6.17-1.2647.fc6xen #1
Sep 21 10:22:12 dragon kernel: RIP: e030:[<ffffffff80250c86>] 
[<ffffffff80250c86>] __change_page_attr+0xa1e/0xa8e
Sep 21 10:22:12 dragon kernel: RSP: e02b:ffff880072723b98  EFLAGS: 00010282
Sep 21 10:22:12 dragon kernel: RAX: 80000000718000e3 RBX: ffff880001462ff8 RCX:
0000000000000023
Sep 21 10:22:12 dragon kernel: RDX: ffff8800012d5c60 RSI: 0000000000001462 RDI:
0000000000000067
Sep 21 10:22:12 dragon kernel: RBP: ffff8800719ff000 R08: ffff8800014d9570 R09:
0000000000000000
Sep 21 10:22:12 dragon kernel: R10: 8000000000000063 R11: 80000000000000e3 R12:
00000000719ff000
Sep 21 10:22:12 dragon kernel: R13: 0000000000000c60 R14: 0000000000000008 R15:
ffffffff80201880
Sep 21 10:22:12 dragon kernel: FS:  00002aaaab49ff80(0000)
GS:ffffffff8064f080(0000) knlGS:0000000000000000
Sep 21 10:22:12 dragon kernel: CS:  e033 DS: 0000 ES: 0000
Sep 21 10:22:12 dragon kernel: Process Xorg (pid: 1309, threadinfo
ffff880072722000, task ffff880073b31040)
Sep 21 10:22:12 dragon kernel: Stack:  0000000000000003  ffff8800719ff000 
ffffffff804c4770  ffffffff80262279 
Sep 21 10:22:12 dragon kernel:  ffffffff804c4770  ffffffff80261d15 
0000000000000000  ffff8800719ff000 
Sep 21 10:22:12 dragon kernel:  00000000000719ff  00000000719ff000 
Sep 21 10:22:12 dragon kernel: Call Trace:
Sep 21 10:22:12 dragon kernel:  [<ffffffff80262279>] _spin_unlock_irq+0x9/0x10
Sep 21 10:22:12 dragon kernel:  [<ffffffff80261d15>] __down_write_nested+0x34/0x96
Sep 21 10:22:12 dragon kernel:  [<ffffffff8027a40d>]
change_page_attr_addr+0x7b/0x12c
Sep 21 10:22:12 dragon kernel:  [<ffffffff8038782a>]
agp_generic_destroy_page+0x4e/0x7a
Sep 21 10:22:12 dragon kernel:  [<ffffffff80387708>] agp_free_memory+0x65/0x92
Sep 21 10:22:12 dragon kernel:  [<ffffffff80386937>] agp_release+0x9f/0x18a
Sep 21 10:22:12 dragon kernel:  [<ffffffff8021296e>] __fput+0xbf/0x1aa
Sep 21 10:22:12 dragon kernel:  [<ffffffff80223f7b>] filp_close+0x5c/0x64
Sep 21 10:22:12 dragon kernel:  [<ffffffff80239c67>] put_files_struct+0x6c/0xc4
Sep 21 10:22:12 dragon kernel:  [<ffffffff80215591>] do_exit+0x2d0/0x8a3
Sep 21 10:22:12 dragon kernel:  [<ffffffff80248ca8>] debug_mutex_init+0x0/0xd
Sep 21 10:22:12 dragon kernel:  [<ffffffff8022b52d>]
get_signal_to_deliver+0x42d/0x45d
Sep 21 10:22:12 dragon kernel:  [<ffffffff8025b6c4>] do_notify_resume+0x9c/0x7c1
Sep 21 10:22:12 dragon kernel:  [<ffffffff802434fc>] sys_rt_sigreturn+0x322/0x355
Sep 21 10:22:12 dragon kernel:  [<ffffffff8025db49>] sysret_signal+0x38/0x43
Sep 21 10:22:12 dragon kernel:  [<ffffffff8025de29>] ptregscall_common+0x3d/0x64
Sep 21 10:22:12 dragon kernel: 
Sep 21 10:22:12 dragon kernel: 
Sep 21 10:22:12 dragon kernel: Code: 48 89 02 31 c0 eb 5a 48 89 da 48 b8 ff ff
ff 7f ff ff ff ff 
Sep 21 10:22:12 dragon kernel: RIP  [<ffffffff80250c86>]
__change_page_attr+0xa1e/0xa8e
Sep 21 10:22:12 dragon kernel:  RSP <ffff880072723b98>
Sep 21 10:22:12 dragon kernel: CR2: ffff8800012d5c60
Sep 21 10:22:12 dragon kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Sep 21 10:22:12 dragon kernel: in_atomic():0, irqs_disabled():1
Sep 21 10:22:12 dragon kernel: 
Sep 21 10:22:12 dragon kernel: Call Trace:
Sep 21 10:22:12 dragon kernel:  [<ffffffff8029b5c7>] down_read+0x15/0x23
Sep 21 10:22:12 dragon kernel:  [<ffffffff80293b99>]
blocking_notifier_call_chain+0x13/0x36
Sep 21 10:22:12 dragon kernel:  [<ffffffff802152e0>] do_exit+0x1f/0x8a3
Sep 21 10:22:12 dragon kernel:  [<ffffffff80264e32>] do_page_fault+0x1136/0x11e2
Sep 21 10:22:12 dragon kernel:  [<ffffffff80262396>]
_spin_unlock_irqrestore+0x9/0x19
Sep 21 10:22:12 dragon kernel:  [<ffffffff80207138>] kmem_cache_free+0x77/0xca
Sep 21 10:22:12 dragon kernel:  [<ffffffff8032f78c>] radix_tree_delete+0x150/0x187
Sep 21 10:22:12 dragon kernel:  [<ffffffff802119ad>] do_select+0x445/0x462
Sep 21 10:22:12 dragon kernel:  [<ffffffff80201880>] init_level4_pgt+0x880/0x1000
Sep 21 10:22:12 dragon kernel:  [<ffffffff8025e13b>] error_exit+0x0/0x6e
Sep 21 10:22:12 dragon kernel:  [<ffffffff80201880>] init_level4_pgt+0x880/0x1000
Sep 21 10:22:12 dragon kernel:  [<ffffffff80250c86>] __change_page_attr+0xa1e/0xa8e
Sep 21 10:22:12 dragon kernel:  [<ffffffff80262313>] _spin_lock_irqsave+0x1a/0x23
Sep 21 10:22:12 dragon kernel:  [<ffffffff80262279>] _spin_unlock_irq+0x9/0x10
Sep 21 10:22:12 dragon kernel:  [<ffffffff80261d15>] __down_write_nested+0x34/0x96
Sep 21 10:22:12 dragon kernel:  [<ffffffff8027a40d>]
change_page_attr_addr+0x7b/0x12c
Sep 21 10:22:12 dragon kernel:  [<ffffffff8038782a>]
agp_generic_destroy_page+0x4e/0x7a
Sep 21 10:22:12 dragon kernel:  [<ffffffff80387708>] agp_free_memory+0x65/0x92
Sep 21 10:22:12 dragon kernel:  [<ffffffff80386937>] agp_release+0x9f/0x18a
Sep 21 10:22:12 dragon kernel:  [<ffffffff8021296e>] __fput+0xbf/0x1aa
Sep 21 10:22:12 dragon kernel:  [<ffffffff80223f7b>] filp_close+0x5c/0x64
Sep 21 10:22:12 dragon kernel:  [<ffffffff80239c67>] put_files_struct+0x6c/0xc4
Sep 21 10:22:12 dragon kernel:  [<ffffffff80215591>] do_exit+0x2d0/0x8a3
Sep 21 10:22:12 dragon kernel:  [<ffffffff80248ca8>] debug_mutex_init+0x0/0xd
Sep 21 10:22:12 dragon kernel:  [<ffffffff8022b52d>]
get_signal_to_deliver+0x42d/0x45d
Sep 21 10:22:12 dragon kernel:  [<ffffffff8025b6c4>] do_notify_resume+0x9c/0x7c1
Sep 21 10:22:12 dragon kernel:  [<ffffffff802434fc>] sys_rt_sigreturn+0x322/0x355
Sep 21 10:22:12 dragon kernel:  [<ffffffff8025db49>] sysret_signal+0x38/0x43
Sep 21 10:22:12 dragon kernel:  [<ffffffff8025de29>] ptregscall_common+0x3d/0x64
Sep 21 10:22:12 dragon kernel: 
Sep 21 10:22:12 dragon kernel: Fixing recursive fault but reboot is needed!

Not long after the box will hang outright.

I am using :
xorg-x11-server-Xorg-1.1.1-38.fc6
metacity-2.16.0-2.fc6

With the intel driver :
(II) LoadModule: "intel"
(II) Loading /usr/lib64/xorg/modules/drivers/intel_drv.so

I have set the priority to high as xen is unusable on this platform.

Comment 1 Stephen Tweedie 2006-09-25 12:05:32 UTC

*** Bug 207883 has been marked as a duplicate of this bug. ***

Comment 2 Stephen Tweedie 2006-09-25 20:16:33 UTC

*** Bug 208003 has been marked as a duplicate of this bug. ***

Comment 3 Naoki 2006-09-26 07:15:35 UTC

Just tested kernel-xen-2.6.18-1.2693.fc6 and it fails to boot at all.  Just past
the USB, UHCI initialization I get :
"GSI 21 sharing vector 0xc8 and IRQ 21" then nothing, just waits until I power
cycle.  This is the exact problem I reported here :
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=206099
So I have re-opened that ticket.

I am still finding this in my messages file :

Sep 26 15:57:21 localhost gdm[2831]: GDM restarting ...
Sep 26 15:57:21 localhost kernel: Unable to handle kernel paging request at
ffff880001225c98 RIP:
Sep 26 15:57:21 localhost kernel:  [<ffffffff80250c4d>]
__change_page_attr+0xa1e/0xa8e
Sep 26 15:57:21 localhost kernel: PGD 1023067 PUD 1024067 PMD 102e067 PTE 759ac065
Sep 26 15:57:21 localhost kernel: Oops: 0003 [1] SMP
Sep 26 15:57:21 localhost kernel: last sysfs file: /class/drm/card0/dev
Sep 26 15:57:21 localhost kernel: CPU 1
Sep 26 15:57:21 localhost kernel: Modules linked in: i915 drm bridge netloop
netbk blkbk autofs4 sunrpc ip_conntrack_ftp ip_conntrack_netbios_ns ipt_REJECT
xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables
acpi_cpufreq video sbs i2c_ec button battery asus_acpi ac ipv6 parport_pc lp
parport sg snd_intel8x0 snd_ac97_codec snd_ac97_bus ide_cd snd_seq_dummy
snd_seq_oss snd_seq_midi_event cdrom snd_seq snd_seq_device i2c_i801 snd_pcm_oss
snd_mixer_oss i2c_core snd_pcm tg3 intel_rng pcspkr serio_raw shpchp snd_timer
snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ata_piix
libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Sep 26 15:57:21 localhost kernel: Pid: 1300, comm: Xorg Not tainted
2.6.18-1.2693.fc6xen #1
Sep 26 15:57:21 localhost kernel: RIP: e030:[<ffffffff80250c4d>] 
[<ffffffff80250c4d>] __change_page_attr+0xa1e/0xa8e
Sep 26 15:57:21 localhost kernel: RSP: e02b:ffff880072539b98  EFLAGS: 00010282
Sep 26 15:57:21 localhost kernel: RAX: 80000000726000e3 RBX: ffff8800013b99f8
RCX: 0000000000000023
Sep 26 15:57:21 localhost kernel: RDX: ffff880001225c98 RSI: 00000000000013b9
RDI: 0000000000000067
Sep 26 15:57:21 localhost kernel: RBP: ffff88007273f000 R08: ffff880001427078
R09: 0000000000000000
Sep 26 15:57:21 localhost kernel: R10: 8000000000000063 R11: 80000000000000e3
R12: 000000007273f000
Sep 26 15:57:21 localhost kernel: R13: 0000000000000c98 R14: 0000000000000008
R15: ffffffff80201880
Sep 26 15:57:21 localhost kernel: FS:  00002aaaaacd8e00(0000)
GS:ffffffff8059d080(0000) knlGS:0000000000000000
Sep 26 15:57:21 localhost kernel: CS:  e033 DS: 0000 ES: 0000
Sep 26 15:57:21 localhost kernel: Process Xorg (pid: 1300, threadinfo
ffff880072538000, task ffff880072c59040)
Sep 26 15:57:21 localhost kernel: Stack:  0000000000000003  ffff88007273f000 
ffffffff804c4770  ffffffff80262239
Sep 26 15:57:21 localhost kernel:  ffffffff804c4770  ffffffff80261cd5 
0000000000000608  ffff88007273f000
Sep 26 15:57:21 localhost kernel:  000000000007273f  000000007273f000
Sep 26 15:57:21 localhost kernel: Call Trace:
Sep 26 15:57:21 localhost kernel:  [<ffffffff80262239>] _spin_unlock_irq+0x9/0x10
Sep 26 15:57:21 localhost kernel:  [<ffffffff80261cd5>]
__down_write_nested+0x34/0x96
Sep 26 15:57:21 localhost kernel:  [<ffffffff8027a42a>]
change_page_attr_addr+0x7b/0x12c
Sep 26 15:57:21 localhost kernel:  [<ffffffff803878aa>]
agp_generic_destroy_page+0x4e/0x7a
Sep 26 15:57:21 localhost kernel:  [<ffffffff8038778a>] agp_free_memory+0x65/0x90
Sep 26 15:57:21 localhost kernel:  [<ffffffff803869a1>] agp_release+0x9f/0x18a
Sep 26 15:57:21 localhost kernel:  [<ffffffff8021296e>] __fput+0xbf/0x1aa
Sep 26 15:57:21 localhost kernel:  [<ffffffff80223f43>] filp_close+0x5c/0x64
Sep 26 15:57:21 localhost kernel:  [<ffffffff80239c2e>] put_files_struct+0x6c/0xc4
Sep 26 15:57:21 localhost kernel:  [<ffffffff80215591>] do_exit+0x2d0/0x8a3
Sep 26 15:57:21 localhost kernel:  [<ffffffff80248c6f>] debug_mutex_init+0x0/0xd
Sep 26 15:57:21 localhost kernel:  [<ffffffff8022b4f4>]
get_signal_to_deliver+0x42d/0x45d
Sep 26 15:57:21 localhost kernel:  [<ffffffff8025b687>] do_notify_resume+0x9c/0x7c1
Sep 26 15:57:21 localhost kernel:  [<ffffffff802434c3>] sys_rt_sigreturn+0x322/0x355
Sep 26 15:57:21 localhost kernel:  [<ffffffff8025db09>] sysret_signal+0x38/0x43
Sep 26 15:57:21 localhost kernel:  [<ffffffff8025dde9>] ptregscall_common+0x3d/0x64
Sep 26 15:57:21 localhost kernel:
Sep 26 15:57:21 localhost kernel:
Sep 26 15:57:21 localhost kernel: Code: 48 89 02 31 c0 eb 5a 48 89 da 48 b8 ff
ff ff 7f ff ff ff ff
Sep 26 15:57:21 localhost kernel: RIP  [<ffffffff80250c4d>]
__change_page_attr+0xa1e/0xa8e
Sep 26 15:57:21 localhost kernel:  RSP <ffff880072539b98>
Sep 26 15:57:21 localhost kernel: CR2: ffff880001225c98
Sep 26 15:57:21 localhost kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Sep 26 15:57:21 localhost kernel: in_atomic():0, irqs_disabled():1
Sep 26 15:57:21 localhost kernel:
Sep 26 15:57:21 localhost kernel: Call Trace:
Sep 26 15:57:21 localhost kernel:  [<ffffffff8029b612>] down_read+0x15/0x23
Sep 26 15:57:21 localhost kernel:  [<ffffffff80293bbc>]
blocking_notifier_call_chain+0x13/0x36
Sep 26 15:57:21 localhost kernel:  [<ffffffff802152e0>] do_exit+0x1f/0x8a3
Sep 26 15:57:21 localhost kernel:  [<ffffffff80264def>] do_page_fault+0x1130/0x11dc
Sep 26 15:57:21 localhost kernel:  [<ffffffff80262356>]
_spin_unlock_irqrestore+0x9/0x19
Sep 26 15:57:21 localhost kernel:  [<ffffffff80207138>] kmem_cache_free+0x77/0xca
Sep 26 15:57:21 localhost kernel:  [<ffffffff8032f7f7>]
radix_tree_delete+0x150/0x187
Sep 26 15:57:21 localhost kernel:  [<ffffffff802119ad>] do_select+0x445/0x462
Sep 26 15:57:21 localhost kernel:  [<ffffffff80201880>] init_level4_pgt+0x880/0x1000
Sep 26 15:57:21 localhost kernel:  [<ffffffff8025e0fb>] error_exit+0x0/0x6e
Sep 26 15:57:21 localhost kernel:  [<ffffffff80201880>] init_level4_pgt+0x880/0x1000
Sep 26 15:57:21 localhost kernel:  [<ffffffff80250c4d>]
__change_page_attr+0xa1e/0xa8e
Sep 26 15:57:21 localhost kernel:  [<ffffffff802622d3>] _spin_lock_irqsave+0x1a/0x23
Sep 26 15:57:21 localhost kernel:  [<ffffffff80262239>] _spin_unlock_irq+0x9/0x10
Sep 26 15:57:21 localhost kernel:  [<ffffffff80261cd5>]
__down_write_nested+0x34/0x96
Sep 26 15:57:21 localhost kernel:  [<ffffffff8027a42a>]
change_page_attr_addr+0x7b/0x12c
Sep 26 15:57:21 localhost kernel:  [<ffffffff803878aa>]
agp_generic_destroy_page+0x4e/0x7a
Sep 26 15:57:21 localhost kernel:  [<ffffffff8038778a>] agp_free_memory+0x65/0x90
Sep 26 15:57:21 localhost kernel:  [<ffffffff803869a1>] agp_release+0x9f/0x18a
Sep 26 15:57:21 localhost kernel:  [<ffffffff8021296e>] __fput+0xbf/0x1aa
Sep 26 15:57:21 localhost kernel:  [<ffffffff80223f43>] filp_close+0x5c/0x64
Sep 26 15:57:21 localhost kernel:  [<ffffffff80239c2e>] put_files_struct+0x6c/0xc4
Sep 26 15:57:21 localhost kernel:  [<ffffffff80215591>] do_exit+0x2d0/0x8a3
Sep 26 15:57:21 localhost kernel:  [<ffffffff80248c6f>] debug_mutex_init+0x0/0xd
Sep 26 15:57:21 localhost kernel:  [<ffffffff8022b4f4>]
get_signal_to_deliver+0x42d/0x45d
Sep 26 15:57:21 localhost kernel:  [<ffffffff8025b687>] do_notify_resume+0x9c/0x7c1
Sep 26 15:57:21 localhost kernel:  [<ffffffff802434c3>] sys_rt_sigreturn+0x322/0x355
Sep 26 15:57:21 localhost kernel:  [<ffffffff8025db09>] sysret_signal+0x38/0x43
Sep 26 15:57:21 localhost kernel:  [<ffffffff8025dde9>] ptregscall_common+0x3d/0x64
Sep 26 15:57:21 localhost kernel:
Sep 26 15:57:21 localhost kernel: Fixing recursive fault but reboot is needed!

Comment 4 Naoki 2006-09-26 09:21:47 UTC

As Stephen Tweedie suggested in #206099 I removed 'rhgb' from the grub boot
option and now I do not have the "Fixing recursive fault but reboot is needed!"
errors.  My problem with X (video card?) crashing has not been solved however.

Comment 6 Naoki 2006-10-19 08:22:11 UTC

Am seeeing this when running X under 2.6.18-1.2798.fc6xen :

Oct 19 12:37:43 localhost kernel: Unable to handle kernel paging request at
fffffffffffffffe RIP: 
Oct 19 12:37:43 localhost kernel:  [<ffffffff88565a3f>]
:drm:drm_agp_enable+0x46/0x52
Oct 19 12:37:43 localhost kernel: PGD 203067 PUD 2da7067 PMD 0 
Oct 19 12:37:43 localhost kernel: Oops: 0002 [1] SMP 
Oct 19 12:37:43 localhost kernel: last sysfs file: /class/drm/card0/dev

See attachement for all the detail.

Comment 7 Naoki 2006-10-19 08:23:52 UTC

Created attachment 138866 [details]
Output of kernel errors when X starts up.

Comment 8 Naoki 2006-10-19 08:28:12 UTC

Note:  This kernel functions perfectly normally during init 3.  I can create,
install, use domains and no errors or lockups are seen.

Also, X works perfectly well under the same non-xen kernel version.

Only when the two meet does it all go pear shaped.

Comment 9 Daniel Malmgren 2006-10-19 11:26:27 UTC

*** Bug 210277 has been marked as a duplicate of this bug. ***

Comment 10 Steven Rostedt 2006-10-19 18:24:09 UTC

naoki,  the lastest looks like X crashes but the system is still running, right?
If so, can you capture the output of "xm dmesg"?

Comment 11 Naoki 2006-10-20 05:50:21 UTC

Hello Steven,

I tried this :

1. Boot to kernel-xen init 3.
2. Run my two gest domains.
3. Create an SSH session from another box.
4. Run "init 5".
5. As soon as graphics mode begins to initialize switch back to console
(CRTL-ALT-F1).

The system is still up to the point where I can switch virtual terminals.
I even managed to type 'root' on one of them, but nothing ever came back. 
My SSH connection become non-responsive.
Switching into X resulted in full system lockup.


What did appear on the console during this was :

BUG Spinlock recursion on CPU#1 cpp/3348
ide-cd: cmd 0x3 timed out.
hda: lost interrupt.

If I made any errors there I'm sorry but it was a case of jotting it down on a
post-it note.

The messages file is once again filled with entries such as :

Oct 19 12:37:43 localhost kernel: PGD 203067 PUD 2da7067 PMD 0 
Oct 19 12:37:43 localhost kernel: Oops: 0002 [1] SMP 
Oct 19 12:37:43 localhost kernel: last sysfs file: /class/drm/card0/dev

Oct 19 12:37:43 localhost kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Oct 19 12:37:43 localhost kernel: in_atomic():0, irqs_disabled():1

Oct 19 12:37:43 localhost kernel: Bad pte = 759ac145, process = gdm-binary,
vm_flags = 100071, vaddr = 362fc81000
Oct 19 12:37:43 localhost gdm[4770]: gdm_cleanup_children: child 4846 crashed of
signal 11
Oct 19 12:37:43 localhost kernel:  
Oct 19 12:37:43 localhost gdm[4770]: gdm_cleanup_children: Slave crashed,
killing its children

Oct 19 12:37:43 localhost kernel: swap_free: Bad swap file entry 8020202020202020
Oct 19 12:37:43 localhost last message repeated 4 times
Oct 19 12:37:43 localhost kernel: swap_free: Bad swap file entry 802f3c2020202020
Oct 19 12:37:43 localhost kernel: Bad pte = 759ac145, process = gdm-binary,
vm_flags = 100071, vaddr = 362fc81000

Comment 12 Stephen Tweedie 2006-10-27 19:51:10 UTC

For future reference, please do not reopen a closed bug with a completely
different bug --- it makes it impossible to track things properly.  The symptoms
may be similar in that X fails, but the change_page_attr bug looks very
different from the new Unable to handle kernel paging request bug.

For now, I'll change the subject to reflect the new bug in this bugzilla.

Comment 14 Naoki 2006-10-30 09:04:12 UTC

Cheers.  

I've moved from rawhide to FC6.  I wanted to test it again this time changing
from the "intel" X driver to "i810" to see if that was my issue (I'll try
anything at this point). 

However the kernel (2798) doesn't even boot (just sits at "RedHat nash.." and I
can't even CTRL-ALT-DEL out of it), and since it never mounts the FS there isn't
anything in the messages file to report.

Meaning I'm again stuck in the posision of not being able to do any further
testing unless I workout how to kernel debug via console.

Comment 15 Daniel Malmgren 2006-10-30 10:26:06 UTC

(In reply to comment #14)
> I've moved from rawhide to FC6.  I wanted to test it again this time changing
> from the "intel" X driver to "i810" to see if that was my issue (I'll try
> anything at this point). 

Aha! I think this put me on the right track. I'm also on a intel gfx chip
(GMA950 onboard on 945-GM). Since "intel" in xorg.conf doesn't seem to work for
me (no matter what kernel I use) I'm running i810 driver which works perfectly
in non-xen kernel, but hangs entire system in xen kernel. However when I tried
simply replacing "i810" in xorg.conf with "vesa" everything works perfectly fine
(except of course 3d accelerated stuff). Could this simply be some kind of
conflict between xen and intel/i810 drivers?

Comment 16 Daniel Malmgren 2006-10-30 11:03:32 UTC

I forgot to mention that I'm not experiencing any difference between 2798 kernel
and any of the earlier 26xx or 27xx kernels.

Also, I don't know if I'm completely stumbling in the dark here, but maybe this
could be related: https://bugzilla.novell.com/show_bug.cgi?id=177465

Comment 17 Stephen Tweedie 2006-10-30 12:29:56 UTC

naoki,

Please open a separate bug if nash hangs --- that's a completely different
symptom and does not sound directly X-related.  We can certainly show you how to
get serial console output to see if that shows up anything.

Comment 18 Naoki 2006-10-31 07:08:23 UTC

Naturally.  Although throughout the duration of this issue the boot issues have
not been reliably repeatable, the X problem has been.  So I'm focusing on that
for now and will mark a bug against nash if that interferes with testing. 

Sounds like we have a confirmation that the 'intel' driver is partly to blame so
I'll try and reproduce that.

Comment 19 Daniel Malmgren 2006-11-03 07:45:33 UTC

Any more news on this?

I've also posted this bug in xorg bugzilla at
https://bugs.freedesktop.org/show_bug.cgi?id=8872

Comment 20 Daniel Malmgren 2006-11-13 15:03:32 UTC

Just for the record, this bug seems to be the same as 2133346...

Comment 21 Naoki 2006-11-14 05:32:03 UTC

Still blocked on "RedHat nash" hang problem and so cannot test further as the
system doesn't even boot to single user mode, let alone getting to X.

I have opened a new BZ for that particular problem and will need to clear it
first as it blocks all other testing of kernel-xen : 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=215464

Comment 22 Naoki 2006-11-14 05:45:44 UTC

I just tried with 'selinux=0' and I have a working system.  Will monitor for any
kernel oops's and other strangeness but right now.. :  

* I am running 2.6.18-1.2849.fc6xen.
* One FC6 guest OS.
* Installing a rawhide based guest OS via VNC.
* And running X windows.

So far so good.  

Will continue to watch it and then will try to reproduce once I'm confident the
system is stable without selinux enabled.

Comment 23 Naoki 2006-11-14 06:11:03 UTC

I have now noticed system instability.

OpenOffice apps refuse to start :
$ oowriter 
/usr/lib64/openoffice.org2.0/program/soffice: line 147:  4678 Segmentation fault
     "$sd_prog/$sd_binary" "$@"

Nov 14 14:53:11 localhost kernel: sdraw.bin[4395] trap invalid opcode
rip:3fa4d7e170 rsp:7fff890bd808 error:0
Nov 14 14:53:40 localhost kernel: sdraw.bin[4408] trap invalid opcode
rip:3fa4d7e170 rsp:7fff351f97c8 error:0
Nov 14 14:55:16 localhost kernel: swriter.bin[4435]: segfault at
0000000000000002 rip 0000003fa4d7e170 rsp 00007fffa0674c38 error 4



General crashing applications such as gnome-panel, gnome-screenshot, gedit :

System: Linux 2.6.18-1.2849.fc6xen #1 SMP Fri Nov 10 12:57:36 EST 2006 x86_64
X Vendor: The X.Org Foundation
X Vendor Release: 70101000
Selinux: No
Accessibility: Disabled
----------- .xsession-errors ---------------------
No stack.
Xlib: sequence lost (0x111aa > 0x11aa) in reply type 0x0!
Xlib: unexpected async reply (sequence 0x11ae)!
Window manager warning: Received a _NET_WM_MOVERESIZE message for 0x4400003 (Bug
Buddy); these messages lack timestamps and therefore suck.
/usr/lib64/openoffice.org2.0/program/soffice: line 147:  4395 Illegal
instruction     "$sd_prog/$sd_binary" "$@"
Unable to connect to yum-updatesd...
** (bug-buddy:4604): WARNING **: Couldn't load icon for Open Folder
warning: Can't read pathname for load map: Input/output error.
Cannot access memory at address 0xffffffffffffffff
/home/naoki/4593: No such file or directory.
No stack.
** (bug-buddy:4606): WARNING **: Couldn't load icon for Open Folder
--------------------------------------------------

Memory status: size: 370352128 vsize: 370352128 resident: 13078528 share:
1732608 rss: 13078528 rss_rlim: -1
CPU usage: start_time: 1163483973 rtime: 129 utime: 122 stime: 7 cutime:0
cstime: 0 timeout: 0 it_real_value: 0 frequency: 100

Backtrace was generated from '/usr/bin/gnome-panel-screenshot'

(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".

And :

System: Linux 2.6.18-1.2849.fc6xen #1 SMP Fri Nov 10 12:57:36 EST 2006 x86_64
X Vendor: The X.Org Foundation
X Vendor Release: 70101000
Selinux: No
Accessibility: Disabled
----------- .xsession-errors ---------------------
(evolution:3227): gtkhtml-WARNING **: No such file or directory
20061102T110000 is a recurrence instance
20061005T110000 is a recurrence instance
20061005T110000 is a recurrence instance
20061102T110000 is a recurrence instance
(firefox-bin:3367): Gtk-WARNING **: Unable to locate theme engine in
module_path: "clearlooks",
Xlib: unexpected async reply (sequence 0x16)!
Unable to connect to yum-updatesd...
Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message with a
timestamp of 0 for 0x4000003 (Open conne)
Window manager warning: meta_window_activate called by a pager with a 0
timestamp; the pager needs to be fixed.
libGL warning: 3D driver claims to not support visual 0x4b
/usr/lib64/openoffice.org2.0/program/sdraw.bin: symbol lookup error:
/usr/lib64/openoffice.org2.0/program/libvclplug_gtk680lx.so: undefined symbol:
_ZN14X11SalGraphics4InitEP8SalFramem
** (bug-buddy:4344): WARNING **: Couldn't load icon for Open Folder
--------------------------------------------------

Memory status: size: 395489280 vsize: 395489280 resident: 13058048 share:
1724416 rss: 13058048 rss_rlim: -1
CPU usage: start_time: 1163483445 rtime: 2 utime: 2 stime: 0 cutime:0 cstime: 0
timeout: 0 it_real_value: 0 frequency: 100

Backtrace was generated from '/usr/bin/gnome-panel-screenshot'

(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".

I have also found some graphic corruption which is why I attempted to get a
screen shot.

Comment 24 Naoki 2006-11-14 06:16:19 UTC

More of the same..

Nov 14 15:10:42 localhost kernel: python[4851] general protection rip:3b5f4ba2a9
rsp:7fff9445bb90 error:0
Nov 14 15:10:49 localhost kernel: python[4854] general protection rip:3ed0e6ed9c
rsp:7fff95557ce0 error:0
Nov 14 15:10:51 localhost kernel: pygrub[4856] general protection rip:3b5f4ba2a9
rsp:7fff2fe36520 error:0

So far it seems to me :
SElinux is causing a problem for kernel-xen / nash booting.
Possibly not an "X" thing if console apps are also failing intermittently -

# xm
Traceback (most recent call last):
  File "/usr/sbin/xm", line 8, in ?
    from xen.xm import main
  File "/usr/lib64/python2.4/site-packages/xen/xm/main.py", line 24, in ?
    import os
ImportError: __import__ not found
Segmentation fault

# xm 
Usage: xm <subcommand> [args]

Control, list, and manipulate Xen guest instances.

Comment 25 Naoki 2006-11-16 01:44:40 UTC

Testing the thoery of the 'intel' X driver being a problem by using the 'i810'
driver instead, and if anything, it's worse, see attached for details.
Highlights are :

$ grep BUG boot-xen.log 
Nov 16 10:23:28 localhost kernel: Kernel BUG at lib/list_debug.c:70
Nov 16 10:23:28 localhost kernel:  <3>BUG: sleeping function called from invalid
context at kernel/rwsem.c:20

And gconfd-2 exploding along with other apps :
Nov 16 10:22:57 localhost kernel: Pid: 3511, comm: gconfd-2 Not tainted
2.6.18-1.2849.fc6xen #1
Nov 16 10:22:57 localhost kernel: RIP: e030:[<ffffffff80209900>] 
[<ffffffff80209900>] __d_lookup+0xe6/0x110
Nov 16 10:22:57 localhost kernel: RSP: e02b:ffff880051aa5c08  EFLAGS: 00010286
Nov 16 10:22:57 localhost kernel: RAX: ff8a94c6ff8a94c6 RBX: ffff880060160078
RCX: 0000000000000012
Nov 16 10:22:57 localhost kernel: RDX: 0000000000024e5b RSI: ffff880051aa5cb8
RDI: ffff880066df8988
Nov 16 10:22:57 localhost kernel: RBP: ff8a94c6ff8a94c6 R08: 0000000000000000
R09: 0000000000000000
Nov 16 10:22:57 localhost kernel: R10: ffff88005bb95eb0 R11: ffffffff8022c6ea
R12: ffff88005e8458c8
Nov 16 10:22:57 localhost kernel: R13: ffff880066df8988 R14: ffff880051aa5cb8
R15: 00000000f027435f
Nov 16 10:22:57 localhost kernel: FS:  00002aaaaaac1cf0(0000)
GS:ffffffff80593080(0000) knlGS:0000000000000000
Nov 16 10:22:57 localhost kernel: CS:  e033 DS: 0000 ES: 0000
Nov 16 10:22:57 localhost kernel: Process gconfd-2 (pid: 3511, threadinfo
ffff880051aa4000, task ffff8800561a7080)
Nov 16 10:22:57 localhost kernel: Stack:  00000000005ca806  0000000900000000 
ffff880051891018  00000000000041c0 
Nov 16 10:22:57 localhost kernel:  0000000000000000  ffff88005e8458c8 
ffff880074df40c0  ffff880051aa5e48 
Nov 16 10:22:57 localhost kernel:  ffff880051aa5cb8  ffffffff8020cf1f 
Nov 16 10:22:57 localhost kernel: Call Trace:
Nov 16 10:22:57 localhost kernel:  [<ffffffff8020cf1f>] do_lookup+0x2c/0x1c3
Nov 16 10:22:57 localhost kernel:  [<ffffffff80209cdc>] __link_path_walk+0x3b2/0xf62
Nov 16 10:22:57 localhost kernel:  [<ffffffff8020e906>] link_path_walk+0x5c/0xe5
Nov 16 10:22:57 localhost kernel:  [<ffffffff8020cd73>] do_path_lookup+0x274/0x2f0
Nov 16 10:22:57 localhost kernel:  [<ffffffff80212701>] getname+0x15b/0x1c1
Nov 16 10:22:57 localhost kernel:  [<ffffffff80223b22>] __user_walk_fd+0x37/0x4c
Nov 16 10:22:57 localhost kernel:  [<ffffffff80228a9a>] vfs_stat_fd+0x1b/0x4a
Nov 16 10:22:57 localhost kernel:  [<ffffffff80223889>] sys_newstat+0x19/0x31
Nov 16 10:22:57 localhost kernel:  [<ffffffff8025d4a6>] system_call+0x86/0x8b
Nov 16 10:22:57 localhost kernel:  [<ffffffff8025d420>] system_call+0x0/0x8b

Comment 26 Naoki 2006-11-16 01:46:08 UTC

Created attachment 141330 [details]
messages after boot : kernel-xen-2.6.18-1.2849.fc6 selinux=0, i810 X driver

Comment 27 Daniel Malmgren 2006-11-16 10:37:10 UTC

As mentioned above, I'm was running i810 drivers. Now I've also got the intel
(modesettings branch version of i810) driver to work in non xen, but I can
confirm that neither i810 nor intel works in xen (using latest rawhide of
everything).

Comment 28 Naoki 2006-11-30 10:05:52 UTC

I was pleased to hear Dan B's announcement of an Xorg fix that would
stop the high frequency random crashing under certain intel chipsets but
here are my results on my Dell GX620.

Booting up is fine, all the way to GDM.  At this point, without logging
in, I can SSH to the machine and check no application failures, and "xm
list" shows my dom0 as expected. However immediately upon successful
login the system reboots.  I tried three times and have now fallen back
to the non-xen kernel. I was trying with "selinux=0" if that's of use.

There is nothing in the messages file to indicate what the problem might
be, the messages file simply contains the next kernel boot message.

Comment 29 Bill Nottingham 2006-11-30 16:54:18 UTC

*** Bug 206032 has been marked as a duplicate of this bug. ***

Comment 30 Davide Bolcioni 2007-02-22 17:07:16 UTC

Bugs #229480 and #229536 might be actually the same as this bug.

Comment 31 Red Hat Bugzilla 2007-07-25 01:33:47 UTC

change QA contact

Comment 32 Chris Lalancette 2008-02-26 23:24:39 UTC

This report targets FC6, which is now end-of-life.

Please re-test against Fedora 7 or later, and if the issue persists, open a new bug.

Thanks