526627 – F12 Xen DomU unstable (2.6.31)

Bug 526627 - F12 Xen DomU unstable (2.6.31)

Summary: F12 Xen DomU unstable (2.6.31)

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	12
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F11VirtTarget
TreeView+	depends on / blocked

Reported:	2009-10-01 05:26 UTC by Kevin Bowling
Modified:	2010-12-04 07:35 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2010-12-04 07:35:51 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Check to make sure clocksource returns minimally monotone results (946 bytes, patch) 2009-10-14 16:40 UTC, Jeremy Fitzhardinge	no flags	Details \| Diff
View All

Description Kevin Bowling 2009-10-01 05:26:11 UTC

Description of problem:
Xen is unstable with 2.6.30 rebase.  I've attached relevant dmesg and kernel timeouts.

Version-Release number of selected component (if applicable):
kernel 2.6.30
  
Actual results:
System hang.

Expected results:
Stable system.

Additional info:

[kev009@gnucapplus ~]$ dmesg
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.30.8-64.fc11.x86_64 (mockbuild.phx.redhat.com) (gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) ) #1 SMP Fri Sep 25 04:43:32 EDT 2009
Command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
ACPI in unprivileged domain disabled
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 00000000000a0000 (usable)
 Xen: 00000000000a0000 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000002166000 (usable)
 Xen: 0000000002166000 - 0000000002369000 (reserved)
 Xen: 0000000002369000 - 0000000040000000 (usable)
DMI not present or invalid.
last_pfn = 0x40000 max_arch_pfn = 0x100000000
init_memory_mapping: 0000000000000000-0000000040000000
 0000000000 - 0040000000 page 4k
kernel direct mapping tables up to 40000000 @ 100000-302000
RAMDISK: 019f2000 - 02166000
No NUMA configuration found
Faking a node at 0000000000000000-0000000040000000
Bootmem setup node 0 0000000000000000-0000000040000000
  NODE_DATA [0000000000008000 - 000000000001cfff]
  bootmap [000000000001d000 -  0000000000024fff] pages 8
(6 early reservations) ==> bootmem [0000000000 - 0040000000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0002369000 - 000237e000]   XEN PAGETABLES ==> [0002369000 - 000237e000]
  #2 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #3 [0001000000 - 00019d11a0]    TEXT DATA BSS ==> [0001000000 - 00019d11a0]
  #4 [00019f2000 - 0002166000]          RAMDISK ==> [00019f2000 - 0002166000]
  #5 [0000100000 - 00002ea000]          PGTABLE ==> [0000100000 - 00002ea000]
Zone PFN ranges:
  DMA      0x00000000 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0: 0x00000000 -> 0x000000a0
    0: 0x00000100 -> 0x00002166
    0: 0x00002369 -> 0x00040000
On node 0 totalpages: 261533
  DMA zone: 56 pages used for memmap
  DMA zone: 493 pages reserved
  DMA zone: 3451 pages, LIFO batch:0
  DMA32 zone: 3528 pages used for memmap
  DMA32 zone: 254005 pages, LIFO batch:31
SMP: Allowing 2 CPUs, 0 hotplug CPUs
No local APIC present
nr_irqs_gsi: 16
PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
PM: Registered nosave memory: 0000000002166000 - 0000000002369000
Allocating PCI resources starting at 50000000 (gap: 40000000:c0000000)
NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:2 nr_node_ids:1
PERCPU: Allocated 20 4k pages, static data 79584 bytes
trying to map vcpu_info 0 at ffffc2000000b020, mfn 1ed7cf, offset 32
cpu 0 using vcpu_info at ffffc2000000b020
trying to map vcpu_info 1 at ffffc20000021020, mfn 1ed7bb, offset 32
cpu 1 using vcpu_info at ffffc20000021020
Xen: using vcpu_info placement
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 257456
Policy zone: DMA32
Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
Initializing CPU#0
------------[ cut here ]------------
WARNING: at arch/x86/xen/enlighten.c:482 cvt_gate_to_trap+0x86/0xbe() (Not tainted)
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.30.8-64.fc11.x86_64 #1
Call Trace:
 [<ffffffff810571a8>] warn_slowpath_common+0x95/0xc3
 [<ffffffff810571fd>] warn_slowpath_null+0x27/0x3d
 [<ffffffff8100abc0>] cvt_gate_to_trap+0x86/0xbe
 [<ffffffff8100ac50>] xen_convert_trap_info+0x58/0x8d
 [<ffffffff8100b056>] xen_load_idt+0x4c/0x70
 [<ffffffff8148f238>] cpu_init+0x11f/0x2ff
 [<ffffffff8100ad93>] ? xen_write_idt_entry+0x46/0xa9
 [<ffffffff8175c9d1>] trap_init+0x257/0x274
 [<ffffffff8122433b>] ? sort_extable+0x32/0x4b
 [<ffffffff81757e62>] start_kernel+0x240/0x433
 [<ffffffff817572d0>] x86_64_start_reservations+0xbb/0xd6
 [<ffffffff8175b4e0>] xen_start_kernel+0x4b8/0x4bc
---[ end trace a7919e7f17c0a725 ]---
NR_IRQS:4352 nr_irqs:288
PID hash table entries: 4096 (order: 12, 32768 bytes)
Detected 1991.513 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
console [hvc0] enabled
allocated 10485760 bytes of page_cgroup
please try cgroup_disable=memory option if you don't want
Checking aperture...
No AGP bridge found
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
Memory: 1001400k/1048576k available (4736k kernel code, 2444k absent, 44732k reserved, 2498k data, 1348k init)
SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Xen: using vcpuop timer interface
installing Xen timer for CPU 0
Calibrating delay loop (skipped), value calculated using timer frequency.. 3983.02 BogoMIPS (lpj=1991513)
Security Framework initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0/0x0 -> Node 0
tseg: 0000000000
SMP alternatives: switching to UP code
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 19374 entries in 76 pages
installing Xen timer for CPU 1
------------[ cut here ]------------
WARNING: at arch/x86/xen/enlighten.c:482 cvt_gate_to_trap+0x86/0xbe() (Tainted: G        W )
Modules linked in:
Pid: 1, comm: swapper Tainted: G        W  2.6.30.8-64.fc11.x86_64 #1
Call Trace:
 [<ffffffff810571a8>] warn_slowpath_common+0x95/0xc3
 [<ffffffff810571fd>] warn_slowpath_null+0x27/0x3d
 [<ffffffff8100abc0>] cvt_gate_to_trap+0x86/0xbe
 [<ffffffff8100ac50>] xen_convert_trap_info+0x58/0x8d
 [<ffffffff8100aca9>] xen_copy_trap_info+0x24/0x26
 [<ffffffff8148d271>] xen_cpu_up+0x193/0x35f
 [<ffffffff814939ef>] _cpu_up+0xbc/0x157
 [<ffffffff81493af9>] cpu_up+0x6f/0x90
 [<ffffffff81757827>] kernel_init+0xc8/0x24b
 [<ffffffff8101313a>] child_rip+0xa/0x20
 [<ffffffff81012afd>] ? restore_args+0x0/0x30
 [<ffffffff81013130>] ? child_rip+0x0/0x20
---[ end trace a7919e7f17c0a726 ]---
SMP alternatives: switching to SMP code
Initializing CPU#1
------------[ cut here ]------------
WARNING: at arch/x86/xen/enlighten.c:482 cvt_gate_to_trap+0x86/0xbe() (Tainted: G        W )
Modules linked in:
Pid: 0, comm: swapper Tainted: G        W  2.6.30.8-64.fc11.x86_64 #1
Call Trace:
 [<ffffffff810571a8>] warn_slowpath_common+0x95/0xc3
 [<ffffffff810571fd>] warn_slowpath_null+0x27/0x3d
 [<ffffffff8100abc0>] cvt_gate_to_trap+0x86/0xbe
 [<ffffffff8100ac50>] xen_convert_trap_info+0x58/0x8d
 [<ffffffff8100b056>] xen_load_idt+0x4c/0x70
 [<ffffffff8148f236>] cpu_init+0x11d/0x2ff
 [<ffffffff8148cfc5>] cpu_bringup+0x22/0xac
 [<ffffffff8148d0c3>] cpu_bringup_and_idle+0x21/0x3c
---[ end trace a7919e7f17c0a727 ]---
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1/0x0 -> Node 0
Brought up 2 CPUs
sizeof(vma)=176 bytes
sizeof(page)=56 bytes
sizeof(inode)=576 bytes
sizeof(dentry)=192 bytes
sizeof(ext3inode)=784 bytes
sizeof(buffer_head)=104 bytes
sizeof(skbuff)=232 bytes
sizeof(task_struct)=5960 bytes
CPU0 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 1 0
net_namespace: 1952 bytes
Booting paravirtualized kernel on Xen
Xen version: 3.4.0 (preserve-AD)
Grant table initialized
regulator: core version 0.5
Time: 165:165:165  Date: 165/165/65
NET: Registered protocol family 16
PCI: Fatal: No config space access function found
bio: create slab <bio-0> at 0
ACPI: Interpreter disabled.
xen_balloon: Initialising balloon driver.
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: System does not support PCI
PCI: System does not support PCI
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
pnp: PnP ACPI: disabled
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 1
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 7632k freed
platform rtc_cmos: registered platform RTC device (no PNP device found)
audit: initializing netlink socket (disabled)
type=2000 audit(1254332332.712:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
msgmni has been set to 2062
SELinux:  Registering netfilter hooks
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
brd: module loaded
loop: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Driver 'sd' needs updating - please use bus_type methods
Driver 'sr' needs updating - please use bus_type methods
Fixed MDIO Bus: probed
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
uhci_hcd: USB Universal Host Controller Interface driver
PNP: No PS/2 controller found. Probing ports directly.
mice: PS/2 mouse device common for all mice
rtc_cmos: probe of rtc_cmos failed with error -16
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel
cpuidle: using governor ladder
cpuidle: using governor menu
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
nf_conntrack version 0.5.0 (8192 buckets, 32768 max)
CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
nf_conntrack.acct=1 kernel paramater, acct=1 nf_conntrack module option or
sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
ip_tables: (C) 2000-2006 Netfilter Core Team
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
PM: Resume from disk failed.
registered taskstats version 1
XENBUS: Device with no driver: device/console/0
XENBUS: Device with no driver: device/vbd/51712
XENBUS: Device with no driver: device/vif/0
  Magic number: 1:252:3141
Initalizing network drop monitor service
Freeing unused kernel memory: 1348k freed
Write protecting the kernel read-only data: 6728k
 xvda: xvda1 xvda2
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: dm-0: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 16579
ext3_orphan_cleanup: deleting unreferenced inode 16530
ext3_orphan_cleanup: deleting unreferenced inode 16476
ext3_orphan_cleanup: deleting unreferenced inode 16469
ext3_orphan_cleanup: deleting unreferenced inode 16455
ext3_orphan_cleanup: deleting unreferenced inode 16423
ext3_orphan_cleanup: deleting unreferenced inode 16416
EXT3-fs: dm-0: 7 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
type=1404 audit(1254332342.191:2): selinux=0 auid=4294967295 ses=4294967295
udev: starting version 141
Initialising Xen virtual ethernet driver.
input: PC Speaker as /devices/platform/pcspkr/input/input1
device-mapper: multipath: version 1.0.5 loaded
EXT3 FS on dm-0, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on xvda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1671160k swap on /dev/mapper/VolGroup00-LogVol01.  Priority:-1 extents:1 across:1671160k
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
eth0: no IPv6 routers present


----------


[root@buildbox-a2 ~]# xm console 8
INFO: task kjournald:61 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88003e932000     0    61      2
 ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
 000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
 ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
Call Trace:
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
 [<ffffffff811bcdf7>] kjournald+0xfd/0x253
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff811bccfa>] ? kjournald+0x0/0x253
 [<ffffffff81070709>] kthread+0x6d/0xae
 [<ffffffff8101313a>] child_rip+0xa/0x20
 [<ffffffff81012afd>] ? restore_args+0x0/0x30
 [<ffffffff81013130>] ? child_rip+0x0/0x20
INFO: task ns-slapd:1034 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd      D ffffc20000000000     0  1034      1
 ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
 ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
 ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
Call Trace:
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff81496c62>] io_schedule+0x44/0x6c
 [<ffffffff8113b141>] sync_buffer+0x53/0x6b
 [<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
 [<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
 [<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
 [<ffffffff8113b380>] __lock_buffer+0x3d/0x53
 [<ffffffff811b6eda>] lock_buffer+0x49/0x64
 [<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
 [<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
 [<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
 [<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
 [<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
 [<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
 [<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
 [<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
 [<ffffffff81129603>] file_update_time+0xc0/0x113
 [<ffffffff810eb167>] do_wp_page+0x610/0x658
 [<ffffffff81INFO: task kjournald:61 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88003e932000     0    61      2
 ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
 000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
 ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
Call Trace:
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
 [<ffffffff811bcdf7>] kjournald+0xfd/0x253
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff811bccfa>] ? kjournald+0x0/0x253
 [<ffffffff81070709>] kthread+0x6d/0xae
 [<ffffffff8101313a>] child_rip+0xa/0x20
 [<ffffffff81012afd>] ? restore_args+0x0/0x30
 [<ffffffff81013130>] ? child_rip+0x0/0x20
INFO: task ns-slapd:1034 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd      D ffffc20000000000     0  1034      1
 ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
 ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
 ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
Call Trace:
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff81496c62>] io_schedule+0x44/0x6c
 [<ffffffff8113b141>] sync_buffer+0x53/0x6b
 [<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
 [<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
 [<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
 [<ffffffff8113b380>] __lock_buffer+0x3d/0x53
 [<ffffffff811b6eda>] lock_buffer+0x49/0x64
 [<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
 [<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
 [<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
 [<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
 [<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
 [<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
 [<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
 [<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
 [<ffffffff81129603>] file_update_time+0xc0/0x113
 [<ffffffff810eb167>] do_wp_page+0x610/0x658
 [<ffffffff8100bc21>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff810eccd9>] handle_mm_fault+0x6a2/0x72e
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff8149be99>] do_page_fault+0x226/0x24f
 [<ffffffff81499965>] page_fault+0x25/0x30
INFO: task ns-slapd:1040 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd      D ffff88003e932024     0  1040      1
 ffff88003bc119f8 0000000000000282 ffffffff8100e45c ffffc20000025410
 00000000f1efb74c ffff88003bc119d8 ffffffff8100ee82 0000000000000004
 ffff88003bc0b248 000000000000e2e8 ffff88003bc0b248 0000000000012d00
Call Trace:
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff811b841d>] start_this_handle+0x2d4/0x373
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff811b865d>] journal_start+0xb7/0x106
 [<ffffffff81187903>] ext3_journal_start_sb+0x62/0x78
 [<ffffffff8117d60b>] ext3_journal_start+0x28/0x3e
 [<ffffffff8117e67d>] ext3_dirty_inode+0x3e

Comment 1 Andrew Jones 2009-10-01 08:24:29 UTC

Moving this to the kernel component, but leaving xen-maint assigned.

Comment 2 Kevin Bowling 2009-10-04 04:13:31 UTC

I've had a stable system thus far by appending "clocksource=jiffies" to the kernel boot line.  The default clocksource is otherwise "xen".

The dmesg boot warnings in my initial report still occur (enlighten.c).

Comment 3 Jeremy Fitzhardinge 2009-10-09 21:05:56 UTC

That's interesting.  I guess there are two possibilities here:
 1. there's something wrong with the Xen/KVM pvclock clocksource that's upsetting the IO scheduling code somehow (maybe non-monotony or something), or
 2. there's a bug in IO scheduling when it has very high-res timing info

It could well be 1.  The tsc clocksource has some code to specifically guarantee monotonicity; maybe we should do the same...

Comment 4 Jeremy Fitzhardinge 2009-10-14 16:40:47 UTC

Created attachment 364778 [details]
Check to make sure clocksource returns minimally monotone results

Please try this and see if it makes any difference.  I'm not sure that it will, but it looks like a difference between the normal tsc-based clocksources and the Xen one (there's an equivalent kvm patch too).

Comment 5 Kevin Bowling 2009-10-22 07:57:16 UTC

I haven't tested the patch yet since I don't have facilities to build a Fedora kernel installed at the moment, but I do have another piece of information.

When running with xen clocksource, the system is stable if I get a message such as this one once: hrtimer: interrupt too slow, forcing clock min delta to 19541868 ns

If the hrtimer message does not appear, the system will crash usually fairly quickly.

Comment 6 Mitchell Berger 2009-10-24 17:03:17 UTC

I've built a 2.6.30.8-64.fc11.x86_64 kernel containing this patch,
and am running it on a server that experiences this issue.  Testing
results will be forthcoming as soon as they exist.

Also, this issue appears to only manifest on guests that are running
with multiple vcpus on AMD hardware - one vcpu on AMD is fine, and
multiple vcpus on Intel is fine.  Kevin, is the host you're seeing
this problem on AMD-based as well?

Comment 7 Kevin Bowling 2009-10-24 23:45:40 UTC

Yes, the box is a dual opterton 246 and the guest has two VCPUs.  I have an identically configured box that does not exhibit these problems.

So far, the patched monotone kernel has been running well for 17 hours.  I will update if anything changes.

Before applying the monotone patch, I upgraded to RHEL 5.4 and Xen 3.4.1 on the dom0 but that had no positive impact for this bug.

Comment 8 Mitchell Berger 2009-10-25 22:44:36 UTC

Unfortunately, it seems that the server I was testing this on
experienced the same old lockup 4.25 hours after it was started
under the patched kernel.

Since then, it's been running for a bit over 26 hours without
incident (we've essentially eliminated all the user load that
is usually on the server, to see if it still fails under no load).

So, while the patch looks like a strictly good thing, I don't
think we've actually pegged and fixed the cause of this lockup.

Note that we experienced similar lockups under 2.6.29 on multi-vcpu
AMD guests, but instead of the backtraces mostly showing things
line xen_clocksource_get_cycles, clocksource_read, and check_events,
the prominent entry in the backtraces in 2.6.29 was getnstimeofday.
Other than the kernel upgrade, the configuration of the machines
I see this issue on has not changed, so I strongly suspect that the
same trigger for this lockup exists in both cases, and the kernel
version bump has affected the way in which it manifests and how
the backtraces look.  Perhaps these hints about what it looked
like in two successive kernel versions will give Jeremy a better
idea of what could be behind it.

Comment 9 Jeremy Fitzhardinge 2009-10-26 20:36:05 UTC

What clocksource does Linux use on this system if you boot it natively (ie, without Xen?).

Your symptoms are pointing to something like mis-synchronized tscs between sockets in some way which either Xen isn't detecting and compensating for, or not passing the info to the guests, or perhaps the guest is not using/misusing that info.

However, I mentioned this problem to Jes Axboe last week and he said that the IO scheduler should not be sensitive to these kinds of timing upsets.  Certainly not going into indefinite sulks.

What IO scheduler are you using (CFQ by the looks of it)?  What happens if you switch to no-op? (Boot with elevator=noop on the kernel command line.)

(noop should be preferred for guests anyway, since they have no real insight into the underlying IO devices; dom0 is in a better position to do a good job.)

Comment 10 Kevin Bowling 2009-10-27 03:07:01 UTC

Based on my results, I'd say it is an interaction with the Dom0 timing or hardware.  One of my identical servers is perfectly stable with the unpatched domU kernels.  Both boxes selected 'jiffies' as dom0 current_clocksource.  This is the RHEL 5.4 xen dom0 kernel.

FWIW the monotone patch is working well on my F11 domU.. 60 hours uptime, but it has very light disk workload..

Comment 11 Mitchell Berger 2009-10-27 03:17:23 UTC

The host in question here, when booted into the native Ubuntu 8.04
2.6.24-25 kernel, has a clocksource0 of acpi_pm.  Indeed, the
guest had been using the CFQ scheduler.  We switched it to the noop
scheduler with no change in behavior - the guest hung with the
same usual blocked task output after just over an hour under its
usual load.

We've now attempted adding clocksource=tsc to the host's kernel
line, and are running the guest with the noop scheduler (and the
F11 2.6.30.8-64 kernel with your monotonicity patch) to see if a
change in host clocksource has any effect.

Comment 12 Mitchell Berger 2009-10-27 03:20:14 UTC

One other difference from the configuration that Kevin sees is
that our dom0 reports that its clocksource0 is xen rather than
jiffies.

Comment 13 Edward Z. Yang 2009-10-27 06:53:58 UTC

The results of our further tests indicate that clocksource=tsc and elevator=noop did not change behavior; the guest hung about five hours after we had rebooted it with the characteristic backtraces.  We have rebooted the kernel with the default IO scheduler (keeping the host with clocksource=tsc) but are not optimistic about the outcome.

Comment 14 Edward Z. Yang 2009-10-28 18:22:22 UTC

Default IO scheduler and host with clocksource=tsc hung after 34 hours.

Comment 15 Kevin Bowling 2009-10-30 00:38:17 UTC

Crash and burn, but it took several days with monotone patch.

[root@buildbox-a2 ~]# xm console gnucapplus                  
awstats.pl[9839]: segfault at 7fff470cdff8 ip 00007fff47945005 sp 00007fff470ce000 error 6 in libperl.so[7fff47895000+196000]                                                     
INFO: task pdflush:29 blocked for more than 120 seconds.                                 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                
pdflush       D ffffc20000000000     0    29      2                                      
 ffff88003eeb5a20 0000000000000246 ffff88003eeb5980 ffffffff8100ed0d                     
 ffff88003f824be8 00000000dcba5f53 ffff88003eeb59a0 ffffffff8107957a                     
 ffff88003ff160e8 000000000000e2e8 ffff88003ff160e8 0000000000012d00                     
Call Trace:                                                                              
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x40                             
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38                                       
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d                                           
 [<ffffffff810d300b>] ? sync_page+0x0/0x71                                               
 [<ffffffff81496c56>] schedule+0x21/0x49                                                 
 [<ffffffff81496cc2>] io_schedule+0x44/0x6c                                              
 [<ffffffff810d3064>] sync_page+0x59/0x71                                                
 [<ffffffff814972f4>] __wait_on_bit_lock+0x55/0xb2                                       
 [<ffffffff810d2f53>] __lock_page+0x73/0x8e                                              
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a                                       
 [<ffffffff810db339>] write_cache_pages+0x21b/0x40c                                      
 [<ffffffff810dab42>] ? __writepage+0x0/0x56                                             
 [<ffffffff81498d04>] ? trace_hardirqs_on_thunk+0x3a/0x3c                                
 [<ffffffff8100bc21>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e                          
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36                              
 [<ffffffff810db55f>] generic_writepages+0x35/0x4b                                       
 [<ffffffff810db5b6>] do_writepages+0x41/0x60                                            
 [<ffffffff811346b8>] __writeback_single_inode+0x168/0x2a8                               
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36                              
 [<ffffffff81134c2a>] generic_sync_sb_inodes+0x21a/0x361                                 
 [<ffffffff81134fe7>] writeback_inodes+0xb0/0x119                                        
 [<ffffffff810db7e8>] wb_kupdate+0xc0/0x14a                                              
 [<ffffffff810dc8ae>] pdflush+0x16d/0x272                                                
 [<ffffffff810db728>] ? wb_kupdate+0x0/0x14a                                             
 [<ffffffff810dc741>] ? pdflush+0x0/0x272                                                
 [<ffffffff81070709>] kthread+0x6d/0xae                                                  
 [<ffffffff8101313a>] child_rip+0xa/0x20                                                 
 [<ffffffff81012afd>] ? restore_args+0x0/0x30                                            
 [<ffffffff81013130>] ? child_rip+0x0/0x20                                               
INFO: task kjournald:61 blocked for more than 120 seconds.                               
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                
kjournald     D ffff88003e81a3c0     0    61      2                                      
 ffff88003e949bc0 0000000000000246 ffff88003e949b20 ffffffff8100ed0d                     
 0000000064706931 0000000064706931 ffff88003e949b40 ffffffff8107957a                     
 ffff88003e9c03a8 000000000000e2e8 ffff88003e9c03a8 0000000000012d00                     
Call Trace:                                                                              
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x40                             
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38                                       
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36                              
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d                                           
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b                                             
 [<ffffffff81496c56>] schedule+0x21/0x49                                                 
 [<ffffffff81496cc2>] io_schedule+0x44/0x6c                                              
 [<ffffffff8113b141>] sync_buffer+0x53/0x6b                                              
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b                                             
 [<ffffffff81497444>] __wait_on_bit+0x57/0x9e                                            
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36                              
 [<ffffffff81497508>] out_of_line_wait_on_bit+0x7d/0x9c                                  
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b                                             
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a                                       
 [<ffffffff8113aff2>] __wait_on_buffer+0x37/0x4d                                         
 [<ffffffff811b88b2>] wait_on_buffer+0x4a/0x65                                           
 [<ffffffff811b8e8d>] journal_commit_transaction+0x497/0xe42                             
 [<ffffffff8100ee7f>] ? xen_restore_fl_direct_end+0x0/0x1                                
 [<ffffffff8149943e>] ? _spin_unlock_irqrestore+0x4e/0x64                                
 [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87                                  
 [<ffffffff811bcdf7>] kjournald+0xfd/0x253                                               
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f                                
 [<ffffffff811bccfa>] ? kjournald+0x0/0x253                                              
 [<ffffffff81070709>] kthread+0x6d/0xae                                                  
 [<ffffffff8101313a>] child_rip+0xa/0x20                                                 
 [<ffffffff81012afd>] ? restore_args+0x0/0x30                                            
 [<ffffffff81013130>] ? child_rip+0x0/0x20                                               
INFO: task ns-slapd:1034 blocked for more than 120 seconds.                              
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                
ns-slapd      D ffffc20000016000     0  1034      1                                      
 ffff88003d549c68 0000000000000286 ffff88003d549bc8 ffffffff8100ed0d                     
 ffffffff8113cdf2 00000000585e4218 ffff88003d549be8 ffffffff8107957a                     
 ffff88003e069af8 000000000000e2e8 ffff88003e069af8 0000000000012d00                     
Call Trace:                                                                              
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x40                             
 [<ffffffff8113cdf2>] ? end_buffer_async_write+0x0/0x16c                                 
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38                                       
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d                                           
 [<ffffffff810d300b>] ? sync_page+0x0/0x71                                               
 [<ffffffff81496c56>] schedule+0x21/0x49                                                 
 [<ffffffff81496cc2>] io_schedule+0x44/0xINFO: task pdflush:29 blocked for more than 120 seconds.                                                                                 
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                
pdflush       D ffffc20000000000     0    29      2                                      
 ffff88003eeb5a20 0000000000000246 ffff88003eeb5980 ffffffff8100ed0d                     
 ffff88003f824be8 00000000dcba5f53 ffff88003eeb59a0 ffffffff8107957a                     
 ffff88003ff160e8 000000000000e2e8 ffff88003ff160e8 0000000000012d00                     
Call Trace:                                                                              
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x40                             
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38                                       
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d                                           
 [<ffffffff810d300b>] ? sync_page+0x0/0x71                                               
 [<ffffffff81496c56>] schedule+0x21/0x49                                                 
 [<ffffffff81496cc2>] io_schedule+0x44/0x6c                                              
 [<ffffffff810d3064>] sync_page+0x59/0x71                                                
 [<ffffffff814972f4>] __wait_on_bit_lock+0x55/0xb2                                       
 [<ffffffff810d2f53>] __lock_page+0x73/0x8e                                              
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a                                       
 [<ffffffff810db339>] write_cache_pages+0x21b/0x40c                                      
 [<ffffffff810dab42>] ? __writepage+0x0/0x56                                             
 [<ffffffff81498d04>] ? trace_hardirqs_on_thunk+0x3a/0x3c                                
 [<ffffffff8100bc21>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e                          
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36                              
 [<ffffffff810db55f>] generic_writepages+0x35/0x4b                                       
 [<ffffffff810db5b6>] do_writepages+0x41/0x60                                            
 [<ffffffff811346b8>] __writeback_single_inode+0x168/0x2a8                               
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36                              
 [<ffffffff81134c2a>] generic_sync_sb_inodes+0x21a/0x361                                 
 [<ffffffff81134fe7>] writeback_inodes+0xb0/0x119                                        
 [<ffffffff810db7e8>] wb_kupdate+0xc0/0x14a                                              
 [<ffffffff810dc8ae>] pdflush+0x16d/0x272                                                
 [<ffffffff810db728>] ? wb_kupdate+0x0/0x14a                                             
 [<ffffffff810dc741>] ? pdflush+0x0/0x272                                                
 [<ffffffff81070709>] kthread+0x6d/0xae                                                  
 [<ffffffff8101313a>] child_rip+0xa/0x20                                                 
 [<ffffffff81012afd>] ? restore_args+0x0/0x30                                            
 [<ffffffff81013130>] ? child_rip+0x0/0x20
INFO: task kjournald:61 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88003e81a3c0     0    61      2
 ffff88003e949bc0 0000000000000246 ffff88003e949b20 ffffffff8100ed0d
 0000000064706931 0000000064706931 ffff88003e949b40 ffffffff8107957a
 ffff88003e9c03a8 000000000000e2e8 ffff88003e9c03a8 0000000000012d00
Call Trace:
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x40
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81496c56>] schedule+0x21/0x49
 [<ffffffff81496cc2>] io_schedule+0x44/0x6c
 [<ffffffff8113b141>] sync_buffer+0x53/0x6b
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81497444>] __wait_on_bit+0x57/0x9e
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff81497508>] out_of_line_wait_on_bit+0x7d/0x9c
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
 [<ffffffff8113aff2>] __wait_on_buffer+0x37/0x4d
 [<ffffffff811b88b2>] wait_on_buffer+0x4a/0x65
 [<ffffffff811b8e8d>] journal_commit_transaction+0x497/0xe42
 [<ffffffff8100ee7f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8149943e>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
 [<ffffffff811bcdf7>] kjournald+0xfd/0x253
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff811bccfa>] ? kjournald+0x0/0x253
 [<ffffffff81070709>] kthread+0x6d/0xae
 [<ffffffff8101313a>] child_rip+0xa/0x20
 [<ffffffff81012afd>] ? restore_args+0x0/0x30
 [<ffffffff81013130>] ? child_rip+0x0/0x20

Comment 16 Kevin Bowling 2009-11-25 10:57:54 UTC

Mitchell/Edward,

Have you tried F12 yet?  I am gearing up this week to see how it performs on these hosts.

F11 with jiffies forced works well otherwise (24 days and counting).

Comment 17 Kevin Bowling 2010-02-09 00:11:47 UTC

Just saw this with F12 (kernel 2.6.31)!!

Any ideas what is going on here?  This is bringing down production VMs for me.

Fedora release 12 (Constantine)
Kernel 2.6.31.12-174.2.3.fc12.x86_64 on an x86_64 (/dev/hvc0)

buildmaster login: type=1305 audit(1265446533.138:11726): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=kernel res=1
INFO: task kjournald:242 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88002c73fb80     0   242      2 0x00000000
 ffff8800031d1c00 0000000000000246 000906996fc8e4d9 ffff88000315a318
 0000000001311302 ffffffff8111c908 ffff8800031d1cb0 0000000000000002
 ffff8800037861c8 000000000000f8f0 ffff8800037861c8 0000000000015600
Call Trace:
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8106acf2>] ? ktime_get_ts+0x4e/0x53
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8141bf53>] io_schedule+0x31/0x42
 [<ffffffff8111c948>] sync_buffer+0x40/0x44
 [<ffffffff8141c465>] __wait_on_bit+0x48/0x7b
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141c506>] out_of_line_wait_on_bit+0x6e/0x79
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8111c86e>] __wait_on_buffer+0x24/0x26
 [<ffffffff81199272>] wait_on_buffer+0x3b/0x3f
 [<ffffffff81199c59>] journal_commit_transaction+0x92e/0xe55
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8105c19b>] ? try_to_del_timer_sync+0x5a/0x66
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8119ce1f>] kjournald+0xeb/0x230
 [<ffffffff81067b37>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8119cd34>] ? kjournald+0x0/0x230
 [<ffffffff810677b5>] kthread+0x91/0x99
 [<ffffffff81012daa>] child_rip+0xa/0x20
 [<ffffffff81011f97>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101271d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81012da0>] ? child_rip+0x0/0x20
INFO: task java:1446 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff88002c73fb80     0  1446      1 0x00000080
 ffff88002d8cbb38 0000000000000282 00003ffffffff000 ffff88002d8c00d8
 ffff88002f271188 00000000000000ab ffffffff8100e139 ffff88002d8cbb08
 ffff88002d864a48 000000000000f8f0 ffff88002d864a48 0000000000015600
Call Trace:
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff81198743>] do_get_write_access+0x1d4/0x3bc
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8119c095>] ? journal_add_journal_head+0xbf/0x13c
 [<ffffffff81198956>] journal_get_write_access+0x2b/0x40
 [<ffffffff81164539>] __ext3_journal_get_write_access+0x25/0x51
 [<ffffffff81157b4f>] ext3_reserve_inode_write+0x43/0x87
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff81157bd3>] ext3_mark_inode_dirty+0x40/0x5d
 [<ffffffff81157d32>] ext3_dirty_inode+0x6a/0x81
 [<ffffffff811173d1>] __mark_inode_dirty+0x34/0xe3
 [<ffffffff8110de79>] file_update_time+0xae/0xed
 [<ffffffff810d74a8>] do_wp_page+0x60c/0x640
 [<ffffffff8100c3e5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e
 [<ffffffff8100c3a9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff810d8de8>] handle_mm_fault+0x690/0x705
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8141fba1>] do_page_fault+0x281/0x299
 [<ffffffff8141db35>] page_fault+0x25/0x30
INFO: task kjournald:242 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88002c73fb80     0   242      2 0x00000000
 ffff8800031d1c00 0000000000000246 000906996fc8e4d9 ffff88000315a318
 0000000001311302 ffffffff8111c908 ffff8800031d1cb0 0000000000000002
 ffff8800037861c8 000000000000f8f0 ffff8800037861c8 0000000000015600
Call Trace:
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8106acf2>] ? ktime_get_ts+0x4e/0x53
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8141bf53>] io_schedule+0x31/0x42
 [<ffffffff8111c948>] sync_buffer+0x40/0x44
 [<ffffffff8141c465>] __wait_on_bit+0x48/0x7b
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141c506>] out_of_line_wait_on_bit+0x6e/0x79
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8111c86e>] __wait_on_buffer+0x24/0x26
 [<ffffffff81199272>] wait_on_buffer+0x3b/0x3f
 [<ffffffff81199c59>] journal_commit_transaction+0x92e/0xe55
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8105c19b>] ? try_to_del_timer_sync+0x5a/0x66
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8119ce1f>] kjournald+0xeb/0x230
 [<ffffffff81067b37>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8119cd34>] ? kjournald+0x0/0x230
 [<ffffffff810677b5>] kthread+0x91/0x99
 [<ffffffff81012daa>] child_rip+0xa/0x20
 [<ffffffff81011f97>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101271d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81012da0>] ? child_rip+0x0/0x20
INFO: task java:1446 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff88002c73fb80     0  1446      1 0x00000080
 ffff88002d8cbb38 0000000000000282 00003ffffffff000 ffff88002d8c00d8
 ffff88002f271188 00000000000000ab ffffffff8100e139 ffff88002d8cbb08
 ffff88002d864a48 000000000000f8f0 ffff88002d864a48 0000000000015600
Call Trace:
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff81198743>] do_get_write_access+0x1d4/0x3bc
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8119c095>] ? journal_add_journal_head+0xbf/0x13c
 [<ffffffff81198956>] journal_get_write_access+0x2b/0x40
 [<ffffffff81164539>] __ext3_journal_get_write_access+0x25/0x51
 [<ffffffff81157b4f>] ext3_reserve_inode_write+0x43/0x87
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff81157bd3>] ext3_mark_inode_dirty+0x40/0x5d
 [<ffffffff81157d32>] ext3_dirty_inode+0x6a/0x81
 [<ffffffff811173d1>] __mark_inode_dirty+0x34/0xe3
 [<ffffffff8110de79>] file_update_time+0xae/0xed
 [<ffffffff810d74a8>] do_wp_page+0x60c/0x640
 [<ffffffff8100c3e5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e
 [<ffffffff8100c3a9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff810d8de8>] handle_mm_fault+0x690/0x705
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8141fba1>] do_page_fault+0x281/0x299
 [<ffffffff8141db35>] page_fault+0x25/0x30
INFO: task kjournald:242 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88002c73fb80     0   242      2 0x00000000
 ffff8800031d1c00 0000000000000246 000906996fc8e4d9 ffff88000315a318
 0000000001311302 ffffffff8111c908 ffff8800031d1cb0 0000000000000002
 ffff8800037861c8 000000000000f8f0 ffff8800037861c8 0000000000015600
Call Trace:
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8106acf2>] ? ktime_get_ts+0x4e/0x53
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8141bf53>] io_schedule+0x31/0x42
 [<ffffffff8111c948>] sync_buffer+0x40/0x44
 [<ffffffff8141c465>] __wait_on_bit+0x48/0x7b
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141c506>] out_of_line_wait_on_bit+0x6e/0x79
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8111c86e>] __wait_on_buffer+0x24/0x26
 [<ffffffff81199272>] wait_on_buffer+0x3b/0x3f
 [<ffffffff81199c59>] journal_commit_transaction+0x92e/0xe55
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8105c19b>] ? try_to_del_timer_sync+0x5a/0x66
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8119ce1f>] kjournald+0xeb/0x230
 [<ffffffff81067b37>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8119cd34>] ? kjournald+0x0/0x230
 [<ffffffff810677b5>] kthread+0x91/0x99
 [<ffffffff81012daa>] child_rip+0xa/0x20
 [<ffffffff81011f97>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101271d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81012da0>] ? child_rip+0x0/0x20
INFO: task rsyslogd:3571 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsyslogd      D ffff880003797480     0  3571      1 0x00000080
 ffff88001b0cbd28 0000000000000282 ffffffff8100e812 ffff88002d961f58
 0000000001314f32 ffff88001b0cbdc8 0000000000000002 ffffffff810c154d
 ffff88000e0383c8 000000000000f8f0 ffff88000e0383c8 0000000000015600
Call Trace:
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff810c154d>] ? sync_page+0x0/0x4a
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8106acf2>] ? ktime_get_ts+0x4e/0x53
 [<ffffffff810c154d>] ? sync_page+0x0/0x4a
 [<ffffffff8141bf53>] io_schedule+0x31/0x42
 [<ffffffff810c1593>] sync_page+0x46/0x4a
 [<ffffffff8141c35b>] __wait_on_bit_lock+0x46/0x8f
 [<ffffffff810e4381>] ? read_swap_cache_async+0x41/0x121
 [<ffffffff810c14f5>] __lock_page+0x66/0x6d
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff810d681d>] lock_page+0x3a/0x3e
 [<ffffffff810d8b77>] handle_mm_fault+0x41f/0x705
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141fba1>] do_page_fault+0x281/0x299
 [<ffffffff8141db35>] page_fault+0x25/0x30
INFO: task java:1446 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D ffff88002c73fb80     0  1446      1 0x00000080
 ffff88002d8cbb38 0000000000000282 00003ffffffff000 ffff88002d8c00d8
 ffff88002f271188 00000000000000ab ffffffff8100e139 ffff88002d8cbb08
 ffff88002d864a48 000000000000f8f0 ffff88002d864a48 0000000000015600
Call Trace:
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff81198743>] do_get_write_access+0x1d4/0x3bc
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8119c095>] ? journal_add_journal_head+0xbf/0x13c
 [<ffffffff81198956>] journal_get_write_access+0x2b/0x40
 [<ffffffff81164539>] __ext3_journal_get_write_access+0x25/0x51
 [<ffffffff81157b4f>] ext3_reserve_inode_write+0x43/0x87
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff81157bd3>] ext3_mark_inode_dirty+0x40/0x5d
 [<ffffffff81157d32>] ext3_dirty_inode+0x6a/0x81
 [<ffffffff811173d1>] __mark_inode_dirty+0x34/0xe3
 [<ffffffff8110de79>] file_update_time+0xae/0xed
 [<ffffffff810d74a8>] do_wp_page+0x60c/0x640
 [<ffffffff8100c3e5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e
 [<ffffffff8100c3a9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff810d8de8>] handle_mm_fault+0x690/0x705
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8141fba1>] do_page_fault+0x281/0x299
 [<ffffffff8141db35>] page_fault+0x25/0x30
INFO: task kjournald:242 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88002c73fb80     0   242      2 0x00000000
 ffff8800031d1c00 0000000000000246 000906996fc8e4d9 ffff88000315a318
 0000000001311302 ffffffff8111c908 ffff8800031d1cb0 0000000000000002
 ffff8800037861c8 000000000000f8f0 ffff8800037861c8 0000000000015600
Call Trace:
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8106acf2>] ? ktime_get_ts+0x4e/0x53
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8141bf53>] io_schedule+0x31/0x42
 [<ffffffff8111c948>] sync_buffer+0x40/0x44
 [<ffffffff8141c465>] __wait_on_bit+0x48/0x7b
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141c506>] out_of_line_wait_on_bit+0x6e/0x79
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8111c86e>] __wait_on_buffer+0x24/0x26
 [<ffffffff81199272>] wait_on_buffer+0x3b/0x3f
 [<ffffffff81199c59>] journal_commit_transaction+0x92e/0xe55
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8105c19b>] ? try_to_del_timer_sync+0x5a/0x66
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8119ce1f>] kjournald+0xeb/0x230
 [<ffffffff81067b37>] ? autoremove_wake_function+0x0/0x39
 [<ffffffff8141d6e9>] ? _spin_unlock_irqrestore+0x19/0x1b
 [<ffffffff8119cd34>] ? kjournald+0x0/0x230
 [<ffffffff810677b5>] kthread+0x91/0x99
 [<ffffffff81012daa>] child_rip+0xa/0x20
 [<ffffffff81011f97>] ? int_ret_from_sys_call+0x7/0x1b
 [<ffffffff8101271d>] ? retint_restore_args+0x5/0x6
 [<ffffffff81012da0>] ? child_rip+0x0/0x20
INFO: task rsyslogd:3571 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rsyslogd      D ffff880003797480     0  3571      1 0x00000080
 ffff88001b0cbd28 0000000000000282 ffffffff8100e812 ffff88002d961f58
 0000000001314f32 ffff88001b0cbdc8 0000000000000002 ffffffff810c154d
 ffff88000e0383c8 000000000000f8f0 ffff88000e0383c8 0000000000015600
Call Trace:
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff810c154d>] ? sync_page+0x0/0x4a
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8106acf2>] ? ktime_get_ts+0x4e/0x53
 [<ffffffff810c154d>] ? sync_page+0x0/0x4a
 [<ffffffff8141bf53>] io_schedule+0x31/0x42
 [<ffffffff810c1593>] sync_page+0x46/0x4a
 [<ffffffff8141c35b>] __wait_on_bit_lock+0x46/0x8f
 [<ffffffff810e4381>] ? read_swap_cache_async+0x41/0x121
 [<ffffffff810c14f5>] __lock_page+0x66/0x6d
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff810d681d>] lock_page+0x3a/0x3e
 [<ffffffff810d8b77>] handle_mm_fault+0x41f/0x705
 [<ffffffff8100e139>] ? xen_force_evtchn_callback+0xd/0xf
 [<ffffffff8100e812>] ? check_events+0x12/0x20
 [<ffffffff8100e7ff>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff8141fba1>] do_page_fault+0x281/0x299
 [<ffffffff8141db35>] page_fault+0x25/0x30
INFO: task ntpd:1005 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ntpd          D ffff88002c73c380     0  1005      1 0x00000080
 ffff88002d4aba18 0000000000000282 ffff88002d4abc38 ffff88002d4f5c48
 0000000001317516 ffff88002d4abac8 0000000000000002 ffffffff8111c908
 ffff8800030e1b48 000000000000f8f0 ffff8800030e1b48 0000000000015600
Call Trace:
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8106e8e3>] ? clocksource_read+0xf/0x11
 [<ffffffff8106acf2>] ? ktime_get_ts+0x4e/0x53
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff8141bf53>] io_schedule+0x31/0x42
 [<ffffffff8111c948>] sync_buffer+0x40/0x44
 [<ffffffff8141c35b>] __wait_on_bit_lock+0x46/0x8f
 [<ffffffff8141c412>] out_of_line_wait_on_bit_lock+0x6e/0x79
 [<ffffffff8111c908>] ? sync_buffer+0x0/0x44
 [<ffffffff81067b70>] ? wake_bit_function+0x0/0x33
 [<ffffffff8111cac8>] __lock_buffer+0x2a/0x2c
 [<ffffffff81197bd3>] lock_buffer+0x3a/0x3f
 [<ffffffff811985e2>] do_get_write_access+0x73/0x3bc
 [<ffffffff8119c01a>] ? journal_add_journal_head+0x44/0x13c
 [<ffffffff81197848>] ? __journal_file_buffer+0xcc/0x177
 [<ffffffff8119c095>] ? journal_add_journal_head+0xbf/0x13c
 [<ffffffff81198956>] journal_get_write_access+0x2b/0x40
 [<ffffffff81164539>] __ext3_journal_get_write_access+0x25/0x51
 [<ffffffff8115664f>] ext3_new_inode+0x608/0x9da
 [<ffffffff811990bf>] ? journal_start+0x65/0xe1
 [<ffffffff810f36eb>] ? kmem_cache_alloc+0xa7/0x11c
 [<ffffffff8115d534>] ext3_create+0xa0/0x126
 [<ffffffff81105f20>] vfs_create+0x78/0x9a
 [<ffffffff81106e02>] do_filp_open+0x314/0x967
 [<ffffffff811074e0>] ? getname+0x31/0x1c3
 [<ffffffff810401ab>] ? __might_sleep+0x27/0xe8
 [<ffffffff81204c64>] ? might_fault+0x1f/0x21
 [<ffffffff81204d63>] ? __strncpy_from_user+0x1e/0x49
 [<ffffffff8111045f>] ? alloc_fd+0x7b/0x124
 [<ffffffff810fae02>] do_sys_open+0x63/0x10f
 [<ffffffff810faee1>] sys_open+0x20/0x22
 [<ffffffff81011cf2>] system_call_fastpath+0x16/0x1b

Comment 18 Kevin Bowling 2010-02-09 00:14:37 UTC

boot dmesg from this system:

[root@buildmaster ~]# dmesg 
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.31.12-174.2.3.fc12.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.4.2 20091222 (Red Hat 4.4.2-20) (GCC) ) #1 SMP Mon Jan 18 19:52:07 UTC 2010
Command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us 
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
ACPI in unprivileged domain disabled
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 00000000000a0000 (usable)
 Xen: 00000000000a0000 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000030000000 (usable)
DMI not present or invalid.
last_pfn = 0x30000 max_arch_pfn = 0x400000000
initial memory mapped : 0 - 20000000
init_memory_mapping: 0000000000000000-0000000030000000
 0000000000 - 0030000000 page 4k
kernel direct mapping tables up to 30000000 @ 100000-282000
RAMDISK: 019d1000 - 03921000
No NUMA configuration found
Faking a node at 0000000000000000-0000000030000000
Bootmem setup node 0 0000000000000000-0000000030000000
  NODE_DATA [0000000000008000 - 000000000001dfff]
  bootmap [000000000001e000 -  0000000000023fff] pages 6
(7 early reservations) ==> bootmem [0000000000 - 0030000000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0003aa4000 - 0003ac5000]   XEN PAGETABLES ==> [0003aa4000 - 0003ac5000]
  #2 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #3 [0001000000 - 00019b0b40]    TEXT DATA BSS ==> [0001000000 - 00019b0b40]
  #4 [00019d1000 - 0003921000]          RAMDISK ==> [00019d1000 - 0003921000]
  #5 [0003921000 - 0003aa4000]   XEN START INFO ==> [0003921000 - 0003aa4000]
  #6 [0000100000 - 000025e000]          PGTABLE ==> [0000100000 - 000025e000]
Zone PFN ranges:
  DMA      0x00000000 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000000 -> 0x000000a0
    0: 0x00000100 -> 0x00030000
On node 0 totalpages: 196512
  DMA zone: 56 pages used for memmap
  DMA zone: 353 pages reserved
  DMA zone: 3591 pages, LIFO batch:0
  DMA32 zone: 2632 pages used for memmap
  DMA32 zone: 189880 pages, LIFO batch:31
SMP: Allowing 2 CPUs, 0 hotplug CPUs
No local APIC present
APIC: disable apic facility
nr_irqs_gsi: 16
PM: Registered nosave memory: 00000000000a0000 - 0000000000100000
Allocating PCI resources starting at 30000000 (gap: 30000000:d0000000)
NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:2 nr_node_ids:1
PERCPU: Allocated 22 4k pages, static data 90080 bytes
trying to map vcpu_info 0 at ffffc9000000b020, mfn 1ec2b9, offset 32
cpu 0 using vcpu_info at ffffc9000000b020
trying to map vcpu_info 1 at ffffc90000023020, mfn 1ec2a3, offset 32
cpu 1 using vcpu_info at ffffc90000023020
Xen: using vcpu_info placement
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 193471
Policy zone: DMA32
Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us 
PID hash table entries: 4096 (order: 12, 32768 bytes)
Initializing CPU#0
Checking aperture...
No AGP bridge found
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
Memory: 729800k/786432k available (4238k kernel code, 384k absent, 56248k reserved, 2924k data, 1328k init)
SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
Hierarchical RCU implementation.
NR_IRQS:4352 nr_irqs:288
Detected 1991.509 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
console [hvc0] enabled
allocated 7864320 bytes of page_cgroup
please try 'cgroup_disable=memory' option if you don't want memory cgroups
Xen: using vcpuop timer interface
installing Xen timer for CPU 0
Calibrating delay loop (skipped), value calculated using timer frequency.. 3983.01 BogoMIPS (lpj=1991509)
Security Framework initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys memory
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
Initializing cgroup subsys net_cls
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0/0x1 -> Node 0
tseg: 0000000000
Performance Counters: AMD PMU driver.
------------[ cut here ]------------
WARNING: at arch/x86/kernel/apic/apic.c:247 native_apic_write_dummy+0x35/0x41() (Not tainted)
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.31.12-174.2.3.fc12.x86_64 #1
Call Trace:
 [<ffffffff81051710>] warn_slowpath_common+0x84/0x9c
 [<ffffffff8105173c>] warn_slowpath_null+0x14/0x16
 [<ffffffff81026175>] native_apic_write_dummy+0x35/0x41
 [<ffffffff8101cdaa>] perf_counters_lapic_init+0x33/0x35
 [<ffffffff8171ebc0>] init_hw_perf_counters+0x300/0x39d
 [<ffffffff8171e77b>] identify_boot_cpu+0x3c/0x3e
 [<ffffffff8171e89c>] check_bugs+0x9/0x2d
 [<ffffffff81716db8>] start_kernel+0x3e0/0x3fa
 [<ffffffff817162a1>] x86_64_start_reservations+0xac/0xb0
 [<ffffffff81719cf1>] xen_start_kernel+0x4e4/0x4e8
---[ end trace a7919e7f17c0a725 ]---
... version:                 0
... bit width:               48
... generic counters:        4
... value mask:              0000ffffffffffff
... max period:              00007fffffffffff
... fixed-purpose counters:  0
... counter mask:            000000000000000f
SMP alternatives: switching to UP code
ftrace: converting mcount calls to 0f 1f 44 00 00
ftrace: allocating 20894 entries in 82 pages
installing Xen timer for CPU 1
SMP alternatives: switching to SMP code
Initializing CPU#1
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1/0x1 -> Node 0
Brought up 2 CPUs
sizeof(vma)=176 bytes
sizeof(page)=56 bytes
sizeof(inode)=584 bytes
sizeof(dentry)=192 bytes
sizeof(ext3inode)=776 bytes
sizeof(buffer_head)=104 bytes
sizeof(skbuff)=248 bytes
sizeof(task_struct)=6008 bytes
CPU0 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 0 1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level CPU
  groups: 1 0
Booting paravirtualized kernel on Xen
Xen version: 3.4.1 (preserve-AD)
Grant table initialized
regulator: core version 0.5
Time: 165:165:165  Date: 165/165/65
NET: Registered protocol family 16
PCI: Fatal: No config space access function found
bio: create slab <bio-0> at 0
ACPI: Interpreter disabled.
xen_balloon: Initialising balloon driver.
vgaarb: loaded
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: System does not support PCI
PCI: System does not support PCI
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
Switched to high resolution mode on CPU 0
pnp: PnP ACPI: disabled
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
Switched to high resolution mode on CPU 1
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP reno registered
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 32064k freed
platform rtc_cmos: registered platform RTC device (no PNP device found)
audit: initializing netlink socket (disabled)
type=2000 audit(1265674112.410:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
msgmni has been set to 1598
SELinux:  Registering netfilter hooks
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
brd: module loaded
loop: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Fixed MDIO Bus: probed
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
uhci_hcd: USB Universal Host Controller Interface driver
PNP: No PS/2 controller found. Probing ports directly.
mice: PS/2 mouse device common for all mice
rtc_cmos: probe of rtc_cmos failed with error -16
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel
cpuidle: using governor ladder
cpuidle: using governor menu
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
nf_conntrack version 0.5.0 (6144 buckets, 24576 max)
CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
ip_tables: (C) 2000-2006 Netfilter Core Team
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
PM: Resume from disk failed.
registered taskstats version 1
No TPM chip found, activating TPM-bypass!
XENBUS: Device with no driver: device/console/0
XENBUS: Device with no driver: device/vbd/51712
XENBUS: Device with no driver: device/vif/0
  Magic number: 1:252:3141
drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
Initalizing network drop monitor service
Freeing unused kernel memory: 1328k freed
Write protecting the kernel read-only data: 6356k
dracut: dracut-004-4.fc12
udev: starting version 145
dracut: Starting plymouth daemon
 xvda: xvda1 xvda2
dracut: Scanning devices xvda2  for LVM volume groups 
dracut: Reading all physical volumes. This may take a while...
dracut: Found volume group "VolGroup00" using metadata type lvm2
dracut: 2 logical volume(s) in volume group "VolGroup00" now active
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: dm-0: orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 1289059
ext3_orphan_cleanup: deleting unreferenced inode 1736760
ext3_orphan_cleanup: deleting unreferenced inode 1289563
ext3_orphan_cleanup: deleting unreferenced inode 1290658
ext3_orphan_cleanup: deleting unreferenced inode 1286573
ext3_orphan_cleanup: deleting unreferenced inode 1736944
ext3_orphan_cleanup: deleting unreferenced inode 1289670
ext3_orphan_cleanup: deleting unreferenced inode 1657226
ext3_orphan_cleanup: deleting unreferenced inode 1657225
ext3_orphan_cleanup: deleting unreferenced inode 1657224
ext3_orphan_cleanup: deleting unreferenced inode 1657223
ext3_orphan_cleanup: deleting unreferenced inode 1657222
ext3_orphan_cleanup: deleting unreferenced inode 1657221
ext3_orphan_cleanup: deleting unreferenced inode 1657220
ext3_orphan_cleanup: deleting unreferenced inode 1657219
ext3_orphan_cleanup: deleting unreferenced inode 1657218
ext3_orphan_cleanup: deleting unreferenced inode 1657217
ext3_orphan_cleanup: deleting unreferenced inode 1657216
ext3_orphan_cleanup: deleting unreferenced inode 1736792
ext3_orphan_cleanup: deleting unreferenced inode 1290604
ext3_orphan_cleanup: deleting unreferenced inode 1288243
ext3_orphan_cleanup: deleting unreferenced inode 1736926
ext3_orphan_cleanup: deleting unreferenced inode 1287216
ext3_orphan_cleanup: deleting unreferenced inode 1290092
ext3_orphan_cleanup: deleting unreferenced inode 1287519
ext3_orphan_cleanup: deleting unreferenced inode 1287269
ext3_orphan_cleanup: deleting unreferenced inode 1736791
ext3_orphan_cleanup: deleting unreferenced inode 1289235
ext3_orphan_cleanup: deleting unreferenced inode 1393164
ext3_orphan_cleanup: deleting unreferenced inode 1287902
ext3_orphan_cleanup: deleting unreferenced inode 1736775
ext3_orphan_cleanup: deleting unreferenced inode 1507843
ext3_orphan_cleanup: deleting unreferenced inode 1507366
ext3_orphan_cleanup: deleting unreferenced inode 1507841
ext3_orphan_cleanup: deleting unreferenced inode 1384449
ext3_orphan_cleanup: deleting unreferenced inode 1507837
ext3_orphan_cleanup: deleting unreferenced inode 1507836
ext3_orphan_cleanup: deleting unreferenced inode 1507835
ext3_orphan_cleanup: deleting unreferenced inode 1507331
ext3_orphan_cleanup: deleting unreferenced inode 1507833
ext3_orphan_cleanup: deleting unreferenced inode 1507618
ext3_orphan_cleanup: deleting unreferenced inode 1507410
ext3_orphan_cleanup: deleting unreferenced inode 1507831
ext3_orphan_cleanup: deleting unreferenced inode 1287648
ext3_orphan_cleanup: deleting unreferenced inode 1507830
ext3_orphan_cleanup: deleting unreferenced inode 1507829
ext3_orphan_cleanup: deleting unreferenced inode 1287398
ext3_orphan_cleanup: deleting unreferenced inode 1507825
ext3_orphan_cleanup: deleting unreferenced inode 1287301
ext3_orphan_cleanup: deleting unreferenced inode 1287248
ext3_orphan_cleanup: deleting unreferenced inode 1287220
ext3_orphan_cleanup: deleting unreferenced inode 1286853
ext3_orphan_cleanup: deleting unreferenced inode 1287219
ext3_orphan_cleanup: deleting unreferenced inode 1507726
ext3_orphan_cleanup: deleting unreferenced inode 1507676
ext3_orphan_cleanup: deleting unreferenced inode 1507335
ext3_orphan_cleanup: deleting unreferenced inode 1287100
ext3_orphan_cleanup: deleting unreferenced inode 1287012
ext3_orphan_cleanup: deleting unreferenced inode 1286960
ext3_orphan_cleanup: deleting unreferenced inode 1507666
ext3_orphan_cleanup: deleting unreferenced inode 1507658
ext3_orphan_cleanup: deleting unreferenced inode 1507657
ext3_orphan_cleanup: deleting unreferenced inode 1507656
ext3_orphan_cleanup: deleting unreferenced inode 1507652
ext3_orphan_cleanup: deleting unreferenced inode 1507644
ext3_orphan_cleanup: deleting unreferenced inode 1507425
ext3_orphan_cleanup: deleting unreferenced inode 1286865
ext3_orphan_cleanup: deleting unreferenced inode 1286813
ext3_orphan_cleanup: deleting unreferenced inode 1286738
ext3_orphan_cleanup: deleting unreferenced inode 1507582
ext3_orphan_cleanup: deleting unreferenced inode 1507545
ext3_orphan_cleanup: deleting unreferenced inode 1507536
ext3_orphan_cleanup: deleting unreferenced inode 1507461
ext3_orphan_cleanup: deleting unreferenced inode 1507441
ext3_orphan_cleanup: deleting unreferenced inode 1286653
ext3_orphan_cleanup: deleting unreferenced inode 1507430
ext3_orphan_cleanup: deleting unreferenced inode 1507414
ext3_orphan_cleanup: deleting unreferenced inode 1507384
ext3_orphan_cleanup: deleting unreferenced inode 1507379
ext3_orphan_cleanup: deleting unreferenced inode 1507367
ext3_orphan_cleanup: deleting unreferenced inode 1507350
ext3_orphan_cleanup: deleting unreferenced inode 1507342
ext3_orphan_cleanup: deleting unreferenced inode 1507341
ext3_orphan_cleanup: deleting unreferenced inode 1507337
EXT3-fs: dm-0: 84 orphan inodes deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
dracut: Mounted root filesystem /dev/mapper/VolGroup00-LogVol00
dracut: Switching root
udev: starting version 145
Initialising Xen virtual ethernet driver.
device-mapper: multipath: version 1.1.0 loaded
EXT3 FS on dm-0, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on xvda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1671160k swap on /dev/mapper/VolGroup00-LogVol01.  Priority:-1 extents:1 across:1671160k 
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
eth0: no IPv6 routers present

Comment 19 Norman Gaywood 2010-02-09 06:36:41 UTC

Is clocksource=jiffies help in this case?

Similar Xen systems that might be having these problems are bug #551552 and bug #550724

But I'm only guessing.

Comment 20 Kevin Bowling 2010-02-09 07:04:25 UTC

(In reply to comment #19)
> Is clocksource=jiffies help in this case?
> 
> Similar Xen systems that might be having these problems are bug #551552 and bug
> #550724
> 
> But I'm only guessing.    

It appears F12/Kernel 2.6.31 have no option for jiffies, just xen clocksource.

bug #551552 looks like a dupe of this, the other one could be different but related.

Comment 21 Norman Gaywood 2010-02-16 12:06:20 UTC

just installed kernel-2.6.30.10-105.2.23.fc11.x86_64 on my F12 domU system similar to your setup (see #550724) and set clocksource=jiffies

Too early to tell if this is more stable, but the clock gains time very quickly.
ntpd does not seem to be keeping it stable. Perhaps I have ntpd setup wrong.

Comment 22 Norman Gaywood 2010-02-16 12:23:27 UTC

Well that didn't last long. 2.6.30.10-105.2.23 crashed pretty quickly. So 2.6.30 with clocksource=jiffies was not stable for me.

Comment 23 Norman Gaywood 2010-05-04 01:30:17 UTC

Neen quite here for awhile, are you still seeing this problem?

Could you also post the dmesg of your dom0?

Comment 24 Norman Gaywood 2010-10-13 05:42:39 UTC

I suspect that 2.6.32.23-170.fc12.x86_64 fixes this problem.

See bug 636534 and bug 550724

Comment 25 Bug Zapper 2010-11-04 09:43:39 UTC

This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 26 Bug Zapper 2010-12-04 07:35:51 UTC

Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.