Bug 551552

Summary: FC12 2.6.31.9-174.fc12.x86_64 hangs under heavy disk I/O
Product: [Fedora] Fedora Reporter: Matteo Brancaleoni <mbrancaleoni>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 12CC: anton, dougsland, drjones, gansalmon, itamar, kernel-maint, mishu, ngaywood
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Xen PV domU
Last Closed: 2010-12-04 01:01:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
FC12 traces when processes hang
none
Same result with upcoming 2.6.32.2-1.fc12.x86_64 none

Description Matteo Brancaleoni 2009-12-31 08:36:19 UTC
Created attachment 381067 [details]
FC12 traces when processes hang

Description of problem:
I am running an x86_64 FC12 domU PV machine under x86_64 xen host.
Kernel Version is 2.6.31.9-174.fc12.x86_64.

Under heavy disk I/O (the system runs koji build environment), the kernel hangs,
sometimes forever (responds to ping, but no activity can be done), other times only some processes hangs.

In all cases a lot of processes results into "being blocked for more than 120 seconds). Sometimes is httpd, more frequently is pdflush, kswapd and kjournald.

The only cure is to hard reset the vm.

Initially I was suspecting filesystem issues, so move from ext4 to ext3, but the result is the same.

The domU is running over disk images (xvd block device) not on a phys device.

If the machine is converted to HVM, no more issues (but is slooow).

Attached kernel call traces when system hangs.

Version-Release number of selected component (if applicable):
domU is FC12 PV VM with kernel 2.6.31.9-174.fc12.x86_64 
domO kernel version is : 2.6.18-164.9.1.el5xen

dom0 (xen host) is Centos 5.4 with latest centos xen:
[root@xen2 images]# xm info
host                   : xen2
release                : 2.6.18-164.9.1.el5xen
version                : #1 SMP Tue Dec 15 21:31:37 EST 2009
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 1
sockets_per_node       : 2
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 2333
hw_caps                : bfebfbff:20000800:00000000:00000140:0004e3bd:00000000:00000001
total_memory           : 8189
free_memory            : 38
node_to_cpu            : node0:0-7
xen_major              : 3
xen_minor              : 1
xen_extra              : .2-164.9.1.el5
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
cc_compiler            : gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
cc_compile_by          : mockbuild
cc_compile_domain      : centos.org
cc_compile_date        : Tue Dec 15 20:50:26 EST 2009
xend_config_format     : 2

domU xen config is:
name = "koji.voismart.net"
uuid = "7fc52711-974d-f475-c268-e6a8f495be5d"
maxmem = 3096
memory = 3096
vcpus = 4
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=it" ]
disk = [ "tap:aio:/var/lib/xen/images/koji.voismart.net.img,xvda,w", "tap:aio:/var/lib/xen/images/koji.voismart.net.disk2.img,xvdb,w", "tap:aio:/var/lib/xen/images/koji.voismart.net.disk3.img.img,xvdc,w" ]
vif = [ "mac=00:16:36:02:ae:f1,bridge=xenbr0,script=vif-bridge" ]

Comment 1 Norman Gaywood 2010-01-01 07:51:21 UTC
Bug 551552 looks very similar to this one.

Comment 2 Norman Gaywood 2010-01-01 07:52:37 UTC
Sorry, that should be bug 550724

Comment 3 Matteo Brancaleoni 2010-01-05 13:06:35 UTC
Created attachment 381749 [details]
Same result with upcoming 2.6.32.2-1.fc12.x86_64

I've tested the kernel 2.6.32.2-1.fc12.x86_64, from fedoraproject koji.

The system seems more stable (means that takes longer to hang and needs heavier I/O) but still happens.

dmesg log attached.

Comment 4 Norman Gaywood 2010-02-25 02:47:35 UTC
As I mentioned before, I think I am seeing this in bug #550724

Do you have a test that can reliably produce this problem?

I can't get it to happen on a test system, only my production LTSP server.

This bug I sure will get more attention, I'm sure, if a reliable test case can be found.

Comment 5 Bug Zapper 2010-11-04 02:10:00 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Bug Zapper 2010-12-04 01:01:52 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.