Bug 690039
Summary: | Test suite "bonnie" generates call trace with RHEL 5.6 guest | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | IBM Bug Proxy <bugproxy> | ||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 5.6 | CC: | akong, gcosta, jasowang, kwolf, leiwang, mkenneth, qwan, rhod, shuang, tburke, virt-maint, yuzhang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | All | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-01-12 15:58:41 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 580948 | ||||||
Attachments: |
|
Description
IBM Bug Proxy
2011-03-23 05:40:53 UTC
Created attachment 486954 [details]
sos report for rhel5.6 guest running on rhel6 host
Did you have the chance to test on RHEL6 host as well ? I'm a bit confused, as this bug seems to be two bugs in one report: - on a rhel5.6 host we get a softlockup in ext3's fsync. Given how slow ext3 fsync is that doesn't surprise me in general. - on a rhel6 host you get a warning from the ATA driver. That's the same guest migrated to a rhel6 host, or a different installation of the same guest image? Note that the 0ex ATA command is WIN_IDENTIFY, which is something we normally don't do under load. Also note that the config seems to use cache=writethrough, which could explain all kinds of timeouts if backed by ext3's incredibly slow O_SYNC code. Does the problem go away if you add cache=none to the qemu command line, or use a different filesystem like xfs or ext4? can reproduce with cache=none 1. cmd /usr/libexec/qemu-kvm -monitor stdio -serial unix:'/tmp/serial-20110718-171628-6Bso',server,nowait -drive file='/home/images/RHEL-Server-5.7-64-virtio.qcow2',index=0,if=ide,media=disk,cache=none,format=qcow2 -net nic,vlan=0,model=rtl8139,macaddr='9a:ba:99:1b:74:e0' -net tap,vlan=0,ifname='t0-171628-6Bso',script='/home/qemu-ifup-switch',downscript='no' -m 2048 -smp 4,cores=1,threads=1,sockets=4 -cpu qemu64,+sse2 -soundhw ac97 -vnc :0 -rtc-td-hack -M rhel5.6.0 -boot c -usbdevice tablet -no-kvm-pit-reinjection 2. host 2.6.18-274.el5 kvm-83-239.el5 3. guest rhel5.7.64 4. INFO: task syslogd:2377 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syslogd D ffff810002536420 0 2377 1 2380 2364 (NOTLB) ffff810074ad1d98 0000000000000082 ffff810074ad1d18 0000000000000246 0000000000000000 0000000000000009 ffff81007d248080 ffffffff80314b60 0000037ef15f64ad 00000000000058a9 ffff81007d248268 00000000880317be Call Trace: [<ffffffff88036e14>] :jbd:log_wait_commit+0xa3/0xf5 [<ffffffff800a2e5d>] autoremove_wake_function+0x0/0x2e [<ffffffff8803179a>] :jbd:journal_stop+0x1d3/0x203 [<ffffffff8002fa5f>] __writeback_single_inode+0x1dd/0x31c [<ffffffff800e3463>] do_readv_writev+0x26e/0x291 [<ffffffff800f64f4>] sync_inode+0x24/0x33 [<ffffffff8804c37e>] :ext3:ext3_sync_file+0xce/0xf8 [<ffffffff8004fe95>] do_fsync+0x52/0xa4 [<ffffffff800e3cf0>] __do_fsync+0x23/0x36 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 INFO: task syslogd:2377 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syslogd D ffff810002536420 0 2377 1 2380 2364 (NOTLB) ffff810074ad1d98 0000000000000082 ffff810074ad1d18 0000000000000246 0000000000000000 0000000000000009 ffff81007d248080 ffffffff80314b60 0000037ef15f64ad 00000000000058a9 ffff81007d248268 00000000880317be Call Trace: [<ffffffff88036e14>] :jbd:log_wait_commit+0xa3/0xf5 [<ffffffff800a2e5d>] autoremove_wake_function+0x0/0x2e [<ffffffff8803179a>] :jbd:journal_stop+0x1d3/0x203 [<ffffffff8002fa5f>] __writeback_single_inode+0x1dd/0x31c [<ffffffff800e3463>] do_readv_writev+0x26e/0x291 [<ffffffff800f64f4>] sync_inode+0x24/0x33 [<ffffffff8804c37e>] :ext3:ext3_sync_file+0xce/0xf8 [<ffffffff8004fe95>] do_fsync+0x52/0xa4 [<ffffffff800e3cf0>] __do_fsync+0x23/0x36 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Just to clarify things: these warnings are in the guest or in the host? Either way it just seems like ext3 fsync beeing the pain it is and causing soft lockup warnings. I can't see how this is related to kvm, except for an added layer of indirection not helping to improve an already painful experience. I would recommend not using ext3 for virt setups, but if you really strongly care about it open a bug against the ext3 driver in the kernel Moved to the kernel component per chellwig recommendation ------- Comment From prem.karat.ibm.com 2011-08-30 01:31 EDT------- (In reply to comment #25) > Moved to the kernel component per chellwig recommendation Hi Redhat, Did you get a chance to look into this? Cheers, Prem ------- Comment From prem.karat.ibm.com 2011-09-13 09:02 EDT------- Hi Redhat, We are awaiting your response on this. Cheers, Prem (In reply to comment #10) > ------- Comment From prem.karat.ibm.com 2011-09-13 09:02 EDT------- > Hi Redhat, > > We are awaiting your response on this. > > Cheers, > Prem Christoph was asking you to try our ext4 instead. It seems it is not a hypervisor issue but a file system one. This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.8 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. ------- Comment From prem.karat.ibm.com 2012-01-09 23:38 EDT------- Hi RH, You can close this request. -prem |