Bug 509220 - i386 rhel4.8 kvm guests crashes in virtio during installation
i386 rhel4.8 kvm guests crashes in virtio during installation
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
high Severity high
: rc
: ---
Assigned To: Chris Lalancette
Virtualization Bugs
: ZStream
Depends On:
Blocks: 582911
  Show dependency treegraph
Reported: 2009-07-01 15:35 EDT by Gurhan Ozen
Modified: 2013-11-03 20:42 EST (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2011-02-16 10:49:09 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Backport of linux-2.6 patch for dynamic scatterlist (4.41 KB, patch)
2009-08-13 08:57 EDT, Chris Lalancette
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Legacy) 29700 None None None Never

  None (edit)
Description Gurhan Ozen 2009-07-01 15:35:48 EDT
Description of problem:
------------[ cut here ]------------
kernel BUG at drivers/block/virtio_blk.c:157!
invalid operand: 0000 [#1]
Modules linked in: dm_snapshot dm_mirror dm_zero dm_mod ext3 jbd msdos raid6 raid5 xor raid1 raid0 virtio_blk virtio_net virtio_pci virtio virtio_ring uhci_hcd sr_mod sd_mod scsi_mod edd floppy loop nfs nfs_acl lockd sunrpc vfat fat cramfs
CPU:    0
EIP:    0060:[<f88352a0>]    Not tainted VLI
EFLAGS: 00010082   (2.6.9-89.EL) 
EIP is at do_virtblk_request+0x1a/0x85 [virtio_blk]
eax: c3137400   ebx: dfe54f4c   ecx: 00000000   edx: dfe57530
esi: f74a6028   edi: 00000000   ebp: f754c000   esp: f7471cf0
ds: 007b   es: 007b   ss: 0068
Process anaconda (pid: 542, threadinfo=f7471000 task=f74860f0)
Stack: f74a6028 00000018 c3137400 dfe54f4c c025e1ab f74a6028 c025e228 00000000 
       c025ef6e 00000001 00000000 f74a6028 f74860f0 c0121d93 f7471d28 f7471d28 
       f7500880 0000021e f754f2cc 00000010 dfe54f4c f7500880 f754f2cc c02670b6 
Call Trace:
 [<c025e1ab>] __generic_unplug_device+0x2b/0x2d
 [<c025e228>] generic_unplug_device+0x7b/0xe0
 [<c025ef6e>] blk_execute_rq+0xbb/0xe3
 [<c0121d93>] autoremove_wake_function+0x0/0x2d
 [<c02670b6>] cfq_set_request+0x33/0x6b
 [<c0267083>] cfq_set_request+0x0/0x6b
 [<c025cef6>] elv_set_request+0xa/0x17
 [<c025eb13>] get_request+0x395/0x39f
 [<c0262ec2>] sg_scsi_ioctl+0x2c2/0x3c4
 [<c0263397>] scsi_cmd_ioctl+0x3d3/0x478
 [<c01ed3a9>] kobject_get+0xf/0x13
 [<c0262218>] get_disk+0x29/0x62
 [<c0261bb3>] exact_lock+0x7/0xd
 [<c025b10e>] kobj_lookup+0x123/0x185
 [<c0261ba5>] exact_match+0x0/0x7
 [<c017840b>] do_open+0x1b2/0x4ec
 [<c018f64a>] wake_up_inode+0x6/0x23
 [<c01787c7>] blkdev_open+0x1a/0x42
 [<c016dad2>] __dentry_open+0xcf/0x188
 [<c016d9a2>] filp_open+0x51/0x65
 [<f88353f7>] virtblk_ioctl+0x76/0x80 [virtio_blk]
 [<c026164e>] blkdev_ioctl+0x32b/0x337
 [<c0178aca>] block_ioctl+0x11/0x13
 [<c0183485>] sys_ioctl+0x297/0x336
 [<c0322f1b>] syscall_call+0x7/0xb
Code: 78 04 b8 01 00 00 00 89 57 04 89 3a 5b 5e 5f 5d c3 55 31 ed 57 31 ff 56 89 c6 53 eb 5a 66 81 7b 54 83 00 8b 43 48 8b 68 38 76 08 <0f> 0b 9d 00 bd 57 83 f8 89 d9 89 ea 89 f0 e8 8f fe ff ff 84 c0 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

Version-Release number of selected component (if applicable):
RHEL4-U8 i386 released distro.

How reproducible:
Very. Only happens in i386 guests.

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 Dor Laor 2009-07-06 11:24:12 EDT
Seems like it is a guest driver issue. Changing the component.
Comment 3 Chris Lalancette 2009-07-15 15:33:31 EDT
I'll take a look at this tomorrow.

Chris Lalancette
Comment 4 Chris Lalancette 2009-08-13 06:09:04 EDT
OK, I took look at this.  First, I wasn't able to reproduce with an F-11 based host, so I'll try a RHEL-5 based one next.  Gurhan, could you give me details about which piece(s) of hardware you were able to reproduce this on?

Looking at the stack trace, we hit this:

0x2a0 is in do_virtblk_request (drivers/block/virtio_blk.c:157).
152		struct request *req;
153		unsigned int issued = 0;
155		while ((req = elv_next_request(q)) != NULL) {
156			vblk = req->rq_disk->private_data;
157			BUG_ON(req->nr_phys_segments > ARRAY_SIZE(vblk->sg));
159			/* If this request fails, stop queue and wait for something to
160			   finish to restart it. */
161			if (!do_req(q, vblk, req)) {

Which means (I think) that we were handed more segments than the ring can handle.  I'll have to look further into how we got into that situation.

Chris Lalancette
Comment 5 Chris Lalancette 2009-08-13 08:06:24 EDT
OK, I have a thought as to how this might happen.  I think the problem might be in the size of our scatterlist vs. what the host told us.  In the current RHEL-4 code, there is a hardcoded scatterlist size of (3+MAX_PHYS_SEGMENTS).  However, I think it is possible for the host to tell us to use more than that, and if it does so, then we set up the block layer to give us whatever the host tells us, irrespective of our internally hardcoded size.  If that happens, then we'll run into this BUG()

Luckily, upstream has moved to a dynamically allocated scatterlist.  Assuming my above analysis is right, than this should fix the problem.  I've done a backport of that patch to RHEL-4, and now I just need a place to test it.  Hopefully Gurhan can provide me with the machines I need to do that.

Chris Lalancette
Comment 6 Chris Lalancette 2009-08-13 08:57:07 EDT
Created attachment 357314 [details]
Backport of linux-2.6 patch for dynamic scatterlist

This patch is what I have in mind.  I've lightly tested it, and it seems to work (at least basically), but I'll still need to test it on the problem machine.

Chris Lalancette
Comment 12 Chris Lalancette 2009-08-20 06:13:20 EDT
Hm, yeah, I'm wondering if Dor's last comment is why things are different.  In any case, I've now done a test of the installer with the patch in place, and things look pretty good; the install with that particular ks.cfg no longer fails.  I'll get this patch queued up for 4.9.

Chris Lalancette
Comment 13 RHEL Product and Program Management 2009-08-20 06:39:53 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 14 Vivek Goyal 2009-08-25 15:23:32 EDT
Committed in 89.11.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 15 Dan Yasny 2010-04-09 08:52:33 EDT
The test kernel has been tested by a customer, and verified the issue is no longer reproducible: see https://enterprise.redhat.com/issue-tracker/?module=issues&action=view&tid=737433&gid=23498&view_type=lifoall#eid_6681283
Comment 25 errata-xmlrpc 2011-02-16 10:49:09 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.