Description of problem: When installing a guest under F17 dom0, using virt-install: virt-install -l http://mirrors.rit.edu/fedora/linux//development/17/i386/os/ --ram 1024 --disk /var/lib/xen/images/F17-virt-install.img,size=10 --name test_1 --network bridge:br0 I get this crash in the guest kernel when it is starting to format the parition in ext4: [ 172.803917] BUG: unable to handle kernel paging request at 000b0860 [ 172.804640] IP: [<c064263d>] blk_dump_rq_flags+0xcd/0x120 [ 172.804901] *pdpt = 000000001fbdf027 *pde = 0000000000000000 [ 172.804901] Oops: 0000 [#1] SMP [ 172.804901] Modules linked in: fcoe libfcoe libfc scsi_transport_fc scsi_tgt xen_kbdfront xen_netfront xen_blkfront sunrpc xts lrw gf128mul sha256_generic dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 iscsi_ibft iscs i_boot_sysfs scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs [last unloaded: scsi_wai t_scan] [ 172.804901] [ 172.804901] Pid: 0, comm: swapper/0 Not tainted 3.3.4-5.fc17.i686.PAE #1 [ 172.804901] EIP: 0061:[<c064263d>] EFLAGS: 00010006 CPU: 0 [ 172.804901] EIP is at blk_dump_rq_flags+0xcd/0x120 [ 172.804901] EAX: c1dd6b60 EBX: c1dd6a80 ECX: 00005a3e EDX: 000b085c [ 172.804901] ESI: 00000000 EDI: 00000000 EBP: c0b61d8c ESP: c0b61d70 [ 172.804901] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [ 172.804901] Process swapper/0 (pid: 0, ti=c0b60000 task=c0b67020 task.ti=c0b60000) [ 172.804901] Stack: [ 172.804901] c0ad41b8 c0ad3594 c0ace238 00000000 c056ff60 c1dd6b60 00000000 c0b61dd4 [ 172.804901] c0645349 c21c8000 00000000 c0b61dac c06457a0 00000000 0000eb5d c21c8000 [ 172.804901] c0645822 c1dd6aa0 00000000 c1dd6a80 00000000 00000000 c1dd6a80 00000000 [ 172.804901] Call Trace: [ 172.804901] [<c056ff60>] ? create_empty_buffers+0xb0/0xb0 [ 172.804901] [<c0645349>] blk_update_request+0x269/0x420 [ 172.804901] [<c06457a0>] ? freed_request+0x40/0x60 [ 172.804901] [<c0645822>] ? __blk_put_request+0x62/0xa0 [ 172.804901] [<c064551a>] blk_update_bidi_request+0x1a/0x70 [ 172.804901] [<c0646d12>] __blk_end_bidi_request+0x22/0x50 [ 172.804901] [<c0646d63>] __blk_end_request_all+0x23/0x40 [ 172.804901] [<ed95db72>] blkif_interrupt+0x172/0x330 [xen_blkfront] [ 172.804901] [<c0476d00>] ? sched_clock_local+0xf0/0x1e0 [ 172.804901] [<c04c44ec>] handle_irq_event_percpu+0x4c/0x1d0 [ 172.804901] [<c0664a4a>] ? radix_tree_lookup+0xa/0x10 [ 172.804901] [<c04c46a3>] handle_irq_event+0x33/0x60 [ 172.804901] [<c04c71c2>] handle_edge_irq+0x62/0xe0 [ 172.804901] [<c06ee47d>] __xen_evtchn_do_upcall+0x15d/0x200 [ 172.804901] [<c06f0450>] xen_evtchn_do_upcall+0x20/0x40 [ 172.804901] [<c094b417>] xen_do_upcall+0x7/0xc [ 172.804901] [<c049007b>] ? second_overflow+0x2b/0x180 [ 172.804901] [<c04023a7>] ? hypercall_page+0x3a7/0x1000 [ 172.804901] [<c040871b>] ? xen_safe_halt+0x1b/0x30 [ 172.804901] [<c0418e98>] default_idle+0x68/0x160 [ 172.804901] [<c0410958>] cpu_idle+0xa8/0xe0 Curiously, if I use a different backend: virt-install -l http://mirrors.rit.edu/fedora/linux//development/17/i386/os/ --ram 1024 --disk /dev/sdb --name test_1 --network bridge:br0 it works just great. Version-Release number of selected component (if applicable): xen-4.1.2-15.fc17.x86_64 xen-libs-4.1.2-15.fc17.x86_64 xen-runtime-4.1.2-15.fc17.x86_64 xen-hypervisor-4.1.2-15.fc17.x86_64 xen-licenses-4.1.2-15.fc17.x86_64 How reproducible: 100% Steps to Reproduce: 1. Install F17 on a machine 2. yum install xen libvirt virt-manager virt-install virt-viewer 3. reboot 4. Pick Xen 5. Configure network (I used a bridge setup outlined in http://wiki.libvirt.org/page/Networking 6. virt-install -l http://mirrors.rit.edu/fedora/linux//development/17/i386/os/ --ram 1024 --disk /var/lib/xen/images/F17-virt-install.img,size=10 --name test_1 --network bridge:br0 7. (xm console test_1; tail -f /var/log/messages; See crash). Actual results: At the top of the bug report Expected results: None of the mentioned problem Additional info:
Created attachment 586473 [details] xenstore-ls output .. and from the guest's dmesg: [root@tst013 konrad]# more pv-bad [ 4.326808] blkfront: xvda: flush diskcache: enabled [ 4.405910] xvda: xvda1 [root@tst013 konrad]# more pv-ok [ 4.413322] blkfront: xvda: barrier or flush: disabled [ 4.430195] xvda: xvda1 xvda2 The test_1 is using phy, while test_2 is using the loop device.
I'll propose this as a blocker just to ensure we don't miss it, if it should be. So AIUI the impact here is that Xen fails with a disk image but passes if using an actual physical disk?
Does this fail only with F17 as Dom0? If so, I'm -1 blocker since the Xen criterion only covers usage as a DomU.
I'm experiencing the same using F16 as Dom0 (In reply to comment #3) > Does this fail only with F17 as Dom0? If so, I'm -1 blocker since the Xen > criterion only covers usage as a DomU.
(In reply to comment #4) > I'm experiencing the same using F16 as Dom0 > > (In reply to comment #3) > > Does this fail only with F17 as Dom0? If so, I'm -1 blocker since the Xen > > criterion only covers usage as a DomU. The current F16 kernel should be almost identical to the F17 kernel right now, so that doesn't surprise me a whole lot. The interesting thing is going to be if this still happens with an older F16 kernel (or something else not running 3.3)
(In reply to comment #4) > I'm experiencing the same using F16 as Dom0 > > (In reply to comment #3) > > Does this fail only with F17 as Dom0? If so, I'm -1 blocker since the Xen > > criterion only covers usage as a DomU. And here they are some more details about my situation and the error I'm getting. I seems definitely the same Konrad is reporting, as it happens only for file:///, while using an actual LVM volume just works perfectly. Also, it manifests itself right when the installer tries to write the partition table to the disk image (I also tried changing the fs type, but the crash is still there). However, I'm getting a different stack trace. How reproducible: 100% Additional info: host: Intel(R) Core(TM) i7-2720QM Fedora release 16 (Verne) $ uname -a Linux Abyss 3.3.5-2.fc16.x86_64 #1 SMP Installed Packages Name : xen Arch : x86_64 Version : 4.1.2 Release : 7.fc16 Name : kernel Arch : x86_64 Version : 3.3.5 Release : 2.fc16 [ 419.873230] BUG: unable to handle kernel NULL pointer dereference at 000000e4 [ 419.873240] IP: [<c0646d4c>] __blk_end_request_all+0xc/0x40 [ 419.873250] *pdpt = 000000001b1e2027 *pde = 0000000000000000 [ 419.873256] Oops: 0000 [#1] SMP [ 419.873261] Modules linked in: fcoe libfcoe libfc scsi_transport_fc scsi_tgt xen_kbdfront xen_netfront xen_blkfront sunrpc xts lrw gf128mul sha256_generic dm_crypt dm_round_robin dm_multipath linear raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 iscsi_ibft iscsi_boot_sysfs scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi squashfs cramfs [last unloaded: scsi_wait_scan] [ 419.873308] [ 419.873311] Pid: 0, comm: swapper/0 Not tainted 3.3.4-5.fc17.i686.PAE #1 [ 419.873318] EIP: 0061:[<c0646d4c>] EFLAGS: 00010086 CPU: 0 [ 419.873323] EIP is at __blk_end_request_all+0xc/0x40 [ 419.873327] EAX: 00000000 EBX: c22692c8 ECX: 00000005 EDX: 00000000 [ 419.873331] ESI: 00000018 EDI: ebf90000 EBP: c0b61e08 ESP: c0b61e00 [ 419.873336] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069 [ 419.873340] Process swapper/0 (pid: 0, ti=c0b60000 task=c0b67020 task.ti=c0b60000) [ 419.873344] Stack: [ 419.873347] c20939c0 c22692c8 c0b61e50 ed95db72 00000000 00000d56 00000000 00001d50 [ 419.873356] 00000000 ec5e0190 ed9600dc 00000547 00000000 00000546 ffffffff c22692c8 [ 419.873365] c0476d00 c20939c0 00000018 c18296c0 c0b61e8c c04c44ec 00000018 00000000 [ 419.873374] Call Trace: [ 419.873380] [<ed95db72>] blkif_interrupt+0x172/0x330 [xen_blkfront] [ 419.873388] [<c0476d00>] ? sched_clock_local+0xf0/0x1e0 [ 419.873395] [<c04c44ec>] handle_irq_event_percpu+0x4c/0x1d0 [ 419.873402] [<c0664a4a>] ? radix_tree_lookup+0xa/0x10 [ 419.873407] [<c04c46a3>] handle_irq_event+0x33/0x60 [ 419.873412] [<c04c71c2>] handle_edge_irq+0x62/0xe0 [ 419.873418] [<c06ee47d>] __xen_evtchn_do_upcall+0x15d/0x200 [ 419.873423] [<c06f0450>] xen_evtchn_do_upcall+0x20/0x40 [ 419.873429] [<c094b417>] xen_do_upcall+0x7/0xc [ 419.873435] [<c049007b>] ? second_overflow+0x2b/0x180 [ 419.873440] [<c04023a7>] ? hypercall_page+0x3a7/0x1000 [ 419.873446] [<c040871b>] ? xen_safe_halt+0x1b/0x30 [ 419.873452] [<c0418e98>] default_idle+0x68/0x160 [ 419.873457] [<c0410958>] cpu_idle+0xa8/0xe0 [ 419.873463] [<c0923bb5>] rest_init+0x5d/0x68 [ 419.873469] [<c0bd9797>] start_kernel+0x357/0x35d [ 419.873474] [<c0bd91c5>] ? loglevel+0x2b/0x2b [ 419.873478] [<c0bd90ac>] i386_start_kernel+0x9b/0xa2 [ 419.873484] [<c0bdba7a>] xen_start_kernel+0x57f/0x587 [ 419.873489] [<c04090c7>] ? check_events+0x3/0xc [ 419.873492] Code: 89 ec 5d c3 8d 76 00 89 d8 89 f2 e8 2f eb ff ff 8b 5d f8 31 c0 8b 75 fc 89 ec 5d c3 8d 76 00 55 89 e5 53 83 ec 04 3e 8d 74 26 00 <8b> 88 e4 00 00 00 85 c9 75 17 31 db 8b 48 30 89 1c 24 e8 8d ff [ 419.873546] EIP: [<c0646d4c>] __blk_end_request_all+0xc/0x40 SS:ESP 0069:c0b61e00 [ 419.873553] CR2: 00000000000000e4 [ 419.873557] ---[ end trace 2020e6e471dc9528 ]--- # dmesg | grep xvda [ 5.205011] blkfront: xvda: flush diskcache: enabled [ 5.206492] xvda: xvda1
(In reply to comment #5) > The current F16 kernel should be almost identical to the F17 kernel right > now, so that doesn't surprise me a whole lot. > I see. > The interesting thing is going to be if this still happens with an older F16 > kernel (or something else not running 3.3) > I'm installing kernel-3.2.10-3.fc16.x86_64. Will now reboot and retry...
I attempted to reproduce with a virthost using the release F16 kernel (3.1.0-7.fc16.x86_64) and xen 4.1.1-8.fc16. The VM was created using an exploded F17 x86_64 DVD as installation source over http. The install goes on without a hitch and everything seems to be working. Unless I'm missing something, this confirms that it is a Dom0 problem instead of a problem with the use of F17 RC4 as a DomU and thus, is not a blocker by the Fedora 17 Final criterion [1]: The release must boot successfully as Xen DomU with releases providing a functional, supported Xen Dom0 and widely used cloud providers utilizing Xen. This does not include any issues limited to the release functioning as Xen Dom0 [1] http://fedoraproject.org/wiki/Fedora_17_Final_Release_Criteria Note that I am not saying anything along the lines of "this isn't a bug". At the moment, I am more interested in whether or not it could block release of F17.
Tim, just to be 100% sure: can you state specifically that you used a disk image file as the storage backend, file:/// ?
(In reply to comment #9) > Tim, just to be 100% sure: can you state specifically that you used a disk > image file as the storage backend, file:/// ? Yes, I used a disk image file as the storage backend in my test. virt-install -l http://192.168.0.5/cblr/ks_mirror/F17-rc4-x86_64/ --ram 1024 --disk /var/lib/xen/images/F17-virt-install.img,size=10 --name f17test1 --network bridge:br0
Was selinux enabled or not during the original failure? There seem to be some selinux issues using xen through the libvirt package, so it is probably worth eliminating that. If that don't help, I am also interested whether the original F16 kernel with the most recent F16 xen works or not (F16 and F17 xen are basically the same apart from some systemd differences) as that would indicate whether the root problem is with xen or the kernel.
Discussed during the 2012-05-24 Fedora 17 final go/no-go meeting. Rejected as a blocker for Fedora 17 final because it appears to be a Dom0 issue instead of a DomU issue which does not violate any of the Fedora 17 release criteria. In addition, the use of file:// storage backends is not recommended and phy:// storage backends do not seem affected by this bug
This sucks. I can reproduce this with a F17 (installed on a phy) and another disk (file:// - xvdb), doing this simple command: mkfs.ext4 -t ext4 /dev/xvdb1 and boom!
This simple test-case triggers it: #define _GNU_SOURCE #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <linux/fs.h> #include <errno.h> #include <string.h> int main(void) { char *name = "/dev/xvdb1"; int disk; __u64 range[2]; int rc; disk = open(name, O_RDWR | O_EXCL | O_LARGEFILE); if (disk == -1) { fprintf(stderr,"Cannot open %s [%s]\n", name, strerror(errno)); return errno; } range[0] = 0; range[1] = 4096; rc = ioctl(disk, BLKDISCARD, &range); close(disk); return rc; } So it is certainly with the kernel. The weird part is that I recall discard working fine with SSDs - but I never got to check the loop back (or at that time it was fallocate functionality). o Interestingly enough, if I ran that test-code on F16 domU I do _not_ get the crash. This is due to it not working well: ioctl(4, BLKDISCARD, {0, 0}) = -1 EOPNOTSUPP (Operation not supported so .. still not sure if blkfront or blkback is doing something silly.
Good news - the bug is in dom0 and this patch fixes the issue in blkback. diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index d0ee7ed..7304059 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -263,6 +263,7 @@ static inline void blkif_get_x86_32_req(struct blkif_request *dst, break; case BLKIF_OP_DISCARD: dst->u.discard.flag = src->u.discard.flag; + dst->u.discard.id = src->u.discard.id; dst->u.discard.sector_number = src->u.discard.sector_number; dst->u.discard.nr_sectors = src->u.discard.nr_sectors; break; @@ -293,6 +294,7 @@ static inline void blkif_get_x86_64_req(struct blkif_request *dst, break; case BLKIF_OP_DISCARD: dst->u.discard.flag = src->u.discard.flag; + dst->u.discard.id = src->u.discard.id; dst->u.discard.sector_number = src->u.discard.sector_number; dst->u.discard.nr_sectors = src->u.discard.nr_sectors; break;
Fixed upstream with commit 8c9ce606a60e4a0cb447bdc082ce383b96b227b4. This was CC'd to stable, which actually doesn't work anymore. The proper tag is CC: <stable.org> I'll look at getting it backported to 3.4 until Konrad pokes Greg about getting it into the 3.4.x stable series.
Committed to Fedora git. Will be in the next kernel update.
kernel-3.4.5-2.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/kernel-3.4.5-2.fc17
kernel-3.4.5-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.4.5-1.fc16
Package kernel-3.4.5-1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.4.5-1.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-10764/kernel-3.4.5-1.fc16 then log in and leave karma (feedback).
kernel-3.4.5-2.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.4.6-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.4.6-1.fc16
kernel-3.4.6-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.