Hide Forgot
Description of problem: host crash while doing migration Version-Release number of selected component (if applicable): 2.6.32-71.18.1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1.cmd: qemu-kvm -drive file='/usr/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idS61yuA,mac=9a:f1:48:07:df:b8,netdev=idS61yuA,id=ndev00idS61yuA,bus=pci.0,addr=0x3 -netdev tap,id=idS61yuA,vhost=on,script='/usr/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :1 -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm -incoming tcp:0:5200 2. 3. Actual results: Expected results: Additional info: 1. host processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 2 model name : AMD Phenom(tm) 9600B Quad-Core Processor stepping : 3 cpu MHz : 1150.000 flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock bogomips : 4587.44 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate 2. can not reproduce in rhel6.1 host 2.6.32-118.el6.x86_64 3. crash info: crash: invalid kernel virtual address: 7180 type: "possible" WARNING: cannot read cpu_possible_map crash: seek error: kernel virtual address: ffffffff8208e980 type: "xtime" BUG: unable to handle kernel paging request at 0000000000001000 IP: [<ffffffff814a024a>] __packet_get_status+0x3a/0x40 PGD 21252b067 PUD 2147bd067 CE: hpet increasing min_delta_ns to 15000 nsec PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/net/t0-122919-IluH/flags CPU 0 Modules linked in: nls_utf8 vhost_net macvtap macvlan tun nfs lockd fscache nfs_ acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table bridge stp llc ip v6 dm_mirror dm_region_hash dm_log kvm_amd kvm tpm_infineon wmi serio_raw edac_c ore edac_mce_amd snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_ seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 sg t g3 shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci radeon ttm drm_k ms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] Modules linked in: nls_utf8 vhost_net macvtap macvlan tun nfs lockd fscache nfs_ acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table bridge stp llc ip v6 dm_mirror dm_region_hash dm_log kvm_amd kvm tpm_infineon wmi serio_raw edac_c ore edac_mce_amd snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_ seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 sg t g3 shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci radeon ttm drm_k ms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan] Pid: 30156, comm: tcpdump Not tainted 2.6.32-71.18.1.el6.x86_64 #1 HP Compaq dc5 850 Microtower RIP: 0010:[<ffffffff814a024a>] [<ffffffff814a024a>] __packet_get_status+0x3a/0x 40 RSP: 0018:ffff880214febaa8 EFLAGS: 00010213 RAX: 0000780000001000 RBX: 0000000000001000 RCX: ffff880214c924c0
from the result we tested before, it works in 2.6.32-71.12.1.el6.x86_64
Created attachment 481528 [details] debug
(In reply to comment #3) > from the result we tested before, it works in 2.6.32-71.12.1.el6.x86_64 Do you mean it is a regression?
Will it happen w/o vhost loaded?
(In reply to comment #5) > (In reply to comment #3) > > from the result we tested before, it works in 2.6.32-71.12.1.el6.x86_64 > > Do you mean it is a regression? From the acceptance testing result we tested before, it works in 2.6.32-71.12.1.el6.x86_64, but kernel 2.6.32-71.12.1.el6.x86_64 is deleted now, I can not test it any more. this issue also can reproduce in 2.6.32-71.14.1.el6.x86_64 Testing with vhost, and try to get complete log. Will report the result soon.
can reproduce with vhost=on 1. cmd: qemu-kvm -drive file='/usr/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idvx5Ue1,mac=9a:f1:48:07:aa:1f,id=ndev00idvx5Ue1,bus=pci.0,addr=0x3 -netdev tap,id=idvx5Ue1,vhost=on,script='/usr/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :1 -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm -incoming tcp:0:5200 2. vmcore: PID: 9495 TASK: ffff88020e9f54e0 CPU: 1 COMMAND: "tcpdump" #0 [ffff880215949790] machine_kexec at ffffffff8103697b #1 [ffff8802159497f0] crash_kexec at ffffffff810b9078 #2 [ffff8802159498c0] oops_end at ffffffff814cc900 #3 [ffff8802159498f0] no_context at ffffffff8104652b #4 [ffff880215949940] __bad_area_nosemaphore at ffffffff810467b5 #5 [ffff880215949990] bad_area_nosemaphore at ffffffff81046883 #6 [ffff8802159499a0] do_page_fault at ffffffff814ce388 #7 [ffff8802159499f0] page_fault at ffffffff814cbc75 [exception RIP: __packet_get_status+58] RIP: ffffffff814a024a RSP: ffff880215949aa8 RFLAGS: 00010213 RAX: 0000780000001000 RBX: 0000000000001000 RCX: ffff8802141aed80 RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000001000 RBP: ffff880215949ab8 R8: ffff880215948000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000001000 R13: ffff8802155d7cc4 R14: ffff88021472aec0 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff880215949ac0] packet_lookup_frame at ffffffff814a0288 #9 [ffff880215949ae0] packet_poll at ffffffff814a0d0c #10 [ffff880215949b10] sock_poll at ffffffff813fb5ca #11 [ffff880215949b20] do_sys_poll at ffffffff8118274b #12 [ffff880215949f40] sys_poll at ffffffff81182bcc #13 [ffff880215949f80] system_call_fastpath at ffffffff81013172 RIP: 00007fad0e30cdf8 RSP: 00007fff3a7d2d50 RFLAGS: 00010286 RAX: 0000000000000007 RBX: ffffffff81013172 RCX: ffffffffffffffff RDX: 00000000000003e8 RSI: 0000000000000001 RDI: 00007fff3a7d3830 RBP: 00000000000003e8 R8: 0000000000000000 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000451980 R14: 00007fff3a7d3830 R15: 0000000001322360 ORIG_RAX: 0000000000000007 CS: 0033 SS: 002b crash>
I am confused, sorry. Which host kernel does have a problem? Which host kernel does not? You list one qemu command. Since this is during You say: >2. can not reproduce in rhel6.1 host >2.6.32-118.el6.x86_64 so in which host does it reproduce? 2.6.32-71.18.1.el6.x86_64?
Also does it or does it not reprocuce without vhost=on?
(In reply to comment #12) > I am confused, sorry. > Which host kernel does have a problem? > Which host kernel does not? > > You list one qemu command. Since this is during > > You say: > >2. can not reproduce in rhel6.1 host > >2.6.32-118.el6.x86_64 > > so in which host does it reproduce? > 2.6.32-71.18.1.el6.x86_64? reproduce in 2.6.32-71.18.1.el6.x86_64
So it's a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=623915 ? Does it happen without vhost=on or not?
(In reply to comment #15) > So it's a duplicate of > https://bugzilla.redhat.com/show_bug.cgi?id=623915 It block RHEL6.0Z migration testing, can you clone it to RHEL6.0Z, or change this one to RHEL6.0Z? > ? > Does it happen without vhost=on or not? repeat 10 times without vhost=on, can not reproduce.
So definitely a duplicate of 623915 Mark as such. *** This bug has been marked as a duplicate of bug 623915 ***
Re Comment 16, Please do not enable vhost in 6.0 at all.