Bug 680864

Summary: __packet_get_status unable to handle kernel paging request
Product: Red Hat Enterprise Linux 6 Reporter: Suqin Huang <shuang>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: khong, mst, tburke
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-03 08:07:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 580951    
Bug Blocks:    
Attachments:
Description Flags
debug none

Description Suqin Huang 2011-02-28 08:46:12 UTC
Description of problem:
host crash while doing migration

Version-Release number of selected component (if applicable):
2.6.32-71.18.1.el6.x86_64

How reproducible:
100% 

Steps to Reproduce:
1.cmd:
qemu-kvm -drive file='/usr/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idS61yuA,mac=9a:f1:48:07:df:b8,netdev=idS61yuA,id=ndev00idS61yuA,bus=pci.0,addr=0x3 -netdev tap,id=idS61yuA,vhost=on,script='/usr/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :1 -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0 -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm  -incoming tcp:0:5200

2.
3.
  
Actual results:


Expected results:


Additional info:
1. host
processor	: 3
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 2
model name	: AMD Phenom(tm) 9600B Quad-Core Processor
stepping	: 3
cpu MHz		: 1150.000
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
bogomips	: 4587.44
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

2. can not reproduce in rhel6.1 host
2.6.32-118.el6.x86_64

3.
crash info:

crash: invalid kernel virtual address: 7180  type: "possible"
WARNING: cannot read cpu_possible_map
crash: seek error: kernel virtual address: ffffffff8208e980  type: "xtime"

BUG: unable to handle kernel paging request at 0000000000001000
IP: [<ffffffff814a024a>] __packet_get_status+0x3a/0x40
PGD 21252b067 PUD 2147bd067
CE: hpet increasing min_delta_ns to 15000 nsec
PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/net/t0-122919-IluH/flags
CPU 0
Modules linked in: nls_utf8 vhost_net macvtap macvlan tun nfs lockd fscache nfs_
acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table bridge stp llc ip
v6 dm_mirror dm_region_hash dm_log kvm_amd kvm tpm_infineon wmi serio_raw edac_c
ore edac_mce_amd snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_
seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 sg t
g3 shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci radeon ttm drm_k
ms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]

Modules linked in: nls_utf8 vhost_net macvtap macvlan tun nfs lockd fscache nfs_
acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table bridge stp llc ip
v6 dm_mirror dm_region_hash dm_log kvm_amd kvm tpm_infineon wmi serio_raw edac_c
ore edac_mce_amd snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_
seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 sg t
g3 shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci radeon ttm drm_k
ms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: scsi_wait_scan]
Pid: 30156, comm: tcpdump Not tainted 2.6.32-71.18.1.el6.x86_64 #1 HP Compaq dc5
850 Microtower
RIP: 0010:[<ffffffff814a024a>]  [<ffffffff814a024a>] __packet_get_status+0x3a/0x
40
RSP: 0018:ffff880214febaa8  EFLAGS: 00010213
RAX: 0000780000001000 RBX: 0000000000001000 RCX: ffff880214c924c0

Comment 3 Suqin Huang 2011-02-28 09:29:37 UTC
from the result we tested before, it works in 2.6.32-71.12.1.el6.x86_64

Comment 4 Suqin Huang 2011-03-01 06:11:37 UTC
Created attachment 481528 [details]
debug

Comment 5 Dor Laor 2011-03-01 12:19:29 UTC
(In reply to comment #3)
> from the result we tested before, it works in 2.6.32-71.12.1.el6.x86_64

Do you mean it is a regression?

Comment 6 Dor Laor 2011-03-01 12:21:38 UTC
Will it happen w/o vhost loaded?

Comment 10 Suqin Huang 2011-03-02 10:13:20 UTC
(In reply to comment #5)
> (In reply to comment #3)
> > from the result we tested before, it works in 2.6.32-71.12.1.el6.x86_64
> 
> Do you mean it is a regression?

From the acceptance testing result we tested before, it works in 2.6.32-71.12.1.el6.x86_64, but kernel 2.6.32-71.12.1.el6.x86_64 is deleted now, I can not test it any more. this issue also can reproduce in 2.6.32-71.14.1.el6.x86_64

Testing with vhost, and try to get complete log.

Will report the result soon.

Comment 11 Suqin Huang 2011-03-02 11:06:00 UTC
can reproduce with vhost=on
1. cmd:
qemu-kvm -drive file='/usr/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idvx5Ue1,mac=9a:f1:48:07:aa:1f,id=ndev00idvx5Ue1,bus=pci.0,addr=0x3 -netdev tap,id=idvx5Ue1,vhost=on,script='/usr/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :1 -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0 -boot order=cdn,once=c,menu=off   -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm  -incoming tcp:0:5200


2. vmcore:

PID: 9495   TASK: ffff88020e9f54e0  CPU: 1   COMMAND: "tcpdump"
 #0 [ffff880215949790] machine_kexec at ffffffff8103697b
 #1 [ffff8802159497f0] crash_kexec at ffffffff810b9078
 #2 [ffff8802159498c0] oops_end at ffffffff814cc900
 #3 [ffff8802159498f0] no_context at ffffffff8104652b
 #4 [ffff880215949940] __bad_area_nosemaphore at ffffffff810467b5
 #5 [ffff880215949990] bad_area_nosemaphore at ffffffff81046883
 #6 [ffff8802159499a0] do_page_fault at ffffffff814ce388
 #7 [ffff8802159499f0] page_fault at ffffffff814cbc75
    [exception RIP: __packet_get_status+58]
    RIP: ffffffff814a024a  RSP: ffff880215949aa8  RFLAGS: 00010213
    RAX: 0000780000001000  RBX: 0000000000001000  RCX: ffff8802141aed80
    RDX: 0000000000000000  RSI: 0000000000001000  RDI: 0000000000001000
    RBP: ffff880215949ab8   R8: ffff880215948000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000001000
    R13: ffff8802155d7cc4  R14: ffff88021472aec0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff880215949ac0] packet_lookup_frame at ffffffff814a0288
 #9 [ffff880215949ae0] packet_poll at ffffffff814a0d0c
#10 [ffff880215949b10] sock_poll at ffffffff813fb5ca
#11 [ffff880215949b20] do_sys_poll at ffffffff8118274b
#12 [ffff880215949f40] sys_poll at ffffffff81182bcc
#13 [ffff880215949f80] system_call_fastpath at ffffffff81013172
    RIP: 00007fad0e30cdf8  RSP: 00007fff3a7d2d50  RFLAGS: 00010286
    RAX: 0000000000000007  RBX: ffffffff81013172  RCX: ffffffffffffffff
    RDX: 00000000000003e8  RSI: 0000000000000001  RDI: 00007fff3a7d3830
    RBP: 00000000000003e8   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000000000451980  R14: 00007fff3a7d3830  R15: 0000000001322360
    ORIG_RAX: 0000000000000007  CS: 0033  SS: 002b
crash>

Comment 12 Michael S. Tsirkin 2011-03-02 11:24:31 UTC
I am confused, sorry.
Which host kernel does have a problem?
Which host kernel does not?

You list one qemu command. Since this is during

You say:
>2. can not reproduce in rhel6.1 host
>2.6.32-118.el6.x86_64

so in which host does it reproduce?
2.6.32-71.18.1.el6.x86_64?

Comment 13 Michael S. Tsirkin 2011-03-02 11:24:59 UTC
Also does it or does it not reprocuce without vhost=on?

Comment 14 Suqin Huang 2011-03-03 04:37:58 UTC
(In reply to comment #12)
> I am confused, sorry.
> Which host kernel does have a problem?
> Which host kernel does not?
> 
> You list one qemu command. Since this is during
> 
> You say:
> >2. can not reproduce in rhel6.1 host
> >2.6.32-118.el6.x86_64
> 
> so in which host does it reproduce?
> 2.6.32-71.18.1.el6.x86_64?

reproduce in 2.6.32-71.18.1.el6.x86_64

Comment 15 Michael S. Tsirkin 2011-03-03 07:27:13 UTC
So it's a duplicate of
https://bugzilla.redhat.com/show_bug.cgi?id=623915
?
Does it happen without vhost=on or not?

Comment 16 Suqin Huang 2011-03-03 07:44:39 UTC
(In reply to comment #15)
> So it's a duplicate of
> https://bugzilla.redhat.com/show_bug.cgi?id=623915

It block RHEL6.0Z migration testing, can you clone it to RHEL6.0Z, or change this one to RHEL6.0Z?
> ?
> Does it happen without vhost=on or not?

repeat 10 times without vhost=on, can not reproduce.

Comment 17 Michael S. Tsirkin 2011-03-03 08:07:53 UTC
So definitely a duplicate of 623915
Mark as such.

*** This bug has been marked as a duplicate of bug 623915 ***

Comment 18 Michael S. Tsirkin 2011-03-03 08:11:49 UTC
Re Comment 16, Please do not enable vhost in 6.0 at all.