Bug 604974

Summary: 32 bit Guest kernel panic when onlining cpu
Product: Red Hat Enterprise Linux 6 Reporter: Joy Pu <ypu>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 6.0CC: dzickus, emcnabb, peterm
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-16 18:33:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 599016    

Description Joy Pu 2010-06-17 07:40:44 UTC
Description:
Hotplug cpu in RHEL6-32 guest will cause a kernel panic. This can reproduce in 2.6.32-36 and 2.6.32-33 kernel, but not in 2.6.32-25 kernel.

Version-Release number of selected component (if applicable):
host kernel: 2.6.32-33.el6.x86_64 
guest kernel: 2.6.32-36.el6.i686
# rpm -qa | grep qemu
qemu-kvm-0.12.1.2-2.68.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.68.el6.x86_64
qemu-img-0.12.1.2-2.68.el6.x86_64
gpxe-roms-qemu-0.9.7-6.3.el6.noarch
qemu-kvm-tools-0.12.1.2-2.68.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. boot up a smp RHEL-6.0-32 guest
2. listen to serial by nc
# nc -U /tmp/serial-20100617-130306-gcxW
3. hotplug cpu1 with echo:
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 1 > /sys/devices/system/cpu/cpu1/online

Actual results:
guest kernel panic when hotplug cpu1

Expected results:
guest can hotplug cpu1 successfully

Additional info:
1. The command line:
#  /root/work/autotest/client/tests/kvm/qemu -name vm1 -monitor tcp:0:6001,server,nowait -drive file=/root/work/autotest/client/tests/kvm/images/RHEL-Server-6.0-32-virtio.qcow2,if=virtio,cache=none,boot=on,aio=native -net nic,vlan=0,model=virtio,macaddr=02:30:0D:20:0b:95 -net tap,vlan=0,ifname=virtio_0_6001,script=/root/work/autotest/client/tests/kvm/scripts/qemu-ifup-switch,downscript=no,vhost=on -m 4096 -smp 2 -soundhw ac97 -redir tcp:5000::22 -vnc :0 -spice port=8000,disable-ticketing -usbdevice tablet -rtc-td-hack -cpu qemu64,+sse2 -no-kvm-pit-reinjection -serial unix:/tmp/serial-20100617-130306-gcxW,server,nowait -no-hpet

2.Host cpuinfo
model           : 2
model name      : AMD Phenom(tm) 8750 Triple-Core Processor
stepping        : 3
cpu MHz         : 1200.000
cache size      : 512 KB
physical id     : 0
siblings        : 3
core id         : 2
cpu cores       : 3
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm
3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16
popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw ibs
bogomips        : 4809.90
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

3.Kernel panic info
invalid opcode: 0000 [#1] SMP 

last sysfs file: /sys/devices/system/cpu/cpu1/online

Modules linked in: autofs4(U) sunrpc(U) ip6t_REJECT(U) nf_conntrack_ipv6(U) ip6table_filter(U) ip6_tables(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) snd_intel8x0(U) snd_ac97_codec(U) ac97_bus(U) snd_seq(U) snd_seq_device(U) snd_pcm(U) ppdev(U) i2c_piix4(U) snd_timer(U) parport_pc(U) i2c_core(U) parport(U) snd(U) soundcore(U) snd_page_alloc(U) sg(U) ext4(U) mbcache(U) jbd2(U) sr_mod(U) cdrom(U) ata_generic(U) pata_acpi(U) virtio_blk(U) virtio_net(U) virtio_pci(U) virtio_ring(U) virtio(U) ata_piix(U) dm_mod(U) [last unloaded: scsi_wait_scan]



Pid: 1652, comm: bash Tainted: G S      W  (2.6.32-36.el6.i686 #1) Bochs

EIP: 0060:[<c0443dc1>] EFLAGS: 00210046 CPU: 0

EIP is at scheduler_tick+0xe1/0x240

EAX: 00000000 EBX: c0825b80 ECX: f4500000 EDX: c1e09080

ESI: c1e09080 EDI: 000095a6 EBP: 000009fa ESP: f4501d30

 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068

Process bash (pid: 1652, ti=f4500000 task=f45d5a90 task.ti=f4500000)

Stack:

 00000e31 00000000 f45d5a90 f45d5a90 00000000 00000000 c1e04fc0 c045f71f

<0> f4501e18 7109d7f0 0000000e c047f209 00000000 c04735ad c1e04fc0 c1e04f30

<0> c1e04f00 c047f1b0 c0473c07 00000000 3d10eb68 3d10eb68 f4501de8 33e22200

Call Trace:

 [<c045f71f>] ? update_process_times+0x3f/0x60

 [<c047f209>] ? tick_sched_timer+0x59/0xd0

 [<c04735ad>] ? __remove_hrtimer+0x2d/0xa0

 [<c047f1b0>] ? tick_sched_timer+0x0/0xd0

 [<c0473c07>] ? __run_hrtimer+0x77/0x190

 [<c0473fb9>] ? hrtimer_interrupt+0x129/0x2a0

 [<c04b1425>] ? rcu_process_callbacks+0x35/0x40

 [<c0456935>] ? __do_softirq+0xb5/0x1b0

 [<c044e396>] ? copy_process+0x796/0xff0

 [<c042599f>] ? smp_apic_timer_interrupt+0x4f/0x90

 [<c040a335>] ? apic_timer_interrupt+0x31/0x38

 [<c044e396>] ? copy_process+0x796/0xff0

 [<c0819edf>] ? text_poke+0x1af/0x200

 [<c040ef26>] ? alternatives_smp_switch+0xe6/0x190

 [<c081d756>] ? _etext+0x0/0x2

 [<c081187d>] ? native_cpu_up+0x1a1/0xaa1

 [<c081227e>] ? do_fork_idle+0x0/0x17

 [<c08136bf>] ? _cpu_up+0x99/0x111

 [<c04f80a2>] ? handle_mm_fault+0x132/0x1d0

 [<c081377f>] ? cpu_up+0x48/0x57

 [<c0805af8>] ? store_online+0x58/0x80

 [<c0805aa0>] ? store_online+0x0/0x80

 [<c06a0825>] ? sysdev_store+0x25/0x40

 [<c0574b69>] ? sysfs_write_file+0x99/0x100

 [<c0574ad0>] ? sysfs_write_file+0x0/0x100

 [<c051b5a0>] ? vfs_write+0xa0/0x190

 [<c051c031>] ? sys_write+0x41/0x70

 [<c04098fb>] ? sysenter_do_call+0x12/0x28

Code: 38 04 00 00 39 d0 74 11 89 c1 29 d1 89 86 ac 04 00 00 f0 01 0d cc 99 af c0 8b 44 24 08 31 c9 8b 58 28 89 c2 89 f0 ff 53 44 89 f2 <f0> 66 c7 02 00 00 8b 44 24 08 e8 d0 ea 08 00 b8 80 70 ad c0 8b 

EIP: [<c0443dc1>] scheduler_tick+0xe1/0x240 SS:ESP 0068:f4501d30

Comment 2 RHEL Program Management 2010-06-17 07:53:20 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 4 Peter Martuccelli 2010-06-25 15:23:19 UTC
This is an CPU online problem, not a hotplug issue.

Comment 6 Don Zickus 2010-07-16 18:20:32 UTC
I believe this is a duplicate of bz581722.

The hrtimers do not get shutdown correctly.  The patch hasn't made it into any kernel yet, otherwise I would just have you try it.  The other bz has an attached patch.   But I can whip up a scratch kernel with that patch too, if you want to verify it fixes your problem.

Comment 7 Prarit Bhargava 2010-07-16 18:33:49 UTC
Don, for now I'm dup'ing to 581722.  If it turns out that this is not a dup, I can undup and I can go from there...

P.

*** This bug has been marked as a duplicate of bug 581722 ***