Created attachment 481547 [details] seario info Description of problem: rhel6.32 guest installation cause B95 host reboot Version-Release number of selected component (if applicable): kvm-83-226.el5 How reproducible: 100% Steps to Reproduce: 1.cmd qemu-kvm -drive file='/usr/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=virtio,media=disk,cache=none,format=qcow2 -net nic,vlan=0,model=virtio,macaddr='9a:42:40:18:c8:b2' -net tap,vlan=0,script='/usr/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp 2,cores=1,threads=1,sockets=2 -drive file='/usr/isos/linux/RHEL6.0-Server-x86_64.iso',media=cdrom,index=1 -drive file='/usr/images/rhel60-64/ks.iso',media=cdrom,index=2 -cpu qemu64,+sse2 -soundhw ac97 -kernel '/usr/images/rhel60-64/vmlinuz' -initrd '/usr/images/rhel60-64/initrd.img' -vnc :0 -rtc-td-hack -M rhel5.6.0 -boot n -usbdevice tablet -no-kvm-pit-reinjection --append 'ks=cdrom nicdelay=60 console=ttyS0,115200 console=tty0 2. 3. Actual results: Expected results: Additional info: 1. host: kernel: 2.6.18-238.el5 cpu: processor : 3 vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 B95 Processor stepping : 2 cpu MHz : 800.000 cache size : 512 KB flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw bogomips : 5984.92 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate [8] 2. can install rhel6.32 in other host 3. can install winxp, win7, win2008, rhel6.64 successfully
(In reply to comment #0) > Created attachment 481547 [details] > seario info > > Description of problem: > rhel6.32 guest installation cause B95 host reboot > > Version-Release number of selected component (if applicable): > kvm-83-226.el5 > > How reproducible: > 100% > > Steps to Reproduce: > 1.cmd > qemu-kvm -drive > file='/usr/images/RHEL-Server-6.0-64-virtio.qcow2',index=0,if=virtio,media=disk,cache=none,format=qcow2 > -net nic,vlan=0,model=virtio,macaddr='9a:42:40:18:c8:b2' -net > tap,vlan=0,script='/usr/scripts/qemu-ifup-switch',downscript='no' -m 2048 -smp > 2,cores=1,threads=1,sockets=2 -drive > file='/usr/isos/linux/RHEL6.0-Server-x86_64.iso',media=cdrom,index=1 -drive > file='/usr/images/rhel60-64/ks.iso',media=cdrom,index=2 -cpu qemu64,+sse2 > -soundhw ac97 -kernel '/usr/images/rhel60-64/vmlinuz' -initrd > '/usr/images/rhel60-64/initrd.img' -vnc :0 -rtc-td-hack -M rhel5.6.0 -boot n > -usbdevice tablet -no-kvm-pit-reinjection --append 'ks=cdrom nicdelay=60 > console=ttyS0,115200 console=tty0 > 2. > 3. > > Actual results: > > > Expected results: > > > Additional info: > > 1. host: > kernel: 2.6.18-238.el5 > host kernel should be 2.6.18-245.el5 I can reproduce in 2.6.18-238.el5 & kvm-83-224.el5 > cpu: > processor : 3 > vendor_id : AuthenticAMD > cpu family : 16 > model : 4 > model name : AMD Phenom(tm) II X4 B95 Processor > stepping : 2 > cpu MHz : 800.000 > cache size : 512 KB > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat > pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm > 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm > extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw > bogomips : 5984.92 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate [8] > > 2. can install rhel6.32 in other host > 3. can install winxp, win7, win2008, rhel6.64 successfully
Serial output does not look complete. Also try to enable kdump.
Created attachment 482580 [details] seario info no core file while I enable kdump
At what stage does the host crash? Immediately after the guest kernel boots, or while installing packages?
What other AMD CPUs have you tried to reproduced on? Provide cpuinfo please.
(In reply to comment #6) > At what stage does the host crash? Immediately after the guest kernel boots, > or while installing packages? at "Starting installation process" step
can install successfully in the following host: processor : 11 vendor_id : AuthenticAMD cpu family : 16 model : 8 model name : Six-Core AMD Opteron(tm) Processor 2427 stepping : 0 cpu MHz : 800.000 cache size : 512 KB physical id : 1 siblings : 6 core id : 5 cpu cores : 6 apicid : 13 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc nonstop_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw
Can you check if bios update is available for this machine?
Can reproduce after I update host BIOS
Potential duplicate of Bug 713636 - both AMD without NPT.
Does RHEL 6.latest show the same behaviour?
repeat 150 times, can not reproduce it on rhel6 qemu-kvm-0.12.1.2-2.199.el6.x86_64
No time to fix it for RHEL5.8. Moving to 5.9. The installed guest is RHEL6.0, and if it is the only problematic guest then we can close this bug (probably not the case, since it looks like a high load issue) Suqin, Please test it with RHEL5.8 host and RHEL6.2 guest to see if the problem still exists. Thanks.
Already submit job to test this bug, and will update the testing result after job finished.
Tested it with RHEL5.8 host and RHEL6.2 guest, can reproduce this bug. The host reboot automatically after install the guest two times. the host info: kernel-2.6.18-290.el5 kvm-83-243.el5 the guest info: kernel-2.6.32-216.el6
Will look for errata in this area.
Possible relevant errata: 319 Inaccurate Temperature Measurement Description The internal thermal sensor used for CurTmp (F3xA4[31:21]), hardware thermal control (HTC), software thermal control (STC) thermal zone, and the sideband temperature sensor interface (SB-TSI) may report inconsistent values. For CPUID Fn0000_0001_EAX[7:4] (Model) 4 and higher, this temperature inconsistency will occur only on AM2r2, Fr2, Fr5 and Fr6 package processors Potential Effect on System HTC, STC thermal zone, and SB-TSI do not provide reliable thermal protection. This does not affect THERMTRIP or the use of the STC-active state using StcPstateLimit or StcPstateEn (F3x68[30:28, 5]). ----------------------- 346 System May Hang if Core Frequency is Even Divisor of Northbridge Clock Description When one processor core is operating at a clock frequency that is higher than the northbridge clock frequency, and another processor core is operating at a clock frequency that is an even divisor of the northbridge clock frequency, the northbridge may fail to complete a cache probe. Potential Effect on System System hang. Suggested Workaround System software should set F3x188[22] to 1b. Fix Planned
Please try retesting with reduced core frequency: For each core: cd /sys/devices/system/cpu/cpuX/cpufreq echo -n userspace > scaling_governor cat scaling_min_freq > scaling_setspeed Run the test with this. Please monitor scaling_cur_freq for all cores to make sure no silly daemon flips them back.
Hi Avi, Tested as your comment #22. Can reproduce this bug, the host reboot automatically during guest installation. my steps: # grep processor /proc/cpuinfo | wc -l 4 # cd /sys/devices/system/cpu/ # ls cpu0 cpu1 cpu2 cpu3 sched_mc_power_savings # cat cpu0/cpufreq/scaling_governor ondemand # for i in 0 1 2 3; do echo -n userspace > cpu$i/cpufreq/scaling_governor; done # for i in 0 1 2 3; do cat cpu$i/cpufreq/scaling_governor; done userspace userspace userspace userspace # for i in 0 1 2 3; do cat cpu$i/cpufreq/scaling_min_freq > cpu$i/cpufreq/scaling_setspeed; done # for i in 0 1 2 3; do cat cpu$i/cpufreq/scaling_setspeed; done 800000 800000 800000 800000 # for i in 0 1 2 3; do cat cpu$i/cpufreq/scaling_cur_freq ; done 800000 800000 800000 800000 Then run job to install guest in a loop. If my steps have problem please correct me, thanks.
It looks okay. Please provide the output of lspci -xxxx -s 00:18.3 (checking for erratum 346)
Also, the output of plain 'lspci'. Function 18 should be something like "Host bridge: Advanced Micro Devices [AMD] Family 10h Processor".
(In reply to comment #24) > It looks okay. > > Please provide the output of > > lspci -xxxx -s 00:18.3 > > (checking for erratum 346) [root@amd-B95-8-2 ~]# lspci -xxxx -s 00:18.3 00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control 00: 22 10 03 12 00 00 10 00 00 00 00 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 40: ff ff ff 3f 5c 00 b0 4a 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 04 00 3f 34 00 00 00 30 51 80 01 60 70: 51 11 32 60 01 01 98 00 14 0c 20 00 11 08 07 00 80: 81 e6 00 e6 e6 41 e6 01 08 00 00 00 00 60 58 00 90: 03 00 00 00 02 00 00 00 00 0d 1f 02 00 00 00 00 a0: 96 08 16 a0 80 18 0c 12 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 43 51 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 26 0f 81 c8 16 0f 2e 03 22 63 47 01 e0: 00 00 00 00 30 13 00 1e 59 7f 07 02 00 00 00 00 f0: 0f 00 10 00 00 00 00 00 00 00 00 00 42 0f 10 00
Looks like a really old lspci. Was that from RHEL 5? Please try RHEL 6 or latest Fedora, should give a lot more output, in particular a line beginning with 180:.
(In reply to comment #27) > Looks like a really old lspci. Was that from RHEL 5? Yes, that came from RHEL 5 host. the host info as following: kernel-2.6.18-290.el5 kvm-83-243.el5 From comment #15, this bug only can be reproduced in RHEL5, cannot reproduce it in RHEL6 (used the same host with different OS). > Please try RHEL 6 or > latest Fedora, should give a lot more output, in particular a line beginning > with 180:. Hi Avi, Do you mean let me reinstall above host to RHEL6 then take the lspci info ?
Yes. Or you can try to build pciutils from source if that's easier.
Created attachment 534154 [details] AMD host lspci info Attached the host lspci info.
It looks like 0x188[22] is set, so it's not erratum 346.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Affects specific, outdated, hardware. Closing.