Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 628875

Summary:

rhel5u5-32 guest became irresponsive while unattended_install from cd on amd-1216

Product:

Red Hat Enterprise Linux 6

Reporter:

Cao, Chen <kcao>

Component:

qemu-kvm

Assignee:

Zachary Amsden <zamsden>

Status:

CLOSED DUPLICATE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

medium

Docs Contact:

Priority:

low

Version:

6.0

CC:

akong, ddumas, ehabkost, gcosta, gleb, Jes.Sorensen, knoel, llim, michen, mkenneth, plyons, shuang, tburke, virt-maint, zamsden

Target Milestone:

Keywords:

Triaged, ZStream

Target Release:

6.1

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-04-13 13:21:12 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

580951, 693756

Attachments:

Description	Flags
guest stuck while installing with aio=native	none
strace output with -cpu cpu64-rhel5	none
guest kernel panic when trying -cpu cpu64-rhel6	none
screenshot of `sendkey ctrl-alt-f3` when installation stuck	none
screenshot of `sendkey ctrl-alt-f4` when installation stuck	none
screenshot of `sendkey ctrl-alt-f4` when installation stuck, without "busy inodes" info	none

Description Cao, Chen 2010-08-31 09:15:29 UTC

Description of problem:
became irresponsive, using private bridge, 
booting from a tftp/pxe server set on localhost, kickstart file in floppy.

NOTE:
1. only reproduced on the amd machine, cannot reproduce on intel boxes.
2. installation ends good if aio=threads


Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.113.el6.x86_64
kernel-2.6.32-70.el6.x86_64


How reproducible:
>50%


Steps to Reproduce:
1. setup the private bridge:
/usr/sbin/brctl addbr vbr0; \
echo 1 > /proc/sys/net/ipv6/conf/vbr0/disable_ipv6; \
echo 1 > /proc/sys/net/ipv4/ip_forward; \
/usr/sbin/brctl stp vbr0 on; \
/usr/sbin/brctl setfd vbr0 0; \
ifconfig vbr0 192.168.58.1; \
ifconfig vbr0 up; \
iptables -t nat -A POSTROUTING \
-s 192.168.58.254/24 ! -d 192.168.58.254/24 -j MASQUERADE;

2. setup dnsmasq
dnsmasq --strict-order --bind-interfaces \
--listen-address 192.168.58.1 \
--dhcp-range 192.168.58.2,192.168.58.254 \
--enable-tftp --tftp-root /home/tests/kvm/images/tftpboot \
--dhcp-boot pxelinux.0 --dhcp-no-override

3. start vm using:
qemu-kvm -name vm1 \
-chardev socket,id=human_monitor_YiMl,path=/tmp/monitor-humanmonitor1-20100831-145056-oH0p,server,nowait \
-mon chardev=human_monitor_YiMl,mode=readline \
-chardev socket,id=serial_XSNF,path=/tmp/serial-20100831-145056-oH0p,server,nowait \
-device isa-serial,chardev=serial_XSNF \
-drive file=/home/tests/kvm/images/RHEL-Server-5.5-32-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2,aio=native \
-device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 \
-device virtio-net-pci,netdev=idDEaLvO,id=ndev00idDEaLvO,mac=02:2D:4B:53:e3:73,bus=pci.0,addr=0x3 \
-netdev tap,id=idDEaLvO,ifname=virtio_0_8000,script=/home/tests/kvm/scripts/qemu-ifup-vbr0,downscript=no \
-m 2048 -smp 2 \
-drive file=/home/tests/kvm/isos/linux/RHEL-Server-5.5-i386-DVD.iso,index=1,if=none,id=drive-ide0-0-0,media=cdrom,readonly=on,format=raw \
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \
-cpu qemu64,+x2apic \
-fda /home/tests/kvm/images/floppy.img \
-vnc :0 -spice port=8000,disable-ticketing \
-rtc base=utc,clock=host,driftfix=none \
-M rhel6.0.0 -usbdevice tablet -no-kvm-pit-reinjection -boot n

  
Actual results:
guest stops while installing.


Expected results:
installation completed good.


Additional info:
1.
# cat /proc/cpuinfo
[snip]
processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 67
model name      : Dual-Core AMD Opteron(tm) Processor 1216
stepping        : 3
cpu MHz         : 2400.000
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips        : 4822.39
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc


2. content of the floppy
floppy.img
`-- ks.cfg

3. tftproot:
# ls -F tftpboot/
initrd.img  pxelinux.0  pxelinux.cfg/  vmlinuz

4. nothing special in dmesg and /sys/kernel/debug/tracing/trace_pipe

5. mmu_* are all 0 after guest stuck

6.1. NOTE: only reproduced on the amd machine, cannot reproduce on intel boxes.

6.2. installation ends good if aio=threads

Comment 1 Cao, Chen 2010-08-31 09:17:33 UTC

Created attachment 442140 [details]
guest stuck while installing with aio=native

guest stuck at random point, i.e. not alway stuck at 10% installation

Comment 2 Cao, Chen 2010-08-31 09:26:21 UTC

it is very easy to trigger
Bug 615839 - kernel panic when install guest with -fda
when starting vm without spice (has nothing to do with spice? accident?)

Comment 3 Dor Laor 2010-08-31 09:58:32 UTC

(In reply to comment #2)
> it is very easy to trigger
> Bug 615839 - kernel panic when install guest with -fda
> when starting vm without spice (has nothing to do with spice? accident?)

Why you state the above? The above bug is a kernel floppy thing, not related to this.

Can you please remove all the none relevant devices from the qemu cmdline so we'll isolate this issue- 

 - remove spice, remove fda, remove the network setup.
If it is a block related issue, let's start by focusing on it.

Comment 4 Jes Sorensen 2010-08-31 12:58:33 UTC

Chen.

Can you reproduce this without PXE boot, ie. when booting from a local
imagine instead?

Thanks,
Jes

Comment 5 Jes Sorensen 2010-08-31 13:47:46 UTC

Chen,

While at it, can you attached a copy of the ks.cfg file you have on
the floppy image?

Thanks,
Jes

Comment 6 Cao, Chen 2010-08-31 15:16:23 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > it is very easy to trigger
> > Bug 615839 - kernel panic when install guest with -fda
> > when starting vm without spice (has nothing to do with spice? accident?)
> 
> Why you state the above? The above bug is a kernel floppy thing, not related to
> this.

because sometimes guest will encounter panic when trying to reproduce this bug.

> 
> Can you please remove all the none relevant devices from the qemu cmdline so
> we'll isolate this issue- 
> 
>  - remove spice, remove fda, remove the network setup.
> If it is a block related issue, let's start by focusing on it.

I have removed spice fda and used -kernel -initrd params to do the
installation,
very hard to reproduce (1/20).
will try to provide more info later

cancel the blocker and 6.0 request.


(In reply to comment #3)
Jes,

I have tried the following command, still can reproduce, but not so frequently,

qemu-kvm -name vm1 -chardev socket,id=human_monitor_YiMl,path=/tmp/monitor-humanmonitor1-20100831-145056-oH0p,server,nowait -mon chardev=human_monitor_YiMl,mode=readline -chardev socket,id=serial_XSNF,path=/tmp/serial-20100831-145056-oH0p,server,nowait -device isa-serial,chardev=serial_XSNF -drive file=/home/tests/kvm/images/RHEL-Server-5.5-32-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idDEaLvO,id=ndev00idDEaLvO,mac=02:2D:4B:53:e3:73,bus=pci.0,addr=0x3 -netdev tap,id=idDEaLvO,ifname=virtio_0_8000,script=/home/tests/kvm/scripts/qemu-ifup-vbr0,downscript=no -m 2048 -smp 2 -drive file=/home/tests/kvm/isos/linux/RHEL-Server-5.5-i386-DVD.iso,index=1,if=none,id=drive-ide0-0-0,media=cdrom,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -cpu qemu64,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0 -usbdevice tablet -no-kvm-pit-reinjection -kernel tftpboot/vmlinuz -initrd tftpboot/initrd.img -append ks=http://10.66.91.153/ks.cfg

Comment 7 RHEL Program Management 2010-08-31 15:17:54 UTC

Thank you for your bug report. This issue was evaluated for inclusion
in the current release of Red Hat Enterprise Linux. Unfortunately, we
are unable to address this request in the current release. Because we
are in the final stage of Red Hat Enterprise Linux 6 development, only
significant, release-blocking issues involving serious regressions and
data corruption can be considered.

If you believe this issue meets the release blocking criteria as
defined and communicated to you by your Red Hat Support representative,
please ask your representative to file this issue as a blocker for the
current release. Otherwise, ask that it be evaluated for inclusion in
the next minor release of Red Hat Enterprise Linux.

Comment 9 Jes Sorensen 2010-08-31 16:54:23 UTC

Chen,

Does the problem disappear if you change (you can try with the most
easy to reproduce case above):

 -cpu qemu64,+x2apic

to
 -cpu cpu64-rhel5

or
 -cpu cpu64-rhel6

or
 -cpu Opteron_G1

Thanks,
Jes

Comment 11 Cao, Chen 2010-09-01 05:42:37 UTC

(In reply to comment #9)
> Chen,
> 
> Does the problem disappear if you change (you can try with the most
> easy to reproduce case above):
> 
>  -cpu qemu64,+x2apic
> 
> to
>  -cpu cpu64-rhel5
> 

reproduced 1 time out of 10.

attach the strace output in the following comment.

> or
>  -cpu cpu64-rhel6

kernel panic, attach the screenshot in following comment.

Comment 12 Cao, Chen 2010-09-01 05:46:28 UTC

Created attachment 442328 [details]
strace output with -cpu cpu64-rhel5

Comment 13 Cao, Chen 2010-09-01 05:47:09 UTC

Created attachment 442329 [details]
guest kernel panic when trying -cpu cpu64-rhel6

Comment 14 Jes Sorensen 2010-09-01 10:02:33 UTC

Hi Chen,

Any chance you could try this brew build:
https://brewweb.devel.redhat.com/taskinfo?taskID=2723893

It should make the guest kernel panic go away with cpu64-rhel6, but
I am curious if it has any impact on the other problem ... it may not,
but it would be good to know.

Thanks,
Jes

Comment 15 Cao, Chen 2010-09-01 15:20:00 UTC

(In reply to comment #14)
> Hi Chen,
> 
> Any chance you could try this brew build:
> https://brewweb.devel.redhat.com/taskinfo?taskID=2723893
> 
> It should make the guest kernel panic go away with cpu64-rhel6, but
> I am curious if it has any impact on the other problem ... it may not,
> but it would be good to know.
> 

Hi, Jes,

I have tested your build, no panic in guest, but still unresponsive
while installing, about 1 time out of 10.


Cao, Chen


--
1. command to create image:
 qemu-img create -f qcow2 -opreallocation=metadata -ocluster_size=2M RHEL-Server-5.5-32-virtio.qcow2 15G

2. command to start vm:

qemu-kvm -name vm1 -cpu cpu64-rhel6 -chardev
socket,id=human_monitor_YiMl,path=/tmp/monitor-humanmonitor1-20100831-145056-oH0p,server,nowait
-mon chardev=human_monitor_YiMl,mode=readline -chardev
socket,id=serial_XSNF,path=/tmp/serial-20100831-145056-oH0p,server,nowait
-device isa-serial,chardev=serial_XSNF -drive
file=/home/tests/kvm/images/RHEL-Server-5.5-32-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,boot=on,format=qcow2,aio=native
-device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1
-device
virtio-net-pci,netdev=idDEaLvO,id=ndev00idDEaLvO,mac=02:2D:4B:53:e3:73,bus=pci.0,addr=0x3
-netdev
tap,id=idDEaLvO,ifname=virtio_0_8000,script=/home/tests/kvm/scripts/qemu-ifup-vbr0,downscript=no
-m 2048 -smp 2 -drive
file=/home/tests/kvm/isos/linux/RHEL-Server-5.5-i386-DVD.iso,index=1,if=none,id=drive-ide0-0-0,media=cdrom,readonly=on,format=raw
-device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -vnc :0 -rtc base=utc,clock=host,driftfix=none -M rhel6.0.0
-usbdevice tablet -no-kvm-pit-reinjection -kernel tftpboot/vmlinuz -initrd
tftpboot/initrd.img -append ks=http://10.66.91.153/ks.cfg

Comment 16 Jes Sorensen 2010-09-01 15:31:12 UTC

Thanks, so we are looking at two different bugs.

Is it possible for you to try to reduce the test case to a simpler setup
that doesn't involve tftp/pxe?

Cheers,
Jes

Comment 17 Cao, Chen 2010-09-02 09:56:46 UTC

(In reply to comment #16)
> Thanks, so we are looking at two different bugs.
> 
> Is it possible for you to try to reduce the test case to a simpler setup
> that doesn't involve tftp/pxe?
> 

Jes,

I have already been testing without tftp/pxe as described in
Comment 15. just used the files under the dir, but not boot from
network.


besides, I have tried on amd-9600b, too, cannot reproduce the bug,
so it can only be triggered on the amd 1216 machine.

and as we have found "-1 EAGAIN (Resource temporarily unavailable)"
in `strace`, is the info related to the bug?



attach the cpuinfo of amd-9600b
--
processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 2
model name      : AMD Phenom(tm) 9600B Quad-Core Processor
stepping        : 3
cpu MHz         : 1150.000
cache size      : 512 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs npt lbrv svm_lock
bogomips        : 4609.53
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 18 Jes Sorensen 2010-09-02 10:13:05 UTC

Hi Chen,

Looks like it could be related to older AMD processors then.
Would you be able to test it on the 9600 box but do this first:

rmmod kvm-amd
modprobe kvm-amd npt=0

and see if the problem shows up there then? It would be interesting
to see if the NPT support makes the problem disappear or not.

Thanks,
Jes

Comment 19 Cao, Chen 2010-09-02 14:39:58 UTC

(In reply to comment #18)
> Hi Chen,
> 
> Looks like it could be related to older AMD processors then.
> Would you be able to test it on the 9600 box but do this first:
> 
> rmmod kvm-amd
> modprobe kvm-amd npt=0
> 
> and see if the problem shows up there then? It would be interesting
> to see if the NPT support makes the problem disappear or not.

Hi Jes,

I have tried 20 times, with npt=0, cannot reproduce this bug.
I can try more later.


Cao, Chen


# cat /sys/module/kvm_amd/parameters/npt
0

Comment 20 Cao, Chen 2010-09-03 01:43:50 UTC

oops, tried again with aio=threads,

hit the problem 5 out of 50 times, so change the subject
to "guest becomes unresponsive when installing on old amd machine"

Comment 21 Jes Sorensen 2010-09-06 10:28:38 UTC

Interesting, so it looks to be related to older AMD hosts, but not NPT or
thread model related.

Avi or Gleb, do you have any ideas what might cause this?

Cheers,
Jes

Comment 22 Jes Sorensen 2010-09-14 12:25:02 UTC

When the guest gets stuck, could you please type the following in the
monitor:

info cpus
info registers

Thanks,
Jes

Comment 23 Jes Sorensen 2010-09-14 12:45:18 UTC

In addition, can you capture the guest's dmesg output? Ie. try to switch to
the virtual console in vnc and grab it.

Thanks,
Jes

Comment 24 Jes Sorensen 2010-09-14 12:51:55 UTC

Another question, is this at all reproducible with -smp 1?

Comment 25 Cao, Chen 2010-09-15 09:50:58 UTC

1.
(qemu) info cpus
* CPU #0: pc=0x00000000c0416913 thread_id=29721 
  CPU #1: pc=0x00000000c042c744 thread_id=29722 

2.
(qemu) info registers
EAX=00000000 EBX=dff88ce0 ECX=c068b880 EDX=000000fb
ESI=00000001 EDI=00000001 EBP=00000000 ESP=dff88cac
EIP=c0416917 EFL=00000297 [--S-APC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =0000 00000000 ffffffff 00000000
GS =0000 b7fb86c0 ffffffff 00000000
LDT=0088 c0746020 00000027 00008200 DPL=0 LDT
TR =0080 c2009400 00002073 00008b00 DPL=0 TSS32-busy
GDT=     c2018000 000000ff
IDT=     c06f6000 000007ff
CR0=8005003b CR2=b73ce000 CR3=00742000 CR4=000006d0
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
FCW=037f FSW=3820 [ST=7] FTW=80 MXCSR=00000000
FPR0=c661800000000000 4017 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=fffffffe00000000 c01d
FPR6=0000000000000000 0000 FPR7=fffffffe00000000 401d
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000


3. guest's dmesg output?
cannot get output from guest, since the guest hangs

4. reproducible with -smp 1?
No, tried more than 50 times.
(50 times aio=native and 50 times aio=threads)

Comment 26 Jes Sorensen 2010-09-15 10:24:00 UTC

For 3. please use -serial stdio and console=ttyS0 as kernel boot argument,
so we can get the console output from the guest.

Thanks,
Jes

Comment 27 Cao, Chen 2010-09-15 17:37:23 UTC

(In reply to comment #26)
> For 3. please use -serial stdio and console=ttyS0 as kernel boot argument,
> so we can get the console output from the guest.
> 
> Thanks,
> Jes

stucked at anaconda starting once, no special output in console "ctrl+alt+f3".

hard to reproduce this time, will try later.

Comment 28 Cao, Chen 2010-09-16 09:28:52 UTC

reproduced a few more times,
since the console to set to ttyS0, I can get the installation Text-UI
on the serial output, showing the installation progress.

after guest begins to install i issue command "sendkey ctrl-alt-f3"
and "sendkey ctrl-alt-f4" to try to get info from other console,
the screenshots are attached in the following comments.

addition all `info cpus` and `info registers`

When stuck at:
selinux-policy-targeted-2.4.6-279.el5-noarch 100%
total: 55%

major difference from previous dump:
1. on vcpu is halted.
2. CR0 == 8005002b ("Extension type" bit is cleared?)

(I have found that the problems happen mostly when installing
glibc-common and selinux-policy-targeted.)


(qemu) info cpus
* CPU #0: pc=0x00000000c0403be1 (halted) thread_id=12884
  CPU #1: pc=0x00000000c042c729 thread_id=12885

(qemu) info registers
EAX=00000000 EBX=00000000 ECX=c0403bb0 EDX=c0704000
ESI=c06359e3 EDI=c2015088 EBP=00000020 ESP=c0704fd4
EIP=c0403be1 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=1
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =0000 00000000 ffffffff 00000000
GS =0000 b7fca6d0 ffffffff 00000000
LDT=0088 c0746020 00000027 00008200 DPL=0 LDT
TR =0080 c2009400 00002073 00008b00 DPL=0 TSS32-busy
GDT=     c2018000 000000ff
IDT=     c06f6000 000007ff
CR0=8005002b CR2=b7fba000 CR3=00742000 CR4=000006d0
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
FCW=037f FSW=3820 [ST=7] FTW=80 MXCSR=00000000
FPR0=c661800000000000 4017 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=fffffffe00000000 c01d
FPR6=0000000000000000 0000 FPR7=fffffffe00000000 401d
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000



And another 50 times with -smp 1, installations all end good.

Comment 29 Cao, Chen 2010-09-16 09:30:01 UTC

Created attachment 447689 [details]
screenshot of `sendkey ctrl-alt-f3` when installation stuck

Comment 30 Cao, Chen 2010-09-16 09:30:40 UTC

Created attachment 447690 [details]
screenshot of `sendkey ctrl-alt-f4` when installation stuck

Comment 31 Cao, Chen 2010-09-16 09:31:41 UTC

Created attachment 447692 [details]
screenshot of `sendkey ctrl-alt-f4` when installation stuck, without "busy inodes" info

Comment 33 Gleb Natapov 2010-10-12 14:16:24 UTC

Can you try reproducing with -cpu cpu64-rhel5,-kvmclock.

Comment 34 Cao, Chen 2010-10-14 02:01:43 UTC

(In reply to comment #33)
> Can you try reproducing with -cpu cpu64-rhel5,-kvmclock.

I have tried 40 times and cannot reproduce, but i will try more.

Comment 35 Gleb Natapov 2010-10-14 07:48:11 UTC

(In reply to comment #34)
> (In reply to comment #33)
> > Can you try reproducing with -cpu cpu64-rhel5,-kvmclock.
> 
> I have tried 40 times and cannot reproduce, but i will try more.

And make sure that you can still reproduce without -kvmclock part please.

Comment 36 Cao, Chen 2010-10-18 02:39:52 UTC

(In reply to comment #35)
> (In reply to comment #34)
> > (In reply to comment #33)
> > > Can you try reproducing with -cpu cpu64-rhel5,-kvmclock.
> > 
> > I have tried 40 times and cannot reproduce, but i will try more.
> 
> And make sure that you can still reproduce without -kvmclock part please.

reproduced without -kvmclock,

1.
info cpus
* CPU #0: pc=0x00000000c0429c70 thread_id=21824
  CPU #1: pc=0x00000000c042c723 thread_id=21825

2.
info registers
EAX=00029001 EBX=f716b1dc ECX=c20fb8c0 EDX=00000001
ESI=1afce141 EDI=03cb7184 EBP=dfb2be3c ESP=dfb2be24
EIP=c0429c70 EFL=00000202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0060 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0068 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =0000 00000000 ffffffff 00000000
GS =0033 b7f1d6c0 ffffffff 00d0f300 DPL=3 DS   [-WA]
LDT=0088 c0746020 00000027 00008200 DPL=0 LDT
TR =0080 c2009400 00002073 00008b00 DPL=0 TSS32-busy
GDT=     c2018000 000000ff
IDT=     c06f6000 000007ff
CR0=80050023 CR2=b74e9000 CR3=1fb88000 CR4=000006d0
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
FCW=037f FSW=0020 [ST=0] FTW=00 MXCSR=00000000
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=fffffffe00000000 c01d FPR5=94b2000000000000 400d
FPR6=ea1575143cf97000 4147 FPR7=a000000000000000 4002
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000



and i have run another 10 times with -kvmclock, the installations
end good.

now running with cpu64-rhel6 will update the results here later.

Comment 37 Cao, Chen 2010-10-21 02:28:52 UTC

with -cpu64-rhel5,-kvmclock, 200 times, cannot reproduce.
with -cpu64-rhel6, reproduced within 15 times, 1/13
with -cpu64-rhel6,-kvmclock, reproduce within 70 times, 1/66.

`info cpus` and `info registers` when -cpu64-rhel6,-kvmclock

(qemu) info cpus
info cpus
* CPU #0: pc=0x00000000010b0073 thread_id=17716 
  CPU #1: pc=0x00000000c0403be1 (halted) thread_id=17717 
(qemu) info registers
info registers
EAX=00000000 EBX=010e6cc4 ECX=00000000 EDX=b76d0de0
ESI=b7643c90 EDI=b76d0e40 EBP=bf91b3e8 ESP=bf91b350
EIP=010b0073 EFL=00000246 [---Z-P-] CPL=3 II=0 A20=1 SMM=0 HLT=0
ES =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
CS =0073 00000000 08048fff 00c0fb00 DPL=3 CS32 [-RA]
SS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
DS =007b 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
FS =0000 00000000 ffffffff 00000000
GS =0033 b7f556c0 ffffffff 00d0f300 DPL=3 DS   [-WA]
LDT=0088 c0746020 00000027 00008200 DPL=0 LDT
TR =0080 c2009400 00002073 00008b00 DPL=0 TSS32-busy
GDT=     c2018000 000000ff
IDT=     c06f6000 000007ff
CR0=8005002b CR2=00760cd0 CR3=0214c000 CR4=000006d0
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 
DR6=ffff0ff0 DR7=00000400
FCW=037f FSW=3800 [ST=7] FTW=80 MXCSR=00000000
FPR0=c661800000000000 4017 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000

Comment 45 Zachary Amsden 2010-10-27 01:08:42 UTC

This sounds a lot like complaints I see and issue I see with kvmclock or TSC based guests with unstable host TSC, specifically on AMD machines.  I believe these commits (in reverse order) need to be back-ported to the host kernel for clocks based off TSC to remain stable and ticking.

commit 47008cd887c1836bcadda123ba73e1863de7a6c4
Author: Zachary Amsden <zamsden>
Date:   Thu Aug 19 22:07:19 2010 -1000

    KVM: x86: Move TSC reset out of vmcb_init

    The VMCB is reset whenever we receive a startup IPI, so Linux is setting
    TSC back to zero happens very late in the boot process and destabilizing
    the TSC.  Instead, just set TSC to zero once at VCPU creation time.

    Why the separate patch?  So git-bisect is your friend.

    Signed-off-by: Zachary Amsden <zamsden>
    Signed-off-by: Marcelo Tosatti <mtosatti>

commit 58877679fd393d3ef71aa383031ac7817561463d
Author: Zachary Amsden <zamsden>
Date:   Thu Aug 19 22:07:18 2010 -1000

    KVM: x86: Fix SVM VMCB reset

    On reset, VMCB TSC should be set to zero.  Instead, code was setting
    tsc_offset to zero, which passes through the underlying TSC.

    Signed-off-by: Zachary Amsden <zamsden>
    Signed-off-by: Marcelo Tosatti <mtosatti>

commit bfb3f3326c915b1800dc65d10ca09fbd548353d2
Author: Zachary Amsden <zamsden>
Date:   Sat Sep 18 14:38:15 2010 -1000

    KVM: x86: TSC catchup mode

    Negate the effects of AN TYM spell while kvm thread is preempted by tracking
    conversion factor to the highest TSC rate and catching the TSC up when it has
    fallen behind the kernel view of time.  Note that once triggered, we don't
    turn off catchup mode.

    A slightly more clever version of this is possible, which only does catchup
    when TSC rate drops, and which specifically targets only CPUs with broken
    TSC, but since these all are considered unstable_tsc(), this patch covers
    all necessary cases.

    Signed-off-by: Zachary Amsden <zamsden>
    Signed-off-by: Marcelo Tosatti <mtosatti>

commit 1377ff23ae2bf49c76f8f498ca81050878b9666a
Author: Zachary Amsden <zamsden>
Date:   Sat Sep 18 14:38:14 2010 -1000

    KVM: x86: Rename timer function

    This just changes some names to better reflect the usage they
    will be given.  Separated out to keep confusion to a minimum.

    Signed-off-by: Zachary Amsden <zamsden>
    Signed-off-by: Marcelo Tosatti <mtosatti>

commit 9a088cc32488cfb9f60dca5972155ba13f39eb83
Author: Zachary Amsden <zamsden>
Date:   Sat Sep 18 14:38:13 2010 -1000

    KVM: x86: Make math work for other scales

    The math in kvm_get_time_scale relies on the fact that
    NSEC_PER_SEC < 2^32.  To use the same function to compute
    arbitrary time scales, we must extend the first reduction
    step to shrink the base rate to a 32-bit value, and
    possibly reduce the scaled rate into a 32-bit as well.

    Note we must take care to avoid an arithmetic overflow
    when scaling up the tps32 value (this could not happen
    with the fixed scaled value of NSEC_PER_SEC, but can
    happen with scaled rates above 2^31.

    Signed-off-by: Zachary Amsden <zamsden>
    Signed-off-by: Marcelo Tosatti <mtosatti>

Comment 46 Gleb Natapov 2010-11-25 15:11:25 UTC

*** Bug 654539 has been marked as a duplicate of this bug. ***

Comment 47 Zachary Amsden 2011-02-03 14:32:13 UTC

I'm hopeful this has been addressed by my recent TSC patches, which were posted for 6.1.  Moving to POST state.

It's possible this is a separate issue, so we'll need to reproduce.  I'll try to create a brew build with my patches so we can test to see if they are effective in that regard.

Comment 49 Zachary Amsden 2011-02-03 15:53:31 UTC

There are patches posted, which I believe resolve this bug, and at least one was marked for this particular BZ (I used BZ: num, num syntax, which might have got tracking of state confused).

It's entirely possible this is a duplicate, but I did post at least one patch against this BZ.

It's entirely possible also that this is a separate bug not even related to my patches, which is why it should stay open for now.

Comment 50 Gleb Natapov 2011-02-06 08:07:27 UTC

*** Bug 654539 has been marked as a duplicate of this bug. ***

Comment 51 Zachary Amsden 2011-02-23 21:36:56 UTC

I have outstanding patches flagged with this BZ number which are in RHEL6.1 queue.

Comment 52 Zachary Amsden 2011-03-28 21:27:27 UTC

Patches to fix AMD specific hangs have been posted, please retest

Comment 54 Miya Chen 2011-04-08 06:33:57 UTC

move it back to POST as the patch is still not applied.

Comment 56 Dor Laor 2011-04-13 13:14:50 UTC

(In reply to comment #54)
> move it back to POST as the patch is still not applied.

Can you reproduce the issue?

Comment 57 Dor Laor 2011-04-13 13:21:12 UTC

New info - this bug is fixed by BZ 651635 (already committed).
I'll mark it as a dup but I do ask QE to verify this as well with the fix for 651635

*** This bug has been marked as a duplicate of bug 651635 ***