Bug 999175 - Clock Issue with rhel 5 i386 [NEEDINFO]
Clock Issue with rhel 5 i386
Status: CLOSED DEFERRED
Product: Fedora
Classification: Fedora
Component: qemu (Show other bugs)
18
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Fedora Virtualization Maintainers
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-20 17:12 EDT by Joe Ryner
Modified: 2013-11-19 20:27 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-17 15:31:06 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
mtosatti: needinfo? (jryner)


Attachments (Terms of Use)

  None (edit)
Description Joe Ryner 2013-08-20 17:12:49 EDT
Description of problem:
I seem to be having clock issues with rhel 5.9 when running as a guest on Fedora 18.  Time will freeze for several hours although the virt will continue to live.  Network IO to/from the virt is very poor. 5Mb/s on a network with a 10gb/s switches and 1gb/s switches.  


RHEL 6 virts run perfect on the same Fedora 18 host.  


Version-Release number of selected component (if applicable):
Fedora 18 x86 64 - kernel version 3.10.7-100.fc18.x86_64
            Qemu - qemu-1.2.2-13.fc18
RHEL 5.9 - kernel - 2.6.18-348.6.1.el5PAE




How reproducible:
Run a RHEL 5.9 virt on Fedora 18 using kvm, libvirt, qemu
Comment 1 Richard W.M. Jones 2013-08-21 04:54:42 EDT
Are you using kvmclock?  In qemu?  In the guest?

What is the complete libvirt XML?  What is the command line?
Are there any messages in the libvirt log file?

Please provide lots more detail.
https://fedoraproject.org/wiki/How_to_debug_Virtualization_problems
Comment 2 Joe Ryner 2013-08-21 11:58:59 EDT
dmesg | grep clock
kvm-clock: cpu 0, msr 0:74c9e1, boot clock
kvm-clock: cpu 0, msr 0:3351c61, primary cpu clock
Time: kvm-clock clocksource has been installed.
Time: hpet clocksource has been installed.

dmesg | tail
Bluetooth: HCI socket layer initialized
Bluetooth: L2CAP ver 2.8
Bluetooth: L2CAP socket layer initialized
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
Bluetooth: RFCOMM ver 1.8
Bluetooth: HIDP (Human Interface Emulation) ver 1.1
eth0: no IPv6 routers present
TSC appears to be running slowly. Marking it as unstable
Time: hpet clocksource has been installed.

Note:  I switched hpet clocksource to see if the situation improved, which it did not.

libvirt xml - mars108vca.xml
<domain type='kvm' id='25'>
  <name>mars108vca</name>
  <uuid>f1cd7449-7e80-ba55-23ae-51035ac4d1de</uuid>
  <description>mars 1 v8 devel web server – group a</description>
  <memory unit='KiB'>4096000</memory>
  <currentMemory unit='KiB'>4096000</currentMemory>
  <vcpu placement='static' current='1'>3</vcpu>
  <os>
    <type arch='x86_64' machine='pc-1.2'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>cpu64-rhel5</model>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads'/>
      <auth username='libvirt'>
        <secret type='ceph' uuid='XXXXXXXXX'/>
      </auth>
      <source protocol='rbd' name='images/mars108vca:rbd_cache=1'>
        <host name='10.66.4.4' port='6789'/>
        <host name='10.66.4.2' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <controller type='usb' index='0'>
      <alias name='usb0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <lease>
      <lockspace>__LIBVIRT__DISKS__</lockspace>
      <key>mars108vca</key>
      <target path='/dev/rbd/locks/mars108vca'/>
    </lease>
    <interface type='bridge'>
      <mac address='52:54:00:af:38:ab'/>
      <source bridge='br-yweb'/>
      <target dev='vnet10'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/10'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/10'>
      <source path='/dev/pts/10'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5910' autoport='yes' listen='127.0.0.1'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c14,c703</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c14,c703</imagelabel>
  </seclabel>
</domain>

qemu command line
/usr/bin/qemu-kvm -name mars108vca -S -M pc-1.2 -cpu Opteron_G1 -enable-kvm -m 4000 -smp 1,maxcpus=3,sockets=3,cores=1,threads=1 -uuid f1cd7449-7e80-ba55-23ae-51035ac4d1de -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mars108vca.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=rbd:images/mars108vca:rbd_cache=1:id=libvirt:key=XXXXXX==:auth_supported=cephx\;none:mon_host=10.66.4.4\:6789\;10.66.4.2\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:af:38:ab,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:10 -vga cirrus -incoming tcp:0.0.0.0:49176 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

Log only shows virt being brought up and being shut down.
Comment 3 Richard W.M. Jones 2013-08-21 12:20:12 EDT
(In reply to Joe Ryner from comment #0)
> Time will freeze for several hours although the virt will
> continue to live.  Network IO to/from the virt is very poor. 5Mb/s on a
> network with a 10gb/s switches and 1gb/s switches.  

So the symptoms are that 'date' in the guest returns the same time?
How about 'uptime' (in the guest)?  How about 'cat /proc/uptime'?
Comment 4 Joe Ryner 2013-08-21 12:38:01 EDT
I am switching to a different guest that is currently broken but it has the same setup as reported before.

[root@neptune203dca ~]# date
Tue Aug 20 04:10:03 CDT 2013
[root@neptune203dca ~]# date
Tue Aug 20 04:10:12 CDT 2013
[root@neptune203dca ~]# date
Tue Aug 20 04:10:13 CDT 2013
[root@neptune203dca ~]# cat /proc/uptime 
486800.73 599000.51
[root@neptune203dca ~]# 

/proc/uptime looks weird because the idle seconds is greater than uptime.  Is that right?
Comment 5 Joe Ryner 2013-08-21 12:38:53 EDT
Just checked ntpd is not running on the neptune203dca virt
Comment 6 Richard W.M. Jones 2013-08-21 13:03:10 EDT
(In reply to Joe Ryner from comment #4)
> I am switching to a different guest that is currently broken but it has the
> same setup as reported before.
> 
> [root@neptune203dca ~]# date
> Tue Aug 20 04:10:03 CDT 2013
> [root@neptune203dca ~]# date
> Tue Aug 20 04:10:12 CDT 2013
> [root@neptune203dca ~]# date
> Tue Aug 20 04:10:13 CDT 2013

So this is working?

> [root@neptune203dca ~]# cat /proc/uptime 
> 486800.73 599000.51
> [root@neptune203dca ~]# 
> 
> /proc/uptime looks weird because the idle seconds is greater than uptime. 
> Is that right?

On my machine (F19) the second number is much larger than
the first number:

$ cat /proc/uptime 
2702520.89 9982018.09

That appears to contradict the documentation, but I'll have
to look at the kernel source to see what it really means.

Does /proc/uptime count upwards?

I'm still a bit confused about what precisely is the issue
you're seeing.
Comment 7 Richard W.M. Jones 2013-08-21 13:06:22 EDT
It appears the output of /proc/uptime is
<uptime> <idle_time>

However, the trick is that <idle_time> is the sum of
idle time across all CPUs in the system, so on my
4 core laptop that number should be divided by 4
to get the average idle time per CPU (2495504) which
is less than the uptime (2702520).
Comment 8 Joe Ryner 2013-08-21 14:23:50 EDT
My issue is two fold.  

1.)  Time seems to stop periodically.  Our web server logs point to zillions of operations happening in 1 second and then nothing for hours.  Time will start again.

2.) Network IO to the virt is painfully slow for large transfers.  Like scp(ing) a 600MB file to the virt will run at about .5 Mb/s on a network where 500-600Mb/s is the norm.

I find this strange because our RHEL 6 virts on the same hosts run just fine on both timing and network transfers.

Below is my cpuinfo.  I only have one cpu in this virt. 


cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 6
model name	: AMD Opteron 240 (Gen 1 Class Opteron)
stepping	: 1
cpu MHz		: 2659.998
cache size	: 4096 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc up pni
bogomips	: 5319.99
Comment 9 Cole Robinson 2013-10-31 18:03:00 EDT
The recommended clock configuration for all guests is:

<clock offset='utc'>
  <timer name='rtc' tickpolicy='catchup'/>
  <timer name='pit' tickpolicy='delay'/>
  <timer name='hpet' present='no'/>
</clock>

Not sure if it will make a difference in this case, but please test and report back.

That guest config you reported above is a bit weird though: you have

  <cpu mode='custom' match='exact'>
    <model fallback='allow'>cpu64-rhel5</model>
  </cpu>

In the XML, but

 -cpu Opteron_G1

on the command line...

ccing marcelo
Comment 10 Cole Robinson 2013-11-17 15:31:06 EST
No response for 2 weeks, and given that F18 is end of life in a couple months, closing. Joe, if you are still seeing issues, please update to F20 or F19 and try again, and please clarify the confusion pointed out in Comment #9
Comment 11 Marcelo Tosatti 2013-11-19 20:27:35 EST
(In reply to Joe Ryner from comment #0)
> Description of problem:
> I seem to be having clock issues with rhel 5.9 when running as a guest on
> Fedora 18.  Time will freeze for several hours although the virt will
> continue to live.  Network IO to/from the virt is very poor. 5Mb/s on a
> network with a 10gb/s switches and 1gb/s switches.  
> 
> 
> RHEL 6 virts run perfect on the same Fedora 18 host.  
> 
> 
> Version-Release number of selected component (if applicable):
> Fedora 18 x86 64 - kernel version 3.10.7-100.fc18.x86_64
>             Qemu - qemu-1.2.2-13.fc18
> RHEL 5.9 - kernel - 2.6.18-348.6.1.el5PAE
> 
> 
> 
> 
> How reproducible:
> Run a RHEL 5.9 virt on Fedora 18 using kvm, libvirt, qemu

Joe,

Can you provide dmesg of the host machine? (even if the bug has been closed, to confirm whether or not its a duplicate). Thanks.

Note You need to log in before you can comment on or make changes to this bug.