Bug 558855 - /distribution/virt/install fails with libvirtError: internal error unable to start guest: char device redirected to /dev/pts/1
Summary: /distribution/virt/install fails with libvirtError: internal error unable to ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: python-virtinst
Version: 5.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 5.5
Assignee: Cole Robinson
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 563151
TreeView+ depends on / blocked
 
Reported: 2010-01-26 15:18 UTC by Barry Marson
Modified: 2023-09-14 01:20 UTC (History)
10 users (show)

Fixed In Version: python-virtinst-0.400.3-8.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 563151 (view as bug list)
Environment:
Last Closed: 2010-03-30 08:51:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
RHTS XML script for queuing this job. Note TESTDISTRO needs to be set (4.14 KB, text/xml)
2010-01-26 15:18 UTC, Barry Marson
no flags Details
Open nonsparse image files with O_DSYNC (568 bytes, patch)
2010-02-03 15:23 UTC, Jiri Denemark
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0282 0 normal SHIPPED_LIVE python-virtinst bug fix and enhancement update 2010-03-29 14:02:31 UTC

Description Barry Marson 2010-01-26 15:18:32 UTC
Created attachment 386851 [details]
RHTS XML script for queuing this job.  Note TESTDISTRO needs to be set

Description of problem:

Have had multiple virt installs fail in RHTS against the 20091227 and 20100117 rhel5.5 distros with the message ...

... The guest1_install log says ...

Mon, 25 Jan 2010 14:21:28 ERROR    internal error unable to start guest: char device redirected to /dev/pts/1
char device redirected to /dev/pts/2

Domain installation may not have been
 successful.  If it was, you can restart your domain
 by running 'virsh start guest1'; otherwise, please
 restart your installation.
Mon, 25 Jan 2010 14:21:28 ERROR    internal error unable to start guest: char device redirected to /dev/pts/1
char device redirected to /dev/pts/2
Traceback (most recent call last):
  File "/usr/sbin/virt-install", line 889, in ?
    main()
  File "/usr/sbin/virt-install", line 751, in main
    start_time, guest.start_install)
  File "/usr/sbin/virt-install", line 813, in do_install
    dom = install_func(conscb, progresscb, wait=(not wait))
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 541, in start_install
    return self._do_install(consolecb, meter, removeOld, wait)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 633, in _do_install
    self.domain = self.conn.createLinux(install_xml, 0)
  File "/usr/lib64/python2.4/site-packages/libvirt.py", line 974, in createLinux
    if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self)
libvirtError: internal error unable to start guest: char device redirected to /dev/pts/1
char device redirected to /dev/pts/2

A Libvirt error occurred during installation. 


The RHTS job that failed (one of them ) is

http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=120318

  The second recipe set.

Version-Release number of selected component (if applicable):

RHEL5.5-Server-20100117.0 

How reproducible:

Almost always.  Seems to depend on box sometimes.  Other times it works.

Steps to Reproduce:
1. Will attach RHTS XML style job for review and queuing
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Barry Marson 2010-01-28 22:25:53 UTC
Can I get some one to look into this ?  My ISV testing in virt environments runs into this 80% of the time.  Since it works some times its suspected to be a race condition.

Thanks,
Barry

Comment 2 Daniel Veillard 2010-01-29 12:50:45 UTC
Can we get the XML for the domain used ? The attachment provided
doesn't even allow to understand what arch or what hypervisor is
being used !


Daniel

Comment 3 Barry Marson 2010-01-29 14:46:19 UTC
take a look at RHTS job

http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=328768

Look at the second recipe set.  The job is still hanging on because it has a built in reserve_workflow for the host system.  pw:rhts.

Barry

Comment 4 Barry Marson 2010-01-29 14:47:40 UTC
scratch that . wrong job ... though this one died too .. thats job

http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=121429

Barry

Comment 5 Martin Jenner 2010-01-29 15:09:22 UTC
test host is ibm-x3950m2-01.rhts.eng.bos.redhat.com

- the virt-install command did not get far enough along to create a guest

# virsh list --all
 Id Name                 State
----------------------------------

the virt-install command used was

# virt-install --name guest1 --mac 00:16:3E:53:E4:2A --cdrom /var/lib/libvirt/images/guest1.iso --file /var/lib/libvirt/images/guest1.img -s 80 --nonsparse --hvm --debug --extra-args ks=http://lab2.rhts.eng.bos.redhat.com/cblr/svc/op/ks/system/guest-83-228.rhts.bos.redhat.com --prompt --accelerate --os-variant=virtio26 --network bridge:br0   --noreboot --vnc

Comment 6 Daniel Veillard 2010-01-29 15:21:39 UTC
I see the process running:

/usr/libexec/qemu-kvm -S -M pc -m 512 -smp 1 -name guest1 -uuid 07021de3-8e93-d39e-9710-53453564c384 -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//guest1.pid -no-reboot -boot d -drive file=/var/lib/libvirt/images/guest1.img,if=virtio,index=0,cache=none -drive file=/var/lib/libvirt/images/guest1.iso,if=ide,media=cdrom,index=2 -net nic,macaddr=00:16:3e:53:e4:2a,vlan=0,model=virtio -net tap,fd=16,script=,vlan=0,ifname=vnet0 -serial pty -parallel none -usb -vnc 127.0.0.1:0 -k en-us

I'm attaching the XML I extracted for it

Daniel

Comment 7 Martin Jenner 2010-01-29 15:46:29 UTC
this issue was also seen on system intel-s3ea2-02.rhts.eng.bos.redhat.com

virt-install --name guest1 --mac 00:16:3E:53:4A:DE --cdrom
/var/lib/libvirt/images/guest1.iso --file /var/lib/libvirt/images/guest1.img -s
180 --nonsparse --hvm --debug --extra-args
ks=http://lab2.rhts.eng.bos.redhat.com/cblr/svc/op/ks/system/guest-83-74.rhts.bos.redhat.com
--prompt --accelerate --os-variant=virtio26 --network bridge:br0   --noreboot
--vnc
Thu, 28 Jan 2010 17:10:51 DEBUG    Received libvirt URI 'qemu:///system'
Thu, 28 Jan 2010 17:10:51 DEBUG    Requesting virt method 'hvm'
Thu, 28 Jan 2010 17:10:51 DEBUG    Received virt method 'hvm'
Thu, 28 Jan 2010 17:10:51 DEBUG    Hypervisor name is 'kvm'
How much RAM should be allocated (in megabytes)? 512 
Thu, 28 Jan 2010 17:10:51 DEBUG    Setting os type to 'linux' for variant
'virtio26'
Thu, 28 Jan 2010 17:10:51 DEBUG    DistroInstaller location is a local
file/path: /var/lib/libvirt/images/guest1.iso
Thu, 28 Jan 2010 17:10:51 DEBUG    Setting size for existing storage to
'0.00978851318359'
Thu, 28 Jan 2010 17:10:51 DEBUG    Detected storage as type 'file'


Starting install...
Thu, 28 Jan 2010 17:10:51 DEBUG    Setting size for existing storage to
'0.00978851318359'
Thu, 28 Jan 2010 17:10:51 DEBUG    Detected storage as type 'file'

Creating | 142 MB
Creating | 266 MB
Creating | 390 MB

.
.
.
.

Creating | 180 GB
Creating
Thu, 28 Jan 2010 17:56:41 DEBUG    Creating guest from:
<domain type='kvm'>
  <name>guest1</name>
  <currentMemory>524288</currentMemory>
  <memory>524288</memory>
  <uuid>1cd3150e-3ed0-b967-2d4b-a00299bca836</uuid>
  <os>
    <type arch='x86_64'>hvm</type>
    <boot dev='cdrom'/>
  </os>
  <features>
    <acpi/><apic/><pae/>
  </features>
  <clock offset="utc"/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <vcpu>1</vcpu>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <console type='pty'/>
    <disk type='file' device='disk'>
      <driver name='qemu' cache='none'/>
      <source file='/var/lib/libvirt/images/guest1.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <disk type='file' device='cdrom'>
      <source file='/var/lib/libvirt/images/guest1.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
    </disk>
    <interface type='bridge'>
      <source bridge='br0'/>
      <mac address='00:16:3E:53:4A:DE'/>
      <model type='virtio'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' keymap='en-us'/>
  </devices>
</domain>

Thu, 28 Jan 2010 17:58:30 ERROR    internal error unable to start guest: 
Domain installation may not have been
 successful.  If it was, you can restart your domain
 by running 'virsh start guest1'; otherwise, please
 restart your installation.
Thu, 28 Jan 2010 17:58:30 ERROR    internal error unable to start guest: 
Traceback (most recent call last):
  File "/usr/sbin/virt-install", line 889, in ?
    main()
  File "/usr/sbin/virt-install", line 751, in main
    start_time, guest.start_install)
  File "/usr/sbin/virt-install", line 813, in do_install
    dom = install_func(conscb, progresscb, wait=(not wait))
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 541, in
start_install
    return self._do_install(consolecb, meter, removeOld, wait)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 633, in
_do_install
    self.domain = self.conn.createLinux(install_xml, 0)
  File "/usr/lib64/python2.4/site-packages/libvirt.py", line 974, in
createLinux
    if ret is None:raise libvirtError('virDomainCreateLinux() failed',
conn=self)
libvirtError: internal error unable to start guest: 
A Libvirt error occurred during installation.

Comment 10 Martin Jenner 2010-01-29 20:45:28 UTC
This issue is definitely compounded by creating the guest image using the --nonsparse option.

I found I could reproduce the issue by kicking off a load prior to starting the virt-install command

- First make sure the image file from any prior run no longer exists
# rm -f /var/lib/libvirt/images/vm1.img

- start a small load
# /bin/bash /etc/cron.daily/rpm

- run the install command - my example is below
# virt-install --hvm --name vm1 --ram 1024 --vnc --cdrom http://qafiler.bos.redhat.com/redhat/rel-eng/RHEL5.5-Server-20100117.0/5/x86_64/os --file /var/lib/libvirt/images/vm1.img --file-size 12 --accelerate --os-variant=virtio26 --debug --nonsparse

I did see on some runs a slightly different output from libvirt but the symptom of qemu exiting or timing out still exists.

The alternate libvirt trace was:

Fri, 29 Jan 2010 14:58:58 DEBUG    Removing /var/lib/libvirt/boot/virtinst-boot.iso.5OdWNI
Fri, 29 Jan 2010 14:58:58 ERROR    internal error unable to start guest: 
Domain installation may not have been
 successful.  If it was, you can restart your domain
 by running 'virsh start vm1'; otherwise, please
 restart your installation.
Fri, 29 Jan 2010 14:58:58 ERROR    internal error unable to start guest: 
Traceback (most recent call last):
  File "/usr/sbin/virt-install", line 889, in ?
    main()
  File "/usr/sbin/virt-install", line 751, in main
    start_time, guest.start_install)
  File "/usr/sbin/virt-install", line 813, in do_install
    dom = install_func(conscb, progresscb, wait=(not wait))
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 541, in start_install
    return self._do_install(consolecb, meter, removeOld, wait)
  File "/usr/lib/python2.4/site-packages/virtinst/Guest.py", line 633, in _do_install
    self.domain = self.conn.createLinux(install_xml, 0)
  File "/usr/lib64/python2.4/site-packages/libvirt.py", line 974, in createLinux
    if ret is None:raise libvirtError('virDomainCreateLinux() failed', conn=self)
libvirtError: internal error unable to start guest:

Comment 11 Martin Jenner 2010-01-29 20:47:35 UTC
Actually I could only reproduce the issue with '--nonsparse' switch if I left this option off the virt-install line the guest always started for me.

Comment 13 Jiri Denemark 2010-01-31 21:02:55 UTC
Yeah, it sounds like bug 456926 we faced with xen. The problem there was that dom0 was very busy flushing out dirty cache pages that it was very slow in doing other stuff, especially reading from the disk, which significantly delayed execution of new processes. However, this is not xen, which was waiting with a timeout for an asynchronous event triggered by udev. I hope qemu does not do something similarly stupid.

Comment 14 Cole Robinson 2010-02-01 15:35:20 UTC
It might be a libvirt timeout in this case. There are several timeouts in the libvirt qemu driver in the start up code: qemu could legitimately be taking too long due to the cache issues, and libvirt might be timing out.

Comment 15 Barry Marson 2010-02-01 15:54:04 UTC
If heavy dirty page writing is the issue, my questions are:

1.  Is the creation of the image container being written with direct I/O ?  We all know that page cache writes on memory starved systems can be quite slow.  We are asking for a 180GB image.

2.  Who sets the timeout times for libvirt qemu driver ?

Barry

Comment 16 Daniel Veillard 2010-02-03 08:56:28 UTC
Timeouts in libvirt qemu driver are hardcoded, not manually configurable at
this point (and really they shouldn't be yet another knob needed tweaking
for things to work well !)

Daniel

Comment 17 Jiri Denemark 2010-02-03 13:52:50 UTC
I was able to reproduce it locally using the virt-install command from comment #10. Sometimes I don't need to do anything special, sometimes running sync just before the image creation finishes helps to see this issue. And running dd at the end of image creation makes it reliably reproducible.

I haven't seen the error described by the reporter, I only hit the one described in comment #10. But they both happen in qemudReadLogOutput() as a result of some unusual behavior when libvirt tries to read qemu's output.

In my case the root error in libvirt was

error : internal error Timed out while reading console log output

which happens in qemudReadLogOutput() which is called from qemudWaitForMonitor() with a timeout of 3 seconds.

By modifying virt-install to open the image file with O_DSYNC, we can avoid seeing this issue when virt-install is the only process doing heavy I/O. When, however, another process is writing a lot to a disk, I guess the only solution is increasing the timeout, although it can take any amount of time to start qemu in such situation.

As installing a guest in parallel to running an I/O bound process is probably quite uncommon, we could be fine with just fixing virt-install.

Comment 18 Daniel Veillard 2010-02-03 14:46:18 UTC
Hum, that's complex ...
What flag is virt-install using now for opening the image file ?

We should probably split this bug because clearly there isn't one magic
change which would fix the issue forever, but incremental improvements
on the two components may help get rid of this in most cases:

  - fix virt-install opening flag if the semantic of O_DSYNC is not
    creating any risk
  - possibly change libvirt timeout, either for a longer timeout or
    something more accomodating of delays in case of load.

We have a update of python-virtinst for 5.5

https://errata.devel.redhat.com/errata/show/9266

but it may be too late to push this, 

Daniel

Comment 19 Jiri Denemark 2010-02-03 15:18:15 UTC
virt-install creates the image with O_WRONLY | O_CREAT. I don't see how O_DSYNC could bring any risk.

Barry, could you try python-virtinst package from https://brewweb.devel.redhat.com/taskinfo?taskID=2241466

I think it could solve the issue for you. It solved it for me.

Comment 20 Jiri Denemark 2010-02-03 15:23:31 UTC
Created attachment 388551 [details]
Open nonsparse image files with O_DSYNC

This is a patch for python-virtinst which should solve the most common cause of this bug. If it really fixes it, I'll move this bug to python-virtinst and clone it back for libvirt to address the other cause.

Comment 21 Cole Robinson 2010-02-03 16:35:36 UTC
We should also make a similar change in the libvirt storage code, since virtinst/virt-manager will use that instead for many configurations.

Comment 22 Barry Marson 2010-02-03 20:41:24 UTC
First test with bits in https://brewweb.devel.redhat.com/taskinfo?taskID=2241466
 succeeded.  This was both for i386 and x86_64.  I want to run it on more IO challenged systems before I declare victory.

Barry

Comment 24 edison 2010-02-05 19:23:58 UTC
I also met this "internal error.. time out to read cosole log ...": I just start/stop 4 vms on one physical machine, every 20 seconds, in my stress tests.

The bug occurred one time in an hour, some like that. When I manually increase the waiting time from 3 seconds to 10 seconds in libvirt, then seems the bug is gone.

Comment 25 Barry Marson 2010-02-09 14:55:38 UTC
I tested this fix on single spindle host machines with both i386 and x86_64 guests.  In both cases updating python-virtinst succeeded.

Barry

Comment 26 Cole Robinson 2010-02-09 20:03:48 UTC
Patch from comment #20 built into python-virtinst-0.400.3-8.el5. Thanks Jiri!

Comment 28 Nan Zhang 2010-03-04 07:34:04 UTC
Verified, fixed with the following packages.

libvirt-0.6.3-33.el5
python-virtinst-0.400.3-9.el5
kvm-83-161.el5

Comment 31 errata-xmlrpc 2010-03-30 08:51:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0282.html

Comment 32 Red Hat Bugzilla 2023-09-14 01:20:03 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.