250206 – Virtual Machine Manager fails to start guests

Bug 250206 - Virtual Machine Manager fails to start guests

Summary: Virtual Machine Manager fails to start guests

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	selinux-policy
Sub Component:
Version:	7
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Daniel Walsh
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-07-31 01:23 UTC by Srihari Vijayaraghavan
Modified:	2008-01-30 19:18 UTC (History)
CC List:	2 users (show)
Fixed In Version:	Current
Clone Of:
Environment:
Last Closed:	2008-01-30 19:18:50 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
SELinux restorecon failure messges (104.85 KB, text/plain) 2007-08-13 23:31 UTC, Srihari Vijayaraghavan	no flags	Details
View All

Description Srihari Vijayaraghavan 2007-07-31 01:23:14 UTC

Description of problem:
After the recent updates of libvirt, libvirt-python & python-virtinst, Virtual
Machine Manager is unable to start any of the guest images, failing with error
message:
virDomainCreate() failed Failed to add tap interface 'vnet48' to bridge 'vnet0'
: Operation not supported

Details:
Unable to start virtual machine '<class 'libvirt.libvirtError'>
virDomainCreate() failed Failed to add tap interface 'vnet48' to bridge 'vnet0'
: Operation not supported
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/console.py", line 348, in control_vm_run
    self.vm.startup()
  File "/usr/share/virt-manager/virtManager/domain.py", line 361, in startup
    self.vm.create()
  File "/usr/lib64/python2.5/site-packages/libvirt.py", line 228, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: virDomainCreate() failed Failed to add tap interface 'vnet48' to
bridge 'vnet0' : Operation not supported
'

Version-Release number of selected component (if applicable):
libvirt.x86_64 0.3.1-2.fc7
libvirt-python.x86_64 0.3.1-2.fc7
python-virtinst.noarch 0.200.0-2.fc7

How reproducible:
Always

Steps to Reproduce:
1. Start Virtual Machine Manager
2. Try starting a guest (in my case it's a KVM/Qemu fully virtualised Linux i686
image under x86-64 host)
3. Observe it fails with the above error message
  
Actual results:
Guest image doesn't run.

Expected results:
Guest image to start & run successfully.

Additional info:
I think the vnet* interfaces aren't getting created. I observe no vnet0 (for
virtual network 1, vnet1 for virtual network etc.) created when
/etc/init.d/libvirtd is started. I've observed in the past libvirtd creating the
vnet* interfaces. There's some regression introduced in libvirtd after today's
updates is the reason behind the failure, I suspect.

Thanks

Comment 1 Daniel Veillard 2007-07-31 08:26:56 UTC

Were you running libvirt 0.3.0 before or did you upgraded directly from
0.2.3 ? Can you make sure the libvirt daemon get properly restarted (you
may need to kill an old libvirt_qemu one).
Can you also provide the version of qemu, kernel to be sure we have a 
matching environment, and attach an XML description for the guest failing
to start (use virsh dumpxml guestname).

  thanks !

Daniel

Comment 2 Srihari Vijayaraghavan 2007-07-31 08:56:34 UTC

Upgraded from libvirt 0.3.0 to this newer version. (I regularly run "yum update"
every few days)

Made sure after /etc/init.d/libvirtd stop, there were no "libvirtd --daemon"
process. Didn't help though.

# uname -a
Linux desktop 2.6.22.1-33.fc7 #1 SMP Mon Jul 23 16:59:15 EDT 2007 x86_64 x86_64
x86_64 GNU/Linux

# rpm -qa|egrep 'kernel|qemu|kvm'
kernel-2.6.22.1-33.fc7
kvm-24-1
qemu-0.9.0-2.fc7

The virsh tool fails with error messages:
# virsh dumpxml fedora7
error: failed to connect to the hypervisor
error: no valid connection

(I've never used virsh before, so I don't know whether it's a new regression.)

Thanks

Comment 3 Daniel Veillard 2007-07-31 09:16:17 UTC

virsh is the command associated to libvirt. When using QEmu you need to
pass an option, as root run by default it tries to use Xen:

virsh -c qemu://system/ dumpxml 

Daniel

Comment 4 Srihari Vijayaraghavan 2007-07-31 09:30:25 UTC

(Sorry mate, still looks like no go)

# virsh -c qemu://system/ dumpxml fedora7
libvir: Remote error : Cannot access CA certificate '/etc/pki/CA/cacert.pem': No
such file or directory (2)
error: failed to connect to the hypervisor
error: no valid connection

I'm very happy to provide various xml configuration files for inspection, if
needed (the whole /etc/libvirt/ if needed)

Thanks

Comment 5 Srihari Vijayaraghavan 2007-07-31 09:59:51 UTC

Actually, I managed to work out how to use virsh:
Here's the impacted system:
# virsh -c qemu:///system
virsh # dumpxml fedora1
<domain type='kvm'>
  <name>fedora1</name>
  <uuid>772e1c9c-aab9-1e65-e69f-0272445f5b0e</uuid>
  <memory>262144</memory>
  <currentMemory>262144</currentMemory>
  <vcpu>1</vcpu>
  <os>
    <type>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <source file='/vm/guests/fedora1'/>
      <target dev='hda'/>
    </disk>
    <interface type='network'>
      <mac address='00:16:3e:2b:42:72'/>
      <source network='net1'/>
      <target dev='vnet%d'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' listen='127.0.0.1'/>
  </devices>
</domain>

Thanks

Comment 6 Daniel Veillard 2007-07-31 10:35:15 UTC

hum, <target dev='vnet%d'/> the %d looks suspicious to me.
Can you try to chnage this to a constant value ?

virsh -c qemu://system/ dumpxml fedora7 > /tmp/fedora7.xml

edit /tmp/fedora7.xml replacing vnet%d with vnet1 for example

virsh -c qemu://system/ undefine fedora7
virsh -c qemu://system/ define /tmp/fedora7.xml

does the guest starts then ?

Daniel

Comment 7 Srihari Vijayaraghavan 2007-07-31 11:50:26 UTC

Followed the above instructions using virsh:
1. Dumped the fedora1's xml file
2. Undefined fedora1
3. Replaced vnet%d with vnet1 in the dumped xml file
4. Defined fedora1 with the edited xml file
5. Restarted /etc/init.d/libvirtd
(Observed no vnet0 interface getting created in ifconfig; bad sign)
6. Tried to bring up the guest fedora1; failed with this error:

virDomainCreate() failed Failed to add tap interface 'vnet1' to bridge 'vnet0':
No such device

Details:

Unable to start virtual machine '<class 'libvirt.libvirtError'>
virDomainCreate() failed Failed to add tap interface 'vnet1' to bridge 'vnet0' :
No such device
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/console.py", line 348, in control_vm_run
    self.vm.startup()
  File "/usr/share/virt-manager/virtManager/domain.py", line 361, in startup
    self.vm.create()
  File "/usr/lib64/python2.5/site-packages/libvirt.py", line 228, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: virDomainCreate() failed Failed to add tap interface 'vnet1' to
bridge 'vnet0' : No such device
'

(I think whenever /etc/init.d/libvirtd is started, it used to create one vnetN
interface per virtual network; that's not happening now. That'd explain why the
above message complaining when trying to bridge vnet1 to vnet0 that vnet0 is
unavailable. My theory.)

Thanks

Comment 8 Richard W.M. Jones 2007-07-31 13:05:59 UTC

Hmm guys, the URI should be qemu:///system

Comment 9 Srihari Vijayaraghavan 2007-07-31 13:35:51 UTC

Correct. In my case it is somewhat like this:

# virsh -c qemu:///system dumpxml fedora1 > /tmp/fedora1.xml
< alters /tmp/fedora1.xml >
# virsh -c qemu:///system undefine fedora1
# virsh -c qemu:///system define /tmp/fedora1.xml
 
(or by now I've learned the finer arts of virsh :-))

# virsh -c qemu:///system
virsh # help
...
virsh # dumpxml fedora1
...
virsh # undefine fedora1
...
virsh # define /tmp/fedora1.xml
...
virsh # quit

Thanks

Comment 10 Srihari Vijayaraghavan 2007-08-01 12:48:40 UTC

I think these error messages in the /var/log/messages might explain the reason
behind the problem:
Aug  1 22:31:53 localhost setroubleshoot:      SELinux is preventing
/usr/sbin/brctl (brctl_t) "getattr" to /sys/class/net/vnet0/bridge/forward_delay
(sysfs_t).      For complete SELinux messages. run sealert -l
3717d1ca-db63-428b-80bb-474d33e499aa

$ sudo sealert -l 3717d1ca-db63-428b-80bb-474d33e499aa
Summary
    SELinux is preventing /usr/sbin/brctl (brctl_t) "getattr" to
    /sys/class/net/vnet0/bridge/forward_delay (sysfs_t).

Detailed Description
    SELinux denied access requested by /usr/sbin/brctl. It is not expected that
    this access is required by /usr/sbin/brctl and this access may signal an
    intrusion attempt. It is also possible that the specific version or
    configuration of the application is causing it to require additional access.

Allowing Access
    Sometimes labeling problems can cause SELinux denials.  You could try to
    restore the default system file context for
    /sys/class/net/vnet0/bridge/forward_delay, restorecon -v
    /sys/class/net/vnet0/bridge/forward_delay If this does not work, there is
    currently no automatic way to allow this access. Instead,  you can generate
    a local policy module to allow this access - see
    http://fedora.redhat.com/docs/selinux-faq-fc5/#id2961385 Or you can disable
    SELinux protection altogether. Disabling SELinux protection is not
    recommended. Please file a http://bugzilla.redhat.com/bugzilla/enter_bug.cgi
    against this package.

Additional Information

Source Context                user_u:system_r:brctl_t
Target Context                system_u:object_r:sysfs_t
Target Objects                /sys/class/net/vnet0/bridge/forward_delay [ file ]
Affected RPM Packages         bridge-utils-1.1-2 [application]
Policy RPM                    selinux-policy-2.6.4-29.fc7
Selinux Enabled               True
Policy Type                   targeted
MLS Enabled                   True
Enforcing Mode                Enforcing
Plugin Name                   plugins.catchall_file
Host Name                     desktop
Platform                      Linux desktop 2.6.22.1-33.fc7 #1 SMP Mon Jul 23
                              16:59:15 EDT 2007 x86_64 x86_64
Alert Count                   24
First Seen                    Tue Jul 31 10:51:54 2007
Last Seen                     Wed Aug  1 22:31:51 2007
Local ID                      3717d1ca-db63-428b-80bb-474d33e499aa
Line Numbers

Raw Audit Messages

avc: denied { getattr } for comm="brctl" dev=sysfs egid=0 euid=0
exe="/usr/sbin/brctl" exit=-13 fsgid=0 fsuid=0 gid=0 items=0
name="forward_delay" path="/sys/class/net/vnet0/bridge/forward_delay" pid=3226
scontext=user_u:system_r:brctl_t:s0 sgid=0 subj=user_u:system_r:brctl_t:s0
suid=0 tclass=file tcontext=system_u:object_r:sysfs_t:s0 tty=(none) uid=0


To prove the theory, I've booted with selinux=permissive, and indeed libvirtd
and cousins (even with the newest Fedora 7 libvirt updates & kernel updates)
work just fine :-).

(It appears it'd nothing to do with libvirt/virtual-machine-manager etc.; good
old SELinux was playing up I suspect :-). Perhaps the recent 2.6.22 kernel
updates or the SELinux policy updates broke libvirt. May I request the SELinux
policy be updated if this turns out to be a genuine SELinux layer problem?)

Thanks

Comment 11 Srihari Vijayaraghavan 2007-08-01 13:54:50 UTC

Reproduced the same problem on a Fedora 7 x86 system also (with the same SELinux
AVC error messages in the system logs). Once again, fixed the problem by
disabling SELinux (with SELinux=permissive).

Thanks

Comment 12 Adam Huffman 2007-08-02 16:00:45 UTC

I have the same problem on an F7 x86_64 system, without any AVC errors.  The
system has just been rebooted after installing the new kernel.

Comment 13 Daniel Veillard 2007-08-06 15:48:51 UTC

Based on comment #11, this bug looks related to an selinux-policy problem,
so reassigning to that component.

Daniel

Comment 14 Daniel Walsh 2007-08-06 23:22:23 UTC

Should be fixed in selinux-policy-2.6.4-30.fc7

Comment 15 Srihari Vijayaraghavan 2007-08-07 08:45:29 UTC

That's good to hear. One another to the SELinux experts: once used
selinux=permissive, subsequent boots without that parameter makes the system
unbootable, as SELinux stops the system from booting (verified on both Laptop &
Destkop using both latest official kernel & SELinux policy for Fedora 7).

Is there a trick involved in getting it to boot with SELinux again? A URL,
pointer would be appreciable.

Thanks

Comment 16 Daniel Walsh 2007-08-07 13:27:12 UTC

What kind of avc messages are you seeing in /var/log/audit/audit.log?

Comment 17 Srihari Vijayaraghavan 2007-08-07 21:54:52 UTC

(Many many lines of errors scroll past on the screen, so what's I'm posting here
is what remains on the screen.)

[many many lines of selinux error messages]
...
udevd-event[PID]: selinux_setfilecon: matchpathconf(/dev/looop3) failed
...
udevd-event[PID]: selinux_setscreatecon: matchpathcon(/dev/input/event5) failed
ditto for /dev/input/event4
ditto for /dev/input/event6
Setting up Logical Volume Management [OK}
Checking filesystems [FAILED]
*** An error occurred during the file system check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
*** Warning -- SELinux is active
*** Disabling security enforcement for system recovery.
*** Run 'setenforce 1' to reenable.
Give root password for maintenance
(or type Control-D to continue):

Rebooting it by pressing Control-D & passing on selinux=permissive gets the
system booting again.

(Even as a system administrator of 250 Linux servers, I've no clue how to fix
this SELinux problem; SELinux is something to learn after my graduation at the
University in a week :-))

Thanks

Comment 18 Daniel Walsh 2007-08-09 20:08:25 UTC

This sounds like a badly mislabeled file system and udev is running with the
wrong context. You could try relabeling in permissive mode

touch /.autorelabel; reboot

Comment 19 Srihari Vijayaraghavan 2007-08-13 23:31:38 UTC

Created attachment 161237 [details]
SELinux restorecon failure messges

Comment 20 Srihari Vijayaraghavan 2007-08-13 23:34:58 UTC

(Stupid bugzilla didn't include my comments I've typed before attaching the
attachments. Ho hum.)

Here's some relevant remarks about the above attachment:
Actually, I had already tried that on the local file systems with no success. (I
think the auto-relabel procedure is what fails first, ref to error messages below)

With the new SELinux policy, things have improved slightly (well it doesn't fill
up kernel dmesg buffer such that you lose the many many lines of initial messages):
Linux version 2.6.22.1-41.fc7 (kojibuilder.phx.redhat.com)
(gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)) #1 SMP Fri Jul 27 18:10:34 EDT 2007
...
SELinux:  Starting in permissive mode
...
audit(1187046658.940:3): policy loaded auid=4294967295
audit(1187046660.043:4): avc:  denied  { read } for  pid=504 comm="restorecon"
name="ld.so.cache" dev=dm-0 ino=427697
scontext=system_u:system_r:restorecon_t:s0 tcontext=system_u:object_r:file_t:s0
tclass=file
...

(The complete dmesg is attached.)

It seems like restorecon is unable to do what it wants to.

Thanks

Comment 21 Daniel Walsh 2007-08-14 10:43:15 UTC

When it should be alright now.  I will fix this in future releases. But you ran
restorecon in permissive mode so it should have relabeled everything.

Comment 22 Srihari Vijayaraghavan 2007-08-14 12:11:07 UTC

I don't know how to explain the problem (lack of SELinux knowledge/experience).
I'll try to recap a few points now:
    I suspect /.autorelabel doesn't force a auto relabeling possibly as of
recent Fedora 7 updates.

    E.g., when I do "restorecon -R /" I see the hard drive spins madly; no such
thing happens with /.autorelabel, which leads me to believe my failure scenario
is way too earlier than autorelabel procedure can begin or the whole
.autorelabel procedure has stopped working perhaps for the root file system, I
don't know.

(My theory could be completely wrong as a person of no good SELinux skills.)

In any case I manually forced a 'restorecon /blahblah' for every one of my
mounted file systems (it's a pity initially I didn't know restorecon tool
properly; nor had an understanding of what's expected from .autorelabel) when
booted on selinux=permissive mode. Now things are working just fine without
selinux=permissive mode.

Thanks for your assistance.

Comment 23 Daniel Walsh 2007-08-15 10:28:58 UTC

Well the touch /.autorelabel would only relabel file systems that are in the
fstab, so if you mount file systems elsewhere this could be a problem.  But when
you touch /.autorelabel it should take 5-10 minutes or alot longer to label and
you should see something on the boot screen.

Comment 24 Adam Huffman 2007-08-21 11:37:24 UTC

I can confirm that relabelling fixes this problem for me.  Odd that I didn't see
any AVC messages, but anyway, glad it's working.

Comment 25 Daniel Walsh 2008-01-30 19:18:50 UTC

Bulk closing all bugs in Fedora updates in the modified state.  If you bug is
not fixed, please reopen.

Note You need to log in before you can comment on or make changes to this bug.