Bug 479736

Summary: libvirtd won't auto-start domains/virtual machines.
Product: [Community] Virtualization Tools Reporter: Sir Woody Hackswell <hackswell>
Component: libvirtAssignee: Daniel Veillard <veillard>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: unspecifiedCC: crobinso, markmc
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-14 10:55:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace dump file none

Description Sir Woody Hackswell 2009-01-12 19:51:03 UTC
Created attachment 328777 [details]
strace dump file

Description of problem:

libvirtd won't auto-start domains/virtual machines.

Version-Release number of selected component (if applicable):

Fedora 10, x86_64, AMD dual core
libvirt-0.5.1-2.fc10.x86_64


How reproducible:

Steps to Reproduce:
1. service libvirtd stop
2. service libvirtd start
3. virtsh list shows no active virtual machines.  
Machine is symlinked in /etc/libvirt/qemu/autostart/tLDE.xml, and starting tLDE through virt-manager manually works fine.
  
Actual results:

tLDE VM is in a stopped state.

Expected results:

tLDE is in a started state, or a log file tells me why it didn't start up.

Additional info:

No activity in log files.
Attached is an strace dump (with SELinux and iptables off).

Comment 1 Mark McLoughlin 2009-01-13 11:42:23 UTC
(kernel version is 2.6.27.9-159.fc10.x86_64)

Okay, so we thought the interesting bit was here:

19:15:32.393540 close(6)                = 0

 - this is the closedir() in virDomainLoadAllConfigs()

19:15:32.393580 getuid()                = 0
19:15:32.393630 clone(Process 6397 attached
child_stack=0x163edb0, flags=CLONE_NEWNS|0x3c000000|SIGCHLD) = 6397
[pid  6396] 19:15:32.394072 wait4(6397, Process 6396 suspended
 <unfinished ...>
[pid  6397] 19:15:32.394104 getpid()    = 1
[pid  6397] 19:15:32.394199 exit_group(0) = ?
Process 6396 resumed
Process 6397 detached
19:15:32.399246 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 6397
19:15:32.399273 --- SIGCHLD (Child exited) @ 0 (0) ---

 - and this is lxcContainerStart()

This means that qemudAutostartConfigs() is being called, but there are no VM's with vm->autostart set

So, backing up a bit to where we check the autostart symlink:

19:15:32.389966 stat("/etc/libvirt/qemu/autostart/tLDE.xml", {st_dev=makedev(253, 0), st_ino=302049, st_mode=S_IFREG|0600, s\
t_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1127, st_atime=2009/01/12-19:08:42, st_mtime=2009/01/12\
-18:54:40, st_ctime=2009/01/12-18:54:40}) = 0
19:15:32.390070 stat("/etc/libvirt/qemu/tLDE.xml", {st_dev=makedev(253, 0), st_ino=302049, st_mode=S_IFREG|0600, st_nlink=1,\
 st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1127, st_atime=2009/01/12-19:08:42, st_mtime=2009/01/12-18:54:40,\
 st_ctime=2009/01/12-18:54:40}) = 0

  - the link exists
  - st_dev and st_ino is the same for both
  - virFileLinkPointsTo() should return 1 for these
  - vm->autostart should be getting set to 1

hackswell: I think you'll need to install libvirt-debuginfo and run libvirtd in gdb; set a breakpoint on qemudAutostartConfigs() and virDomainLoadConfig() and see if you can figure out what's going on

Comment 2 Sir Woody Hackswell 2009-01-13 15:37:32 UTC
Ah HA!  Problem stemmed from the following:

* Two different xml config files had the same name.
* and/or the same UUID.

User error.

HOWEVER... would it make sense to check for duplicate name/uuid and at least log the fact that two config files were "duplicates"?  For foolish users like me? ;)

Comment 3 Mark McLoughlin 2009-01-13 15:45:45 UTC
Could you give us a simple set of commands and XML files to reproduce this?

Comment 4 Sir Woody Hackswell 2009-01-13 15:52:38 UTC
* Find any xml config file in /etc/libvirt/qemu/ that is symlinked in the autostart directory.

* Copy it to a different name, but don't change the name or uuid field.

In my case the non-running filename came "Before" the autostart config file.

ex:

Fedora10-i386 (no autostart, name Fedora10-i386, uuid PLUGH)
fedora10.xml  (no autostart, name tLDE, uuid XYZZY)
tLDE.xml      (linked to in autostart, name tLDE, uuid XYZZY)

Upon startup of libvirtd, tLDE would not autostart because fedora10 was NOT autostarting. 

Is this as clear as mud now? :O

Comment 5 Mark McLoughlin 2009-01-14 10:55:47 UTC
Let that be a lesson to you :-)

The format of the /etc/libvirt/qemu directory is private and you should only modify domain configuration using virsh or other libvirt tools.

I took a quick look at whether we could add a simple check to catch this, but not really - libvirtd treats the loading of the second config file as if you were re-defining the config of that domain.

This couldn't happen unless the user manually futzed around with the files - the filename will always match the domain name in that case.