Description of problem: jakoubek:~ $ sudo -i [sudo] password for matej: jakoubek:~# systemctl status libvirtd.service libvirtd.service - LSB: daemon for libvirt virtualization API Loaded: loaded (/etc/rc.d/init.d/libvirtd) Active: active (running) since Thu, 17 Feb 2011 08:44:54 +0100; 24h ago Main PID: 3669 (libvirtd) CGroup: name=systemd:/system/libvirtd.service ├ 1403 /usr/sbin/dnsmasq --strict-order --bind-interfaces... └ 3669 libvirtd --daemon jakoubek:~# virsh list --all Id Name Status ---------------------------------- - rawhide switched-off - santiago switched-off - tikanga switched-off jakoubek:~# virsh start santiago error: I couldn't start a domain santiago error: I cannot create a cgroup for santiago: Director or file doesn't exist jakoubek:~# systemctl restart libvirtd.service jakoubek:~# virsh start santiago Domain santiago started jakoubek:~# Version-Release number of selected component (if applicable): selinux-policy-3.9.15-1.fc16.noarch systemd-18-1.fc16.x86_64 libvirt-0.8.8-1.fc16.x86_64 How reproducible: 100% (given long enough delay between start of the service and attempt to start to run virtual machine happens) Steps to Reproduce: 1. start machine or restart libvirtd service 2. wait for long time (couple of hours?) 3. virsh start <domain> Actual results: error as shown above Expected results: running domain
> jakoubek:~# virsh start santiago > error: I couldn't start a domain santiago > error: I cannot create a cgroup for santiago: Director or file doesn't exist This indicates that libvirtd's cgroups do not exist, either because libvirtd was started *before* the cgconfig service ran, or because something deleted libvirtd's cgroups. > Steps to Reproduce: > 1. start machine or restart libvirtd service > 2. wait for long time (couple of hours?) > 3. virsh start <domain> If you immediately try to start a guest, after a restart of libvirtd, does it work ? If so, this suggests that something is deleting libvirtd's cgroups in that 'step 2 wait for long time'. I wonder if systemd periodically deletes empty cgroups ? Or is there some other cgroup related daemon running ?
(In reply to comment #1) > I wonder if systemd periodically deletes empty cgroups ? Or is there some other > cgroup related daemon running ? Yupp, from time to time we go through the tree and kill empty cgroups. I figure you want us to stop doing that?
Yes, purging libvirtd's cgroups is rather unkind ! Even if a cgroup does not have any processes in it, it is still useful for it to exist, because it will have an impact on future child cgroups which will contain processes. libvirt creates a 3 level hierarchy, starting from the location in which libvirtd itself is placed. [cgroup where libvirtd process is placed by systemd/init] | +- libvirt | +- qemu | | | +- qemuguest1 | +- qemuguest1 | +- qemuguest1 | +- qemuguest1 | +- lxc | +- lxcguest1 Only the leaf nodes actually contain processes. The first two levels are just there to allow admins to set limits that will apply to later child processes
Hmm, so we "trim" the cgroup hierarchies at four places: - When a user session ends we trim his entire hierarchy - When all processes of a service exited we trim the service's hierarchy - When a service entered "dead" or "failed" mode (i.e. is stopped) we trim the service's hierarchy. (This is often the same as the previous case) - Before we start a service we trim its hiearachy Now, this basically boils down that in real life we should never conflict with libvirt: we never interfere with the tree as long is the daemon is still running. We only trim before and after it is running. So, unless there's a bug lurking here I don't think systemd is at fault.
There is a bug lurking here. You missed one place where the cgroup hierarchies are trimmed - when reloading the daemon: manager_reload() -> manager_clear_jobs_and_units() -> unit_free() -> cgroup_bonding_free_list() -> cgroup_bonding_free() -> cg_trim() Steps to reproduce: 1. service libvirtd restart 2. find /sys/fs/cgroup -path '*libvirt*' -type d > cg-list-1 3. systemctl daemon-reload 4. find /sys/fs/cgroup -path '*libvirt*' -type d > cg-list-2 5. diff -Nu cg-list-* Actual result: --- cg-list-1 2011-04-14 10:38:03.138854891 +0200 +++ cg-list-2 2011-04-14 10:38:09.644854896 +0200 @@ -2,7 +2,4 @@ /sys/fs/cgroup/blkio/libvirt/lxc /sys/fs/cgroup/blkio/libvirt/qemu /sys/fs/cgroup/cpu/system/libvirtd.service -/sys/fs/cgroup/cpu/system/libvirtd.service/libvirt -/sys/fs/cgroup/cpu/system/libvirtd.service/libvirt/lxc -/sys/fs/cgroup/cpu/system/libvirtd.service/libvirt/qemu /sys/fs/cgroup/systemd/system/libvirtd.service A workaround is to set "DefaultControllers=" in /etc/systemd/system.conf
> - When a user session ends we trim his entire hierarchy Could you clarify what you mean by 'user session' here ? eg does it mean the hierarchy is trimmed when the user logs out of X ? As well as the privileged, per-host libvirtd, there is an unprivileged libvirtd daemon that is run per user ID. This isn't tied to the user X session - it is just spawned on demand from any application, whether logged in via X, or ssh, or cron, etc Thus we don't necessarily want to kill libvirtd & its VMs when the user logs out of X, and thus wouldn't really want its cgroups trimmed either. > - When all processes of a service exited we trim the service's hierarchy > - When a service entered "dead" or "failed" mode (i.e. is stopped) we trim the service's hierarchy. (This is often the same as the previous case) > - Before we start a service we trim its hiearachy These three cases should all be fine, because libvirtd will re-create anything it needs at startup, and any VMs still running when libvirtd is stopped, will mean the cgroups are not empty & thus not trimmable. - Daemon reload This sounds like the main problem people are hitting in this bug.
Im seeing this also, and i feel its a release blocker proposed criteria The release must boot successfully as a virtual guest in a situation where the virtual host is running the same release (using Fedora's current preferred virtualization technology) we cant guarantee that
(In reply to comment #5) > There is a bug lurking here. > You missed one place where the cgroup hierarchies are trimmed - when reloading > the daemon: > manager_reload() > -> manager_clear_jobs_and_units() > -> unit_free() > -> cgroup_bonding_free_list() > -> cgroup_bonding_free() > -> cg_trim() Ah, indeed. Fixed now in git. I hope there's not another cg_trim() call lurking somewhere. > Could you clarify what you mean by 'user session' here ? eg does it mean the > hierarchy is trimmed when the user logs out of X ? Yes, this is what happens.
*** Bug 698027 has been marked as a duplicate of this bug. ***
systemd-25-1.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/systemd-25-1.fc15
Package systemd-25-1.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-25-1.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/systemd-25-1.fc15 then log in and leave karma (feedback).
is https://bugzilla.redhat.com/show_bug.cgi?id=666130 the same as this?
Discussed in the 2011-04-21 blocker bug review meeting. This does come close to the alpha release criteria "When booting a system installed without a graphical environment, or when using a correct configuration setting to cause an installed system to boot in non-graphical mode, the system should boot to a state where it is possible to log in through at least one of the default virtual consoles" However, it is fixable with an update and doesn't happen every time to every user so rejected as a release blocker. Since it is a major issue, accepted as NTH for final.
*** Bug 699886 has been marked as a duplicate of this bug. ***
*** Bug 666130 has been marked as a duplicate of this bug. ***
*** Bug 699932 has been marked as a duplicate of this bug. ***
systemd-25-1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
The release version of Fedora 15 seems to be doing this still.
I still see it too in Fedora 15 gold, using systemd-26-1.fc15.x86_64.
Reopening. It's been fixed for "systemctl daemon-reload", but it is still reproducible using "systemctl daemon-reexec". It's a less common operation than daemon-reload. It is used when systemd or glibc packages are updated.
*** Bug 711703 has been marked as a duplicate of this bug. ***
*** Bug 696218 has been marked as a duplicate of this bug. ***
Seems to be fixed with the latest round of updates in Fedora 15 but I'm not sure which update did it. Now I don't have to restart libvirtd every time I want to run a new virtual machine.
(In reply to comment #23) > Seems to be fixed with the latest round of updates in Fedora 15 I don't think so. > Now I don't have to restart libvirtd every time I want to run a new > virtual machine. "every time"? That was never the case. See comment #20. It used to break if something (usually an RPM scriptlet) ran 'systemctl daemon-reload' (which a lot of them do). This is fixed, but the 'daemon-reexec' case remains.
Sorry my comment was a bit out of order there. I was confirming the fix for the "systemctl daemon-reload" case. The "every time" comment was how it appeared to me in passing observation but I did not do extensive testing. Carry on ;)
I am still getting this error just about every time I try to start a VM. systemd-26-4.fc15.x86_64
I'm also seeing this and so far have been unable to create a VM at all in Fedora 15. Even being careful to start the cgconfig then libvirt, I still get this error whenever attempting to create a new guest in virt-manager. systemd-26-5.fc15.x86_64 libvirt-0.8.8-4.fc15.x86_64
Please ignore last comment. Restarting cgconfig works. Helps to make sure the terminal window used to restart the service isn't actually an ssh session on another machine. Doh!
*** Bug 716436 has been marked as a duplicate of this bug. ***
*** Bug 714407 has been marked as a duplicate of this bug. ***
Fixed upstream: http://cgit.freedesktop.org/systemd/commit/?id=38a285d776cc0bf4440efe79fc7691032bcf3d67 http://cgit.freedesktop.org/systemd/commit/?id=a75560529663e5fd055884e32ab9c73f47f8aaa5
systemd-26-7.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/systemd-26-7.fc15
Package systemd-26-7.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-26-7.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/systemd-26-7.fc15 then log in and leave karma (feedback).
*** Bug 709076 has been marked as a duplicate of this bug. ***
systemd-26-8.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.