Bug 714271

Summary:	libvirt pinned to single CPU after suspend/resume cycle -> all VMs running on the same single core
Product:	[Fedora] Fedora	Reporter:	Ronald Wahl <rwahl>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	17	CC:	adel.gadllah, ajia, aquini, belegdol, berrange, cfergeau, clalance, crobinso, dallan, djasa, drago01, fdanapfe, gansalmon, hous3y, itamar, j2, jadams1217, jforbes, jistone, john.newman.0, jonathan, kchamart, kernel-maint, knoel, laine, madhu.chinakonda, marcandre.lureau, mprivozn, mschmidt, pasteur, rs, srivatsa.bhat, unicell, veillard, virt-maint
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:	cpuset suspend virt
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-08-05 21:24:37 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	813228
Bug Blocks:

Description Ronald Wahl 2011-06-17 18:26:09 UTC

Description of problem:
After a suspend/resume cycle libvirt is pinned to CPU 0 which causes all VMs running on the same single core. Rebooting the system removes the constraint. Alternatively stopping libvirtd - removing the cpuset hierarchy for libvirt and restarting libvirtd works as well.

Version-Release number of selected component (if applicable):
libvirt-0.8.8-4.fc15.x86_64

How reproducible:
always

Steps to Reproduce:
1. Start with a freshly booted machine and call:

# cat /sys/fs/cgroup/cpuset/libvirt/cpuset.cpus 
0-7

This may vary but the range should be all your available CPU cores.

2. going into suspend mode

3. resume by pressing the power button

4. Call

# cat /sys/fs/cgroup/cpuset/libvirt/cpuset.cpus 
0

Now libvirtd is pinned to CPU 0 and all child processes as well. When starting VMs you wuill see that they all share the same CPU core while the others are idle.

Actual results:
libvirt is pinned to single CPU core after suspend/resume

Expected results:
resume should restore the exact state before suspend was called

Additional info:

Comment 1 Fedora Admin XMLRPC Client 2011-09-22 17:52:33 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 2 Fedora Admin XMLRPC Client 2011-09-22 17:56:01 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 3 Fedora Admin XMLRPC Client 2011-11-30 20:02:39 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 4 Fedora Admin XMLRPC Client 2011-11-30 20:02:44 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 5 Fedora Admin XMLRPC Client 2011-11-30 20:07:15 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 6 Fedora Admin XMLRPC Client 2011-11-30 20:07:21 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 7 Avi Kivity 2012-03-06 00:13:41 UTC

Biting me too, making virt unusable on a laptop.

Comment 8 John Newman 2012-03-06 04:03:59 UTC

Yeah, for now there's at least a semi-ok workaround.  

Set libvirt to save the VMs to disk when the service is stopped.
Stop the service in your suspend script
On wakeup restart the service

I don't believe any of this is actually necessary, but I have it commented out in the resume block:

#cpus="0-7"
#for i in `find /sys/fs/cgroup/cpuset/libvirt/ -iname "cpuset.cpus"` ; do echo "${cpus}" | sudo tee $i ; cat $i ; done
#for i in `pgrep libvirt` ; do taskset -c -p $i ; done
#for i in `pgrep qemu` ; do taskset -c -p $i ; done


The only drawback is suspend/resume takes a lot longer, and if you have a VM running that is not marked to auto start, you have to remember to start it on wakeup; or add the bits to the script to detect and then start it for you.

Unfortunately it looks like I'm running into another bug where all of the restored VMs will _rarely become totally unresponsive.  I haven't had the time to look into this yet, but I've also seen it more frequently when using the vhost_net module..

Also, I vaguely recall a discussion about adding the power management API to the VM, so before suspending the host, you'd suspend all of the guests to ram, and wake them up.  I would think just pausing the emulation would effectively do the same thing, but if I recall, until that is added, suspending the host is not officially supported.

Comment 9 Avi Kivity 2012-03-06 15:20:56 UTC

Looks like a fix was committed for Linux 3.3.

Comment 10 Dave Allan 2012-03-07 01:24:34 UTC

Avi, do you have a pointer to the fix?  Also, do you think there is anything that libvirt can do to mitigate this?

Comment 11 Avi Kivity 2012-03-07 16:24:42 UTC

(In reply to comment #10)
> Avi, do you have a pointer to the fix? 

linux.git 8f2f748b0656257153bc

 Also, do you think there is anything
> that libvirt can do to mitigate this?

If there is an API that allows you to receive notifications on resume events, you can use that to reconfigure cpusets.  But it's a kernel problem, so we should just backport the fix IMO.

Comment 12 Dave Allan 2012-03-07 16:35:29 UTC

Agreed, that seems like a lot of work for a workaround.  I've moved the BZ to kernel.

Comment 13 Josh Boyer 2012-03-07 16:49:38 UTC

That commit was CC'd to stable.  It should hopefully get picked up in 3.2.10.

Comment 14 Josh Boyer 2012-03-07 17:05:36 UTC

(In reply to comment #13)
> That commit was CC'd to stable.  It should hopefully get picked up in 3.2.10.

Actually, looking further it seems that Linus is going to revert that patch from 3.3.

http://thread.gmane.org/gmane.linux.kernel/1262802

Comment 15 David Jaša 2012-03-20 14:41:03 UTC

*** Bug 749191 has been marked as a duplicate of this bug. ***

Comment 16 Michal Privoznik 2012-04-18 14:19:08 UTC

There has been a workaround proposed:

https://www.redhat.com/archives/libvir-list/2012-April/msg00777.html

Comment 17 Michal Schmidt 2012-04-25 17:48:33 UTC

Recent findings by Srivatsa S. Bhat:
http://thread.gmane.org/gmane.linux.kernel/1262802/focus=1286289

Comment 18 Srivatsa S. Bhat 2012-05-08 06:57:32 UTC

Hi,

Recently, I posted a new set of patches to fix this issue in the kernel.
http://thread.gmane.org/gmane.linux.documentation/4805

It is still under discussion.

Regards,
Srivatsa S. Bhat

Comment 19 Cole Robinson 2012-06-05 15:10:58 UTC

*** Bug 820625 has been marked as a duplicate of this bug. ***

Comment 20 Cole Robinson 2012-06-05 15:11:17 UTC

*** Bug 787467 has been marked as a duplicate of this bug. ***

Comment 21 Cole Robinson 2012-06-05 15:15:15 UTC

Since F15 is approaching end of life, moving this to F17 where it it still an issue. F16 is also affected, so any fix should be pushed there as well.

Comment 22 drago01 2012-06-05 15:34:34 UTC

(In reply to comment #18)
> Hi,
> 
> Recently, I posted a new set of patches to fix this issue in the kernel.
> http://thread.gmane.org/gmane.linux.documentation/4805
> 
> It is still under discussion.

Seems like those patches got NAKed and we are still left with the "slow after suspend" issue.

How to move forward with that bug?

Comment 23 Srivatsa S. Bhat 2012-06-05 17:06:39 UTC

Hi,

v6 of the patchset was posted here:
http://thread.gmane.org/gmane.linux.kernel/1302893

And Peter Zijlstra (the maintainer) said that he has queued them up
to push them to mainline later.
http://thread.gmane.org/gmane.linux.kernel/1302893/focus=1303390

It hasn't hit mainline yet though.

Regards,
Srivatsa S. Bhat

Comment 24 Adel Gadllah 2012-06-05 18:13:48 UTC

(In reply to comment #23)
> Hi,
> 
> v6 of the patchset was posted here:
> http://thread.gmane.org/gmane.linux.kernel/1302893
> 
> And Peter Zijlstra (the maintainer) said that he has queued them up
> to push them to mainline later.
> http://thread.gmane.org/gmane.linux.kernel/1302893/focus=1303390
> 
> It hasn't hit mainline yet though.

Ah nice.

Josh can we backport them and ship them in F17/16 or are they considered too invasive?

Comment 25 Srivatsa S. Bhat 2012-06-05 18:19:39 UTC

Oh by the way, I forgot to mention this:

The real "bug-fix" is just patch 1 only. All the remaining patches (2, 3, and 4)
are only cleanups/optimizations. The cover-letter (0/4) explains this
structuring.

So I guess backporting becomes easy since the fix is contained in just one patch.

Regards,
Srivatsa S. Bhat

Comment 26 Josh Boyer 2012-06-05 18:25:34 UTC

(In reply to comment #25)
> Oh by the way, I forgot to mention this:
> 
> The real "bug-fix" is just patch 1 only. All the remaining patches (2, 3,
> and 4)
> are only cleanups/optimizations. The cover-letter (0/4) explains this
> structuring.
> 
> So I guess backporting becomes easy since the fix is contained in just one
> patch.

It's CC'd to stable.  It'll get into 3.4.2 or whatever as soon as it's in Linus' tree and we'll pick it up automatically.

Comment 27 Daniel Berrangé 2012-06-20 09:51:46 UTC

*** Bug 833655 has been marked as a duplicate of this bug. ***

Comment 28 Josh Boyer 2012-07-10 18:47:38 UTC

OK, so the tip bot commit which is here:

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;h=0c1508129adc051fabaf8debefea79baa2f1a81b

doesn't have stable CC'd.  Is that an oversight on the upstream tip maintainer's part?

Srivatsa, does this bug really just need that single commit or are all 4 patches needed?

Comment 29 Julian Sikorski 2012-07-10 19:52:15 UTC

IIRC stable CC was stripped because this is not a regression fix, so it is not allowed to add this patch to stable.

Comment 30 Eric Blake 2012-07-23 19:23:54 UTC

*** Bug 842406 has been marked as a duplicate of this bug. ***

Comment 31 Srivatsa S. Bhat 2012-07-23 19:32:10 UTC

(In reply to comment #28)
> OK, so the tip bot commit which is here:
> 
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commitdiff;
> h=0c1508129adc051fabaf8debefea79baa2f1a81b
> 
> doesn't have stable CC'd.  Is that an oversight on the upstream tip
> maintainer's part?
> 

Well, I had added the CC to stable while submitting the patchset. But
Ingo Molnar stripped it saying that it is not a regression fix.
This is what he said:
http://thread.gmane.org/gmane.linux.kernel/1302893/focus=1316019

> Srivatsa, does this bug really just need that single commit or are all 4
> patches needed?

Just that single commit. The remaining 3 patches are cleanups and optimizations.
Only the first patch is the bug-fix.

Regards,
Srivatsa S. bhat

Comment 32 Josh Boyer 2012-07-24 15:25:08 UTC

I've added a backport of the single commit needed to the following scratch build.  I'd appreciate it if someone could test this kernel once it finishes building and let us know if it works as expected.  If so, we'll roll this patch into Fedora.

http://koji.fedoraproject.org/koji/taskinfo?taskID=4325884

Comment 33 Robert Story 2012-07-24 16:38:25 UTC

Josh - got a F16 version?

Comment 34 Josh Boyer 2012-07-24 17:06:36 UTC

(In reply to comment #33)
> Josh - got a F16 version?

http://koji.fedoraproject.org/koji/taskinfo?taskID=4326270

when it finishes building.  Will be a bit yet.

Comment 35 Robert Story 2012-07-24 20:09:03 UTC

(In reply to comment #34)
> (In reply to comment #33)
> > Josh - got a F16 version?
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=4326270

This works for me.. F16 x86_64, starting a vm after suspend/resume will pin/use all cpus.

Comment 36 Josh Boyer 2012-07-24 20:11:30 UTC

(In reply to comment #35)
> (In reply to comment #34)
> > (In reply to comment #33)
> > > Josh - got a F16 version?
> > 
> > http://koji.fedoraproject.org/koji/taskinfo?taskID=4326270
> 
> This works for me.. F16 x86_64, starting a vm after suspend/resume will
> pin/use all cpus.

OK, great.  Thanks for testing.

Comment 37 Srivatsa S. Bhat 2012-07-24 20:16:21 UTC

(In reply to comment #35)
> (In reply to comment #34)
> > (In reply to comment #33)
> > > Josh - got a F16 version?
> > 
> > http://koji.fedoraproject.org/koji/taskinfo?taskID=4326270
> 
> This works for me.. F16 x86_64, starting a vm after suspend/resume will
> pin/use all cpus.

Thanks for testing! Good to know it works for you :-)

Regards,
Srivatsa S. Bhat

Comment 38 Josh Stone 2012-07-24 20:29:00 UTC

(In reply to comment #32)
> I've added a backport of the single commit needed to the following scratch
> build.  I'd appreciate it if someone could test this kernel once it finishes
> building and let us know if it works as expected.  If so, we'll roll this
> patch into Fedora.
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=4325884

I just tried F17 x86_64, and it also looks good.  The cpusets for libvirt and qemu are now maintained after a suspend/resume.  Thanks!

Comment 39 Josh Boyer 2012-07-25 12:08:24 UTC

OK, fixed in Fedora git.  It will be included in the next update.  Bodhi will leave a comment in the bug when it is available.

Comment 40 Julian Sikorski 2012-07-26 22:37:25 UTC

F17 x86_64 version seems to work for me too.

Comment 41 Fedora Update System 2012-07-31 14:15:50 UTC

kernel-3.4.7-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/kernel-3.4.7-1.fc16

Comment 42 Fedora Update System 2012-08-01 18:23:29 UTC

Package kernel-3.4.7-1.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.4.7-1.fc16'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-11348/kernel-3.4.7-1.fc16
then log in and leave karma (feedback).

Comment 43 Fedora Update System 2012-08-05 21:24:37 UTC

kernel-3.4.7-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.