Description of problem: Due to the following kernel bug: https://bugzilla.redhat.com/show_bug.cgi?id=714271 when a host goes into the suspend/hibernate state, the kernel clears out all CPUs from the cpuset cgroup. This means that upon resume all processes in any cpuset cgroup are pinned to just 1 physical CPU core. Since this kernel bug has existed for so long & the kernel devs show no signs of being able to fix it in the forseeable future, we really need to do a temporary userspace workaround for the brokenness. Using a pm-utils hook, we can save the current cgroup cpuset affinity before suspend/hibernate & restore it upon resume. We really need this for libvirt, but the fix we came up with is general purpose to apply to any usage of cgroups cpuset. Thus I think the workaround ought to be done by systemd, or perhaps the libcgroup RPM. Version-Release number of selected component (if applicable): systemd-44-4.fc17 How reproducible: Always Steps to Reproduce: 1. cd /sys/fs/cgroup/cpuset 2. mkdir foo 3. cd foo 4. echo "0-1" > cpuset.cpus 5. pm-suspend 6. cat cpuset.cpus Actual results: 0 Expected results: 0-1 Additional info:
Created attachment 577965 [details] pm-utils hook to preserve cpuset affinity across suspend/hibernate The attached patch was developed by Srivatsa S. Bhat as part of this libvirt discussion on the problem https://www.redhat.com/archives/libvir-list/2012-April/msg00777.html
oh +-^ while read line do cpuset_path=`echo $line | cut -d' ' -f1` value=`echo $line | cut -d' ' -f2` echo "$value" > $cpuset_path done < saved_cpusets.txt
(In reply to comment #2) > oh > +-^ > > while read line > do > cpuset_path=`echo $line | cut -d' ' -f1` > value=`echo $line | cut -d' ' -f2` > echo "$value" > $cpuset_path > done < saved_cpusets.txt This should really be: while read cpuset_path value do echo "$value" > $cpuset_path done < saved_cpusets.txt Someone does not really know shell...
Created attachment 578035 [details] simplified script I noticed that too. Here's an improved version of the script. I'm expressing no opinion yet on adding it to a package.
To make the script in attachment 578035 [details] actually work, you have to change the mindepth in the find command from "2" to "1". Otherwise restoring cpusets will not work, because when run with depth 2 the result is this: /sys/fs/cgroup/cpuset/libvirt/lxc/cpuset.cpus 0-3 /sys/fs/cgroup/cpuset/libvirt/qemu/cpuset.cpus 0-3 After a suspend/resume cycle you cannot restore the cpusets from above configuration, as /sys/fs/cgroup/cpuset/libvirt/cpuset.cpus is still set to "0" and you will get a permission denied error. Changing the depth to "1" however will create the following result: /sys/fs/cgroup/cpuset/libvirt/cpuset.cpus 0-3 /sys/fs/cgroup/cpuset/libvirt/lxc/cpuset.cpus 0-3 /sys/fs/cgroup/cpuset/libvirt/qemu/cpuset.cpus 0-3 That configuration can be restored on resume without a problem. After placing the modified script with depth 1 into /etc/pm/sleep.d/01cpusets.sh libvirtd and all machines work like expected after a suspend/resume cycle.
Proposing as an f17 nice-to-have
I am -1 on NTH for the following reasons: - The bug is not at all related to the release criteria. - Not many users are affected. - Those who really care can place the workaround script on their systems. - It can be fixed in a post-release update. - Srivatsa S. Bhat is making progress on a proper fix in the kernel: http://thread.gmane.org/gmane.linux.kernel/1262802/focus=1286289
During suspend when cpuset.cpus is flushed all processes from LXC containers are placed in sysdefault cgroup and they are not restored after suspend. So effectively on my system all cgroups are empty, except sysdefault. Here is a small modification to workaround script: SAVEDIR=/run/ugly-hack-for-bz813228-saved_cpusets save_cpusets() { mkdir -p $SAVEDIR find -L /sys/fs/cgroup/cpuset -mindepth 1 -type d | while read cspath; do mkdir -p $SAVEDIR/$cspath cp $cspath/cpuset.cpus $cspath/tasks $SAVEDIR/$cspath/ done } restore_cpusets() { cd $SAVEDIR find -L . -type d | while read cspath; do [ -f $cspath/cpuset.cpus ] && cp $cspath/cpuset.cpus /$cspath/cpuset.cpus [ -f $cspath/tasks ] && while read pid; do echo $pid > /$cspath/tasks; done < $cspath/tasks done rm -rf $SAVEDIR } Unfortunately right after suspend some of processes inside cgroups might be dead, so this error is shown up in logs: "echo: write error: No such process" Also if new processes were spawned they are not placed into corresponding cgroup.
(In reply to comment #8) > Unfortunately right after suspend some of processes inside cgroups might be > dead, so this error is shown up in logs: "echo: write error: No such process" > Also if new processes were spawned they are not placed into corresponding > cgroup. Which demonstrates the futility of trying to workaround kernel bugs in userspace.