Created attachment 485534 [details] cpu cgroup Description of problem: The sandbox tool can launch and application within a cgroup. But I am getting errors from firefox and evince when they try to execute firefox sched_setscheduler(13401, SCHED_FIFO, { 0 }) = -1 EINVAL (Invalid argument) evince GThread-ERROR **: file gthread-posix.c: line 348 (g_thread_create_posix_impl): error 'Invalid argument' during 'pthread_attr_setschedparam (&attr, &sched)' cat cgroup 10:blkio:/ 9:net_cls:/ 8:freezer:/ 7:devices:/ 6:memory:/sandbox 5:cpuacct:/ 4:cpu:/sandbox 3:ns:/13204 2:cpuset:/ 1:name=systemd:/user/dwalsh/1
Created attachment 485535 [details] memory cgroup
Created attachment 485536 [details] firefox strace
Created attachment 485537 [details] evince strace
You can see this happening by executing sandbox -C -W metacity -X xterm Then execute firefox or evince in the xterm. If you run sandbox -W metacity -X xterm And run the commands they should work.
Looks like this is happening because we have CONFIG_RT_GROUP_SCHED enabled in our kernel config. Explicit task groups need to have RT bandwidth allocated to them before you can set SCHED_FIFO for their tasks. See Documentation/scheduler/sched-rt-group.txt in the kernel-doc package for how to do it, but after a quick read I think we may just want to disable this feature.
So should this be moved to a kernel issue?
The fundamental issue is, the whole _concepts_ of "share CPU time fairly between these groups" and "run this task with realtime priority" are incompatible with each other. This is not an implementation issue, but a question of the kernel being asked to fulfill contradictory requirements. Running a realtime task can only be done in a realtime task group (or outside of the task group system), not in a timeshared task group.
For these reasons we are currently not really using the 'cpu' hierarchy in systemd much: if you create a group and want to allow the processes in them get RT scheduling, then you'd have to assign the group RT budget, which is realistically not doable, since we wouldn't know what amount to assign if we did this automatically. What I'd really like to see is that the rt budget and the non-rt shares could be configured in different hierachies, so that people can set cpu.shares without necessarily implying cpu.rt_runtime_us=0.
Is the problem here that the apps are not getting permission denied, which they probably handle but instead are getting EINVAL (Invalid argument), which they don't expect and crash?
I just found bug 442959, which Lennart filed asking us to enable the RT group scheduler, so I don't think we can just disable it to fix of this problem.
Did anyone come to some kind of resolution for this issue? We still have CONFIG_RT_GROUP_SCHED=y in the F15 kernel configs and it seems the consensus was that running RT processes in a cgroup without RT budget just wasn't feasible...