Description of problem:
docker run --cpu-shares 2 --rm -ti fedora:27 date
works on Fedora 27, fails on rawhide.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. docker run --cpu-shares 2 --rm -ti fedora:27 date
/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\"".
Wed Apr 25 10:40:31 UTC 2018
which I get on Fedora 27 with
This is based on findings in https://github.com/kubernetes/kubernetes/issues/61474 that suggest that the issue is in systemd.
Interesting. #1568594 is similar, but it should not occur in systemd-238-7.fc29.1.x86_64. So this seems to be something else.
points to whitespace between functions, and I'm not how to find the right version of the sources.
I'll reassign this to docker, because the issue in #1568594 seems to be resolved by the band-aid in systemd-238-7.fc29.1.x86_64, so this appears to be something else, and docker-specific knowledge is necessary to diagnose what exactly is going wrong.
Yes, this needs to be fixed in Docker.
The problem is that docker->runc->libcontainer is sending CpuShares as an int64 ('x') on D-Bus, while systemd expects to deserialize it as an uint64 ('t') and that causes the ENXIO ("No such device or address")
I managed to fix that by applying https://github.com/projectatomic/runc/pull/10 to the runc inside docker-1.13.1-51 package.
That PR to runc (in projectatomic) is a backport of https://github.com/opencontainers/runc/pull/1375. I noticed there were follow ups for that later in runc/libcontainer, so I think *probably* someone should look at those and see whether they're relevant too... (Moving closer to upstream runc might be a good idea too.)
I didn't track down why this happened... It's possible systemd moved CpuShares from signed int64 to unsigned uint64 at some point... Someone with more time could do some git archeology on both sides to understand why this used to work and now doesn't...
cc'ing myself here, so feel free to ask me further questions if you have them.
@Felipe, thanks. I'll CC you on any similar stuff in the future.
It's been defined as "t" at least since https://github.com/systemd/systemd/commit/4ad490007b (v204-210-g4ad490007b). But there were various changes to the details of implementation on the way, in particular https://github.com/systemd/systemd/commit/d53d94743c looks like it might have tightened up the parsing.
*** Bug 1568570 has been marked as a duplicate of this bug. ***
Related kube and systemd fixes have been merged, moving to POST for verification
Is docker the correct component at all? On my Fedora 28 with
the reproducer command from comment 0 now passes.
For the record, this also affects Fedora 27. Adjusting version to the earliest affected version.
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30 Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '27'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 27 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
This was fixed a long time ago.