Bug 1571736 - docker run --cpu-shares 2 no longer works
Summary: docker run --cpu-shares 2 no longer works
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: docker
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Antonio Murdaca
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1568570 (view as bug list)
Depends On:
Blocks: 1590364
TreeView+ depends on / blocked
 
Reported: 2018-04-25 10:41 UTC by Jan Pazdziora (Red Hat)
Modified: 2018-11-27 22:22 UTC (History)
24 users (show)

Fixed In Version:
Clone Of:
: 1590364 (view as bug list)
Environment:
Last Closed: 2018-11-27 22:22:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jan Pazdziora (Red Hat) 2018-04-25 10:41:17 UTC
Description of problem:

Running

  docker run --cpu-shares 2 --rm -ti fedora:27 date

works on Fedora 27, fails on rawhide.

Version-Release number of selected component (if applicable):

systemd-238-7.fc29.1.x86_64
docker-1.13.1-52.git89b0e65.fc29.x86_64

How reproducible:

Deterministic.

Steps to Reproduce:
1. docker run --cpu-shares 2 --rm -ti fedora:27 date

Actual results:

/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"No such device or address\"".

Expected results:

Wed Apr 25 10:40:31 UTC 2018

which I get on Fedora 27 with

systemd-234-10.git5f8984e.fc27.x86_64
docker-1.13.1-51.git4032bd5.fc27.x86_64

Additional info:

This is based on findings in https://github.com/kubernetes/kubernetes/issues/61474 that suggest that the issue is in systemd.

Comment 1 Zbigniew Jędrzejewski-Szmek 2018-04-25 11:15:11 UTC
Interesting. #1568594 is similar, but it should not occur in systemd-238-7.fc29.1.x86_64. So this seems to be something else.

https://github.com/docker/libcontainer/blob/master/process_linux.go#L258
points to whitespace between functions, and I'm not how to find the right version of the sources.

I'll reassign this to docker, because the issue in #1568594 seems to be resolved by the band-aid in systemd-238-7.fc29.1.x86_64, so this appears to be something else, and docker-specific knowledge is necessary to diagnose what exactly is going wrong.

Comment 2 Filipe Brandenburger 2018-04-25 14:33:03 UTC
Hi,

Yes, this needs to be fixed in Docker.

The problem is that docker->runc->libcontainer is sending CpuShares as an int64 ('x') on D-Bus, while systemd expects to deserialize it as an uint64 ('t') and that causes the ENXIO ("No such device or address")

I managed to fix that by applying https://github.com/projectatomic/runc/pull/10 to the runc inside docker-1.13.1-51 package.

That PR to runc (in projectatomic) is a backport of https://github.com/opencontainers/runc/pull/1375. I noticed there were follow ups for that later in runc/libcontainer, so I think *probably* someone should look at those and see whether they're relevant too... (Moving closer to upstream runc might be a good idea too.)

I didn't track down why this happened... It's possible systemd moved CpuShares from signed int64 to unsigned uint64 at some point... Someone with more time could do some git archeology on both sides to understand why this used to work and now doesn't...

cc'ing myself here, so feel free to ask me further questions if you have them.

Cheers!
Filipe

Comment 3 Zbigniew Jędrzejewski-Szmek 2018-04-25 15:13:06 UTC
@Felipe, thanks. I'll CC you on any similar stuff in the future.

It's been defined as "t" at least since https://github.com/systemd/systemd/commit/4ad490007b (v204-210-g4ad490007b). But there were various changes to the details of implementation on the way, in particular https://github.com/systemd/systemd/commit/d53d94743c looks like it might have tightened up the parsing.

Comment 4 David Sastre Medina 2018-05-07 13:01:34 UTC
*** Bug 1568570 has been marked as a duplicate of this bug. ***

Comment 5 Antonio Murdaca 2018-05-29 10:53:36 UTC
Related kube and systemd fixes have been merged, moving to POST for verification

Comment 6 Jan Pazdziora (Red Hat) 2018-05-29 12:08:16 UTC
Is docker the correct component at all? On my Fedora 28 with

systemd-238-8.git0e0aa59.fc28.x86_64
docker-1.13.1-51.git4032bd5.fc28.x86_64
runc-1.0.0-31.git0cbfd83.fc28.x86_64

the reproducer command from comment 0 now passes.

Comment 7 Martin Pitt 2018-06-11 11:22:52 UTC
For the record, this also affects Fedora 27. Adjusting version to the earliest affected version.

Comment 8 Ben Cotton 2018-11-27 15:11:30 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Martin Pitt 2018-11-27 22:22:59 UTC
This was fixed a long time ago.


Note You need to log in before you can comment on or make changes to this bug.