Bug 1795376

Summary: docker cannot be updated to 108 on rhos13 as a container fails to start with "pivot_root invalid argument" error.
Product: Red Hat Enterprise Linux 7 Reporter: Sofer Athlan-Guyot <sathlang>
Component: dockerAssignee: Daniel Walsh <dwalsh>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.8CC: ajia, akaris, amurdaca, bcafarel, bdobreli, bshephar, djuran, dornelas, jnovy, lsm5, rrasouli, tsweeney
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: docker-1.13.1-156.gitcccb291.el7_8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-01 00:26:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1797119    
Bug Blocks: 1186913, 1744505    

Description Sofer Athlan-Guyot 2020-01-27 19:33:55 UTC
Description of problem: Updating OSP13 we have a couple of issues for the neutron_ovs_agent[1][2].

We see in CI that error:


2020-01-18 00:32:17 |         "Error running ['docker', 'run', '--name', 'neutron_dhcp', '--label', 'config_id=tripleo_step4', '--label', 'container_name=neutron_dhcp', '--label', 'managed_by=paunch', '--label', 'config_data={\"start_order\": 10, \"ulimit\": [\"nofile=16384\"], \"hea
lthcheck\": {\"test\": \"/openstack/healthcheck 5672\"}, \"image\": \"192.168.24.1:8787/rhosp13/openstack-neutron-dhcp-agent:2019-12-12.1rhel7.8\", \"pid\": \"host\", \"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=ee55e15e1aabdbdf501d3b59099c2f7b\"], \
"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.c
rt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/log/containers/ne
utron:/var/log/neutron\", \"/var/lib/kolla/config_files/neutron_dhcp.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro\", \"/lib/modules:/lib/modules:ro\", \"/run/openvswitch:/run/openvswitch\", \"/v
ar/lib/neutron:/var/lib/neutron\", \"/run/netns:/run/netns:shared\", \"/var/lib/openstack:/var/lib/openstack\", \"/var/lib/neutron/dnsmasq_wrapper:/usr/local/bin/dnsmasq:ro\", \"/var/lib/neutron/dhcp_haproxy_wrapper:/usr/local/bin/haproxy:ro\"], \"net\": \"host\", \"privileged\": true, \"restart\": \"always\"}', '--detach=true', '--env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS', '--env=TRIPLEO_CONFIG_HASH=ee55e15e1aabdbdf501d3b59099c2f7b', '--net=host', '--pid=host', '--ulimit=nofile=16384', '--health-cmd=/openstack/healthcheck 5672', '--privileged=true', '--restart=always', '--volume=/etc/hosts:/etc/hosts:ro', '--volume=/etc/localtime:/etc/localtime:ro', '--volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro', '--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--volume=/dev/log:/dev/log', '--volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro', '--volume=/etc/puppet:/etc/puppet:ro', '--volume=/var/log/containers/neutron:/var/log/neutron', '--volume=/var/lib/kolla/config_files/neutron_dhcp.json:/var/lib/kolla/config_files/config.json:ro', '--volume=/var/lib/config-data/puppet-generated/neutron/:/var/lib/kolla/config_files/src:ro', '--volume=/lib/modules:/lib/modules:ro', '--volume=/run/openvswitch:/run/openvswitch', '--volume=/var/lib/neutron:/var/lib/neutron', '--volume=/run/netns:/run/netns:shared', '--volume=/var/lib/openstack:/var/lib/openstack', '--volume=/var/lib/neutron/dnsmasq_wrapper:/usr/local/bin/dnsmasq:ro', '--volume=/var/lib/neutron/dhcp_haproxy_wrapper:/usr/local/bin/haproxy:ro', '192.168.24.1:8787/rhosp13/openstack-neutron-dhcp-agent:2019-12-12.1rhel7.8']. [125]", 
2020-01-18 00:32:17 |         "stdout: 57cea0fdb732ae7a4576bb3b0ddaa80d965b2b6c12b2442b78eec2b78068186a", 
2020-01-18 00:32:17 |         "stderr: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:359: container init caused \\\"rootfs_linux.go:89: jailing process inside rootfs caused \\\\\\\"pivot_root invalid argument\\\\\\\"\\\"\".",


(this is from [2])

Let us know what you need to move forward with that issue.

Thanks.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1793455
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1793634


Version-Release number of selected component (if applicable):


How reproducible: Update from docker-1.13.1-75.git8633870.el7_5.x86_64 to docker-1.13.1-108.git4ef4b30.el7.x86_64 of a OSP13 environment.

Comment 3 Sofer Athlan-Guyot 2020-01-28 13:13:45 UTC
Hum, I've noticed this morning that with the latest repo (puddle) I've got the 1.13.1-155.gitcccb291.el7_8, I will test with that.  I've notice that the 108 build disappeared from the history in that build, is there any specific reason ?

Comment 4 Tom Sweeney 2020-01-28 15:37:27 UTC
Lokesh do you know the status of Docker 108?

Comment 10 Bernard Cafarelli 2020-02-24 15:55:11 UTC
*** Bug 1793455 has been marked as a duplicate of this bug. ***

Comment 11 Bernard Cafarelli 2020-02-24 15:59:49 UTC
*** Bug 1793634 has been marked as a duplicate of this bug. ***

Comment 15 Sofer Athlan-Guyot 2020-03-02 13:36:11 UTC
Hi,

sorry wrong communication here from my colleague, the job https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-ga-HA-ipv4/106/ which is yellow is with rhel-7.7, and docker:

docker.x86_64                    2:1.13.1-109.gitcccb291.el7_7
@rhelosp-rhel-7.7-extras

The job that was setup with rhel7.8 failed but on an other error.  Currently debugging that issue and checking if we can call this verified.

Comment 18 Sofer Athlan-Guyot 2020-03-05 15:47:42 UTC
Hi Jindrich,

oki, recap:
 - my colleague validated this bz based on a job[1] that is rhel7.7 (anything you'll see there isn't related to that bz), so I unvalidated it
 - So I created a job I change the job[1] to update to rhel7.8 in there[2]
   - automation is blocked by[3]
   - So I finished the update run in [2] "manually"

> Sofer, we need some evidence how comment #15 is docker related.

The validate state was wrong as the job didn't use the right version of docker (< to the FIV, and not rhel7.8).  While setting up the job for rhel7.8 I hit another issue (bz1810119) which is totally not docker related but blocks CI as this happen before the container get updated.

> I don't understand what your CI job does and see just python backtrace.

The job does a full update of OSP16, it's a functional one.

In any case, after setting up a special job and finishing it manually I can move this to validated:

1. update of controller and compute was successful

stack@undercloud-0 ~]$ tail  overcloud_update_run-Controller.sh.log
PLAY RECAP *********************************************************************
controller-0               : ok=301  changed=143  unreachable=0    failed=0
controller-1               : ok=294  changed=141  unreachable=0    failed=0
controller-2               : ok=294  changed=141  unreachable=0    failed=0

Thursday 05 March 2020  08:22:35 -0500 (0:00:00.060)       2:16:13.877 ********
===============================================================================

Updated nodes - Controller
Success

[stack@undercloud-0 ~]$ tail  overcloud_update_run-Compute.sh.log

PLAY RECAP *********************************************************************                                                                                                                                                                                                                                             compute-0                  : ok=178  changed=71   unreachable=0    failed=0
compute-1                  : ok=178  changed=71   unreachable=0    failed=0

Thursday 05 March 2020  06:44:40 -0500 (0:00:00.133)       0:21:43.374 ********
===============================================================================                                                                                                                                                                                                                                              
Updated nodes - Compute
Success

2. we have a right version of docker (it's > than FIV)

[root@controller-0 ~]# rpm -qa | grep ^docker
docker-1.13.1-161.git64e9980.el7_8.x86_64

3. neutron_dhcp has been properly updated from tag 2018-06-21.2 to 2019-12-12.1rhel7.8 (that's the one that was choking before)
[root@controller-0 ~]# docker ps | grep neutron_dhcp
49ffa2838103        192.168.24.1:8787/rhosp13/openstack-neutron-dhcp-agent:2019-12-12.1rhel7.8          "dumb-init --singl..."   2 hours ago         Up 2 hours (healthy)                       neutron_dhcp

Thanks,

[1] https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/upgrades/view/update/job/DFG-upgrades-updates-13-from-ga-HA-ipv4/
[2] http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-13-from-ga-HA-ipv4/
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1810119

Comment 23 Sofer Athlan-Guyot 2020-03-20 18:07:22 UTC
*** Bug 1796666 has been marked as a duplicate of this bug. ***

Comment 25 errata-xmlrpc 2020-04-01 00:26:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1234