Bug 1774076

Summary: OSP15 update fails during overcloud update with "No such log driver json-file"
Product: Red Hat OpenStack Reporter: Sofer Athlan-Guyot <sathlang>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: Sofer Athlan-Guyot <sathlang>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 15.0 (Stein)CC: amcleod, cjeanner, jfrancoa, mburns
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 15.0 (Stein)   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: openstack-tripleo-heat-templates-10.6.2-0.20191125220521.97cc159.el8ost Doc Type: Bug Fix
Doc Text:
Previously, there was a change to the log parameter in the podman interface that introduced an issue with tripleo-heat-templates, which caused updates to fail. With this update, the issue has been resolved and updates pass successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-05 12:00:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Sofer Athlan-Guyot 2019-11-19 14:07:56 UTC
Comment 1 Sofer Athlan-Guyot 2019-11-19 14:21:57 UTC
Here comes the description.

During OSP15 update from 8.0/GA to 8.1//RHOS_TRUNK-15.0-RHEL-8-20191115.n.0 we are failing with:

2019-11-15 05:29:44 |         "2019-11-15 05:28:06,747 INFO: 178644 -- Pulling image:",
2019-11-15 05:29:44 |         "2019-11-15 05:28:10,139 ERROR: 178645 -- ['/usr/bin/podman', 'run', '--user', 'root', '--name', 'container-puppet-haproxy', '--env', 'PUPPET_TAGS=file,file_line,concat,augeas,cron,ha
proxy_config', '--env', 'NAME=haproxy', '--env', 'HOSTNAME=controller-0', '--env', 'NO_ARCHIVE=', '--env', 'STEP=6', '--env', 'NET_HOST=true', '--log-driver', 'json-file', '--volume', '/etc/localtime:/etc/localtim
e:ro', '--volume', '/tmp/tmpkkpdgv7j:/etc/config.pp:ro', '--volume', '/etc/puppet/:/tmp/puppet-etc/:ro', '--volume', '/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume', '/etc/pki/tls/certs/ca
-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume', '/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume', '/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--vo
lume', '/var/lib/config-data:/var/lib/config-data/:rw', '--volume', '/var/lib/container-puppet/puppetlabs/facter.conf:/etc/puppetlabs/facter/facter.conf:ro', '--volume', '/var/lib/container-puppet/puppetlabs/:/opt
/puppetlabs/:ro', '--volume', '/dev/log:/dev/log:rw', '--log-opt', 'path=/var/log/containers/stdouts/container-puppet-haproxy.log', '--security-opt', 'label=disable', '--volume', '/usr/share/openstack-puppet/modul
es/:/usr/share/openstack-puppet/modules/:ro', '--volume', '/etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro', '--entrypoint', '/var/lib/container-puppet/container-puppet.s
h', '--net', 'host', '--volume', '/etc/hosts:/etc/hosts:ro', '--volume', '/var/lib/container-puppet/container-puppet.sh:/var/lib/container-puppet/container-puppet.sh:ro', '
ck-haproxy:20191106.1'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         "Error: write child: broken pipe",
2019-11-15 05:29:44 |         " attempt(s): 1",
2019-11-15 05:29:44 |         "2019-11-15 05:28:10,140 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:16,325 ERROR: 178645 -- ['/usr/bin/podman', 'start', '-a', 'container-puppet-haproxy'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         "Error: unable to start container faae0f9449ae1fb795511196ceb47b3b7ea04b357d95088dbecde8508bf6622b: write child: broken pipe",
2019-11-15 05:29:44 |         " attempt(s): 2",
2019-11-15 05:29:44 |         "2019-11-15 05:28:16,326 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,166 ERROR: 178645 -- ['/usr/bin/podman', 'start', '-a', 'container-puppet-haproxy'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         " attempt(s): 3",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,167 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,167 ERROR: 178645 -- Failed running container for haproxy",

we got similar log for other containers:

(undercloud) [stack@undercloud-0 ~]$ grep 'start.*run failed after \[conmon:e\]: No such log driver json-file' overcloud_update_run_Controller.log  | cut -d' ' -f20-20 | cut -d- -f-3 | sort -u

It's a known issue, and has been fixed, but it happens that the fix isn't working during update.  Especially fix for:

 - https://bugzilla.redhat.com/show_bug.cgi?id=1761867 (paunch) (note that paunch in that version was tested too https://bugzilla.redhat.com/show_bug.cgi?id=1769209)
 - https://bugzilla.redhat.com/show_bug.cgi?id=1769291 (tht)

are included in the target repo.

Comment 2 Sofer Athlan-Guyot 2019-11-19 15:06:40 UTC
So, as described in the launchpad bug, this boils down to container-puppet.py not being regenerated during update.  This prevents the fix from being included at that stage of the update process (just before regenerating the configuration - common_deploy_steps_tasks) and after the update and host prep tasks.

Comment 5 Sofer Athlan-Guyot 2019-12-17 17:59:01 UTC

that job http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-15-from-GA-HA_no_ceph-ipv4/7/ run past the whole update from GA to RHOS_TRUNK-15.0-RHEL-8-20191212.n.0, it fails but for other reason at reboot. The stage that this review fix is run successfully.

Comment 6 Alex McLeod 2020-02-19 12:48:57 UTC
