Bug 1774076

Summary: OSP15 update fails during overcloud update with "No such log driver json-file"
Product: Red Hat OpenStack Reporter: Sofer Athlan-Guyot <sathlang>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: Sofer Athlan-Guyot <sathlang>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 15.0 (Stein)CC: amcleod, cjeanner, jfrancoa, mburns
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 15.0 (Stein)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-10.6.2-0.20191125220521.97cc159.el8ost Doc Type: Bug Fix
Doc Text:
Previously, there was a change to the log parameter in the podman interface that introduced an issue with tripleo-heat-templates, which caused updates to fail. With this update, the issue has been resolved and updates pass successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-05 12:00:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sofer Athlan-Guyot 2019-11-19 14:07:56 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sofer Athlan-Guyot 2019-11-19 14:21:57 UTC
Here comes the description.

During OSP15 update from 8.0/GA to 8.1//RHOS_TRUNK-15.0-RHEL-8-20191115.n.0 we are failing with:

2019-11-15 05:29:44 |         "2019-11-15 05:28:06,747 INFO: 178644 -- Pulling image: 192.168.24.1:8787/rh-osbs/rhosp15-openstack-glance-api:20191106.1",
2019-11-15 05:29:44 |         "2019-11-15 05:28:10,139 ERROR: 178645 -- ['/usr/bin/podman', 'run', '--user', 'root', '--name', 'container-puppet-haproxy', '--env', 'PUPPET_TAGS=file,file_line,concat,augeas,cron,ha
proxy_config', '--env', 'NAME=haproxy', '--env', 'HOSTNAME=controller-0', '--env', 'NO_ARCHIVE=', '--env', 'STEP=6', '--env', 'NET_HOST=true', '--log-driver', 'json-file', '--volume', '/etc/localtime:/etc/localtim
e:ro', '--volume', '/tmp/tmpkkpdgv7j:/etc/config.pp:ro', '--volume', '/etc/puppet/:/tmp/puppet-etc/:ro', '--volume', '/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume', '/etc/pki/tls/certs/ca
-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume', '/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume', '/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--vo
lume', '/var/lib/config-data:/var/lib/config-data/:rw', '--volume', '/var/lib/container-puppet/puppetlabs/facter.conf:/etc/puppetlabs/facter/facter.conf:ro', '--volume', '/var/lib/container-puppet/puppetlabs/:/opt
/puppetlabs/:ro', '--volume', '/dev/log:/dev/log:rw', '--log-opt', 'path=/var/log/containers/stdouts/container-puppet-haproxy.log', '--security-opt', 'label=disable', '--volume', '/usr/share/openstack-puppet/modul
es/:/usr/share/openstack-puppet/modules/:ro', '--volume', '/etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro', '--entrypoint', '/var/lib/container-puppet/container-puppet.s
h', '--net', 'host', '--volume', '/etc/hosts:/etc/hosts:ro', '--volume', '/var/lib/container-puppet/container-puppet.sh:/var/lib/container-puppet/container-puppet.sh:ro', '192.168.24.1:8787/rh-osbs/rhosp15-opensta
ck-haproxy:20191106.1'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         "Error: write child: broken pipe",
2019-11-15 05:29:44 |         " attempt(s): 1",
2019-11-15 05:29:44 |         "2019-11-15 05:28:10,140 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:16,325 ERROR: 178645 -- ['/usr/bin/podman', 'start', '-a', 'container-puppet-haproxy'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         "Error: unable to start container faae0f9449ae1fb795511196ceb47b3b7ea04b357d95088dbecde8508bf6622b: write child: broken pipe",
2019-11-15 05:29:44 |         " attempt(s): 2",
2019-11-15 05:29:44 |         "2019-11-15 05:28:16,326 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,166 ERROR: 178645 -- ['/usr/bin/podman', 'start', '-a', 'container-puppet-haproxy'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         " attempt(s): 3",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,167 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,167 ERROR: 178645 -- Failed running container for haproxy",

we got similar log for other containers:

(undercloud) [stack@undercloud-0 ~]$ grep 'start.*run failed after \[conmon:e\]: No such log driver json-file' overcloud_update_run_Controller.log  | cut -d' ' -f20-20 | cut -d- -f-3 | sort -u
'container-puppet-aodh
'container-puppet-ceilometer
'container-puppet-cinder
'container-puppet-clustercheck']
'container-puppet-crond
'container-puppet-glance_api
'container-puppet-gnocchi
'container-puppet-haproxy']
'container-puppet-heat
'container-puppet-heat_api
'container-puppet-heat_api_cfn
'container-puppet-horizon
'container-puppet-iscsid
'container-puppet-keystone
'container-puppet-memcached
'container-puppet-mysql']
'container-puppet-neutron
'container-puppet-nova
'container-puppet-nova_metadata
'container-puppet-nova_placement
'container-puppet-ovn_controller
'container-puppet-panko
'container-puppet-rabbitmq']
'container-puppet-redis']
'container-puppet-swift
'container-puppet-swift_ringbuilder


It's a known issue, and has been fixed, but it happens that the fix isn't working during update.  Especially fix for:

 - https://bugzilla.redhat.com/show_bug.cgi?id=1761867 (paunch) (note that paunch in that version was tested too https://bugzilla.redhat.com/show_bug.cgi?id=1769209)
 - https://bugzilla.redhat.com/show_bug.cgi?id=1769291 (tht)

are included in the target repo.

Comment 2 Sofer Athlan-Guyot 2019-11-19 15:06:40 UTC
So, as described in the launchpad bug, this boils down to container-puppet.py not being regenerated during update.  This prevents the fix from being included at that stage of the update process (just before regenerating the configuration - common_deploy_steps_tasks) and after the update and host prep tasks.

Comment 5 Sofer Athlan-Guyot 2019-12-17 17:59:01 UTC
Hi,

that job http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-15-from-GA-HA_no_ceph-ipv4/7/ run past the whole update from GA to RHOS_TRUNK-15.0-RHEL-8-20191212.n.0, it fails but for other reason at reboot. The stage that this review fix is run successfully.

Comment 6 Alex McLeod 2020-02-19 12:48:57 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 8 errata-xmlrpc 2020-03-05 12:00:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0643