Bug 1774076 - OSP15 update fails during overcloud update with "No such log driver json-file"
Summary: OSP15 update fails during overcloud update with "No such log driver json-file"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z2
: 15.0 (Stein)
Assignee: Sofer Athlan-Guyot
QA Contact: Sofer Athlan-Guyot
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-19 14:07 UTC by Sofer Athlan-Guyot
Modified: 2020-03-05 12:01 UTC (History)
4 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.6.2-0.20191125220521.97cc159.el8ost
Doc Type: Bug Fix
Doc Text:
Previously, there was a change to the log parameter in the podman interface that introduced an issue with tripleo-heat-templates, which caused updates to fail. With this update, the issue has been resolved and updates pass successfully.
Clone Of:
Environment:
Last Closed: 2020-03-05 12:00:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1853156 0 None None None 2019-11-19 15:06:39 UTC
OpenStack gerrit 695861 0 'None' MERGED Make sure we apply all deploy step-0 during update. 2020-03-02 13:47:55 UTC
Red Hat Product Errata RHBA-2020:0643 0 None None None 2020-03-05 12:01:33 UTC

Description Sofer Athlan-Guyot 2019-11-19 14:07:56 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Sofer Athlan-Guyot 2019-11-19 14:21:57 UTC
Here comes the description.

During OSP15 update from 8.0/GA to 8.1//RHOS_TRUNK-15.0-RHEL-8-20191115.n.0 we are failing with:

2019-11-15 05:29:44 |         "2019-11-15 05:28:06,747 INFO: 178644 -- Pulling image: 192.168.24.1:8787/rh-osbs/rhosp15-openstack-glance-api:20191106.1",
2019-11-15 05:29:44 |         "2019-11-15 05:28:10,139 ERROR: 178645 -- ['/usr/bin/podman', 'run', '--user', 'root', '--name', 'container-puppet-haproxy', '--env', 'PUPPET_TAGS=file,file_line,concat,augeas,cron,ha
proxy_config', '--env', 'NAME=haproxy', '--env', 'HOSTNAME=controller-0', '--env', 'NO_ARCHIVE=', '--env', 'STEP=6', '--env', 'NET_HOST=true', '--log-driver', 'json-file', '--volume', '/etc/localtime:/etc/localtim
e:ro', '--volume', '/tmp/tmpkkpdgv7j:/etc/config.pp:ro', '--volume', '/etc/puppet/:/tmp/puppet-etc/:ro', '--volume', '/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro', '--volume', '/etc/pki/tls/certs/ca
-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro', '--volume', '/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro', '--volume', '/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro', '--vo
lume', '/var/lib/config-data:/var/lib/config-data/:rw', '--volume', '/var/lib/container-puppet/puppetlabs/facter.conf:/etc/puppetlabs/facter/facter.conf:ro', '--volume', '/var/lib/container-puppet/puppetlabs/:/opt
/puppetlabs/:ro', '--volume', '/dev/log:/dev/log:rw', '--log-opt', 'path=/var/log/containers/stdouts/container-puppet-haproxy.log', '--security-opt', 'label=disable', '--volume', '/usr/share/openstack-puppet/modul
es/:/usr/share/openstack-puppet/modules/:ro', '--volume', '/etc/pki/tls/private/overcloud_endpoint.pem:/etc/pki/tls/private/overcloud_endpoint.pem:ro', '--entrypoint', '/var/lib/container-puppet/container-puppet.s
h', '--net', 'host', '--volume', '/etc/hosts:/etc/hosts:ro', '--volume', '/var/lib/container-puppet/container-puppet.sh:/var/lib/container-puppet/container-puppet.sh:ro', '192.168.24.1:8787/rh-osbs/rhosp15-opensta
ck-haproxy:20191106.1'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         "Error: write child: broken pipe",
2019-11-15 05:29:44 |         " attempt(s): 1",
2019-11-15 05:29:44 |         "2019-11-15 05:28:10,140 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:16,325 ERROR: 178645 -- ['/usr/bin/podman', 'start', '-a', 'container-puppet-haproxy'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         "Error: unable to start container faae0f9449ae1fb795511196ceb47b3b7ea04b357d95088dbecde8508bf6622b: write child: broken pipe",
2019-11-15 05:29:44 |         " attempt(s): 2",
2019-11-15 05:29:44 |         "2019-11-15 05:28:16,326 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,166 ERROR: 178645 -- ['/usr/bin/podman', 'start', '-a', 'container-puppet-haproxy'] run failed after [conmon:e]: No such log driver json-file",
2019-11-15 05:29:44 |         " attempt(s): 3",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,167 WARNING: 178645 -- Retrying running container: haproxy",
2019-11-15 05:29:44 |         "2019-11-15 05:28:25,167 ERROR: 178645 -- Failed running container for haproxy",

we got similar log for other containers:

(undercloud) [stack@undercloud-0 ~]$ grep 'start.*run failed after \[conmon:e\]: No such log driver json-file' overcloud_update_run_Controller.log  | cut -d' ' -f20-20 | cut -d- -f-3 | sort -u
'container-puppet-aodh
'container-puppet-ceilometer
'container-puppet-cinder
'container-puppet-clustercheck']
'container-puppet-crond
'container-puppet-glance_api
'container-puppet-gnocchi
'container-puppet-haproxy']
'container-puppet-heat
'container-puppet-heat_api
'container-puppet-heat_api_cfn
'container-puppet-horizon
'container-puppet-iscsid
'container-puppet-keystone
'container-puppet-memcached
'container-puppet-mysql']
'container-puppet-neutron
'container-puppet-nova
'container-puppet-nova_metadata
'container-puppet-nova_placement
'container-puppet-ovn_controller
'container-puppet-panko
'container-puppet-rabbitmq']
'container-puppet-redis']
'container-puppet-swift
'container-puppet-swift_ringbuilder


It's a known issue, and has been fixed, but it happens that the fix isn't working during update.  Especially fix for:

 - https://bugzilla.redhat.com/show_bug.cgi?id=1761867 (paunch) (note that paunch in that version was tested too https://bugzilla.redhat.com/show_bug.cgi?id=1769209)
 - https://bugzilla.redhat.com/show_bug.cgi?id=1769291 (tht)

are included in the target repo.

Comment 2 Sofer Athlan-Guyot 2019-11-19 15:06:40 UTC
So, as described in the launchpad bug, this boils down to container-puppet.py not being regenerated during update.  This prevents the fix from being included at that stage of the update process (just before regenerating the configuration - common_deploy_steps_tasks) and after the update and host prep tasks.

Comment 5 Sofer Athlan-Guyot 2019-12-17 17:59:01 UTC
Hi,

that job http://staging-jenkins2-qe-playground.usersys.redhat.com/job/DFG-upgrades-updates-15-from-GA-HA_no_ceph-ipv4/7/ run past the whole update from GA to RHOS_TRUNK-15.0-RHEL-8-20191212.n.0, it fails but for other reason at reboot. The stage that this review fix is run successfully.

Comment 6 Alex McLeod 2020-02-19 12:48:57 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 8 errata-xmlrpc 2020-03-05 12:00:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0643


Note You need to log in before you can comment on or make changes to this bug.