Bug 1728678
| Summary: | OSP 14->15: haproxy_init_bundle fails with "unable to get cib" | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jiri Stransky <jstransk> |
| Component: | openstack-tripleo-heat-templates | Assignee: | RHOS Maint <rhos-maint> |
| Status: | CLOSED WORKSFORME | QA Contact: | Sasha Smolyak <ssmolyak> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 15.0 (Stein) | CC: | bperkins, mburns |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-07-11 12:01:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1727807 | ||
|
Description
Jiri Stransky
2019-07-10 12:23:23 UTC
I was able to get this "minimal" reproducer outside the deployment/upgrade tooling. First i check that the cluster is running on the node: [root@controller-0 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-0 (version 2.0.1-4.el8_0.3-0eb7991564) - partition with quorum Last updated: Thu Jul 11 08:57:25 2019 Last change: Wed Jul 10 10:20:34 2019 by root via cibadmin on controller-0 1 node configured 0 resources configured Online: [ controller-0 ] No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled Then i look at the command used for the haproxy_init_bundle container: [root@controller-0 ~]# paunch debug --action print-cmd --file /var/lib/tripleo-config/container-startup-config-step_2.json --container haproxy_init_bundle podman run --name haproxy_init_bundle-2l2qbc2v --conmon-pidfile=/var/run/haproxy_init_bundle.pid --env=TRIPLEO_DEPLOY_IDENTIFIER=1562750152 --net=host --ipc=host --privileged=true --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/var/lib/container-config-scripts/container_puppet_apply.sh:/container_puppet_apply.sh:ro --volume=/etc/puppet:/tmp/puppet-etc:ro --volume=/usr/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro --volume=/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack-haproxy:latest /container_puppet_apply.sh 2 file,file_line,concat,augeas,pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ip,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation include ::tripleo::profile::base::pacemaker; include ::tripleo::profile::pacemaker::haproxy_bundle And i edit the command to run just `pcs status` instead of puppet: [root@controller-0 ~]# podman run --rm -ti --name haproxy_init_bundle-test --conmon-pidfile=/var/run/haproxy_init_bundle.pid --env=TRIPLEO_DEPLOY_IDENTIFIER=1562750152 --net=host --ipc=host --privileged=true --u ser=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trus t/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/c ert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/var/lib/container-config-scripts/container_puppet_apply.sh:/container_puppet_apply.sh:ro --volume=/etc/puppet:/tmp/puppet-etc:ro --volume=/us r/share/openstack-puppet/modules:/usr/share/openstack-puppet/modules:ro --volume=/etc/corosync/corosync.conf:/etc/corosync/corosync.conf:ro brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhosp15/openstack- haproxy:latest pcs status Error: cluster is not currently running on this node Which prints that cluster is not running on this node. But it is running there :). Not sure if it could be related but the env had a workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1726680 applied. So before running the upgrade, i ran `userdel hacluster` to force recreation of the user and re-authentication. The workaround worked, because the hacluster user was recreated and pcmk cluster was correctly formed, before hitting the issue with haproxy and redis init containers. This is likely caused by the workaround for https://bugzilla.redhat.com/show_bug.cgi?id=1726680 which causes the hacluster UID/GID to change from well-known to "random". We need a different workaround -- just changing password instead of deleting the user, to force Puppet to refresh its resources, but not break the UID/GID expectations on hacluster user. I'm testing and if that ^ fixes the problem, i'll close this as WFM. New workarounds for bug 1726680 doesn't cause this issue. Closing. |