Bug 1911620 - [RHOSP13]Regression blocks minor upgrades for overclouds with TripleO-managed Ceph clusters
Summary: [RHOSP13]Regression blocks minor upgrades for overclouds with TripleO-managed...
Keywords:
Status: CLOSED DUPLICATE of bug 1910842
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-30 12:33 UTC by Alex Stupnikov
Modified: 2024-06-13 23:54 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-07 14:01:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-32302 0 None None None 2024-06-13 23:54:18 UTC
Red Hat Knowledge Base (Solution) 5674631 0 None None None 2020-12-30 12:38:47 UTC

Description Alex Stupnikov 2020-12-30 12:33:52 UTC
Description of problem:

A patch for bug #1877815 blocks minor upgrade procedures for RHOSP 13 clusters with TripleO-controlled Ceph if new docker RPM is available.

A customer reported that minor upgrade procedure is not working well: it fails because of error [1] on some controller node. The fix is simple: to manually start appropriate services and repeat minor upgrade procedure; next run would fail on next controller node.

As a result, this problem is not complete blocker, but at the same time it requires 4 executions of "openstack overcloud update run" command.

Customer provided sosreports from affected controller node, director node and also provided complete set of mistral logs. From logs it looks like ceph-mon container (and other docker containers) was originally stopped by "Stop docker" play, so "Double check the mon systemd unit is not consistent with the current mon" finds out that ceph-mon unit is not running and ansible runs "Stop mons to make them consistent with systemd" play, which fails because there is not appropriate container.

Again, full set of logs is available in provided case's attachements.

[1]
  2020-12-29 16:10:52,071 p=12272 u=mistral |  fatal: [controller01]: FAILED! => {"changed": true, "cmd": "docker stop ceph-mon-controller01", "delta": "0:00:00.032422", "end": "2020-12-29 16:10:52.048778", "msg": "non-zero return code", "rc": 1, "start": "2020-12-29 16:10:52.016356", "stderr": "Error response from daemon: No such container: ceph-mon-controller01", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-controller01"], "stdout": "", "stdout_lines": []}

Comment 4 John Fulton 2021-01-07 13:24:43 UTC
Please use https://access.redhat.com/solutions/5679791 to workaround this issue. 
This looks like a duplicate of bug 1910842.


Note You need to log in before you can comment on or make changes to this bug.