Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1859173

Summary: [Ceph-ansible][Containers]: Rolling update fails due to monitor service failure while upgrading from RHCS4.1 to 4.1z1
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Ameena Suhani S H <amsyedha>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Ameena Suhani S H <amsyedha>
Severity: high Docs Contact:
Priority: high    
Version: 4.1CC: aschoen, ceph-eng-bugs, dsavinea, gabrioux, gmeno, kdreyer, nthomas, r.martinez, tserlin, vashastr, ykaul
Target Milestone: z1Keywords: Regression
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.25.1-1.el8cp, ceph-ansible-4.0.25.1-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-04 18:48:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Ansible log none

Description Ameena Suhani S H 2020-07-21 11:39:37 UTC
Created attachment 1701882 [details]
Ansible log

Description of problem:
Rolling update fails with following error
 fatal: [magna063]: FAILED! => changed=false 
  invocation:
    module_args:
      daemon_reexec: false
      daemon_reload: true
      enabled: true
      force: null
      masked: null
      name: ceph-mon@magna063
      no_block: false
      scope: null
      state: restarted
      user: null
  msg: |-
    Unable to restart service ceph-mon@magna063: Job for ceph-mon failed because the control process exited with error code.
    See "systemctl status ceph-mon" and "journalctl -xe" for details.


Version-Release number of selected component (if applicable):
ceph-ansible-4.0.25-1.el8cp.noarch
ansible-2.8.13-1.el8ae.noarch

How reproducible:
2/2

Steps to Reproduce:
1.Deployed a 4.1 cluster in container
2.Upgrade it to 4.1z1


Actual results:
Rolling update failed

Expected results:
Rolling update should pass

Additional info:
# journalctl -xe


Jul 21 11:36:40 magna063 podman[67522]: cluster 2020-07-21 11:36:34.491378 mgr.magna062 (mgr.24174) 5447 : cluster [DBG] pgmap v5443: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 1.7 KiB/s rd, 1 op/s
Jul 21 11:36:40 magna063 podman[67522]: cluster 2020-07-21 11:36:36.492330 mgr.magna062 (mgr.24174) 5448 : cluster [DBG] pgmap v5444: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 2.5 KiB/s rd, 2 op/s
Jul 21 11:36:40 magna063 podman[67522]: cluster 2020-07-21 11:36:38.492784 mgr.magna062 (mgr.24174) 5449 : cluster [DBG] pgmap v5445: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 1.7 KiB/s rd, 1 op/s
Jul 21 11:36:40 magna063 podman[107077]: Error: error creating container storage: the container name "ceph-mon-magna063" is already in use by "6e16d65c24b37ec28a8ef3fd3bee29aa01d83f3fcafcb56fcc5cf57590f18409". You have to remove that con>
Jul 21 11:36:40 magna063 systemd[1]: ceph-mon: Control process exited, code=exited status=125
Jul 21 11:36:40 magna063 systemd[1]: ceph-mon: Failed with result 'exit-code'.
Jul 21 11:36:40 magna063 systemd[1]: Failed to start Ceph Monitor.
-- Subject: Unit ceph-mon has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit ceph-mon has failed.
-- 
-- The result is failed.
Jul 21 11:36:50 magna063 podman[67522]: cluster 2020-07-21 11:36:40.493752 mgr.magna062 (mgr.24174) 5450 : cluster [DBG] pgmap v5446: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 2.5 KiB/s rd, 2 op/s
Jul 21 11:36:50 magna063 podman[67522]: cluster 2020-07-21 11:36:42.494192 mgr.magna062 (mgr.24174) 5451 : cluster [DBG] pgmap v5447: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 1.7 KiB/s rd, 1 op/s
Jul 21 11:36:50 magna063 podman[67522]: cluster 2020-07-21 11:36:44.494633 mgr.magna062 (mgr.24174) 5452 : cluster [DBG] pgmap v5448: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 1.7 KiB/s rd, 1 op/s
Jul 21 11:36:50 magna063 podman[67522]: cluster 2020-07-21 11:36:46.495544 mgr.magna062 (mgr.24174) 5453 : cluster [DBG] pgmap v5449: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 2.5 KiB/s rd, 2 op/s
Jul 21 11:36:50 magna063 podman[67522]: cluster 2020-07-21 11:36:48.496055 mgr.magna062 (mgr.24174) 5454 : cluster [DBG] pgmap v5450: 152 pgs: 152 active+clean; 8.2 KiB data, 68 MiB used, 8.2 TiB / 8.2 TiB avail; 1.7 KiB/s rd, 1 op/s
Jul 21 11:36:50 magna063 systemd[1]: ceph-mon: Service RestartSec=10s expired, scheduling restart.
Jul 21 11:36:50 magna063 systemd[1]: ceph-mon: Scheduled restart job, restart counter is at 972.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Automatic restarting of the unit ceph-mon has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Jul 21 11:36:50 magna063 systemd[1]: Stopped Ceph Monitor.
-- Subject: Unit ceph-mon has finished shutting down
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit ceph-mon has finished shutting down.
Jul 21 11:36:50 magna063 systemd[1]: ceph-mon: Found left-over process 67522 (podman) in control group while starting unit. Ignoring.
Jul 21 11:36:50 magna063 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 11:36:50 magna063 systemd[1]: Starting Ceph Monitor...
-- Subject: Unit ceph-mon has begun start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit ceph-mon has begun starting up.
Jul 21 11:36:50 magna063 systemd[1]: ceph-mon: Found left-over process 67522 (podman) in control group while starting unit. Ignoring.
Jul 21 11:36:50 magna063 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jul 21 11:36:50 magna063 podman[107100]: Error: error creating container storage: the container name "ceph-mon-magna063" is already in use by "6e16d65c24b37ec28a8ef3fd3bee29aa01d83f3fcafcb56fcc5cf57590f18409". You have to remove that con>
Jul 21 11:36:50 magna063 systemd[1]: ceph-mon: Control process exited, code=exited status=125
Jul 21 11:36:50 magna063 systemd[1]: ceph-mon: Failed with result 'exit-code'.
Jul 21 11:36:50 magna063 systemd[1]: Failed to start Ceph Monitor.
-- Subject: Unit ceph-mon has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit ceph-mon has failed.
-- 
-- The result is failed.
lines 2218-2271/2271 (END)

Comment 12 Dimitri Savineau 2020-07-30 15:23:38 UTC
*** Bug 1861522 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2020-08-04 18:48:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3322