Bug 2007298

Summary: cephadm mds upgrade procedure is incomplete
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Patrick Donnelly <pdonnell>
Component: CephadmAssignee: Patrick Donnelly <pdonnell>
Status: CLOSED ERRATA QA Contact: Amarnath <amk>
Severity: urgent Docs Contact: Karen Norteman <knortema>
Priority: urgent    
Version: 5.0CC: rlepaksh, tserlin, vereddy
Target Milestone: ---Keywords: Rebase
Target Release: 5.1   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: ceph-16.2.7-4.el8cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-04 10:21:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Patrick Donnelly 2021-09-23 14:13:59 UTC
Description of problem:

See: https://tracker.ceph.com/issues/52654

In particular: standby_replay is not disabled and the mds pre-upgrade procedure is not followed for minor releases (which will be the case for 5.0->5.1).

Comment 1 Sebastian Wagner 2021-11-04 16:20:58 UTC
https://github.com/ceph/ceph/pull/43800 is related, right?

Comment 2 Patrick Donnelly 2021-11-05 19:05:31 UTC
Yes, it is.

Comment 7 Amarnath 2022-01-11 09:02:25 UTC
Hi @pdonnell 

The steps given in the below doc are getting done by cephadm itself.

https://docs.ceph.com/en/pacific/cephfs/upgrading/


I had 1 filesystem(cephfs) with 2 active mds and 1 standby.

I triggered the upgrade without scaling down the mds nodes.

Do I need to validate anything else as part of this BZ

Cephadm Watch log snippet
2022-01-11T03:26:08.351387-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Scaling down filesystem cephfs
2022-01-11T03:26:12.400376-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node1-installer: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:12.400467-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node3: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:12.400515-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node2: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:12.434772-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Waiting for fs cephfs to scale down to reach 1 MDS
2022-01-11T03:26:25.446514-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node1-installer: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:25.446563-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node3: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:25.446602-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node2: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:25.466194-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Waiting for fs cephfs to scale down to reach 1 MDS
2022-01-11T03:26:38.376810-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node1-installer: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:38.376859-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node3: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:38.376898-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node2: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:26:38.403195-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: It appears safe to stop mds.cephfs.ceph-amk5-0-x4iy0t-node4.oznwkx
2022-01-11T03:26:38.851637-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Pulling registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:160faad614cca1a59e4277cb2390fe6f06590e8a2a30a9dcdcda066c5b13ce42 on ceph-amk5-0-x4iy0t-node4
2022-01-11T03:27:07.768027-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Updating mds.cephfs.ceph-amk5-0-x4iy0t-node4.oznwkx
2022-01-11T03:27:07.797611-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Deploying daemon mds.cephfs.ceph-amk5-0-x4iy0t-node4.oznwkx on ceph-amk5-0-x4iy0t-node4
2022-01-11T03:27:15.749383-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node1-installer: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:27:15.749453-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node3: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:27:15.749496-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node2: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:27:15.774618-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: It appears safe to stop mds.cephfs.ceph-amk5-0-x4iy0t-node5.oqwcen
2022-01-11T03:27:17.221240-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Updating mds.cephfs.ceph-amk5-0-x4iy0t-node5.oqwcen
2022-01-11T03:27:17.252551-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Deploying daemon mds.cephfs.ceph-amk5-0-x4iy0t-node5.oqwcen on ceph-amk5-0-x4iy0t-node5
2022-01-11T03:27:29.348173-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node1-installer: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:27:29.348330-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node3: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:27:29.348389-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node2: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:27:29.382209-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: It appears safe to stop mds.cephfs.ceph-amk5-0-x4iy0t-node6.qtibap
2022-01-11T03:27:29.949726-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Pulling registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:160faad614cca1a59e4277cb2390fe6f06590e8a2a30a9dcdcda066c5b13ce42 on ceph-amk5-0-x4iy0t-node6
2022-01-11T03:28:06.548533-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Updating mds.cephfs.ceph-amk5-0-x4iy0t-node6.qtibap
2022-01-11T03:28:06.581508-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Deploying daemon mds.cephfs.ceph-amk5-0-x4iy0t-node6.qtibap on ceph-amk5-0-x4iy0t-node6
2022-01-11T03:28:12.053903-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node1-installer: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:28:12.054020-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node3: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:28:12.054068-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Filtered out host ceph-amk5-0-x4iy0t-node2: does not belong to mon public_network (10.0.208.0/22)
2022-01-11T03:28:12.079396-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Setting container_image for all mds
2022-01-11T03:28:12.165909-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Scaling up filesystem cephfs max_mds to 2
2022-01-11T03:28:13.221406-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Setting container_image for all rgw
2022-01-11T03:28:13.244813-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Setting container_image for all rbd-mirror
2022-01-11T03:28:13.272128-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Setting container_image for all iscsi
2022-01-11T03:28:13.321711-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Setting container_image for all nfs
2022-01-11T03:28:13.353968-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Finalizing container_image settings
2022-01-11T03:28:13.638490-0500 mgr.ceph-amk5-0-x4iy0t-node1-installer.alzftf [INF] Upgrade: Complete!

Ceph version before Upgrade : 
[root@ceph-amk5-0-x4iy0t-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.0-150.el8cp (a95dd77b7d968852ee415f1b7dfe08c89414c96a) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.0-150.el8cp (a95dd77b7d968852ee415f1b7dfe08c89414c96a) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.0-150.el8cp (a95dd77b7d968852ee415f1b7dfe08c89414c96a) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.0-150.el8cp (a95dd77b7d968852ee415f1b7dfe08c89414c96a) pacific (stable)": 3
    },
    "overall": {
        "ceph version 16.2.0-150.el8cp (a95dd77b7d968852ee415f1b7dfe08c89414c96a) pacific (stable)": 20
    }
}
[root@ceph-amk5-0-x4iy0t-node7 ~]# ceph -s
  cluster:
    id:     1e55c34c-72ad-11ec-8aa5-fa163e0355b1
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-amk5-0-x4iy0t-node1-installer,ceph-amk5-0-x4iy0t-node3,ceph-amk5-0-x4iy0t-node2 (age 67m)
    mgr: ceph-amk5-0-x4iy0t-node1-installer.alzftf(active, since 70m), standbys: ceph-amk5-0-x4iy0t-node2.sdlbaf
    mds: 2/2 daemons up, 1 standby
    osd: 12 osds: 12 up (since 65m), 12 in (since 66m)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 41 objects, 3.6 KiB
    usage:   70 MiB used, 180 GiB / 180 GiB avail
    pgs:     65 active+clean

Ceph Version After Upgrade:

[root@ceph-amk5-0-x4iy0t-node7 ~]# ceph versions
{
    "mon": {
        "ceph version 16.2.7-18.el8cp (86cfa49b08da370afb3f98be618f2c3c1eae71fb) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.7-18.el8cp (86cfa49b08da370afb3f98be618f2c3c1eae71fb) pacific (stable)": 2
    },
    "osd": {
        "ceph version 16.2.7-18.el8cp (86cfa49b08da370afb3f98be618f2c3c1eae71fb) pacific (stable)": 12
    },
    "mds": {
        "ceph version 16.2.7-18.el8cp (86cfa49b08da370afb3f98be618f2c3c1eae71fb) pacific (stable)": 3
    },
    "overall": {
        "ceph version 16.2.7-18.el8cp (86cfa49b08da370afb3f98be618f2c3c1eae71fb) pacific (stable)": 20
    }
}
[root@ceph-amk5-0-x4iy0t-node7 ~]# ceph -s
  cluster:
    id:     1e55c34c-72ad-11ec-8aa5-fa163e0355b1
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph-amk5-0-x4iy0t-node1-installer,ceph-amk5-0-x4iy0t-node3,ceph-amk5-0-x4iy0t-node2 (age 108s)
    mgr: ceph-amk5-0-x4iy0t-node1-installer.alzftf(active, since 24m), standbys: ceph-amk5-0-x4iy0t-node2.sdlbaf
    mds: 2/2 daemons up, 1 standby
    osd: 12 osds: 12 up (since 21m), 12 in (since 94m)
 
  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 41 objects, 7.5 KiB
    usage:   89 MiB used, 180 GiB / 180 GiB avail
    pgs:     65 active+clean
 
[root@ceph-amk5-0-x4iy0t-node7 ~]#

Comment 11 errata-xmlrpc 2022-04-04 10:21:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174