Bug 2007516 - [RBD]ISCSI- Ceph cluster goes to error state after performing multiple removal and deployment of ISCSI
Summary: [RBD]ISCSI- Ceph cluster goes to error state after performing multiple remova...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 5.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 5.1
Assignee: Melissa Li
QA Contact: Preethi
URL:
Whiteboard:
: 2034789 2049006 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-24 06:31 UTC by Preethi
Modified: 2022-12-07 17:15 UTC (History)
11 users (show)

Fixed In Version: ceph-16.2.7-67.el8cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-04 10:21:43 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 53706 0 None None None 2022-01-05 16:52:37 UTC
Red Hat Issue Tracker RHCEPH-1878 0 None None None 2021-09-24 06:35:03 UTC
Red Hat Knowledge Base (Solution) 6726461 0 None None None 2022-02-11 13:54:29 UTC
Red Hat Product Errata RHSA-2022:1174 0 None None None 2022-04-04 10:22:07 UTC

Internal Links: 2034789

Description Preethi 2021-09-24 06:31:10 UTC
Description of problem: Ceph cluster goes to error state after performing multiple removal and deployment of ISCSI 
-  health: HEALTH_ERR
            Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-pnataraj-7ypsv7-node3' does not exist retval: -2



Version-Release number of selected component (if applicable):

[ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph version
ceph version 16.2.0-79.el8cp (63c5c96018da6d39383c8f5ae534a0d1523fc274) pacific (stable)

How reproducible:


Steps to Reproduce:
1. Deploy 5.0 cluster with mgr,mon, osd services
2. Create pool and deploy ISCSI with 4 gateways
3. Check "Ceph orch ls" for service status
4. Perform Removal of service and deploy ISCSI for 2-3 times
5. Check the cluster health and Ceph orch ls

I copied the keyring, conf and cephadm to primary gateway and perform removal of ISCSIfrom gateway node and vice versa from bootstarp node

Actual results:
Seeing the below error in "Ceph orch ls"
[root@ceph-pnataraj-7ypsv7-node1-installer cephuser]# cephadm shell
Inferring fsid f64f341c-655d-11eb-8778-fa163e914bcc
Inferring config /var/lib/ceph/f64f341c-655d-11eb-8778-fa163e914bcc/mon.ceph-pnataraj-7ypsv7-node1-installer/config
Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:b37e99428d2304e11982d192a2a948526dd19c2196685dce656f205f3400de27
[ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph status
  cluster:
    id:     f64f341c-655d-11eb-8778-fa163e914bcc
    health: HEALTH_ERR
            Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-pnataraj-7ypsv7-node3' does not exist retval: -2
 
  services:
    mon: 3 daemons, quorum ceph-pnataraj-7ypsv7-node1-installer,ceph-pnataraj-7ypsv7-node6,ceph-pnataraj-7ypsv7-node2 (age 4h)
    mgr: ceph-pnataraj-7ypsv7-node1-installer.jxhifn(active, since 6d), standbys: ceph-pnataraj-7ypsv7-node2.gzykir
    osd: 12 osds: 12 up (since 6d), 12 in (since 6d)
    rgw: 2 daemons active (2 hosts, 1 zones)
 
  data:
    pools:   9 pools, 233 pgs
    objects: 631 objects, 1.5 GiB
    usage:   13 GiB used, 167 GiB / 180 GiB avail
    pgs:     233 active+clean
 
  io:
    client:   2.5 KiB/s rd, 2 op/s rd, 0 op/s wr
 
[ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph orch ls
NAME                       RUNNING  REFRESHED   AGE  PLACEMENT                                                                                                    
alertmanager                   2/2  19h ago     6d   ceph-pnataraj-7ypsv7-node1-installer;ceph-pnataraj-7ypsv7-node2                                              
grafana                        1/1  19h ago     6d   ceph-pnataraj-7ypsv7-node1-installer                                                                         
iscsi.iscsi                    3/4  <deleting>  19h  ceph-pnataraj-7ypsv7-node3;ceph-pnataraj-7ypsv7-node4;ceph-pnataraj-7ypsv7-node5;ceph-pnataraj-7ypsv7-node8  
mgr                            2/2  19h ago     22h  ceph-pnataraj-7ypsv7-node1-installer;ceph-pnataraj-7ypsv7-node2;count:2                                      
mon                            3/3  19h ago     6d   label:mon                                                                                                    
node-exporter                  8/8  19h ago     6d   *                                                                                                            
osd.all-available-devices    12/20  19h ago     6d   *                                                                                                            
prometheus                     1/1  19h ago     6d   ceph-pnataraj-7ypsv7-node1-installer                                                                         
rgw.foo                        2/2  19h ago     6d   ceph-pnataraj-7ypsv7-node6;ceph-pnataraj-7ypsv7-node7;count:2                                                
[ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# ceph version
ceph version 16.2.0-79.el8cp (63c5c96018da6d39383c8f5ae534a0d1523fc274) pacific (stable)
[ceph: root@ceph-pnataraj-7ypsv7-node1-installer /]# 


Node details:
10.0.209.88 cephuser@cephuser
root@q

No errors were noticed in mgr logs

NOTE:We also noticed the below error while adding the ISCSI gatewate hence, did removal of service and redeploy of ISCSI because of the below issue and therefore we encountered the above issue

/iscsi-target...-igw/gateways> create ceph-gw-1 10.0.210.8
The first gateway defined must be the local machine
/iscsi-target...-igw/gateways> create ceph-gw-1 10.0.209.227
The first gateway defined must be the local machine
/iscsi-target...-igw/gateways>

Expected results: 
Cluster should not enter to error state irrespective of multiple times of removal and deployment

Additional info:

ISCSI spec file for reference:
service_type: iscsi
service_id: iscsi
placement:
  hosts:
   - ceph-pnataraj-7ypsv7-node3
   - ceph-pnataraj-7ypsv7-node4
   - ceph-pnataraj-7ypsv7-node5
   - ceph-pnataraj-7ypsv7-node8
spec:
  pool: iscsi
  trusted_ip_list: "10.0.210.8,10.0.209.227,10.0.210.191,10.0.211.111"

Comment 1 Preethi 2021-09-24 06:37:18 UTC
http://pastebin.test.redhat.com/996372 - mgr logs snippet.

Comment 2 Daniel Pivonka 2021-10-07 20:47:42 UTC
upstream tracker: https://tracker.ceph.com/issues/52866
upstream master pr: https://github.com/ceph/ceph/pull/43454

Comment 8 Preethi 2021-12-22 12:09:27 UTC
Issue is still seen with the latest ceph version ceph version 16.2.7-9.el8cp




[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch apply iscsi test1 --placement="ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node5" --trusted_ip_list="10.0.211.165,10.0.209.32" admin admin
Scheduled iscsi.test1 update...
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                                                        
iscsi.test1                           0/2  -          5s   ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  108s ago   1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  108s ago   1h   label:mon                                                        
osd.all-available-devices              16  108s ago   1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                                                        
iscsi.test1                           2/2  -          10s  ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  112s ago   1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  112s ago   1h   label:mon                                                        
osd.all-available-devices              16  112s ago   1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch apply iscsi test1 --placement="ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node5" --trusted_ip_list="10.0.211.165,10.0.209.32" admin admin^C
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch rm iscsi.test1
Removed service iscsi.test1
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED  AGE  PLACEMENT                             
mgr                                   1/1  21s ago    1h   ceph-ci-lfir5-kmnh9c-node1-installer  
mon                                   3/3  32s ago    1h   label:mon                             
osd.all-available-devices              16  73s ago    1h   *                                     
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch apply iscsi test1 --placement="ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node5" --trusted_ip_list="10.0.211.165,10.0.209.32" admin admin
Scheduled iscsi.test1 update...
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch rm iscsi.test1
Removed service iscsi.test1
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED   AGE  PLACEMENT                                                        
iscsi.test1                           1/2  <deleting>  17s  ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  8s ago      1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  56s ago     1h   label:mon                                                        
osd.all-available-devices              16  97s ago     1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED   AGE  PLACEMENT                                                        
iscsi.test1                           1/2  <deleting>  23s  ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  15s ago     1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  62s ago     1h   label:mon                                                        
osd.all-available-devices              16  104s ago    1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED   AGE  PLACEMENT                                                        
iscsi.test1                           1/2  <deleting>  25s  ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  16s ago     1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  64s ago     1h   label:mon                                                        
osd.all-available-devices              16  106s ago    1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED   AGE  PLACEMENT                                                        
iscsi.test1                           1/2  <deleting>  27s  ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  18s ago     1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  66s ago     1h   label:mon                                                        
osd.all-available-devices              16  107s ago    1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# 
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# 
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED   AGE  PLACEMENT                                                        
iscsi.test1                           1/2  <deleting>  48s  ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  40s ago     1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  87s ago     1h   label:mon                                                        
osd.all-available-devices              16  2m ago      1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph orch ls
NAME                       PORTS  RUNNING  REFRESHED   AGE  PLACEMENT                                                        
iscsi.test1                           1/2  <deleting>  56s  ceph-ci-lfir5-kmnh9c-node1-installer;ceph-ci-lfir5-kmnh9c-node5  
mgr                                   1/1  48s ago     1h   ceph-ci-lfir5-kmnh9c-node1-installer                             
mon                                   3/3  95s ago     1h   label:mon                                                        
osd.all-available-devices              16  2m ago      1h   *                                                                
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]# ceph status
  cluster:
    id:     f64f341c-655d-11eb-8778-fa163e914bcc
    health: HEALTH_ERR
            Module 'cephadm' has failed: dashboard iscsi-gateway-rm failed: iSCSI gateway 'ceph-ci-lfir5-kmnh9c-node1-installer' does not exist retval: -2
 
  services:
    mon: 3 daemons, quorum ceph-ci-lfir5-kmnh9c-node1-installer,ceph-ci-lfir5-kmnh9c-node2,ceph-ci-lfir5-kmnh9c-node6 (age 25h)
    mgr: ceph-ci-lfir5-kmnh9c-node1-installer.pnbxql(active, since 25h)
    osd: 16 osds: 16 up (since 25h), 16 in (since 25h)
 
  data:
    pools:   8 pools, 201 pgs
    objects: 204 objects, 6.0 KiB
    usage:   590 MiB used, 239 GiB / 240 GiB avail
    pgs:     201 active+clean
 
  io:
    client:   852 B/s rd, 0 op/s rd, 0 op/s wr
 
[ceph: root@ceph-ci-lfir5-kmnh9c-node1-installer /]#

Comment 9 Melissa Li 2022-01-18 22:52:01 UTC
This issue occurs if the iscsi service is removed before the iscsi gateway list is updated with the deployed daemons (i.e. error occurs if `ceph dashboard iscsi-gateway-list` is empty when `ceph orch rm iscsi.iscsi` is run). If enough time has passed so that the gateway list is populated, no error will occur when it's removed. 

This is the upstream PR for this: https://github.com/ceph/ceph/pull/44549

Comment 13 Preethi 2022-02-22 09:25:53 UTC
Working as expected. No errors seen after multiple removal and deployment of ISCSI. Verified with latest ceph version

[ceph: root@magna031 /]# ceph orch ls
NAME               PORTS        RUNNING  REFRESHED  AGE  PLACEMENT                                                        
alertmanager       ?:9093,9094      1/1  18s ago    5M   count:1                                                          
crash                               9/9  6m ago     5M   *                                                                
grafana            ?:3000           1/1  18s ago    5M   count:1                                                          
iscsi.iscsipool                     2/2  -          10s  magna031;magna006                                                
mds.remote                          2/2  5m ago     11d  depressa004.ceph.redhat.com;depressa005.ceph.redhat.com;count:2  
mgr                                 3/3  5m ago     5M   magna031;magna032;magna006;count:3                               
mon                                 3/3  5m ago     5M   magna031;magna032;magna006;count:3                               
node-exporter      ?:9100           9/9  6m ago     5M   *                                                                
osd                                  15  6m ago     -    <unmanaged>                                                      
osd.osd_with_nvme                    12  5m ago     4M   depressa00[4-6].ceph.redhat.com                                  
prometheus         ?:9095           1/1  18s ago    5M   count:1                                                          
rbd-mirror                          1/1  18s ago    5M   magna031                                                         
rgw.foo            ?:80             2/2  5m ago     4M   count:2                                                          
[ceph: root@magna031 /]# ceph status
  cluster:
    id:     d6e5c458-0f10-11ec-9663-002590fc25a4
    health: HEALTH_OK
 
  services:
    mon:        3 daemons, quorum magna031,magna032,magna006 (age 93m)
    mgr:        magna006.vxieja(active, since 94m), standbys: magna031.xqwypm, magna032.lzjsxg
    mds:        1/1 daemons up, 1 standby
    osd:        27 osds: 27 up (since 82m), 27 in (since 4M)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        2 daemons active (2 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   30 pools, 913 pgs
    objects: 151.67k objects, 574 GiB
    usage:   2.9 TiB used, 105 TiB / 108 TiB avail
    pgs:     913 active+clean
 
  io:
    client:   511 B/s rd, 85 B/s wr, 0 op/s rd, 0 op/s wr
 
[ceph: root@magna031 /]# 



[ceph: root@magna031 /]# ceph version
ceph version 16.2.7-67.el8cp (2ff107c73e8642c55c83296928b5102b785ff4e2) pacific (stable)
[ceph: root@magna031 /]#

Comment 14 Guillaume Abrioux 2022-02-23 08:15:55 UTC
*** Bug 2049006 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2022-04-04 10:21:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1174

Comment 17 Bipin Kunal 2022-04-07 05:54:05 UTC
*** Bug 2034789 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.