Bug 2129252

Summary: cephadm `rm-cluster` fails to clean `/etc/ceph` directory
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vaibhav Mahajan <vamahaja>
Component: CephadmAssignee: Redouane Kachach Elhichou <rkachach>
Status: NEW --- QA Contact: Vaibhav Mahajan <vamahaja>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.3CC: cephqe-warriors, rkachach, vereddy, vpapnoi
Target Milestone: ---   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vaibhav Mahajan 2022-09-23 05:16:33 UTC
Description of problem:
  ceph.client.1.keyring and podman-auth.json files are present in `/etc/ceph` directories even aftercleaning cluster using cephadm rm-cluster command

Version-Release number of selected component (if applicable):

  $ rpm -qa cephadm
  cephadm-16.2.8-85.el8cp.noarch

  $ sudo cephadm shell -- ceph --version
  Using recent ceph image registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:31fbe18b6f81c53d21053a4a0897bc3875e8ee8ec424393e4d5c3c3afd388274
  ceph version 16.2.8-85.el8cp (0bdc6db9a80af40dd496b05674a938d406a9f6f5) pacific (stable)

How reproducible: Always


Steps to Reproduce:

  Depploy cluster with below configuration:
    4-Node cluster(RHEL-8.6)
    3 MONS, 2 MDS, 1 MGR, 3 OSD and 2 RGW service daemon(s)
     Node1 - Mon, Mgr, Installer, OSD, alertmanager, grafana, prometheus, node-exporter
     Node2 - Mon, Mgr, OSD, MDS, RGW, alertmanager, node-exporter, haproxy
     Node3 - Mon, OSD, MDS, RGW, node-exporter, haproxy
     Node4 - RGW
     Node5 - Client

  Execute below steps:
    (1) Bootstrap cluster with below options -
        - skip-monitoring-stack
        - orphan-initial-daemons
        - fsid : f64f341c-655d-11eb-8778-fa163e914bcc
        - initial-dashboard-user: admin123
        - initial-dashboard-password: admin@123,
        - registry-json: registry.json
        - apply-spec: <list of service specification containing multiple admin nodes, mon, mgr, osd and rgw deployment>
        - ssh-user: <ssh user name>
        - ssh-public-key: <path to the custom ssh public key file>
        - ssh-private-key: <path to the custom ssh private key file>
        - mon-ip: <monitor IP address: Required>
    (2) Copy the provided SSH keys to nodes and Add it to cluster with address and role labels attached to it.
    (3) Deploy HA proxy service on node2 and node3 using apply spec option.
    (4) Configure client node by adding ceph.conf and keying to node.
    (5) Setup S3cmd tool and prepare for RGW IO on client Node.
    (6) Run IOs from S3cmd tool for 20 mins.
    (7) Kernel Mount:
        - Create /mnt/cephfs directory and Mount cephfs on client node.
          sudo mount -t ceph 10.8.128.110:6789:/ /mnt/cephfs -o name=client.0,secret=<key>
        - using dd command create files on /mnt/cephfs directory.
    (8) Disable ha proxy and verify                                                                                                     
    (9) Enable ha proxy and verify
    (10) Restart ha proxy and verify

  Execute Below steps to clean cluster -
    (1) Disable cephadm module 
          $ cephadm shell -- ceph mgr module disable cephadm
    (2) Execute cephadm rm-cluster command with the fsid on each cluster node
          $ cephadm rm-cluster --fsid <cluster-fsid> --zap-osds --force
    (3) Validate `/etc/ceph`, `/var/lib/ceph`, `/var/lib/cephadm` directories, ceph disks and running containers

Actual results:
    (1) `/etc/ceph` directory contains ceph.client.1.keyring and podman-auth.json files
	  $ cat /etc/hosts | grep ceph | grep -v node5 | awk '{print $2}' | xargs -I{} -t ssh {} "sudo ls /etc/ceph/"
	
	  ssh ceph-vamahaja-khe1em-node4 sudo ls /etc/ceph/ 
	  ls: cannot access '/etc/ceph/': No such file or directory
	
	  ssh ceph-vamahaja-khe1em-node3 sudo ls /etc/ceph/ 
	  podman-auth.json
	
	  ssh ceph-vamahaja-khe1em-node1-installer sudo ls /etc/ceph/ 
	  ceph.client.1.keyring
	  podman-auth.json
	
	  ssh ceph-vamahaja-khe1em-node2 sudo ls /etc/ceph/ 
	  podman-auth.json

Expected results:
    (1) `/etc/ceph`, `/var/lib/ceph`, `/var/lib/cephadm` directories should be cleaned, No ceph disks are present and all rhceph containers shoud be exited.