Bug 2207601

Summary: Multus [Ipvlan, openshift-storage, Public net and cluster net], storagecluster stuck on Progressing state
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Oded <oviner>
Component: rookAssignee: Blaine Gardner <brgardne>
Status: CLOSED NOTABUG QA Contact: Neha Berry <nberry>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.13CC: ebenahar, muagarwa, ocs-bugs, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-16 20:08:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oded 2023-05-16 10:36:51 UTC
Description of problem (please be detailed as possible and provide log
snippets):

Multus deployment failed on this configuration
type: ipvlan
namespace of NAD:  openshift-storage
Public net and Cluster net

Version of all relevant components (if applicable):
ODF Version: odf-operator.v4.13.0-199.stable
OCP Version: 4.13.0-0.nightly-2023-05-11-225357
Platform: Vsphere


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Install OCP cluster without ODF [4.13.0-0.nightly-2023-05-11-225357]
2. Enable PROMISC on br-ex network interface on worker nodes
sh-4.4# ip link set promisc on br-ex

sh-4.4# ifconfig br-ex
br-ex: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST>  mtu 1500
        inet 10.1.161.34  netmask 255.255.254.0  broadcast 10.1.161.255
        ether 00:50:56:8f:30:9c  txqueuelen 1000  (Ethernet)
        RX packets 740525  bytes 1700301432 (1.5 GiB)
        RX errors 0  dropped 43  overruns 0  frame 0
        TX packets 593619  bytes 276630893 (263.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

3. Install ODF Operator [odf-operator.v4.13.0-199.stable]

4.Create network on openshift-storage namespace:
$ oc create -f network-attach.yaml 
networkattachmentdefinition.k8s.cni.cncf.io/public-net created
networkattachmentdefinition.k8s.cni.cncf.io/cluster-net created

$ oc get networkattachmentdefinition.k8s.cni.cncf.io -n openshift-storage
NAME          AGE
cluster-net   42s
public-net    43s

*******************************************************************************
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: public-net
 namespace: openshift-storage
 labels: {}
 annotations: {}
spec:
 config: '{ "cniVersion": "0.3.1", "type": "ipvlan", "master": "br-ex", "mode": "bridge", "ipam": { "type": "whereabouts", "range": "192.168.20.0/24" } }'
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: cluster-net
 namespace: openshift-storage
 labels: {}
 annotations: {}
spec:
 config: '{ "cniVersion": "0.3.1", "type": "ipvlan", "master": "br-ex", "ipam": { "type": "whereabouts", "range": "192.168.30.0/24" } }'
**************************************************************

5.Create Storage cluster

6.Check storagecluster status:
$ oc get storagecluster
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   73m   Progressing              2023-05-16T09:11:46Z   4.13.0

Status:
  Conditions:
    Last Heartbeat Time:   2023-05-16T09:11:48Z
    Last Transition Time:  2023-05-16T09:11:48Z
    Message:               Version check successful
    Reason:                VersionMatched
    Status:                False
    Type:                  VersionMismatch
    Last Heartbeat Time:   2023-05-16T10:19:17Z
    Last Transition Time:  2023-05-16T09:11:48Z
    Message:               Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd]
    Reason:                ReconcileFailed
    Status:                False


Actual results:
StorageCluster stuck on Progressing state

Expected results:
StorageCluster on Ready state

Additional info:
For more info:
https://url.corp.redhat.com/1e60de7

OCS MG:
http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2207601.tar.gz

Comment 3 Oded 2023-05-17 11:17:52 UTC
With this NAD configuration, the storagecluster moved to Ready state

---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: public-net
 namespace: openshift-storage
 labels: {}
 annotations: {}
spec:
 config: '{ "cniVersion": "0.3.1", "type": "ipvlan", "master": "br-ex", "ipam": { "type": "whereabouts", "range": "192.168.20.0/24" } }'
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: cluster-net
 namespace: openshift-storage
 labels: {}
 annotations: {}
spec:
 config: '{ "cniVersion": "0.3.1", "type": "ipvlan", "master": "br-ex", "ipam": { "type": "whereabouts", "range": "192.168.30.0/24" } }'


$ oc get storageclusters.ocs.openshift.io 
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   26m   Ready              2023-05-17T10:46:46Z   4.13.0


$ oc get pod rook-ceph-osd-0-775f679f76-d2b8r -o yaml | more
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.129.2.24/23"],"mac_address":"0a:58:0a:81:02:18","gateway_ips":["10.129.2.1"],"ip_address":"10.129.2.24/23","gateway_ip":"10.129
.2.1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "10.129.2.24"
          ],
          "mac": "0a:58:0a:81:02:18",
          "default": true,
          "dns": {}
      },{
          "name": "openshift-storage/cluster-net",
          "interface": "net1",
          "ips": [
              "192.168.30.1"
          ],
          "mac": "00:50:56:8f:73:61",
          "dns": {}
      },{
          "name": "openshift-storage/public-net",
          "interface": "net2",
          "ips": [
              "192.168.20.33"
          ],
          "mac": "00:50:56:8f:73:61",
          "dns": {}
      }]


sh-5.1$ ceph osd dump
epoch 275
fsid b9b17293-7fa0-40ab-aa64-251899c9c104
created 2023-05-17T10:48:25.818377+0000
modified 2023-05-17T10:57:22.217710+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 10
full_ratio 0.85
backfillfull_ratio 0.8
nearfull_ratio 0.75
require_min_compat_client luminous
min_compat_client jewel
require_osd_release quincy
stretch_mode_enabled false
pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 74 pgp_num 72 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 275 lfor 0/275/272 flags hashpspool,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd
pool 2 'ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 7 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 262 lfor 0/261/259 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 3 'ocs-storagecluster-cephobjectstore.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 6 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 262 lfor 0/246/244 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 4 'ocs-storagecluster-cephobjectstore.rgw.control' replicated size 3 min_size 2 crush_rule 8 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 262 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 5 'ocs-storagecluster-cephobjectstore.rgw.otp' replicated size 3 min_size 2 crush_rule 5 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 262 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 6 'ocs-storagecluster-cephobjectstore.rgw.meta' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 262 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 7 'ocs-storagecluster-cephobjectstore.rgw.log' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 262 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 8 '.rgw.root' replicated size 3 min_size 2 crush_rule 4 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 262 flags hashpspool stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 9 'ocs-storagecluster-cephfilesystem-metadata' replicated size 3 min_size 2 crush_rule 9 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 129 lfor 0/0/37 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 10 'ocs-storagecluster-cephobjectstore.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 10 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 264 lfor 0/0/37 flags hashpspool stripe_width 0 target_size_ratio 0.49 application rook-ceph-rgw
pool 11 'ocs-storagecluster-cephfilesystem-data0' replicated size 3 min_size 2 crush_rule 11 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 51 lfor 0/0/39 flags hashpspool stripe_width 0 target_size_ratio 0.49 application cephfs
pool 12 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 25 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
max_osd 3
osd.0 up   in  weight 1 up_from 13 up_thru 263 down_at 0 last_clean_interval [0,0) [v2:192.168.20.33:6800/1900524451,v1:192.168.20.33:6801/1900524451] [v2:192.168.30.1:6800/1900524451,v1:192.168.30.1:6801/1900524451] exists,up 6cd5ee55-5444-4425-8e7a-6438102765fd
osd.1 up   in  weight 1 up_from 13 up_thru 266 down_at 0 last_clean_interval [0,0) [v2:192.168.20.34:6800/299980773,v1:192.168.20.34:6801/299980773] [v2:192.168.30.2:6800/299980773,v1:192.168.30.2:6801/299980773] exists,up c36d75a2-2b00-46cd-980e-730184674797
osd.2 up   in  weight 1 up_from 22 up_thru 272 down_at 0 last_clean_interval [0,0) [v2:192.168.20.35:6800/1149546650,v1:192.168.20.35:6801/1149546650] [v2:192.168.30.3:6800/1149546650,v1:192.168.30.3:6801/1149546650] exists,up 6d812b81-37bb-4dda-a136-9c67f2ec667c
blocklist 10.131.2.22:0/4100069071 expires 2023-05-18T10:50:48.144044+0000
blocklist 10.131.2.22:0/1983106250 expires 2023-05-18T10:50:48.144044+0000
blocklist 10.131.2.22:0/1893372324 expires 2023-05-18T10:50:48.144044+0000
blocklist 10.131.2.22:6801/115715351 expires 2023-05-18T10:50:48.144044+0000
blocklist 10.131.2.22:6800/115715351 expires 2023-05-18T10:50:48.144044+0000


sh-5.1$ ceph health
HEALTH_OK