Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2254605

Summary: Get panic error when trying to create an OpenStackControlPlane object [17.1]
Product: Red Hat OpenStack Reporter: Juan Pablo Marti <jmarti>
Component: osp-director-operator-containerAssignee: Andrew Bays <abays>
Status: CLOSED MIGRATED QA Contact:
Severity: medium Docs Contact: Irina <igallagh>
Priority: low    
Version: 17.1 (Wallaby)CC: abays, bshephar, jmarti, jschluet, lmadsen, mschuppe, vwalek
Target Milestone: asyncKeywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 2321300 (view as bug list) Environment:
Last Closed: 2024-12-10 19:14:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2321300    

Description Juan Pablo Marti 2023-12-14 19:25:05 UTC
Description of problem:
I'm trying to create an OpenStackControlPlane object with the following YAML file:

~~~
apiVersion: osp-director.openstack.org/v1beta2
kind: OpenStackControlPlane
metadata:
  name: overcloud
  namespace: openstack
spec:
  domainName: overcloud.tlvlab.local
  openStackClientImageURL: 'registry.redhat.io/rhosp-rhel9/openstack-tripleoclient:17.1'
  openStackClientNetworks:
    - ctlplane
    - internal_api
    - external
  openStackClientStorageClass: host-nfs-storageclass
  openStackRelease: '17.1'
  passwordSecret: openstack-root-password
  virtualMachineRoles:
    controller:
      roleName: Controller
`      roleCount: 3
      isTripleoRole: true
      ctlplaneInterface: enp2s0
      cores: 6
      memory: 20
      networks:
        - ctlplane
        - internal_api
        - external
        - tenant
        - storage
        - storage_mgmt
      rootDisk:
        name: root
        diskSize: 50
        baseImageVolumeName: openstack-base-img
        storageClass: host-nfs-storageclass
        storageAccessMode: ReadWriteMany
        storageVolumeMode: Filesystem
~~~

I get the following output:
$ oc create -f openstack-controller.yaml -n openstack
Error from server (InternalError): error when creating "openstack-controller.yaml": Internal error occurred: failed calling webhook "vopenstackcontrolplane.kb.io": failed to call webhook: Post "https://osp-director-operator-controller-manager-service.openstack.svc:4343/validate-osp-director-openstack-org-v1beta2-openstackcontrolplane?timeout=10s": EOF
~~~

In the osp-director-operator-controller-manager-f66c67dbb-jgmmx pod (which runs the service) log I see this panic error: `http: panic serving 10.128.0.2:33178: runtime error: invalid memory address or nil pointer dereference` 

Full output here for reference:

~~~
2023-12-14T18:50:18.337Z INFO controlplane-resource adding network labels: map[ooo-subnetname/ctlplane:true]
2023-12-14T18:50:18.337Z INFO controlplane-resource OpenStackControlPlane overcloud labels set to map[ooo-subnetname/ctlplane:true osnetconfig-ref:openstacknetconfig]
2023-12-14T18:50:18.338Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-osp-director-openstack-org-v1beta2-openstackcontrolplane", "code": 200, "reason": "", "UID": "47484661-da72-4a8e-817a-e9c05cd40a87", "allowed": true}
2023-12-14T18:50:18.349Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-osp-director-openstack-org-v1beta2-openstackcontrolplane", "UID": "2a4c0ee3-fb2d-49d9-9b0a-05142d804f13", "kind": "osp-director.openstack.org/v1beta2, Kind=OpenStackControlPlane", "resource": {"group":"osp-director.openstack.org","version":"v1beta2","resource":"openstackcontrolplanes"}}
2023-12-14T18:50:18.349Z INFO controlplane-resource validate create {"name": "overcloud"}
2023/12/14 18:50:18 http: panic serving 10.128.0.2:33178: runtime error: invalid memory address or nil pointer dereference
goroutine 231021 [running]:
net/http.(*conn).serve.func1()
/usr/lib/golang/src/net/http/server.go:1850 +0xbf
panic({0x1beea60, 0x319e440})
/usr/lib/golang/src/runtime/panic.go:890 +0x262
github.com/openstack-k8s-operators/osp-director-operator/api/v1beta1.ValidateNetworks({0xc000717100, 0x9}, {0xc001f59860?, 0x6, 0xc0011ba048?})
/remote-source/app/api/v1beta1/common_openstacknet.go:237 +0x1f5
github.com/openstack-k8s-operators/osp-director-operator/api/v1beta2.(*OpenStackControlPlane).ValidateCreate(0xc0006358c0)
/remote-source/app/api/v1beta2/openstackcontrolplane_webhook.go:248 +0x485
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatingHandler).Handle(_, {_, _}, {{{0xc000f292f0, 0x24}, {{0xc001da7e60, 0x1a}, {0xc000716af0, 0x7}, {0xc000e91f38, ...}}, ...}})
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/webhook/admission/validator.go:71 +0x239
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000f292f0, 0x24}, {{0xc001da7e60, 0x1a}, {0xc000716af0, 0x7}, {0xc000e91f38, ...}}, ...}})
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/webhook/admission/webhook.go:169 +0xfd
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc000446680, {0x7fd3b88f9cf8?, 0xc001896b90}, 0xc001fc6900)
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/webhook/admission/http.go:98 +0xed2
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7fd3b88f9cf8, 0xc001896b90}, 0x21db300?)
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.2/prometheus/promhttp/instrument_server.go:40 +0xd4
net/http.HandlerFunc.ServeHTTP(0x21db378?, {0x7fd3b88f9cf8?, 0xc001896b90?}, 0xc0008d5a68?)
/usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x21db378?, 0xc000826700?}, 0xc001fc6900)
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.2/prometheus/promhttp/instrument_server.go:117 +0xaa
net/http.HandlerFunc.ServeHTTP(0xc0008d59e0?, {0x21db378?, 0xc000826700?}, 0xc0017cb000?)
/usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x21db378, 0xc000826700}, 0xc001fc6900)
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.2/prometheus/promhttp/instrument_server.go:84 +0xbf
net/http.HandlerFunc.ServeHTTP(0xc000826700?, {0x21db378?, 0xc000826700?}, 0x1ec5998?)
/usr/lib/golang/src/net/http/server.go:2109 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc0020ea048?, {0x21db378, 0xc000826700}, 0xc001fc6900)
/usr/lib/golang/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0x21cdb80?}, {0x21db378, 0xc000826700}, 0xc001fc6900)
/usr/lib/golang/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc00066bf40, {0x21dc420, 0xc000968780})
/usr/lib/golang/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
/usr/lib/golang/src/net/http/server.go:3102 +0x4db
~~~

Comment 1 Juan Pablo Marti 2023-12-14 19:29:43 UTC
My OpenStackNetConfig was created using this YAML:

~~~
apiVersion: osp-director.openstack.org/v1beta1
kind: OpenStackNetConfig
metadata:
  name: openstacknetconfig
spec:
  attachConfigurations:
    br-osp:
      nodeNetworkConfigurationPolicy:
        nodeSelector:
          node-role.kubernetes.io/worker: ""
        desiredState:
          interfaces:
          - bridge:
              options:
                stp:
                  enabled: false
              port:
              - name: enp2s0
            description: Linux bridge with enp2s0 as a port
            name: br-osp
            state: up
            type: linux-bridge
            mtu: 1500
    br-vlans:
      nodeNetworkConfigurationPolicy:
        nodeSelector:
          node-role.kubernetes.io/worker: ""
        desiredState:
          interfaces:
          - bridge:
              options:
                stp:
                  enabled: false
              port:
              - name: enp3s0
            description: Linux bridge with enp3s0 as a port
            name: br-vlans
            state: up
            type: linux-bridge
            mtu: 1500
  # optional DnsServers list
  dnsServers:
  - 10.47.242.10
  - 10.38.5.26
  # DomainName of the OSP environment
  domainName: overcloud.tlvlab.local
  networks:
  - name: Control
    nameLower: ctlplane
    subnets:
    - name: ctlplane
      ipv4:
        allocationEnd: 192.168.24.250
        allocationStart: 192.168.24.100
        cidr: 192.168.24.0/24
        gateway: 192.168.24.254
      attachConfiguration: br-osp
  - name: Tenant
    nameLower: tenant
    mtu: 1350
    subnets:
    - name: tenant_subnet
      attachConfiguration: br-vlans
      vlan: 101
      ipv4:
        allocationEnd: 172.17.101.250
        allocationStart: 172.17.101.4
        cidr: 172.17.101.0/24
        gateway: 172.17.101.1
  - name: Storage
    nameLower: storage
    mtu: 1350
    subnets:
    - name: storage_subnet
      attachConfiguration: br-vlans
      vlan: 102
      ipv4:
        allocationEnd: 172.17.102.250
        allocationStart: 172.17.102.4
        cidr: 172.17.102.0/24
        gateway: 172.17.102.1
  - name: InternalApi
    nameLower: internal_api
    mtu: 1350
    subnets:
    - name: internal_api_subnet
      attachConfiguration: br-vlans
      vlan: 103
      ipv4:
        allocationEnd: 172.17.103.250
        allocationStart: 172.17.103.4
        cidr: 172.17.103.0/24
        gateway: 172.17.103.1
  - name: StorageMgmt
    nameLower: storage_mgmt
    mtu: 1350
    subnets:
    - name: storage_mgmt_subnet
      attachConfiguration: br-vlans
      vlan: 104
      ipv4:
        allocationEnd: 172.17.104.250
        allocationStart: 172.17.104.4
        cidr: 172.17.104.0/24
        gateway: 172.17.104.1
  - name: External
    nameLower: external
    mtu: 1350
    subnets:
    - name: external_subnet
      attachConfiguration: br-vlans
      vlan: 105
      ipv4:
        allocationEnd: 172.17.200.250
        allocationStart: 172.17.200.4
        cidr: 172.17.200.0/24
        gateway: 172.17.200.1
  reservations:
    controlplane:
      ipReservations:
        ctlplane: 192.168.24.254
        external: 172.17.200.254
        internal_api: 172.17.103.254
        storage: 172.17.102.254
        storage_mgmt: 172.17.104.254
      macReservations: {}
    openstackclient-0:
      ipReservations:
        ctlplane: 192.168.24.253
        external: 172.17.200.253
        internal_api: 172.17.103.253
      macReservations: {}
~~~

After replacing the subnet names with the default values (without the _subnet part) the problem was solved. (Thanks wladek for the help to find out that!)

Although the problem was fixed, the panic error doesn't seem to be giving much information about this. So it should be addressed somehow.

Thanks in advance!

Comment 2 Brendan Shephard 2023-12-15 00:01:24 UTC
The panic is probably because it fails to find a network using the subnet name as a label. So when we try to format the error here:
https://github.com/bshephar/osp-director-operator/blob/master/api/v1beta1/common_openstacknet.go#L237

osnet is nil at that point causing the panic.

As to why the _subnet makes a difference, I'm not sure. Maybe someone from the osp-director-operator team will have some more insights for that. My initial suspicion would be that the OpenStackNet object is created using the nameLower as the label, so it can't find any networks labeled with {{ nameLower }}_subnet.

Comment 4 Andrew Bays 2024-10-03 10:43:40 UTC
Regardless of proper or improper network config, I think we can at least fix the panic.  We are getting an "not found" error here...

https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

...which we've returned from here...

https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

...so we just need to fix this line (as Brendan noted):

https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

We could probably just remove osnet.GetObjectKind().GroupVersionKind().Kind and hardcode OpenStackNet as the Kind, since it is always that anyhow.