This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2254605 - Get panic error when trying to create an OpenStackControlPlane object [17.1]
Summary: Get panic error when trying to create an OpenStackControlPlane object [17.1]
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: osp-director-operator-container
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: async
: 17.1
Assignee: Andrew Bays
QA Contact:
Irina
URL:
Whiteboard:
Depends On:
Blocks: 2321300
TreeView+ depends on / blocked
 
Reported: 2023-12-14 19:25 UTC by Juan Pablo Marti
Modified: 2024-12-10 19:15 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 2321300 (view as bug list)
Environment:
Last Closed: 2024-12-10 19:14:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   OSP-30756 0 None None None 2024-12-10 19:14:42 UTC
Red Hat Issue Tracker OSP-33184 0 None None None 2024-12-10 19:15:44 UTC

Description Juan Pablo Marti 2023-12-14 19:25:05 UTC
Description of problem:
I'm trying to create an OpenStackControlPlane object with the following YAML file:

~~~
apiVersion: osp-director.openstack.org/v1beta2
kind: OpenStackControlPlane
metadata:
  name: overcloud
  namespace: openstack
spec:
  domainName: overcloud.tlvlab.local
  openStackClientImageURL: 'registry.redhat.io/rhosp-rhel9/openstack-tripleoclient:17.1'
  openStackClientNetworks:
    - ctlplane
    - internal_api
    - external
  openStackClientStorageClass: host-nfs-storageclass
  openStackRelease: '17.1'
  passwordSecret: openstack-root-password
  virtualMachineRoles:
    controller:
      roleName: Controller
`      roleCount: 3
      isTripleoRole: true
      ctlplaneInterface: enp2s0
      cores: 6
      memory: 20
      networks:
        - ctlplane
        - internal_api
        - external
        - tenant
        - storage
        - storage_mgmt
      rootDisk:
        name: root
        diskSize: 50
        baseImageVolumeName: openstack-base-img
        storageClass: host-nfs-storageclass
        storageAccessMode: ReadWriteMany
        storageVolumeMode: Filesystem
~~~

I get the following output:
$ oc create -f openstack-controller.yaml -n openstack
Error from server (InternalError): error when creating "openstack-controller.yaml": Internal error occurred: failed calling webhook "vopenstackcontrolplane.kb.io": failed to call webhook: Post "https://osp-director-operator-controller-manager-service.openstack.svc:4343/validate-osp-director-openstack-org-v1beta2-openstackcontrolplane?timeout=10s": EOF
~~~

In the osp-director-operator-controller-manager-f66c67dbb-jgmmx pod (which runs the service) log I see this panic error: `http: panic serving 10.128.0.2:33178: runtime error: invalid memory address or nil pointer dereference` 

Full output here for reference:

~~~
2023-12-14T18:50:18.337Z INFO controlplane-resource adding network labels: map[ooo-subnetname/ctlplane:true]
2023-12-14T18:50:18.337Z INFO controlplane-resource OpenStackControlPlane overcloud labels set to map[ooo-subnetname/ctlplane:true osnetconfig-ref:openstacknetconfig]
2023-12-14T18:50:18.338Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/mutate-osp-director-openstack-org-v1beta2-openstackcontrolplane", "code": 200, "reason": "", "UID": "47484661-da72-4a8e-817a-e9c05cd40a87", "allowed": true}
2023-12-14T18:50:18.349Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-osp-director-openstack-org-v1beta2-openstackcontrolplane", "UID": "2a4c0ee3-fb2d-49d9-9b0a-05142d804f13", "kind": "osp-director.openstack.org/v1beta2, Kind=OpenStackControlPlane", "resource": {"group":"osp-director.openstack.org","version":"v1beta2","resource":"openstackcontrolplanes"}}
2023-12-14T18:50:18.349Z INFO controlplane-resource validate create {"name": "overcloud"}
2023/12/14 18:50:18 http: panic serving 10.128.0.2:33178: runtime error: invalid memory address or nil pointer dereference
goroutine 231021 [running]:
net/http.(*conn).serve.func1()
/usr/lib/golang/src/net/http/server.go:1850 +0xbf
panic({0x1beea60, 0x319e440})
/usr/lib/golang/src/runtime/panic.go:890 +0x262
github.com/openstack-k8s-operators/osp-director-operator/api/v1beta1.ValidateNetworks({0xc000717100, 0x9}, {0xc001f59860?, 0x6, 0xc0011ba048?})
/remote-source/app/api/v1beta1/common_openstacknet.go:237 +0x1f5
github.com/openstack-k8s-operators/osp-director-operator/api/v1beta2.(*OpenStackControlPlane).ValidateCreate(0xc0006358c0)
/remote-source/app/api/v1beta2/openstackcontrolplane_webhook.go:248 +0x485
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatingHandler).Handle(_, {_, _}, {{{0xc000f292f0, 0x24}, {{0xc001da7e60, 0x1a}, {0xc000716af0, 0x7}, {0xc000e91f38, ...}}, ...}})
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/webhook/admission/validator.go:71 +0x239
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000f292f0, 0x24}, {{0xc001da7e60, 0x1a}, {0xc000716af0, 0x7}, {0xc000e91f38, ...}}, ...}})
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/webhook/admission/webhook.go:169 +0xfd
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc000446680, {0x7fd3b88f9cf8?, 0xc001896b90}, 0xc001fc6900)
/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.1/pkg/webhook/admission/http.go:98 +0xed2
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7fd3b88f9cf8, 0xc001896b90}, 0x21db300?)
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.2/prometheus/promhttp/instrument_server.go:40 +0xd4
net/http.HandlerFunc.ServeHTTP(0x21db378?, {0x7fd3b88f9cf8?, 0xc001896b90?}, 0xc0008d5a68?)
/usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x21db378?, 0xc000826700?}, 0xc001fc6900)
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.2/prometheus/promhttp/instrument_server.go:117 +0xaa
net/http.HandlerFunc.ServeHTTP(0xc0008d59e0?, {0x21db378?, 0xc000826700?}, 0xc0017cb000?)
/usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x21db378, 0xc000826700}, 0xc001fc6900)
/remote-source/deps/gomod/pkg/mod/github.com/prometheus/client_golang.2/prometheus/promhttp/instrument_server.go:84 +0xbf
net/http.HandlerFunc.ServeHTTP(0xc000826700?, {0x21db378?, 0xc000826700?}, 0x1ec5998?)
/usr/lib/golang/src/net/http/server.go:2109 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc0020ea048?, {0x21db378, 0xc000826700}, 0xc001fc6900)
/usr/lib/golang/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0x21cdb80?}, {0x21db378, 0xc000826700}, 0xc001fc6900)
/usr/lib/golang/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc00066bf40, {0x21dc420, 0xc000968780})
/usr/lib/golang/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
/usr/lib/golang/src/net/http/server.go:3102 +0x4db
~~~

Comment 1 Juan Pablo Marti 2023-12-14 19:29:43 UTC
My OpenStackNetConfig was created using this YAML:

~~~
apiVersion: osp-director.openstack.org/v1beta1
kind: OpenStackNetConfig
metadata:
  name: openstacknetconfig
spec:
  attachConfigurations:
    br-osp:
      nodeNetworkConfigurationPolicy:
        nodeSelector:
          node-role.kubernetes.io/worker: ""
        desiredState:
          interfaces:
          - bridge:
              options:
                stp:
                  enabled: false
              port:
              - name: enp2s0
            description: Linux bridge with enp2s0 as a port
            name: br-osp
            state: up
            type: linux-bridge
            mtu: 1500
    br-vlans:
      nodeNetworkConfigurationPolicy:
        nodeSelector:
          node-role.kubernetes.io/worker: ""
        desiredState:
          interfaces:
          - bridge:
              options:
                stp:
                  enabled: false
              port:
              - name: enp3s0
            description: Linux bridge with enp3s0 as a port
            name: br-vlans
            state: up
            type: linux-bridge
            mtu: 1500
  # optional DnsServers list
  dnsServers:
  - 10.47.242.10
  - 10.38.5.26
  # DomainName of the OSP environment
  domainName: overcloud.tlvlab.local
  networks:
  - name: Control
    nameLower: ctlplane
    subnets:
    - name: ctlplane
      ipv4:
        allocationEnd: 192.168.24.250
        allocationStart: 192.168.24.100
        cidr: 192.168.24.0/24
        gateway: 192.168.24.254
      attachConfiguration: br-osp
  - name: Tenant
    nameLower: tenant
    mtu: 1350
    subnets:
    - name: tenant_subnet
      attachConfiguration: br-vlans
      vlan: 101
      ipv4:
        allocationEnd: 172.17.101.250
        allocationStart: 172.17.101.4
        cidr: 172.17.101.0/24
        gateway: 172.17.101.1
  - name: Storage
    nameLower: storage
    mtu: 1350
    subnets:
    - name: storage_subnet
      attachConfiguration: br-vlans
      vlan: 102
      ipv4:
        allocationEnd: 172.17.102.250
        allocationStart: 172.17.102.4
        cidr: 172.17.102.0/24
        gateway: 172.17.102.1
  - name: InternalApi
    nameLower: internal_api
    mtu: 1350
    subnets:
    - name: internal_api_subnet
      attachConfiguration: br-vlans
      vlan: 103
      ipv4:
        allocationEnd: 172.17.103.250
        allocationStart: 172.17.103.4
        cidr: 172.17.103.0/24
        gateway: 172.17.103.1
  - name: StorageMgmt
    nameLower: storage_mgmt
    mtu: 1350
    subnets:
    - name: storage_mgmt_subnet
      attachConfiguration: br-vlans
      vlan: 104
      ipv4:
        allocationEnd: 172.17.104.250
        allocationStart: 172.17.104.4
        cidr: 172.17.104.0/24
        gateway: 172.17.104.1
  - name: External
    nameLower: external
    mtu: 1350
    subnets:
    - name: external_subnet
      attachConfiguration: br-vlans
      vlan: 105
      ipv4:
        allocationEnd: 172.17.200.250
        allocationStart: 172.17.200.4
        cidr: 172.17.200.0/24
        gateway: 172.17.200.1
  reservations:
    controlplane:
      ipReservations:
        ctlplane: 192.168.24.254
        external: 172.17.200.254
        internal_api: 172.17.103.254
        storage: 172.17.102.254
        storage_mgmt: 172.17.104.254
      macReservations: {}
    openstackclient-0:
      ipReservations:
        ctlplane: 192.168.24.253
        external: 172.17.200.253
        internal_api: 172.17.103.253
      macReservations: {}
~~~

After replacing the subnet names with the default values (without the _subnet part) the problem was solved. (Thanks wladek for the help to find out that!)

Although the problem was fixed, the panic error doesn't seem to be giving much information about this. So it should be addressed somehow.

Thanks in advance!

Comment 2 Brendan Shephard 2023-12-15 00:01:24 UTC
The panic is probably because it fails to find a network using the subnet name as a label. So when we try to format the error here:
https://github.com/bshephar/osp-director-operator/blob/master/api/v1beta1/common_openstacknet.go#L237

osnet is nil at that point causing the panic.

As to why the _subnet makes a difference, I'm not sure. Maybe someone from the osp-director-operator team will have some more insights for that. My initial suspicion would be that the OpenStackNet object is created using the nameLower as the label, so it can't find any networks labeled with {{ nameLower }}_subnet.

Comment 4 Andrew Bays 2024-10-03 10:43:40 UTC
Regardless of proper or improper network config, I think we can at least fix the panic.  We are getting an "not found" error here...

https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

...which we've returned from here...

https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

...so we just need to fix this line (as Brendan noted):

https://github.com/openstack-k8s-operators/osp-director-operator/blob/562777c57f39[…]fd8a7ad1568874f136d7d0efb482/api/v1beta1/common_openstacknet.go

We could probably just remove osnet.GetObjectKind().GroupVersionKind().Kind and hardcode OpenStackNet as the Kind, since it is always that anyhow.


Note You need to log in before you can comment on or make changes to this bug.