Description of problem: We're seeing that in the gate at the moment, the test_update_network_policy is failing very often with: Traceback (most recent call last): File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/kuryr_tempest_plugin/tests/scenario/base_network_policy.py", line 259, in test_update_network_policy self.assertIsNotNone(crd_pod_selector) File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/testtools/testcase.py", line 439, in assertIsNotNone self.assertThat(observed, matcher, message) File "/opt/stack/tempest/.tox/tempest/lib/python3.6/site-packages/testtools/testcase.py", line 502, in assertThat raise mismatch_error testtools.matchers._impl.MismatchError: None matches Is(None) And the culprit is this circling around in the logs until it times out: 2020-08-04 20:33:36.495 1 DEBUG kuryr_kubernetes.controller.drivers.lbaasv2 [-] KuryrLoadBalancer for service default/kuryr-service-1926940643 not populated yet. update_lbaas_sg /usr/local/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py:769[00m 2020-08-04 20:33:36.495 1 DEBUG kuryr_kubernetes.handlers.retry [-] Handler KuryrNetworkPolicyHandler failed (attempt 2; ResourceNotReady: Resource not ready: 'kuryr-service-1926940643') _sleep /usr/local/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py:101[00m Point is - the handler is technically right - that LB is not yet provisioned, yet we wait to apply it the SG. This should not be necessary - when members will get added, they will have the correct SG rules applied. Version-Release number of selected component (if applicable): How reproducible: Only on envs with very slow Octavia Amphora creations, probably never happens on proper OSP envs, we've only seen it on OpenStack gates. Steps to Reproduce: 1. Create a service. 2. While loadbalancer for that service is still created try creating a NetworkPolicy. Actual results: KuryrNetworkPolicy CRD doesn't get status.podSelector populated until that LB is created (or never if LB creation took over several minutes). Expected results: KuryrNetworkPolicy CRD gets status.podSelector populated correctly even before the LB is up. Additional info:
Verified on OCP4.6.0-0.nightly-2020-09-07-224533 over OSP13 2020-09-03.2 with Amphora provider. Loadbalancer and NP are created sequentially and kuryrloadbalancer resource is getting populated before the loadbalancer is on ACTIVE operating_status.NP is fully operational once the loadbalancer become ready. 1. Create environment: $ oc new-project test $ oc run --image kuryr/demo demo-allowed-caller $ oc run --image kuryr/demo demo-caller $ oc run --image kuryr/demo demo $ $ cat np_resource.yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: np spec: podSelector: matchLabels: run: demo ingress: - from: - podSelector: matchLabels: run: demo-allowed-caller 2. Create service and apply NP on it: $ oc expose pod/demo --port 80 --target-port 8080 && sleep 1 && oc apply -f np_resource.yaml service/demo exposed networkpolicy.networking.k8s.io/np created 3. Loadbalancer pending to be created and kuryrloadbalancer shows correct status: (shiftstack) [stack@undercloud-0 ~]$ . overcloudrc && openstack loadbalancer show test/demo +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | admin_state_up | True | | created_at | 2020-09-09T09:49:38 | | description | openshiftClusterID=ostest-jhtjg | | flavor | | | id | 4397f78d-6694-4aea-986c-5e8b4826ec32 | | listeners | | | name | test/demo | | operating_status | OFFLINE | | pools | | | project_id | abf184ea0ec84b70ab13de3bfd1ed0cc | | provider | octavia | | provisioning_status | PENDING_CREATE | | updated_at | None | | vip_address | 172.30.97.133 | | vip_network_id | 62d90e55-a172-4da0-8366-0348bbdf88e6 | | vip_port_id | 6e73ae1a-7da5-443e-ae59-276459f91c43 | | vip_qos_policy_id | None | | vip_subnet_id | d164c66c-e705-4200-a5bb-6243d4bd5f9e | +---------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ oc get knp -n test np -o json | jq ".status.podSelector" { "matchLabels": { "run": "demo" } } (overcloud) [stack@undercloud-0 ~]$ . overcloudrc && openstack loadbalancer show test/demo +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | admin_state_up | True | | created_at | 2020-09-09T09:49:38 | | description | openshiftClusterID=ostest-jhtjg | | flavor | | | id | 4397f78d-6694-4aea-986c-5e8b4826ec32 | | listeners | | | name | test/demo | | operating_status | OFFLINE | | pools | | | project_id | abf184ea0ec84b70ab13de3bfd1ed0cc | | provider | octavia | | provisioning_status | PENDING_CREATE | | updated_at | None | | vip_address | 172.30.97.133 | | vip_network_id | 62d90e55-a172-4da0-8366-0348bbdf88e6 | | vip_port_id | 6e73ae1a-7da5-443e-ae59-276459f91c43 | | vip_qos_policy_id | None | | vip_subnet_id | d164c66c-e705-4200-a5bb-6243d4bd5f9e | +---------------------+--------------------------------------+ (overcloud) [stack@undercloud-0 ~]$ . overcloudrc && openstack loadbalancer show test/demo +---------------------+--------------------------------------+ | Field | Value | +---------------------+--------------------------------------+ | admin_state_up | True | | created_at | 2020-09-09T09:49:38 | | description | openshiftClusterID=ostest-jhtjg | | flavor | | | id | 4397f78d-6694-4aea-986c-5e8b4826ec32 | | listeners | e30985ec-193a-4231-bbfc-97ff2c6fb4d1 | | name | test/demo | | operating_status | ONLINE | | pools | 5c4aefe2-3d1d-42ac-b159-c3c08de98956 | | project_id | abf184ea0ec84b70ab13de3bfd1ed0cc | | provider | octavia | | provisioning_status | ACTIVE | | updated_at | 2020-09-09T09:51:05 | | vip_address | 172.30.97.133 | | vip_network_id | 62d90e55-a172-4da0-8366-0348bbdf88e6 | | vip_port_id | 6e73ae1a-7da5-443e-ae59-276459f91c43 | | vip_qos_policy_id | None | | vip_subnet_id | d164c66c-e705-4200-a5bb-6243d4bd5f9e | +---------------------+--------------------------------------+ 4. NP is correctly applied on the service: (overcloud) [stack@undercloud-0 ~]$ oc get all NAME READY STATUS RESTARTS AGE pod/demo 1/1 Running 0 11m pod/demo-allowed-caller 1/1 Running 0 3m41s pod/demo-caller 1/1 Running 0 11m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/demo ClusterIP 172.30.97.133 <none> 80/TCP 2m38s (overcloud) [stack@undercloud-0 ~]$ oc rsh pod/demo-allowed-caller curl 172.30.97.133 demo: HELLO! I AM ALIVE!!! (overcloud) [stack@undercloud-0 ~]$ oc rsh pod/demo-caller curl 172.30.97.133 ^Ccommand terminated with exit code 130
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196