Bug 1924075
| Summary: | kuryr-controller restart when enablePortPoolsPrepopulation = true | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | rlobillo | ||||||||
| Component: | Networking | Assignee: | Maysa Macedo <mdemaced> | ||||||||
| Networking sub component: | kuryr | QA Contact: | rlobillo | ||||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||||
| Severity: | medium | ||||||||||
| Priority: | medium | CC: | juriarte, mdemaced, mdulko | ||||||||
| Version: | 4.6.z | ||||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | 4.8.0 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2021-07-27 22:37:56 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1929066 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
rlobillo
2021-02-02 14:36:51 UTC
Created attachment 1754821 [details]
error on kuryr_controller when sos_report was taken
Assigning it to me as I started taking a look on this one. Failed on OCP4.8.0-0.nightly-2021-02-21-102854 over OSP13 (2021-01-20.1) with Amphora provider.
Once the kuryrConfig is changed, the kuryr pods remain stable. However, while running tempest tests, kuryr-controller is restarted from time to time with below exception:
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {'type': 'ADDED', 'object': {'apiVersion': 'openstack.org/v1', 'kind': 'KuryrNetwork', 'metadata': {'creationTimestamp': '2021-02-22T14:40:45Z', 'finalizers': ['kuryrnetwork.finalizers.kuryr.openstack.org'], 'generation': 1, 'managedFields': [{'apiVersion': 'openstack.org/v1', 'fieldsType': 'FieldsV1', 'fieldsV1': {'f:metadata': {'f:finalizers': {'.': {}, 'v:"kuryrnetwork.finalizers.kuryr.openstack.org"': {}}}, 'f:spec': {'.': {}, 'f:nsLabels': {}, 'f:nsName': {}, 'f:projectId': {}}}, 'manager': 'python-requests', 'operation': 'Update', 'time': '2021-02-22T14:40:45Z'}], 'name': 'kuryr-namespace-1148497519', 'namespace': 'kuryr-namespace-1148497519', 'resourceVersion': '491628', 'uid': 'd381b3e2-0382-43e3-93c3-f7b4a9f381dd'}, 'spec': {'nsLabels': {}, 'nsName': 'kuryr-namespace-1148497519', 'projectId': '15145598c81a4fe3a49d9b1676bb2b0f'}}}: KeyError: 'status'
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event, *args, **kwargs)
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 80, in __call__
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event, *args, **kwargs)
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 89, in __call__
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging self.on_added(obj)
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrnetwork_population.py", line 45, in on_added
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging subnet_id = kuryrnet_crd['status'].get('subnetId')
2021-02-22 14:40:45.740 1 ERROR kuryr_kubernetes.handlers.logging KeyError: 'status'
This is presumably leading to the observed failures on the kuryr-tempest tests.
The config is:
$ oc get networks.operator.openshift.io cluster -o yaml | yq .spec
{
"clusterNetwork": [
{
"cidr": "10.128.0.0/14",
"hostPrefix": 23
}
],
"defaultNetwork": {
"kuryrConfig": {
"enablePortPoolsPrepopulation": true,
"poolBatchPorts": 3,
"poolMaxPorts": 7,
"poolMinPorts": 1
},
"type": "Kuryr"
},
"disableNetworkDiagnostics": false,
"logLevel": "Normal",
"managementState": "Managed",
"observedConfig": null,
"operatorLogLevel": "Normal",
"serviceNetwork": [
"172.30.0.0/16"
],
"unsupportedConfigOverrides": null
}
To reproduce it, as instance, run below test:
$ python -m testtools.run kuryr_tempest_plugin.tests.scenario.test_network_policy.NetworkPolicyScenario.test_ipblock_network_policy_allow_except
must_gather: http://file.rdu.redhat.com/rlobillo/must_gather_1924075.tgz
sos_report: http://rhos-release.virt.bos.redhat.com/log/bz1924075_2
The issue last reported on comment #5 is unrelated to the issue first reported on this Bugzilla. The new issue is about a race condition of the status field not yet being present on the KuryrNetwork CRD (The crd spec and status are filled in 2 different operations)and the KuryrPrePopulation handler expecting that the CRD already contains it. Verified on OCP4.8.0-0.nightly-2021-04-17-044339 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0) with OVN-Octavia. Configuration change is applied and kuryr-controller remains stable. kuryr-tempest tests passed except test_port_pool which is missing a change on the test. Created attachment 1773297 [details]
kuryr tempest results with the fix
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |