Bug 2077339

Summary: OpenShift liveness/readiness probes are failing after reboot of an OpenStack compute node
Product: Red Hat OpenStack Reporter: Wilhelm Weber <wweber>
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Eran Kuris <ekuris>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: averdagu, chrisw, jlibosva, mdemaced, scohen, wweber
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-10-26 06:13:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wilhelm Weber 2022-04-21 06:03:50 UTC
Description of problem:

OpenShift liveness/readiness probes are failing after reboot of an OpenStack compute node (and therefore all OpenShift worker/master nodes running on that compute node).

After the OpenShift nodes were up again (on the same OpenStack compute node) all liveness and readiness probes failed until the Pods got deleted/recreated manually.

Tcpdump (before deleting the Pods) shows that packets are leaving the OpenShift node towards the OpenStack Router but nothing is sent to the Pods.

Following log entries can be found in the kuryer-controller Pod after the reboot:

WARNING kuryr_kubernetes.controller.drivers.utils [-] Port <Port ID> is in DOWN status but still associated to a trunk. This should not happen. Trying to delete it from the trunk.: openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://<openstack api>:13696/v2.0/ports/<Port ID>, Port <Port ID> is currently a subport for trunk <Trunk ID>.ESC[00m


Version-Release number of selected component (if applicable):

OpenStack v16.1.7
OpenShift v4.8.27

Comment 8 Red Hat Bugzilla 2023-09-18 04:35:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days