Bug 1964399

Summary: OCP installation with kuryr failure - network operator degraded
Product: Red Hat OpenStack Reporter: Kamil Sambor <ksambor>
Component: python-networking-ovnAssignee: Kamil Sambor <ksambor>
Status: CLOSED ERRATA QA Contact: rlobillo
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.2 (Train)CC: apevec, chrisw, ealcaniz, eduen, ekuris, eolivare, jjoyce, j.thadden, juriarte, ksambor, lhh, ltomasbo, majopela, mdemaced, mdulko, mharri, oblaut, pgrist, rlobillo, scohen, shrjoshi, spower
Target Milestone: AlphaKeywords: AutomationBlocker, Regression, Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-networking-ovn-7.4.2-2.20210601204817.el8ost.4 Doc Type: If docs needed, set a value
Doc Text:
-
Story Points: ---
Clone Of: 1937851 Environment:
Last Closed: 2021-09-15 07:15:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 17 rlobillo 2021-07-01 11:05:05 UTC
Verified on RHOS-16.2-RHEL-8-20210623.n.1.

Installing 4.8.0-rc.0 which contains https://github.com/openshift/kuryr-kubernetes/pull/521 so we look for the log line on kuryr-controller pod to identify if the issue has happened:


#!/bin/bash
export KUBECONFIG=/home/stack/ostest/auth/kubeconfig
while(true);
do
  date -u
  oc get pods -n openshift-kuryr -l app=kuryr-controller -o NAME
  echo "trunk ports down:"
  source ~/shiftstackrc && openstack port list --device-owner "trunk:subport" -f value | grep DOWN
  echo "issue detected on kuryr-controller:"
  RESULT=$(oc logs -n openshift-kuryr $(oc get pods -n openshift-kuryr -l app=kuryr-controller -o NAME) | grep "This is a Neutron issue")
  if [[ $RESULT -eq 1 ]]
  then
          echo $RESULT
  fi
  echo "####"
  echo
  sleep 5
done

The issue (w/o fix) was observed randomly on hybrid setup and on DSAL box. Therefore, we tested in both setups for verification (w fix) and the problem was not hit over several OCP installations in loop:

- on DSAL box with RHOS-16.2-RHEL-8-20210623.n.1, we run 65 successful installations and neither the log line on kuryr controller or hung trunk:subports were detected during installation. (Logs on http://file.rdu.redhat.com/rlobillo/monitor_BZ1964399.log.gz)
- on Hybrid setup (titan26) we run 10 successful installations and neither the log line on kuryr controller or hung trunk:subports were detected (logs on http://file.rdu.redhat.com/rlobillo/hybrid_monitor_BZ1964399.tgz).

Comment 21 errata-xmlrpc 2021-09-15 07:15:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:3483