Bug 2227776

Summary: bulk_pull error after compute node reboot resulting in timeout
Product: Red Hat OpenStack Reporter: Paul Jany <pgodwin>
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: ASSIGNED --- QA Contact: Eran Kuris <ekuris>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: aruffin, chrisw, jlibosva, ralonsoh, scohen
Target Milestone: ---Flags: skaplons: needinfo? (pgodwin)
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paul Jany 2023-07-31 12:27:22 UTC
Description of problem:
After compute node reboot, there is bulk_pull error and the instances lose connectivity. The suggestions as per bugzilla: 2212348 did not help 

Version-Release number of selected component (if applicable):
13

How reproducible:
The case is re-opened.  
From the suggestions we have provided in Comment-3, customer could only perform increase of rpc_response_timeout and he that did not help. 

Below are the steps customer has performed.
- Reboot of the compute node and bulk pull is observed.
- Increased rpc_response_timeout and  restarted neutron, ovsagent and neutron-sriov containers. That did not help
- Customer has to re-deploy the instances to restore the connectivity.

I have asked for the possibility of upgrade as this is not on the latest version of OSP-13, and I got the reply that "Red Hat OpenStack software" running in custom containers and without OSP Director, whereas the upgrade has to be performed using their custom CVIM software.