Description of problem: Customer has upgraded one of our rhosp from 16.2.1 to 16.2.2 and during the procedure we saw some impact to the VMs running in there. It seems that it happened during the ovn-controller container refresh where at least some of the VMs experienced connection timeouts. It seems that exactly while following the below step was the one that caused the issue and as mentioned, everything auto-recovered after 60-90 seconds. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/keeping_red_hat_openstack_platform_updated/index#proc_updating-ovn-controller-container_updating-overcloud Version-Release number of selected component (if applicable): RHOSP 16.2.1 How reproducible: Upgrade from RHOSP 16.2.1 to 16.2.2 Steps to Reproduce: 1. Upgrade 16.2.1 to 16.2.2 Actual results: Upgraded successfully but there VM connectivity drop for 60 - 90 seconds while data plan upgrade. Expected results: successful upgrade without any downtime. Additional info:
Hi Ujey, I confirmed the correct process was followed regarding updating the OVN controller on the compute nodes first before controllers: Compute: StartedAt": "2022-05-18T10:58:15.253471631Z Controller: StartedAt": "2022-05-18T20:22:58.251640909Z I also confirmed the container version match with both compute and controllers using version 16.2.2-15.1651564647[1] However it seems we are missing the relevant logs for OVS and OVN in the sos reports, for example: OVS: 2022-05-19T03:32:01.810Z|01027|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log OVN controller: 2022-05-19T00:22:14.527Z|00014|pinctrl(ovn_pinctrl0)|INFO|DHCPACK $MAC $IP Even messages starts at `May 18 18:19:27` From the start date of ovn_controller on the compute node and what the customer has told us we should assume the issue was around 2022-05-18T10:58:15 I assume the customer hasn't provided the full rotated logs from the system, could you please confirm this, maybe I have missed something. If we don't have the logs from the incident please check with the customer, they may not have been rotated off the server and we could capture them in a tarball of /var/log/ [1]https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-ovn-northd/5de6c2b4d70cc51644a57382?architecture=amd64&tag=16.2.2-15.1651564647&push_date=1651858373000
Hello, Customer has attached the ovs-vswitchd logs from that specific compute node on the case also ovn-controller logs from that specific compute are already attached in this case. So please check the attached logs and share your findings. Also, Cu concern is that "we will have this same issue on the next minor update so what we are trying to confirm for now is that using one of your reference rhosp deployments(or a lab one) you still don't see any impact during a minor update to the data plane when running that intermediate step of refreshing the ovn-controllers." Thanks, Ujey J
Hi Jakub, Please find the answer for your query below what you mean by "It seems that it happened during the ovn-controller container refresh" Ans: customer has mentioned that after doing the below steps on this document they have faced the issue https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/keeping_red_hat_openstack_platform_updated/index#proc_updating-ovn-controller-container_updating-overcloud Do you have rough timestamps in which the downtime occurred? Ans: the impact time is: 10:57-10:58 UTC 18.05.2022 Were whole logs asked in comment 1 provided? Unfortunately, because some logs rotated out they will not able to provide full logs but they shared ovs-vswitchd logs from that specific compute on case itself and OVN controller logs are attached earlier on May 25. The customer ticket mentions "upgrade" but the procedure they did was an "update" from 16.2.1 to 16.2.2, is that correct? Ans: Yes Cu has done the update from 16.2.1 to 16.2.2 The customer actively wants to know the root cause or the customer would like to know if they can fix the documentation as was not referenced in the downtime that they experienced. Could you please look into this and provide some updates. Thanks, Ujey J
*** Bug 2127166 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8794
Hi, Follow up bz for update there https://bugzilla.redhat.com/show_bug.cgi?id=2151958.