Bug 2094265
| Summary: | Data plane disruption during update from 16.2.1, 16.2.0, or any 16.1 release to 16.2.2 or later in ML2/OVN deployments | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Ujey J <ujj> | |
| Component: | openstack-tripleo-heat-templates | Assignee: | Terry Wilson <twilson> | |
| Status: | CLOSED ERRATA | QA Contact: | Fiorella Yanac <fyanac> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 16.2 (Train) | CC: | apevec, astupnik, averdagu, dalvarez, ebarrera, fwissing, fyanac, gfidente, jamsmith, jlibosva, jveiraca, lbezdick, ldenny, lhh, ltamagno, majopela, matteo.panella, mburns, mmichels, mtomaska, pratshar, sathlang, scohen, stchen, tvignaud, twilson | |
| Target Milestone: | z4 | Keywords: | Triaged | |
| Target Release: | 16.2 (Train on RHEL 8.4) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-tripleo-heat-templates-11.6.1-2.20221010235135.el8ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2141873 (view as bug list) | Environment: | ||
| Last Closed: | 2022-12-07 19:23:13 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2089416, 2141873 | |||
| Bug Blocks: | ||||
|
Description
Ujey J
2022-06-07 10:14:27 UTC
Hi Ujey, I confirmed the correct process was followed regarding updating the OVN controller on the compute nodes first before controllers: Compute: StartedAt": "2022-05-18T10:58:15.253471631Z Controller: StartedAt": "2022-05-18T20:22:58.251640909Z I also confirmed the container version match with both compute and controllers using version 16.2.2-15.1651564647[1] However it seems we are missing the relevant logs for OVS and OVN in the sos reports, for example: OVS: 2022-05-19T03:32:01.810Z|01027|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log OVN controller: 2022-05-19T00:22:14.527Z|00014|pinctrl(ovn_pinctrl0)|INFO|DHCPACK $MAC $IP Even messages starts at `May 18 18:19:27` From the start date of ovn_controller on the compute node and what the customer has told us we should assume the issue was around 2022-05-18T10:58:15 I assume the customer hasn't provided the full rotated logs from the system, could you please confirm this, maybe I have missed something. If we don't have the logs from the incident please check with the customer, they may not have been rotated off the server and we could capture them in a tarball of /var/log/ [1]https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-ovn-northd/5de6c2b4d70cc51644a57382?architecture=amd64&tag=16.2.2-15.1651564647&push_date=1651858373000 Hello, Customer has attached the ovs-vswitchd logs from that specific compute node on the case also ovn-controller logs from that specific compute are already attached in this case. So please check the attached logs and share your findings. Also, Cu concern is that "we will have this same issue on the next minor update so what we are trying to confirm for now is that using one of your reference rhosp deployments(or a lab one) you still don't see any impact during a minor update to the data plane when running that intermediate step of refreshing the ovn-controllers." Thanks, Ujey J Hi Jakub, Please find the answer for your query below what you mean by "It seems that it happened during the ovn-controller container refresh" Ans: customer has mentioned that after doing the below steps on this document they have faced the issue https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html-single/keeping_red_hat_openstack_platform_updated/index#proc_updating-ovn-controller-container_updating-overcloud Do you have rough timestamps in which the downtime occurred? Ans: the impact time is: 10:57-10:58 UTC 18.05.2022 Were whole logs asked in comment 1 provided? Unfortunately, because some logs rotated out they will not able to provide full logs but they shared ovs-vswitchd logs from that specific compute on case itself and OVN controller logs are attached earlier on May 25. The customer ticket mentions "upgrade" but the procedure they did was an "update" from 16.2.1 to 16.2.2, is that correct? Ans: Yes Cu has done the update from 16.2.1 to 16.2.2 The customer actively wants to know the root cause or the customer would like to know if they can fix the documentation as was not referenced in the downtime that they experienced. Could you please look into this and provide some updates. Thanks, Ujey J *** Bug 2127166 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8794 Hi, Follow up bz for update there https://bugzilla.redhat.com/show_bug.cgi?id=2151958. |