Bug 1788518 - VM instances unreachable during upgrade from OSP13 to 14
Summary: VM instances unreachable during upgrade from OSP13 to 14
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Assaf Muller
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-07 12:06 UTC by Eduardo Olivares
Modified: 2020-01-07 14:19 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-07 14:19:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
upgrade logs (992.26 KB, application/gzip)
2020-01-07 12:06 UTC, Eduardo Olivares
no flags Details

Description Eduardo Olivares 2020-01-07 12:06:17 UTC
Created attachment 1650373 [details]
upgrade logs

Description of problem:
CI job DFG-network-networking-ovn-upgrade-13-14_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable installs OSP13 with OVN and then upgrades to OSP14.
Job execution #136 failed because VM instances where unreachable during controllers upgrade.

While controller nodes where upgraded (script overcloud_upgrade_run-Controller.sh), VM instance did not answer pings during 40 minutes approx:
According to the ping logs, the problem starts at GMT: Tuesday, December 24, 2019 16:59:41.739 PM
[1577206781.739397] 64 bytes from 10.0.0.222: icmp_seq=2352 ttl=63 time=0.900 ms
[1577206792.749430] From 10.0.0.82 icmp_seq=2360 Destination Host Unreachable

And the connectivity is recovered at GMT: Tuesday, December 24, 2019 17:42:53.700 PM
[1577209370.681500] From 10.0.0.82 icmp_seq=4940 Destination Host Unreachable
[1577209373.700228] 64 bytes from 10.0.0.222: icmp_seq=4941 ttl=63 time=2022 ms

Information from ovn-controller logs (similar in all controllers):
2019-12-24T16:31:08.207Z|00053|reconnect|INFO|tcp:172.17.1.11:6642: connection closed by peer
2019-12-24T16:31:09.208Z|00054|reconnect|INFO|tcp:172.17.1.11:6642: connecting...
2019-12-24T16:31:09.233Z|00055|reconnect|INFO|tcp:172.17.1.11:6642: connection attempt failed (Connection refused)
2019-12-24T16:31:09.233Z|00056|reconnect|INFO|tcp:172.17.1.11:6642: waiting 2 seconds before reconnect
2019-12-24T16:31:11.234Z|00057|reconnect|INFO|tcp:172.17.1.11:6642: connecting...
2019-12-24T16:31:11.249Z|00058|reconnect|INFO|tcp:172.17.1.11:6642: connection attempt failed (Connection refused)
...
2019-12-24T17:37:29.458Z|00432|reconnect|INFO|tcp:172.17.1.11:6642: waiting 8 seconds before reconnect
2019-12-24T17:37:37.458Z|00433|reconnect|INFO|tcp:172.17.1.11:6642: connecting...
2019-12-24T17:37:37.459Z|00434|reconnect|INFO|tcp:172.17.1.11:6642: connected



After OVN reconnection is successful, connectivity to VM instance is recovered (pings are answered again).


Version-Release number of selected component (if applicable):
PUDDLE_ID=2019-12-13.1
RHOSP_RELEASE='14.0.4 RC (Rocky)'

How reproducible:
Run CI job https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-upgrade-13-14_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/

During overcloud-upgrade stage, job failed.

It was reproduced at #136, but did not fail at #137.


Steps to Reproduce:
1. Run CI job
2.
3.

Actual results:
Job failed and aborted during overcloud upgrade.


Expected results:



Additional info:

Comment 1 Jakub Libosvar 2020-01-07 14:19:32 UTC
OSP14 is EOL in 3 days and there is new release planned for OSP14.


Note You need to log in before you can comment on or make changes to this bug.