Created attachment 1650373 [details] upgrade logs Description of problem: CI job DFG-network-networking-ovn-upgrade-13-14_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable installs OSP13 with OVN and then upgrades to OSP14. Job execution #136 failed because VM instances where unreachable during controllers upgrade. While controller nodes where upgraded (script overcloud_upgrade_run-Controller.sh), VM instance did not answer pings during 40 minutes approx: According to the ping logs, the problem starts at GMT: Tuesday, December 24, 2019 16:59:41.739 PM [1577206781.739397] 64 bytes from 10.0.0.222: icmp_seq=2352 ttl=63 time=0.900 ms [1577206792.749430] From 10.0.0.82 icmp_seq=2360 Destination Host Unreachable And the connectivity is recovered at GMT: Tuesday, December 24, 2019 17:42:53.700 PM [1577209370.681500] From 10.0.0.82 icmp_seq=4940 Destination Host Unreachable [1577209373.700228] 64 bytes from 10.0.0.222: icmp_seq=4941 ttl=63 time=2022 ms Information from ovn-controller logs (similar in all controllers): 2019-12-24T16:31:08.207Z|00053|reconnect|INFO|tcp:172.17.1.11:6642: connection closed by peer 2019-12-24T16:31:09.208Z|00054|reconnect|INFO|tcp:172.17.1.11:6642: connecting... 2019-12-24T16:31:09.233Z|00055|reconnect|INFO|tcp:172.17.1.11:6642: connection attempt failed (Connection refused) 2019-12-24T16:31:09.233Z|00056|reconnect|INFO|tcp:172.17.1.11:6642: waiting 2 seconds before reconnect 2019-12-24T16:31:11.234Z|00057|reconnect|INFO|tcp:172.17.1.11:6642: connecting... 2019-12-24T16:31:11.249Z|00058|reconnect|INFO|tcp:172.17.1.11:6642: connection attempt failed (Connection refused) ... 2019-12-24T17:37:29.458Z|00432|reconnect|INFO|tcp:172.17.1.11:6642: waiting 8 seconds before reconnect 2019-12-24T17:37:37.458Z|00433|reconnect|INFO|tcp:172.17.1.11:6642: connecting... 2019-12-24T17:37:37.459Z|00434|reconnect|INFO|tcp:172.17.1.11:6642: connected After OVN reconnection is successful, connectivity to VM instance is recovered (pings are answered again). Version-Release number of selected component (if applicable): PUDDLE_ID=2019-12-13.1 RHOSP_RELEASE='14.0.4 RC (Rocky)' How reproducible: Run CI job https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-upgrade-13-14_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/ During overcloud-upgrade stage, job failed. It was reproduced at #136, but did not fail at #137. Steps to Reproduce: 1. Run CI job 2. 3. Actual results: Job failed and aborted during overcloud upgrade. Expected results: Additional info:
OSP14 is EOL in 3 days and there is new release planned for OSP14.