Bug 1878688

Summary: [updates] 0.08% packet loss OSP16 Z updates after l3 connectivity check
Product: Red Hat OpenStack Reporter: Ronnie Rasouli <rrasouli>
Component: openvswitchAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED CURRENTRELEASE QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: high    
Version: 16.0 (Train)CC: apevec, chrisw, jfrancoa, rhos-maint, sathlang
Target Milestone: z3Keywords: TestBlocker, TestOnly, Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-05 12:47:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ronnie Rasouli 2020-09-14 10:56:08 UTC
Description of problem:

Testing Z update of OSP16 GA (IPv4, IPv6 Composable) fail stop l3 agent connectivity check:

STDOUT:

12718 packets transmitted, 12709 received, 0.0707658% packet loss, time 13331ms
rtt min/avg/max/mdev = 0.394/0.897/35.319/0.773 ms
Ping loss higher than 0 seconds detected (9 seconds)


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy RHOS16 GA
2. update the undercloud
3.update the overcloud
4. run ping test after l3 stop

Actual results:

0.0707658% packet loss 
Expected results:
no packet loss

Additional info:

Comment 2 Sofer Athlan-Guyot 2020-09-14 11:55:31 UTC
Hi,

so there is no special treatment for openvswitch in osp16.0.  So it gets updated and certainly create the small cut we see here:

95205:2020-09-11 22:55:07 | TASK [Update all packages] *****************************************************                                                                                                                                                                                                                  
95207:2020-09-11 22:55:07 | changed: [controller-2] => {"changed": true, "msg": "", "rc": 0, "results": ["Installed: iwl2030-firmware-18.168.6.1-95.el8_1.1.noarch", "Installed: ansible-pacemaker-1.0.4-0.20200324121423.5847167.el8ost.noarch", "Installed: iwl3945-firmware-15.32.2.9-95.el8_1.1.noarch", "Installed: iwl51
50-firmware-8.24.2.2-95.el8_1.1.noarch", "Installed: gnutls-utils-3.6.8-9.el8_1.x86_64", "Installed: gnutls-dane-3.6.8-9.el8_1.x86_64", "Installed: bind-export-libs-32:9.11.4-26.P2.el8_1.3.x86_64", "Installed: python3-neutronclient-6.14.0-0.20200221162537.115f60f.el8ost.noarch", "Installed: gnutls-3.6.8-9.el8_1.x86_6
4", "Installed: python3-novaclient-1:15.1.0-0.20200225115308.cd396b8.el8ost.noarch", "Installed: cloud-init-18.5-7.el8_1.2.noarch", "Installed: libvirt-daemon-driver-storage-scsi-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Installed: libvirt-client-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Installed: system
d-udev-239-18.el8_1.7.x86_64", "Installed: rpm-build-4.14.2-26.el8_1.x86_64", "Installed: certmonger-0.79.7-6.el8_1.x86_64", "Installed: systemd-libs-239-18.el8_1.7.x86_64", "Installed: systemd-pam-239-18.el8_1.7.x86_64", "Installed: netcf-libs-0.2.8-12.module+el8.1.1+5309+6d656f05.x86_64", "Installed: systemd-contai
ner-239-18.el8_1.7.x86_64", "Installed: libvirt-daemon-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Installed: python3-openstackclient-4.0.0-0.20200221163859.aa64eb6.el8ost.noarch", "Installed: python3-openstacksdk-0.36.2-0.20200319144116.211bdc6.el8ost.noarch", "Installed: slirp4netns-0.4.2-3.git21fdece.module+el
8.1.1+5657+524a77d7.x86_64", "Installed: python3-os-client-config-1.33.0-0.20200225125413.d0eea17.el8ost.noarch", "Installed: systemd-239-18.el8_1.7.x86_64", "Installed: gdb-headless-8.2-6.el8_0.x86_64", "Installed: python3-os-service-types-1.7.0-0.20200225085439.0b2f473.el8ost.noarch", "Installed: microcode_ctl-4:20
190618-1.20200609.1.el8_1.x86_64", "Installed: libvirt-daemon-driver-interface-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Installed: libvirt-daemon-config-nwfilter-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Installed: libvirt-daemon-driver-storage-gluster-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Inst
alled: dib-utils-0.0.11-0.20200224215429.51661c3.el8ost.noarch", "Installed: python3-osc-lib-1.14.1-0.20200221153720.a0d9746.el8ost.noarch", "Installed: libvirt-daemon-driver-qemu-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Installed: python3-oslo-concurrency-3.30.0-0.20200225090342.610df38.el8ost.noarch", "Insta
lled: python3-oslo-config-2:6.11.2-0.20200221150642.22c286c.el8ost.noarch", "Installed: libiscsi-1.18.0-8.module+el8.1.1+5309+6d656f05.x86_64", "Installed: python3-oslo-context-2.23.0-0.20200225215428.07f068d.el8ost.noarch", "Installed: python3-oslo-i18n-3.24.0-0.20200221144708.91b39bb.el8ost.noarch", "Installed: pyt
hon3-oslo-log-3.44.1-0.20200225215430.3ff497d.el8ost.noarch", "Installed: python3-oslo-messaging-10.2.0-0.20200225104634.b7e9faf.el8ost.noarch", "Installed: python3-oslo-middleware-3.38.1-0.20200225095528.9bae80e.el8ost.noarch", "Installed: libvirt-bash-completion-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Insta
lled: python3-oslo-serialization-2.29.2-0.20200225092603.fa399b6.el8ost.noarch", "Installed: python3-oslo-service-1.40.2-0.20200225103516.a7621c8.el8ost.noarch", "Installed: libnghttp2-1.33.0-3.el8_1.1.x86_64", "Installed: python3-oslo-utils-3.41.5-0.20200310140157.85cd57d.el8ost.noarch", "Installed: grub2-tools-efi-
1:2.02-87.el8_1.x86_64", "Installed: libvirt-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64", "Installed: python3-osprofiler-2.8.2-0.20200221151756.d431c7a.el8ost.noarch", "Installed: grub2-efi-x64-1:2.02-87.el8_1.x86_64", "Installed: kernel-tools-4.18.0-147.24.2.el8_1.x86_64", "Installed: grub2-pc-1:2.02-87.el8_1.x86_
64", "Installed: grub2-pc-modules-1:2.02-87.el8_1.noarch", "Installed: grub2-tools-1:2.02-87.el8_1.x86_64", "Installed: openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch", "Installed: kernel-tools-libs-4.18.0-147.24.2.el8_1.x86_64", "Installed: python3-paunch-5.3.2-0.20200320172310.ebc49c4.el8ost.noarch", 


We can see that openvswitch is updated during the usual yum upgrade task.

The version are updated from:


openvswitch2.11-2.11.0-35.el8
openvswitch2.11-2.11.3-60.el8

and 

openvswitch-2.11-0.6.el8
openvswitch-2.11-0.5.el8

To be positive that's what is causing the issue we could just launch a test run with the backport from 
https://bugzilla.redhat.com/show_bug.cgi?id=1858745 and https://bugzilla.redhat.com/show_bug.cgi?id=1863024
to verify it's the same issue.

Comment 6 Sofer Athlan-Guyot 2020-09-14 16:55:33 UTC
Lowering the priority as we're talking about less than 10seconds cut during all update run tasks.  We really have to ask ourself if that's good enough before avoiding the reload of ovs.

Comment 7 Sofer Athlan-Guyot 2020-09-15 17:42:08 UTC
Hi, 

lowering again,

I've run the test with all four patches and it didn't help.  We got the special openvswitch handling but still the small cut.

The good new is that looking more closely to the ping test It seems that there isn't any issue during openvswitch update.

This is to be a *timing* issue between the start of the instance and its fip availability and the start of the fip test:

See the start of that ping test:

PING 10.0.0.211 (10.0.0.211) 56(84) bytes of data.
[1600116667.740145] 64 bytes from 10.0.0.211: icmp_seq=9 ttl=63 time=2.91 ms

we start at icmp_seq 9, we have lost the first 8 packets.

We don't have any more cut in the file:

grep -v '64 bytes from' ping_results_202009142050.log
PING 10.0.0.211 (10.0.0.211) 56(84) bytes of data.

--- 10.0.0.211 ping statistics ---
9195 packets transmitted, 9187 received, 0.0870038% packet loss, time 9717ms
rtt min/avg/max/mdev = 0.371/0.864/36.029/0.708 ms

I need to verify all jobs but given they have a similar cut time, it's expected to be the same issue.

The solution will be then to wait for the fip to be available before starting the ping logging.

So it will be in the tripleo-upgrade role.  I'll confirm asap.