Description of problem: On TLS-Everywhere env., after rebooting of controller node(s), connection to the cirros instance that was created after the reboot, had been refused: [stack@undercloud-0 ~]$ ssh cirros.0.213 sss_ssh_knownhostsproxy: connect to host 10.0.0.213 port 22: Connection refused kex_exchange_identification: Connection closed by remote host Connection closed by UNKNOWN port 65535 The instance was hosted on compute-0. The ovn_metadata agent on this node appeared as unhealthy: [root@compute-0 ~]# podman ps |grep ovn_metadata d1a5e59b6515 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-metadata-agent-ovn:17.0_20220721.1 kolla_start 4 hours ago Up 4 hours ago (unhealthy) ovn_metadata_agent [root@compute-1 ~]# less /var/log/containers/neutron/ovn-metadata-agent.log … 2022-08-02 13:22:04.837 25475 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib64/python3.9/ssl.py", line 1170, in send 2022-08-02 13:22:04.837 25475 ERROR ovsdbapp.backend.ovs_idl.connection raise ValueError( 2022-08-02 13:22:04.837 25475 ERROR ovsdbapp.backend.ovs_idl.connection ValueError: non-zero flags not allowed in calls to send() on <class 'eventlet.green.ssl.GreenSSLSocket'> Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Deploy TLS-e HA Overcloud. 2. Reboot controller that holds OC main VIP (can be found in output of 'pcs status' command on controller node). 3. Boot a vm. 4. Try to ssh to the VM. Actual results: Connection to the VM is refused. ovn_metadata_agent container is in unhealthy state. Expected results: The vm is reachable via ssh. All containers are healthy. Additional info:
Looks like the root cause of this issue is OVS switching from pyOpenSSL to python std library socket module. [1]. Python socket.send[2] does not allow non-zero flag for SSL. Which was ignored in pyOpenSSL send function[3] [1] https://github.com/openvswitch/ovs/commit/68543dd523bd00f53fa7b91777b962ccb22ce679 [2] https://github.com/python/cpython/blob/main/Lib/ssl.py#L1141-L1156 [3] https://github.com/pyca/pyopenssl/blob/38f9b4e524ac6479d57021bba2270df84d85b672/src/OpenSSL/SSL.py#L1844
Patch is posted upstream for review. https://github.com/ovsrobot/ovs/commit/f09a55946cc83583c2e93be632e50f51ea830322
trac team deemed this a GA blocker but not a blocker for beta
Verified: [stack@undercloud-0 ~]$ cat core_puddle_version RHOS-17.0-RHEL-9-20220816.n.2[stack@undercloud-0 ~]$ [root@controller-0 ~]# rpm -qa|grep openvsw openvswitch2.17-2.17.0-32.1.el9fdp.x86_64 After hard reboot (echo b > /proc/sysrq-trigger) of controller-2, ovn-metadata-agents are healthy on both the compute nodes: [heat-admin@compute-0 ~]$ sudo -i [root@compute-0 ~]# podman ps|grep meta 00534cbdb30e undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-metadata-agent-ovn:17.0_20220816.1 kolla_start 23 hours ago Up 23 hours ago (healthy) ovn_metadata_agent [root@compute-1 ~]# podman ps|grep metadata 1a553fa027e7 undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp17-openstack-neutron-metadata-agent-ovn:17.0_20220816.1 kolla_start 23 hours ago Up 23 hours ago (healthy) ovn_metadata_agent [root@compute-1 ~]# ssh connection to the newly created instance succeeded: [Tue Aug 23 12:05:49 AM UTC 2022] Trying to ssh to 10.0.0.161 cirros Instance instance_d1f5085f0e is reachable via 10.0.0.161 Werified by automated tests as well: https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2017.0/view/PidOne/job/DFG-pidone-sanity-17.0_director-rhel-virthost-3cont_2comp_1ipa-ipv4-geneve-ansible-sts-sanity-tls-everywhere/75/artifact/.sh/ansible_sts-ha-tests.log
*** Bug 2114617 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543