Bug 1823178
Summary: | [OSP16.1] Openvswitch segfaults multiple times causing API failures | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Roman Safronov <rsafrono> |
Component: | openvswitch | Assignee: | RHOS Maint <rhos-maint> |
Status: | CLOSED DUPLICATE | QA Contact: | nlevinki <nlevinki> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 16.1 (Train) | CC: | apevec, chrisw, jjoyce, jschluet, lmiccini, michele, rhos-maint, slinaber, tvignaud |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-04-16 12:16:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Roman Safronov
2020-04-12 11:41:40 UTC
[root@controller-0 mysql]# podman exec -it galera-bundle-podman-0 rpm -qa | grep maria mariadb-common-10.3.17-1.module+el8.1.0+3974+90eded84.x86_64 mariadb-connector-c-3.0.7-1.el8.x86_64 mariadb-errmsg-10.3.17-1.module+el8.1.0+3974+90eded84.x86_64 mariadb-server-utils-10.3.17-1.module+el8.1.0+3974+90eded84.x86_64 mariadb-server-galera-10.3.17-1.module+el8.1.0+3974+90eded84.x86_64 mariadb-connector-c-config-3.0.7-1.el8.noarch mariadb-10.3.17-1.module+el8.1.0+3974+90eded84.x86_64 mariadb-server-10.3.17-1.module+el8.1.0+3974+90eded84.x86_64 mariadb-backup-10.3.17-1.module+el8.1.0+3974+90eded84.x86_64 Hi Roman, fyi the logs in the sosreports do not cover the timestamps in the description so I am not 100% sure I am looking at the right stuff, here what happens on controller-0: Apr 9 18:45:21 controller-0 corosync[36483]: [KNET ] link: host: 3 link: 0 is down Apr 9 18:45:21 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:45:21 controller-0 corosync[36483]: [KNET ] host: host: 3 has no active links Apr 9 18:45:22 controller-0 corosync[36483]: [TOTEM ] Token has not been received in 844 ms Apr 9 18:45:23 controller-0 corosync[36483]: [KNET ] link: host: 2 link: 0 is down Apr 9 18:45:23 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:45:23 controller-0 corosync[36483]: [KNET ] host: host: 2 has no active links Apr 9 18:45:23 controller-0 corosync[36483]: [TOTEM ] A processor failed, forming new configuration. Apr 9 18:45:25 controller-0 corosync[36483]: [TOTEM ] A new membership (1.d) was formed. Members left: 2 3 Apr 9 18:45:25 controller-0 corosync[36483]: [TOTEM ] Failed to receive the leave message. failed: 2 3 Apr 9 18:45:26 controller-0 corosync[36483]: [KNET ] rx: host: 3 link: 0 is up Apr 9 18:45:26 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:45:26 controller-0 corosync[36483]: [KNET ] rx: host: 2 link: 0 is up Apr 9 18:45:26 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:45:28 controller-0 corosync[36483]: [KNET ] link: host: 3 link: 0 is down Apr 9 18:45:28 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:45:28 controller-0 corosync[36483]: [KNET ] host: host: 3 has no active links Apr 9 18:45:30 controller-0 corosync[36483]: [TOTEM ] Token has not been received in 3274 ms Apr 9 18:45:31 controller-0 corosync[36483]: [KNET ] link: host: 2 link: 0 is down Apr 9 18:45:31 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:45:31 controller-0 corosync[36483]: [KNET ] host: host: 2 has no active links Apr 9 18:45:32 controller-0 corosync[36483]: [TOTEM ] A new membership (1.19) was formed. Members Apr 9 18:45:32 controller-0 corosync[36483]: [KNET ] rx: host: 3 link: 0 is up Apr 9 18:45:32 controller-0 corosync[36483]: [KNET ] rx: host: 2 link: 0 is up Apr 9 18:45:32 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:45:32 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:45:33 controller-0 corosync[36483]: [TOTEM ] A new membership (1.1d) was formed. Members joined: 2 Apr 9 18:45:33 controller-0 corosync[36483]: [TOTEM ] A new membership (1.21) was formed. Members joined: 3 Apr 9 18:52:45 controller-0 corosync[36483]: [KNET ] link: host: 2 link: 0 is down Apr 9 18:52:45 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:52:45 controller-0 corosync[36483]: [KNET ] host: host: 2 has no active links Apr 9 18:52:46 controller-0 corosync[36483]: [KNET ] link: host: 3 link: 0 is down Apr 9 18:52:46 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:52:46 controller-0 corosync[36483]: [KNET ] host: host: 3 has no active links Apr 9 18:52:46 controller-0 corosync[36483]: [TOTEM ] Token has not been received in 1237 ms Apr 9 18:52:47 controller-0 corosync[36483]: [TOTEM ] A processor failed, forming new configuration. Apr 9 18:52:49 controller-0 corosync[36483]: [TOTEM ] A new membership (1.25) was formed. Members left: 2 3 Apr 9 18:52:49 controller-0 corosync[36483]: [TOTEM ] Failed to receive the leave message. failed: 2 3 Apr 9 18:52:49 controller-0 corosync[36483]: [KNET ] rx: host: 3 link: 0 is up Apr 9 18:52:49 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:52:52 controller-0 corosync[36483]: [KNET ] rx: host: 2 link: 0 is up Apr 9 18:52:52 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:52:52 controller-0 corosync[36483]: [TOTEM ] Token has not been received in 2186 ms Apr 9 18:52:52 controller-0 corosync[36483]: [TOTEM ] A new membership (1.29) was formed. Members joined: 2 3 Apr 9 18:55:29 controller-0 corosync[36483]: [KNET ] link: host: 3 link: 0 is down Apr 9 18:55:29 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:55:29 controller-0 corosync[36483]: [KNET ] host: host: 3 has no active links Apr 9 18:55:30 controller-0 corosync[36483]: [TOTEM ] Token has not been received in 451 ms Apr 9 18:55:30 controller-0 corosync[36483]: [KNET ] link: host: 2 link: 0 is down Apr 9 18:55:30 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:55:30 controller-0 corosync[36483]: [KNET ] host: host: 2 has no active links Apr 9 18:55:30 controller-0 corosync[36483]: [TOTEM ] A processor failed, forming new configuration. Apr 9 18:55:33 controller-0 corosync[36483]: [KNET ] rx: host: 3 link: 0 is up Apr 9 18:55:33 controller-0 corosync[36483]: [KNET ] rx: host: 2 link: 0 is up Apr 9 18:55:33 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:55:33 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:55:34 controller-0 corosync[36483]: [TOTEM ] A new membership (1.35) was formed. Members left: 2 3 Apr 9 18:55:34 controller-0 corosync[36483]: [TOTEM ] Failed to receive the leave message. failed: 2 3 Apr 9 18:55:34 controller-0 corosync[36483]: [TOTEM ] A new membership (1.39) was formed. Members joined: 2 3 Apr 9 18:56:47 controller-0 corosync[36483]: [KNET ] link: host: 3 link: 0 is down Apr 9 18:56:47 controller-0 corosync[36483]: [KNET ] link: host: 2 link: 0 is down Apr 9 18:56:47 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:56:47 controller-0 corosync[36483]: [KNET ] host: host: 3 has no active links Apr 9 18:56:47 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:56:47 controller-0 corosync[36483]: [KNET ] host: host: 2 has no active links Apr 9 18:56:47 controller-0 corosync[36483]: [TOTEM ] Token has not been received in 829 ms Apr 9 18:56:47 controller-0 corosync[36483]: [TOTEM ] A processor failed, forming new configuration. Apr 9 18:56:49 controller-0 corosync[36483]: [TOTEM ] A new membership (1.3d) was formed. Members left: 2 3 Apr 9 18:56:49 controller-0 corosync[36483]: [TOTEM ] Failed to receive the leave message. failed: 2 3 Apr 9 18:56:51 controller-0 corosync[36483]: [KNET ] rx: host: 3 link: 0 is up Apr 9 18:56:51 controller-0 corosync[36483]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1) Apr 9 18:56:52 controller-0 corosync[36483]: [KNET ] rx: host: 2 link: 0 is up Apr 9 18:56:52 controller-0 corosync[36483]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Apr 9 18:56:52 controller-0 corosync[36483]: [TOTEM ] A new membership (1.41) was formed. Members joined: 2 Apr 9 18:56:53 controller-0 corosync[36483]: [TOTEM ] A new membership (1.45) was formed. Members joined: 3 same stuff on the other nodes. Controllers are randomly getting isolated and since you don't have fencing configured/enabled pacemaker can't recover promptly (if at all). It seems to me that the root cause could be the load on the environment but since we don't have enough data in the sosreports I can't be 100% sure. If you happen to reproduce maybe give us a ping and we'll look at the live env. Openvswitch version is openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 [heat-admin@controller-0 ~]$ sudo podman exec -it ovn_controller rpm -qa | grep openvswitch rhosp-openvswitch-2.13-7.el8ost.noarch rhosp-openvswitch-ovn-host-2.13-7.el8ost.noarch openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch python3-openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 python3-rhosp-openvswitch-2.13-7.el8ost.noarch network-scripts-openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 *** This bug has been marked as a duplicate of bug 1821185 *** |