Bug 2163559
| Summary: | ovn-controller coredumped on all nodes (controllers+compute) and many FIP flows were affected | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | David Hill <dhill> |
| Component: | ovn-2021 | Assignee: | Ales Musil <amusil> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | FDP 20.F | CC: | amusil, ctrautma, dceara, gurpsing, jiji, ltamagno, mmichels |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | ovn-2021-21.12.0-134.el8fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-06 20:05:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Hill
2023-01-23 21:58:14 UTC
It appears the corresponding customer case has been closed. We also suspect that this core dump might be fixed by backporting commit 2e4f393650ccf298f26787583c13a88197ba8348 from OVN main (https://github.com/ovn-org/ovn/commit/2e4f393650ccf298f26787583c13a88197ba8348) . Once we backport this fix, we will close this issue. We didn't have the core dump and it didn't happen again ... ovn-2021 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2213611 Hi Ales, this is the log I found in ovn-2021.spec: * Thu May 18 2023 Ales Musil <amusil> - 21.12.0-132 - [branch-21.12] ovs: Bump submodule to v2.17.6 (#2163559) [Upstream: 52ef956bb4e9de2d418805dd43b337184f1aa560] which specific patch fix the issue? any reproducer for the issue? thanks Hi, the commit https://github.com/ovn-org/ovn/commit/2e4f393650ccf298f26787583c13a88197ba8348 has a test which was used to reproduce the original issue. Basing the reproducer on that is the best chance. Thanks, Ales tested with following script:
enable_coredump()
{
ulimit -c unlimited
ulimit -s unlimited
sysctl -w fs.suid_dumpable=2
if ! sysctl kernel.core_pattern | grep systemd-coredump
then
sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c "
fi
rm -rf /var/lib/systemd/coredump/*
rm -rf /run/log/journal/*
rm -rf /var/log/journal/*
systemctl restart systemd-journald
}
enable_coredump
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_1
systemctl restart ovn-controller
ovs-vsctl add-port br-int p1 -- set interface p1 type=internal
ovs-vsctl set interface p1 external-ids:iface-id=sw0-port1
ovn-nbctl --wait=hv sync
ovn-appctl debug/pause
sleep 2
ovn-appctl -t ovn-controller debug/status
ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1
ovn-nbctl lsp-del sw0-port1
ovn-nbctl --wait=sb sync
ovn-appctl debug/resume
ovn-nbctl --wait=hv sync
ovn-nbctl ls-del sw0
ovn-nbctl --wait=hv sync
coredumpctl list
reproduced on ovn-2021-21.12.0-130.el8:
+ ovn-nbctl --wait=hv sync
+ coredumpctl list
TIME PID UID GID SIG COREFILE EXE
Mon 2023-07-03 03:07:01 EDT 34521 993 990 11 none /usr/bin/ovn-controller
<=== coredump
Verified on ovn-2021-21.12.0-134.el8:
[root@sweetpig-8 bz2163559]# rpm -qa | grep -E "openvswitch2.17|ovn-2021"
ovn-2021-21.12.0-130.el8fdp.x86_64
ovn-2021-host-21.12.0-130.el8fdp.x86_64
ovn-2021-central-21.12.0-130.el8fdp.x86_64
openvswitch2.17-2.17.0-106.el8fdp.x86_64
+ ovn-nbctl --wait=hv sync
+ coredumpctl list
No coredumps found.
<=== no coredump
[root@sweetpig-8 bz2163559]# rpm -qa | grep -E "openvswitch2.17|ovn-2021"
ovn-2021-21.12.0-134.el8fdp.x86_64
ovn-2021-host-21.12.0-134.el8fdp.x86_64
openvswitch2.17-2.17.0-106.el8fdp.x86_64
ovn-2021-central-21.12.0-134.el8fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn-2021 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3995 |