Bug 2132964
| Summary: | Potential crash in ovn-controller when handling deleted port bindings | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | OvS team <ovs-bugzilla> |
| Component: | openvswitch2.17 | Assignee: | xsimonar |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | FDP 22.H | CC: | ctrautma, eelahi, jhsiao, ralongi, tredaelli, xsimonar |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openvswitch2.17-2.17.0-46.el9fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-11-21 18:19:11 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
OvS team
2022-10-07 11:26:36 UTC
* Fri Oct 07 2022 Open vSwitch CI <ovs-ci> - 2.17.0-46
- Merging upstream branch-2.17 [RH git: b2b4334db0]
Commit list:
09e22fec45 daemon-unix: Fix file descriptor leak when monitor restarts child.
53df50db26 vconn: Allow ECONNREFUSED in refuse connection test.
26a11ca610 dpdk: Use DPDK 21.11.2 release.
edf699ec64 m4: Test avx512 for x86 only.
1989caf9ea ovsdb-idl: Preserve references for rows deleted in same IDL run as their insertion. (#2126450)
db6a612cd7 python: idl: Fix idl.Row.__str__ method.
73d7bf64a7 bond: Avoid deadlock while updating post recirculation rules.
70a63391cb ofproto-dpif-upcall: Add debug commands to pause/resume revalidators.
cf0e12f8ae test-list: Fix false-positive build failure with GCC 12.
tried with following script:
enable_coredump()
{
ulimit -c unlimited
ulimit -s unlimited
sysctl -w fs.suid_dumpable=2
if ! sysctl kernel.core_pattern | grep systemd-coredump
then
sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e"
fi
rm -rf /var/lib/systemd/coredump/*
rm -rf /run/log/journal/*
rm -rf /var/log/journal/*
systemctl restart systemd-journald
}
check_coredump()
{
coredumpctl list
}
enable_coredump
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.50.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.50.25
systemctl restart ovn-controller
ps aux | grep ovn-controller
ovn-appctl vlog/set dbg
ovs-vsctl add-port br-int p1 -- set interface p1 type=internal external-ids:iface-id=sw0-port1
ovn-nbctl --wait=hv sync
ovn-appctl debug/pause
ovn-appctl -t ovn-controller debug/status
ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
ovn-nbctl lsp-del sw0-port1
ovn-appctl debug/resume
ovn-nbctl --wait=hv sync
ps aux | grep ovn-controller
ovn-nbctl ls-del sw0
ovn-nbctl --wait=hv sync
ps aux | grep ovn-controller
check_coredump
ovn-controller still crash on openvswitch2.17-50.el9:
[root@dell-per740-33 bz2132964]# rpm -qa | grep -E "openvswitch2.17|ovn22.06"
ovn22.06-22.06.0-64.el9fdp.x86_64
ovn22.06-central-22.06.0-64.el9fdp.x86_64
ovn22.06-host-22.06.0-64.el9fdp.x86_64
openvswitch2.17-2.17.0-50.el9fdp.x86_64
+ systemctl restart ovn-controller
+ ps aux
+ grep ovn-controller
openvsw+ 40817 0.0 0.0 238236 6640 ? S<sl 21:32 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file
=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root 40822 0.0 0.0 6412 2264 pts/0 S+ 21:32 0:00 grep ovn-controller
+ ovn-appctl vlog/set dbg
+ ovs-vsctl add-port br-int p1 -- set interface p1 type=internal external-ids:iface-id=sw0-port1
+ ovn-nbctl --wait=hv sync
+ ovn-appctl debug/pause
+ ovn-appctl -t ovn-controller debug/status
paused
+ ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses sw0-port1 '50:54:00:00:00:01 192.168.0.2'
+ ovn-nbctl lsp-del sw0-port1
+ ovn-appctl debug/resume
+ ovn-nbctl --wait=hv sync
+ ps aux
+ grep ovn-controller
openvsw+ 40951 0.0 0.0 238264 7100 ? S<sl 21:32 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file
=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root 40956 0.0 0.0 6412 2264 pts/0 S+ 21:32 0:00 grep ovn-controller
+ ovn-nbctl ls-del sw0
+ ovn-nbctl --wait=hv sync
+ ps aux
+ grep ovn-controller
openvsw+ 40951 0.0 0.0 238264 7100 ? S<sl 21:32 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file
=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root 40960 0.0 0.0 6412 2176 pts/0 S+ 21:32 0:00 grep ovn-controller
+ check_coredump
+ coredumpctl list
TIME PID UID GID SIG COREFILE EXE SIZE
Sun 2022-10-16 21:32:23 EDT 40817 986 986 SIGSEGV none /usr/bin/ovn-controller n/a
I don't know why there is no coredump file generated.
Xavier, could you please help to check? thanks
Hi Jianlin The reproducer I initially posted does not always reproduce the issue. Can you try adding "ovn-nbctl --wait=sb sync" right before "ovn-appctl debug/resume" ? Thanks Xavier (In reply to xsimonar from comment #5) > Hi Jianlin > > The reproducer I initially posted does not always reproduce the issue. > Can you try adding "ovn-nbctl --wait=sb sync" right before "ovn-appctl > debug/resume" ? > > Thanks > Xavier the same result: + systemctl restart ovn-controller + ps aux + grep ovn-controller openvsw+ 88564 0.0 0.0 238232 6748 ? S<sl 20:45 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach root 88569 0.0 0.0 6412 2264 pts/0 S+ 20:45 0:00 grep ovn-controller + ovn-appctl vlog/set dbg + ovs-vsctl add-port br-int p1 -- set interface p1 type=internal external-ids:iface-id=sw0-port1 + ovn-nbctl --wait=hv sync + ovn-appctl debug/pause + ovn-appctl -t ovn-controller debug/status paused + ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses sw0-port1 '50:54:00:00:00:01 192.168.0.2' + ovn-nbctl lsp-del sw0-port1 + ovn-nbctl --wait=sb sync + ovn-appctl debug/resume + ovn-nbctl --wait=hv sync + ps aux + grep ovn-controller openvsw+ 88698 0.0 0.0 238264 7008 ? S<sl 20:45 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach root 88703 0.0 0.0 6412 2316 pts/0 S+ 20:45 0:00 grep ovn-controller + ovn-nbctl ls-del sw0 + ovn-nbctl --wait=hv sync + ps aux + grep ovn-controller openvsw+ 88698 0.0 0.0 238264 7008 ? S<sl 20:45 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach root 88707 0.0 0.0 6412 2300 pts/0 S+ 20:45 0:00 grep ovn-controller + check_coredump + coredumpctl list TIME PID UID GID SIG COREFILE EXE SIZE Mon 2022-10-17 20:45:27 EDT 88564 986 986 SIGSEGV none /usr/bin/ovn-controller n/a [root@dell-per740-33 bz2132964]# rpm -qa | grep -E "openvswitch2.17|ovn22.06" ovn22.06-22.06.0-64.el9fdp.x86_64 ovn22.06-central-22.06.0-64.el9fdp.x86_64 ovn22.06-host-22.06.0-64.el9fdp.x86_64 openvswitch2.17-2.17.0-50.el9fdp.x86_64 Hi Sorry, I was confused by the message "there is no coredump file generated" The issue is fixed in OVS code branch, but it requires OVN to use the proper OVS submodule. The submodule change is not backported yet to ovn-22.06. As a side note, to generate the coredump, I had to do/check the following: "cat /proc/<ovn-controller-pid>/limits" I suspect the "Max core file size" is 0. Set DefaultLimitCORE=infinity in /etc/systemd/system.conf. Thanks (In reply to xsimonar from comment #7) > Hi > > Sorry, I was confused by the message "there is no coredump file generated" > > The issue is fixed in OVS code branch, but it requires OVN to use the proper > OVS submodule. > The submodule change is not backported yet to ovn-22.06. > > As a side note, to generate the coredump, I had to do/check the following: > "cat /proc/<ovn-controller-pid>/limits" I suspect the "Max core file size" > is 0. > Set DefaultLimitCORE=infinity in /etc/systemd/system.conf. > > Thanks then how could I verify the bug? is the submodule change backported to ovn-22.09? No, there is no downstream yet. There is an other (unrelated) issue preventing me so far to do the submodule change I think we should move the state back to ASSIGNED. (In reply to xsimonar from comment #9) > No, there is no downstream yet. > There is an other (unrelated) issue preventing me so far to do the submodule > change > I think we should move the state back to ASSIGNED. if it can't be fixed in 22.J, then we need to ask Mark to help to remove it from errata. no crash when test with ovn22.06-22.06.0-75: + grep ovn-controller openvsw+ 37236 0.0 0.0 238256 6980 ? S<sl 02:18 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach root 37241 0.0 0.0 6412 2244 pts/0 S+ 02:18 0:00 grep ovn-controller + ovn-appctl vlog/set dbg + ovs-vsctl add-port br-int p1 -- set interface p1 type=internal external-ids:iface-id=sw0-port1 + ovn-nbctl --wait=hv sync + ovn-appctl debug/pause + ovn-appctl -t ovn-controller debug/status paused + ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses sw0-port1 '50:54:00:00:00:01 192.168.0.2' + ovn-nbctl lsp-del sw0-port1 + ovn-appctl debug/resume + ovn-nbctl --wait=hv sync + ps aux + grep ovn-controller openvsw+ 37236 2.0 0.0 238264 7316 ? S<sl 02:18 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach root 37305 0.0 0.0 6412 2332 pts/0 S+ 02:18 0:00 grep ovn-controller + ovn-nbctl ls-del sw0 + ovn-nbctl --wait=hv sync + ps aux + grep ovn-controller openvsw+ 37236 2.0 0.0 238264 7316 ? S<sl 02:18 0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach root 37309 0.0 0.0 6412 2236 pts/0 S+ 02:18 0:00 grep ovn-controller + check_coredump + coredumpctl list No coredumps found. [root@dell-per730-20 bz2132964]# rpm -qa | grep -E "openvswitch2.17|ovn22.06" openvswitch2.17-2.17.0-50.el9fdp.x86_64 ovn22.06-22.06.0-75.el9fdp.x86_64 ovn22.06-central-22.06.0-75.el9fdp.x86_64 ovn22.06-host-22.06.0-75.el9fdp.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.17 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8567 The fix does not seem to be backported on ovn-2021. |