Bug 2065504
Summary: | Neutron DVR breaks with kernel 4.18.0-365.el8.x86_64 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Community] RDO | Reporter: | Jonathan Mills <jonathan.b.mills> | ||||||||||
Component: | openstack-neutron | Assignee: | Daniel Alvarez Sanchez <dalvarez> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ofer Blaut <oblaut> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | unspecified | CC: | chrisw, fwestpha, ralonsoh, robert.m.budden, srevivo, ykarel | ||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||
Target Release: | trunk | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2022-08-30 04:27:34 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | 2062870 | ||||||||||||
Bug Blocks: | |||||||||||||
Attachments: |
|
Description
Jonathan Mills
2022-03-18 01:25:05 UTC
I should have mentioned...we can fix the problem 100% of the time simply be reverting our kernels from 4.18.0-365.el8.x86_64 to 4.18.0-348.el8.x86_64. We have reverted the kernels on our production cloud hypervisors. But clearly this isn't ideal, as we also have a duty to patch... To add some additional information, we are able to live swap tenant virtual routers to centralized mode to restore North/South connectivity. This has obvious performance impacts, but may prove useful in narrowing down or debugging the issue. On the surface one might expect the North/South pieces of DVR and Centralized to be similar if not the same, but admittedly I have not yet dug into the code. Created attachment 1866639 [details]
pcap of the working example
Created attachment 1866640 [details]
pcap of the broken example
Created attachment 1866641 [details]
Additional details on working example as text log
Created attachment 1866642 [details]
Additional details on broken example as text log
So checked the issue reproduces even with latest C8 kernel 4.18.0-394.el8 and 'dvr_no_external' l3 agent mode, and also issue don't reproduce on CentOS 9-Stream. On checking further with @ralonsoh we found that it's caused by the fix of https://bugzilla.redhat.com/show_bug.cgi?id=2006167. That fix was reverted in RHEL 9 as part of https://bugzilla.redhat.com/show_bug.cgi?id=2061850 that's the reason we don't see the issue in CentOS 9-Stream, but the issue is not yet fixed in RHEL 8 kernel https://bugzilla.redhat.com/show_bug.cgi?id=2051413 (In reply to Yatin Karel from comment #10) > So checked the issue reproduces even with latest C8 kernel 4.18.0-394.el8 > and 'dvr_no_external' l3 agent mode, and also issue don't reproduce on > CentOS 9-Stream. > > On checking further with @ralonsoh we found that it's caused by the fix of > https://bugzilla.redhat.com/show_bug.cgi?id=2006167. > That fix was reverted in RHEL 9 as part of > https://bugzilla.redhat.com/show_bug.cgi?id=2061850 that's the reason we > don't see the issue in CentOS 9-Stream, but the issue is not yet fixed in > RHEL 8 kernel https://bugzilla.redhat.com/show_bug.cgi?id=2051413 It is fixed in RHEL8, in 4.18.0-397.el8. The bug you are referencing is filed vs. Fedora. The RHEL8 bug is https://bugzilla.redhat.com/show_bug.cgi?id=2062870. kernel-4.18.0-408.el8(In reply to Florian Westphal from comment #11) > (In reply to Yatin Karel from comment #10) > > So checked the issue reproduces even with latest C8 kernel 4.18.0-394.el8 > > and 'dvr_no_external' l3 agent mode, and also issue don't reproduce on > > CentOS 9-Stream. > > > > On checking further with @ralonsoh we found that it's caused by the fix of > > https://bugzilla.redhat.com/show_bug.cgi?id=2006167. > > That fix was reverted in RHEL 9 as part of > > https://bugzilla.redhat.com/show_bug.cgi?id=2061850 that's the reason we > > don't see the issue in CentOS 9-Stream, but the issue is not yet fixed in > > RHEL 8 kernel https://bugzilla.redhat.com/show_bug.cgi?id=2051413 > > It is fixed in RHEL8, in 4.18.0-397.el8. The bug you are referencing is > filed vs. Fedora. > The RHEL8 bug is https://bugzilla.redhat.com/show_bug.cgi?id=2062870. Thanks Florian for the link, i updated the bz reference. I see kernel-4.18.0-408.el8 which includes the revert just built yesterday for C8-Stream, so should soon be available in C8-Stream repos. To update,(In reply to Yatin Karel from comment #12) > kernel-4.18.0-408.el8(In reply to Florian Westphal from comment #11) > > (In reply to Yatin Karel from comment #10) > > > So checked the issue reproduces even with latest C8 kernel 4.18.0-394.el8 > > > and 'dvr_no_external' l3 agent mode, and also issue don't reproduce on > > > CentOS 9-Stream. > > > > > > On checking further with @ralonsoh we found that it's caused by the fix of > > > https://bugzilla.redhat.com/show_bug.cgi?id=2006167. > > > That fix was reverted in RHEL 9 as part of > > > https://bugzilla.redhat.com/show_bug.cgi?id=2061850 that's the reason we > > > don't see the issue in CentOS 9-Stream, but the issue is not yet fixed in > > > RHEL 8 kernel https://bugzilla.redhat.com/show_bug.cgi?id=2051413 > > > > It is fixed in RHEL8, in 4.18.0-397.el8. The bug you are referencing is > > filed vs. Fedora. > > The RHEL8 bug is https://bugzilla.redhat.com/show_bug.cgi?id=2062870. > > Thanks Florian for the link, i updated the bz reference. > I see kernel-4.18.0-408.el8 which includes the revert just built yesterday > for C8-Stream, so should soon be available in C8-Stream repos. kernel-4.18.0-408.el8 now available in C8-Stream repos and is working fine, tested with both wallaby and train[1][2]. Closing the bug based on this, feel free to reopen if you still see the issue with latest kernel. [1] https://review.rdoproject.org/zuul/build/e63e400e87324bd88e877fc326e310a8 [2] https://review.rdoproject.org/zuul/build/bcdef368fcb74cc7aced9fabe7d8b9e6 |