Bug 1569244
Summary: | ovs-subnet to ovs-networkpolicy migration does not work as documented [docs] | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Matthew Robson <mrobson> | ||||||||
Component: | Networking | Assignee: | Brandi Munilla <bmcelvee> | ||||||||
Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> | ||||||||
Status: | CLOSED WONTFIX | Docs Contact: | |||||||||
Severity: | high | ||||||||||
Priority: | unspecified | CC: | aos-bugs, bbennett, misalunk, scuppett, tmanor, zzhao | ||||||||
Version: | 3.7.0 | Keywords: | Reopened | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | 3.10.z | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
Cause: The documentation for ovs-subnet to ovs-networkpolicy migration was not complete.
Consequence: Migration would not succeed without a reboot.
Fix: The documentation was corrected.
Result: Migration can be done without a reboot.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2019-11-20 15:46:24 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Matthew Robson
2018-04-18 21:16:04 UTC
Weibin: Can you try this and see what needs to be restarted to make it work? And then we can update the docs. I have reproduced two issues mentioned in bug description in my setup which installed v3.7.44. 1. oc commands are VERY slow 2. networkpolicy does not function after migration until restart services. For the send issue, I checked ovs rules when this happened and found the ovs rule does not up after apply networkpolicy. OVS rules log which include working setup and not working setup is attached. Created attachment 1431537 [details]
ovs rules log
After updating networkPluginName: redhat/openshift-ovs-networkpolicy in master-config and node-config files, please restart services in below orders: In Master: systemctl restart iptables systemctl restart openvswitch systemctl restart docker systemctl restart atomic-openshift-master-api.service systemctl restart atomic-openshift-master-controllers.service systemctl restart atomic-openshift-node.service In Node: systemctl restart iptables systemctl restart openvswitch systemctl restart docker systemctl restart atomic-openshift-node.service Trying above steps and without rebooting systems in both v3.7 and v3.9, all oc commands, container veth interfaces and deny-by-default networkpolicy work fine Weibin, I've been testing these added steps in the CEE QuickLabs with mixed results. I'm noting that at times it works as expected, however, other times, while the svc is restricted via curl, the endpoint is not. In looking at the pod placement, it appears that when the target pod is on the same node as the router pod, the svc is restricted, but not the endpoint, as noted when running curl for both svc and endpoint. When the target pod is on a different node as the router pod, both the svc and endpoint appear to have proper restrictions. I too am using OCP 3.7.44/RHEL7.4 in the QuickLabs. I've attached to additional files recording the steps for each lab. If you want to inspect the labs, you are more than welcome. Working Lab Details: -------------------- https://operations.cee.redhat.com/quicklab/cluster/tmanornetwork attached file: working.pods.diff.nodes.txt Not Working Lab Details: ------------------------ https://operations.cee.redhat.com/quicklab/cluster/tmanor1networkpolicy attached file notworking.pods.same.node.txt Instructions for creating the key for QuickLabs can be found here: https://gitlab.cee.redhat.com/cee_ops/quicklab/wikis/access Created attachment 1433504 [details]
recording of working policy
Created attachment 1433505 [details]
recording of nonworking policy
Weibin, Something else worth noting, in my testing, I created my target project AFTER migrating the environment from ovs-subnet to ovs-networkpolicy, reaching the 2 different results noted above. I would think we would need to consider the test case where EXISTING projects running under ovs-subnet would also need to be able to have NetworkPolicy objects defined once the migration to ovs-networkpolicy has occurred. In some preliminary testing, I have not been able to get this test case to work either, regardless if the target pod is hosted on the same, or different node as the router pod. Would it make sense to cover the use-cases/test cases as a separate BZ? I'm going to continue testing this additional use case as well. Thanks! @Tom, I reproduced the issue you mentioned in Comment 5. After I tried same testing steps in a new openshift cluster which install openshift-ovs-networkpolicy at beginning, I saw the same issue. So, the problem you saw is a networkpolicy issue, but not a migration issue. Please open another bug to report the networkpolicy issue you found here. For this bug, We will need a documentation PR to describe the correct orders of restarting services after migrate openshift-ovs-sub to openshift-ovs-networkpolicy @Weibin, New BZ created for networkpolicy issue - BZ 1576857. Weibin, do we need to do all of those steps? For the master is it sufficient to do: systemctl restart openvswitch systemctl restart atomic-openshift-master-api.service systemctl restart atomic-openshift-master-controllers.service systemctl restart atomic-openshift-node.service And for the node: systemctl restart openvswitch systemctl restart atomic-openshift-node.service (In reply to Ben Bennett from comment #11) > Weibin, do we need to do all of those steps? > > For the master is it sufficient to do: > systemctl restart openvswitch > systemctl restart atomic-openshift-master-api.service > systemctl restart atomic-openshift-master-controllers.service > systemctl restart atomic-openshift-node.service > > And for the node: > systemctl restart openvswitch > systemctl restart atomic-openshift-node.service Ben, Re test it again, I think the whole steps should be like this: ####Migrating from ovs-subnet to ovs-networkpolicy 1) Updated networkPluginName in master-config.yaml on all masters: 2) Updated networkPluginName in node-config.yaml on all masters and nodes: 3) Restart the atomic-openshift-master-api and atomic-openshift-master-controllers on all masters one by one 4) Restart the atomic-openshift-node service on all masters and nodes one by one 5) Restart openvswitch on all masters and nodes one by one 6) Restart the atomic-openshift-master-api and atomic-openshift-master-controllers on all masters one by one 7) Restart the atomic-openshift-node service on all masters one by one (In reply to Weibin Liang from comment #12) > (In reply to Ben Bennett from comment #11) > > Weibin, do we need to do all of those steps? > > > > For the master is it sufficient to do: > > systemctl restart openvswitch > > systemctl restart atomic-openshift-master-api.service > > systemctl restart atomic-openshift-master-controllers.service > > systemctl restart atomic-openshift-node.service > > > > And for the node: > > systemctl restart openvswitch > > systemctl restart atomic-openshift-node.service > > Ben, Re test it again, I think the whole steps should be like this: > > ####Migrating from ovs-subnet to ovs-networkpolicy > > 1) Updated networkPluginName in master-config.yaml on all masters: > > 2) Updated networkPluginName in node-config.yaml on all masters and nodes: > > 3) Restart the atomic-openshift-master-api and > atomic-openshift-master-controllers on all masters one by one > > 4) Restart the atomic-openshift-node service on all masters and nodes one by > one > > 5) Restart openvswitch on all masters and nodes one by one > > 6) Restart the atomic-openshift-master-api and > atomic-openshift-master-controllers on all masters one by one > > 7) Restart the atomic-openshift-node service on all masters one by one 7) Should be: 7) Restart the atomic-openshift-node service on all masters and nodes one by one Fixed by docs PR https://github.com/openshift/openshift-docs/pull/9638 Commit pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/8eeb262175dca99706acd54f529ec8985c036913 Fix the OpenShift SDN migration steps We need to restart openvswitch to clean out the rules before we restart the node processes. Fixes bug 1569244 (https://bugzilla.redhat.com/show_bug.cgi?id=1569244) The docs changes LGTM ,verified this bug Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816 The *Migrating Between SDN Plug-ins* section in the 3.9[1] and 3.10[2] docs was changed last May per https://bugzilla.redhat.com/show_bug.cgi?id=1569244#c13. The 3.7[3] docs were not changed, though the issue was originally raised for 3.7. @misalunk - What version was your customer using? Please confirm whether the 3.7 docs need this change. [1] https://docs.openshift.com/container-platform/3.9/install_config/configuring_sdn.html#migrating-between-sdn-plugins [2] https://docs.openshift.com/container-platform/3.10/install_config/configuring_sdn.html#migrating-between-sdn-plugins [3] https://docs.openshift.com/container-platform/3.7/install_config/configuring_sdn.html#migrating-between-sdn-plugins OCP 3.7-3.10 has reached the end of full support [1]. Closing this BZ as WONTFIX. If there is a customer case to be attached with a valid support exception and we still need a fix here, please post those details and reopen. [1] - https://access.redhat.com/support/policy/updates/openshift The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |