Bug 1427789
| Summary: | [3.5] Pod may get the duplicate IP if it is created after the node service restarted on containerized env | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Meng Bo <bmeng> | |
| Component: | Installer | Assignee: | Dan Williams <dcbw> | |
| Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.5.0 | CC: | aos-bugs, bbennett, eparis, gscrivan, jokerman, mmccomas, sdodson, wmeng | |
| Target Milestone: | --- | Keywords: | NeedsTestCase | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
In containerized environments the CNI data directory located at /var/lib/cni was not properly configured to persist on the node host. The installer has been updated to ensure that pod IP allocation data is persisted when restarting containerized nodes.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1429029 1429030 (view as bug list) | Environment: | ||
| Last Closed: | 2017-04-12 19:02:51 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1429029, 1429030 | |||
|
Description
Meng Bo
2017-03-01 08:47:13 UTC
The data directory must be persistent, so we'll need to map that directory into the container so it sticks around across node process restarts. Similar to how docker handles IPAM, it also has an on-disk database of IPAM allocations. What directory? I'm assuming this will be as easy as another -v volume option in the definitions of how we launch the node container? (In reply to Eric Paris from comment #2) > What directory? I'm assuming this will be as easy as another -v volume > option in the definitions of how we launch the node container? /var/lib/cni/ should probably get persisted. Does that only need to happen in ansible, like in roles/openshift_node/templates/openshift.docker.node.service ? eg, is this just an ansible issue, or does something in origin itself need updating? sdodson can either tell us everywhere we need the -v or he can tell us who knows... openshift-ansible's roles/openshift_node/templates/openshift.docker.node.service is what's actually used by the installer origin's /contrib/systemd/containerized/ has some reference systemd units too Giueseppe can explain how this needs to be done for system containers which we'll be switching to in the future. so the issue is that /var/lib/cni is recreated each time the node container restarts? I think this should work with system containers, as there is already a binding from the host `/var/lib/cni` so that it is persisted across restarts of the node container: https://github.com/openshift/origin/blob/master/images/node/system-container/config.json.template#L238 With system containers we enforce the image to be read only, that helps to ensure no state is left into the container itself. (In reply to Giuseppe Scrivano from comment #6) > so the issue is that /var/lib/cni is recreated each time the node container > restarts? > > I think this should work with system containers, as there is already a > binding from the host `/var/lib/cni` so that it is persisted across restarts > of the node container: > > https://github.com/openshift/origin/blob/master/images/node/system-container/ > config.json.template#L238 > > With system containers we enforce the image to be read only, that helps to > ensure no state is left into the container itself. Ok, so that looks like it would do the right thing. But are our customers using that right now when they set up a "containerized env" or is that happening via the ansible installer, or somehow else? (In reply to Dan Williams from comment #7) > (In reply to Giuseppe Scrivano from comment #6) > > so the issue is that /var/lib/cni is recreated each time the node container > > restarts? > > > > I think this should work with system containers, as there is already a > > binding from the host `/var/lib/cni` so that it is persisted across restarts > > of the node container: > > > > https://github.com/openshift/origin/blob/master/images/node/system-container/ > > config.json.template#L238 > > > > With system containers we enforce the image to be read only, that helps to > > ensure no state is left into the container itself. > > Ok, so that looks like it would do the right thing. But are our customers > using that right now when they set up a "containerized env" or is that > happening via the ansible installer, or somehow else? No, that's future state. I just wanted to make sure we accounted for that so we don't regress as soon as we move to system containers. Origin: https://github.com/openshift/origin/pull/13231 Ansible: https://github.com/openshift/openshift-ansible/pull/3556 Tested with containerized node in a docker-in-docker instance. /var/lib/cni is preserved across "docker restart origin/node" invocations. Since the ansible code merged for 3.5/master marking as MODIFIED Tested with OCP build 3.5.0.39 and openshift-ansible-3.5.23-1 issue has been fixed. The /var/lib/cni directory will be persistent when the node container restart. *** Bug 1429029 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0903 |