Bug 1525642
| Summary: | immortal namespace are not immortal (as we claim them to be) | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Eric Rich <erich> |
| Component: | Master | Assignee: | David Eads <deads> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Wang Haoran <haowang> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.6.0 | CC: | aos-bugs, deads, jokerman, mmccomas, smunilla, stwalter |
| Target Milestone: | --- | ||
| Target Release: | 3.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | oc v3.9.0-0.20.0 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-06-18 18:18:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Eric Rich
2017-12-13 19:20:07 UTC
It should be noted, that you should be able to work around/recover from the delete by following:
1: Stop all instances of the atomic-openshift-master or atomic-openshift-master-api processes:
# systemctl stop atomic-openshift-master OR # systemctl stop atomic-openshift-master-api
# systemctl stop atomic-openshift-master* ### This should also work
- Note: if you have 3 masters run this on all 3 masters! This will cause an outage to cluster operations!
2: Find / Confirm the kube-system namespace is accessible directly from etcd (this (and the next step) are the EVASIVE part)
- Note: We will use [-] as the foundation for explaining how to do this
- Note: You need to fill in ${etcd_endpoint}, ${cert_file}, ${key_file} and ${ca_file} in the command below with files/values that match your cluster [-] shows you where/how you can lookup these values.
# export ETCDCTL_API=2; etcdctl --endpoints ${etcd_endpoint} --cert-file ${cert_file} --key-file ${key_file} --ca-file ${ca_file} ls /kubernetes.io/namespaces
OR
# export ETCDCTL_API=3; etcdctl --endpoints=${etcd_endpoint} --cert ${cert_file} --key ${key_file} --cacert ${ca_file} get /kubernetes.io/namespaces --prefix --keys-only
3: Delete the kube-system namespace from etcd directly
# export ETCDCTL_API=2; etcdctl --endpoints ${etcd_endpoint} --cert-file ${cert_file} --key-file ${key_file} --ca-file ${ca_file} del /kubernetes.io/namespaces/kube-system
OR
# export ETCDCTL_API=3; etcdctl --endpoints=${etcd_endpoint} --cert ${cert_file} --key ${key_file} --cacert ${ca_file} del /kubernetes.io/namespaces/kube-system
4: Restart all instances of the atomic-openshift-master or atomic-openshift-master-api processes:
# systemctl restart atomic-openshift-master OR # systemctl restart atomic-openshift-master-api
# systemctl restart atomic-openshift-master* ### This is known _NOT_to work (so unlike befor do _NOT_ try this).
- Note: if you have 3 masters run this on all 3 masters! This will cause an outage to cluster operations!
Once completed, the the kube-system namespace should get re-created by the api process and your cluster should begind functioning again. To test this you want to run the following:
# oc get ns ### confirm that kube-system namespace is infact created!
# oc get all,sa,secrets -n kube-system ### confirm that the kube-system namespace is infact populated with secretes and service accounts!
# oc rollout latest dc/ruby-ex -n test ### confirm that this deploys the latest instance of your application.
[-] https://access.redhat.com/articles/2542841
Fixed in 3.6-3.8 And 3.9 Verified with : oc v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://172.16.120.125:8443 openshift v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 |