Bug 1525642
Summary: | immortal namespace are not immortal (as we claim them to be) | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eric Rich <erich> |
Component: | Master | Assignee: | David Eads <deads> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Wang Haoran <haowang> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.0 | CC: | aos-bugs, deads, jokerman, mmccomas, smunilla, stwalter |
Target Milestone: | --- | ||
Target Release: | 3.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | oc v3.9.0-0.20.0 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-18 18:18:44 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Eric Rich
2017-12-13 19:20:07 UTC
It should be noted, that you should be able to work around/recover from the delete by following: 1: Stop all instances of the atomic-openshift-master or atomic-openshift-master-api processes: # systemctl stop atomic-openshift-master OR # systemctl stop atomic-openshift-master-api # systemctl stop atomic-openshift-master* ### This should also work - Note: if you have 3 masters run this on all 3 masters! This will cause an outage to cluster operations! 2: Find / Confirm the kube-system namespace is accessible directly from etcd (this (and the next step) are the EVASIVE part) - Note: We will use [-] as the foundation for explaining how to do this - Note: You need to fill in ${etcd_endpoint}, ${cert_file}, ${key_file} and ${ca_file} in the command below with files/values that match your cluster [-] shows you where/how you can lookup these values. # export ETCDCTL_API=2; etcdctl --endpoints ${etcd_endpoint} --cert-file ${cert_file} --key-file ${key_file} --ca-file ${ca_file} ls /kubernetes.io/namespaces OR # export ETCDCTL_API=3; etcdctl --endpoints=${etcd_endpoint} --cert ${cert_file} --key ${key_file} --cacert ${ca_file} get /kubernetes.io/namespaces --prefix --keys-only 3: Delete the kube-system namespace from etcd directly # export ETCDCTL_API=2; etcdctl --endpoints ${etcd_endpoint} --cert-file ${cert_file} --key-file ${key_file} --ca-file ${ca_file} del /kubernetes.io/namespaces/kube-system OR # export ETCDCTL_API=3; etcdctl --endpoints=${etcd_endpoint} --cert ${cert_file} --key ${key_file} --cacert ${ca_file} del /kubernetes.io/namespaces/kube-system 4: Restart all instances of the atomic-openshift-master or atomic-openshift-master-api processes: # systemctl restart atomic-openshift-master OR # systemctl restart atomic-openshift-master-api # systemctl restart atomic-openshift-master* ### This is known _NOT_to work (so unlike befor do _NOT_ try this). - Note: if you have 3 masters run this on all 3 masters! This will cause an outage to cluster operations! Once completed, the the kube-system namespace should get re-created by the api process and your cluster should begind functioning again. To test this you want to run the following: # oc get ns ### confirm that kube-system namespace is infact created! # oc get all,sa,secrets -n kube-system ### confirm that the kube-system namespace is infact populated with secretes and service accounts! # oc rollout latest dc/ruby-ex -n test ### confirm that this deploys the latest instance of your application. [-] https://access.redhat.com/articles/2542841 Fixed in 3.6-3.8 And 3.9 Verified with : oc v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://172.16.120.125:8443 openshift v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 |