Bug 1710012
| Summary: | Provide user's information on making sure registry cluster object exists before applying patch. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Eric Rich <erich> | ||||
| Component: | Master | Assignee: | Michal Fojtik <mfojtik> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Xingxing Xia <xxia> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 4.1.0 | CC: | adahiya, aos-bugs, jokerman, mmccomas | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.1.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-05-22 20:24:12 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Eric Rich
2019-05-14 17:53:05 UTC
(In reply to Eric Rich from comment #0) > Description of problem: If you follow: > https://docs.openshift.com/container-platform/4.1/installing/ > installing_bare_metal/installing-bare-metal.html#installation-installing- > bare-metal_installing-bare-metal you can't apply registry storage. You are > hit with the following error: > > > $ oc --config test_cluster/auth/kubeconfig patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}' > > Error from server (NotFound): configs.imageregistry.operator.openshift.io "cluster" not found You cannot patch the object if the api or object doesn't exist. If user's need that hand-holding, documentation can be updated to make sure user's see the object exist and then patch. Also to be fair this command exists as "example" "not for production" way to configure registry in the mentioned docs. > Version-Release number of selected component (if applicable): beta4 > > How reproducible: Rare (only seen this once) > > Steps to Reproduce: > 1. Follow the Docs > > Actual results: See above > > Expected results: I have seen this patch command work in the past. > > Additional info: > > $ oc --config test_cluster/auth/kubeconfig get clusterversion > NAME VERSION AVAILABLE PROGRESSING SINCE STATUS > version 4.1.0-rc.0 False True 45m Unable to apply > 4.1.0-rc.0: an unknown error has occurred > > $ oc --config test_cluster/auth/kubeconfig get clusteroperators > NAME VERSION AVAILABLE PROGRESSING > FAILING SINCE > cloud-credential 4.1.0-rc.0 True False > False 43m > cluster-autoscaler 4.1.0-rc.0 True False > False 43m > dns 4.1.0-rc.0 False False > False 40m > kube-apiserver 4.1.0-rc.0 True True > 42m > kube-controller-manager 4.1.0-rc.0 True False > 40m > kube-scheduler 4.1.0-rc.0 True False > 40m > machine-api 4.1.0-rc.0 True False > False 43m > machine-config 4.1.0-rc.0 False False > True 30m > network 4.1.0-rc.0 True True > 44m > openshift-apiserver 4.1.0-rc.0 False False > 40m > openshift-controller-manager 4.1.0-rc.0 False False > 33m > operator-lifecycle-manager 4.1.0-rc.0 True False > False 42m > operator-lifecycle-manager-catalog 4.1.0-rc.0 True False > False 42m > service-ca 4.1.0-rc.0 True True > False 43m > > $ oc --config test_cluster/auth/kubeconfig -n openshift-cluster-version logs > cluster-version-operator-864544f74f-9hfw4 > Error from server: Get > https://master-0:10250/containerLogs/openshift-cluster-version/cluster- > version-operator-864544f74f-9hfw4/cluster-version-operator: dial tcp > 192.168.100.10:10250: connect: connection refused > > However, if you are on master-0 (where the pod gets deployed); you can > connect to this port: > > $ oc --config test_cluster/auth/kubeconfig -n openshift-cluster-version get > pod cluster-version-operator-864544f74f-9hfw4 -o > jsonpath='{.spec.nodeName}{"\n"}' > > [core@master-0 ~]$ echo >/dev/tcp/master-0.thoran.dwarf.mine/10250 > [core@master-0 ~]$ echo $? > 0 > [core@master-0 ~]$ echo >/dev/tcp/master-0/10250 > [core@master-0 ~]$ echo $? > 0 > [core@master-0 ~]$ echo >/dev/tcp/192.168.100.10/10250 > -bash: connect: Connection refused > -bash: /dev/tcp/192.168.100.10/10250: Connection refused > > However, this goes away (eventually) and then produces. > > $ oc --config test_cluster/auth/kubeconfig -n openshift-cluster-version logs > cluster-version-operator-864544f74f-9hfw4 > Error from server: Get > https://master-0:10250/containerLogs/openshift-cluster-version/cluster- > version-operator-864544f74f-9hfw4/cluster-version-operator: remote error: > tls: internal error > > $ curl > https://master-0.thoran.dwarf.mine:10250/containerLogs/openshift-cluster- > version/cluster-version-operator-864544f74f-9hfw4/cluster-version- > operator?follow=true: > curl: (35) error:14094438:SSL routines:ssl3_read_bytes:tlsv1 alert internal > error Created attachment 1568635 [details]
Install Logs
I think the question (we need to be focused on) is why the install never completes, or why that resource is not available at that state in the docs for a user to run said command.
I've filed [1] to get more specifics in the CVO error instead of the above 'Unable to apply 4.1.0-rc.0: an unknown error has occurred'. [1]: https://github.com/openshift/cluster-version-operator/pull/185 This may be connected to https://bugzilla.redhat.com/show_bug.cgi?id=1684049 This clusters failed install seems to be caused because of the kubelet certificate being expired:
$ sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
08:89:6e:e5:3c:da:dd:28:b4:16:37:ad:75:bb:e0:9a:70:f6:5f:56
Signature Algorithm: sha256WithRSAEncryption
Issuer: OU = openshift, CN = kubelet-signer
Validity
Not Before: May 14 16:34:00 2019 GMT
Not After : May 14 16:55:30 2019 GMT
Subject: O = system:nodes, CN = system:node:master-0
|