Bug 1614904
Summary: | Validation of static pod fails due to inconsistent names | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Steven Walter <stwalter> | |
Component: | Installer | Assignee: | Michael Gugino <mgugino> | |
Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 3.10.0 | CC: | aleks, aos-bugs, arghosh, brian.millett, byount, dhwanil.raval, fshaikh, jcrumple, jkaur, jokerman, jolee, mark.vinkx, maupadhy, mmccomas, msomasun, openshift-bugs-escalate, rbost, rhowe, rkant, rkshirsa, schoudha, scuppett, sdodson, sgarciam, sheldyakov, shlao, torben, wmeng | |
Target Milestone: | --- | |||
Target Release: | 3.11.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1638525 (view as bug list) | Environment: | ||
Last Closed: | 2018-11-20 03:10:43 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1638525 |
Description
Steven Walter
2018-08-10 16:50:43 UTC
Steven, We need to get logs from the static pods and the complete journal from the node service on all masters. `journalctl --no-pager > node.log` `master-logs etcd etcd &> etcd.log` `master-logs api api &> api.log` `master-logs controllers controllers &> controllers.log` The static pods for the API should come up before CNI and SDN are initialized and the node is marked ready. There should be no need to install atomic-openshift-sdn-ovs in 3.10, this is all handled via a daemonset that's provisioned after the API bootstraps. Hi, We previously requested to check if pods were around, like: oc get pods -n kubesystem But customer was not able to get output for these due to master not responding. Do we expect that command to respond if we ask for it in a different namespace? Or else how should we check for these logs? Sorry, the "services running as pods" thing is still a bit new to me. Is "master-logs" a command or shorthand for getting journalctl output? I dont see it as an option in my 3.10 cluster so I assume the latter Nevermind, I see "master-logs" in /usr/local/bin, I'll have the customer grab those *** Bug 1615754 has been marked as a duplicate of this bug. *** *** Bug 1613348 has been marked as a duplicate of this bug. *** QE also hit some similar issue as this bug, refer to scenario #1 in https://bugzilla.redhat.com/show_bug.cgi?id=1629726#c2. Getting similar on bare metal environment. Any updates? This should be addressed via https://github.com/openshift/openshift-ansible/pull/10356 on release-3.11. According to dev's proposed verification path. https://gist.github.com/michaelgugino/c961476d8be7d160a5e53fe9a9734051 For 3.11 fresh install, for testing scenarios #4, also need similar backport like what is done in 3.10 https://github.com/openshift/openshift-ansible/pull/10409 PR created for 3.11: https://github.com/openshift/openshift-ansible/pull/10447 3.11 merged. Verified this bug with openshift-ansible-3.11.38-1.git.0.d146f83.el7.noarch, and PASS. Scenario #1: Try to install a new 3.11 cluster with openshift_kubelet_name_override set. Installs should fail. ############ ANSIBLE RUN: playbooks/prerequisites.yml ############ PLAY [Fail openshift_kubelet_name_override for new hosts] ********************** TASK [Gathering Facts] ********************************************************* Monday 05 November 2018 14:33:22 +0800 (0:00:00.111) 0:00:00.111 ******* ok: [qe-jialiu312-master-etcd-1.1105-0gs.qe.rhcloud.com] ok: [qe-jialiu312-node-1.1105-0gs.qe.rhcloud.com] ok: [qe-jialiu312-node-registry-router-1.1105-0gs.qe.rhcloud.com] TASK [Fail when openshift_kubelet_name_override is defined] ******************** Monday 05 November 2018 14:33:23 +0800 (0:00:01.097) 0:00:01.209 ******* fatal: [qe-jialiu312-master-etcd-1.1105-0gs.qe.rhcloud.com]: FAILED! => {"changed": false, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"} fatal: [qe-jialiu312-node-registry-router-1.1105-0gs.qe.rhcloud.com]: FAILED! => {"changed": false, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"} fatal: [qe-jialiu312-node-1.1105-0gs.qe.rhcloud.com]: FAILED! => {"changed": false, "msg": "openshift_kubelet_name_override Cannot be defined for new hosts"} to retry, use: --limit @/home/slave3/workspace/Launch Environment Flexy Wrapper/private-openshift-ansible/playbooks/prerequisites.retry Scenario #2: cluster install on OSP (snvl2) without cloudprovider enabled + short hostname, PASS. Scenario #3: cluster install on OSP (snvl2) with cloudprovider enabled + short hostname, PASS. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3537 |