Created attachment 1677346 [details] oc get co Description of problem: On OCP 4.4.0-rc.6 console see messages to the effect of Machine test-pr-op3-ms6f9-master-0 does not have valid node reference Machine test-pr-op3-ms6f9-master-1 does not have valid node reference Machine test-pr-op3-ms6f9-master-2 does not have valid node reference Version-Release number of selected component (if applicable): How reproducible: Every 4.4.0-rc.6 install. Do not see on 4.3 OCP installs Steps to Reproduce: 1.Install OCP 2. 3. Actual results: Expected results: Additional info: Machine test-pr-op3-ms6f9-master-0 does not have valid node reference
Created attachment 1677348 [details] oc get po oc get po
Created attachment 1677349 [details] part 2aa of must-gather part 2aa of must-gather
Created attachment 1677350 [details] must-gather 2ab must-gather 2ab
Created attachment 1677351 [details] must-gather 2ac must-gather 2ac
Created attachment 1677352 [details] must-gather 2ad must-gather 2ad To put back together cat must-gather.tar.gz2a* > must-gather.tar.gz
When submitting bug reports, please fill out the steps to reproduce completely, to ensure the assignee can quickly reproduce & fix your bug, thanks! Steps to Reproduce: 1.Install OCP 2. 3.
Setting fix version 4.5 for now while awaiting feedback. Checking a current dev cluster, I don't immediately see anything problematic.
Setup 1. VMware 3 master, 3 worker setup. 2. esxi servers are setup with vSphere 6.7 3. SAN with 7 attached LUNs, SDRS turned off. I assume much, if not all this information, is already apparent from the must-gather.
Created attachment 1677839 [details] Console output of issue Image of Console displaying issue
Created attachment 1677840 [details] Version info Version info
Created attachment 1677843 [details] Console error on 4.4.40-rc.7 Same error occurring with 4.4.0-rc.7
The extra info may be in the must-gather, but we can triage bugs much faster if the basics of all bugs are consistently reported. Appreciate it!
Created attachment 1678803 [details] Same failure rc8 on vmware Waiting on fix
Created attachment 1678856 [details] Failure on 4.5.0-0.nightly-2020-04-14-184903 Same failure occurring on 4.5.0-0.nightly-2020-04-14-184903
Created attachment 1678857 [details] Same failure on 4.5.0-0.nightly-2020-04-14-184903 Failures on VMware, OCP console OVERview 4.5.0-0.nightly-2020-04-14-184903
So what is the status of this issue?
The console seems to be simply reporting on the data is is receiving, and that there is a problem with the machines. Passing this along to machine config.
That comes from MAO based on https://github.com/openshift/machine-api-operator/pull/406/commits/706ecf9cc21fe901fab84a7c0a49a726970560f2
Does this issue appear on newer builds?
Created attachment 1686329 [details] Latest-4.5 nightly Yes the latest-4.5 nightly is still showing the issue.
In 4.4 vSphere has no support for IPI nor automated machine management. The machine api operator will no-op. The screenshot shared for 4.5 in https://bugzilla.redhat.com/show_bug.cgi?id=1822345#c20 shows your workers machines stuck in provisioning. Can you reproduce that against the latest nightly? If so can you share must gather logs?
Created attachment 1688131 [details] From 4.5.0-0.nightly-2020-05-13-130344 Night console overview 4.5.0-0.nightly-2020-05-13-130344
Now gathering must-gather logs.
Created attachment 1688138 [details] Part 1 of 4 must gather Part 1 of 4 must gather
Created attachment 1688139 [details] Part 2 of 4 must gather Part 2 of 4 must gather
Created attachment 1688141 [details] Part 3 of 4 must gather Part 3 of 4 must gather
Created attachment 1688144 [details] Part 4 of 4 must gather Part 4 of 4 must gather
To put must gather parts back together cat must-gather.tar.gz.parta* >must-gather.tar.gz.joined
It seems like the controller can't find template for creating a VM. Can you verify that "walt-45latest1-5vwzr-rhcos" exists?
I don't understand the comment .... obviously the VM's have been created successfully via the rhcos template .... the cluster is up .... we would not have gotten to this point where I can log onto the OCP console without that having happened. I can tell you the way I find out what rhcos template to use. I go to the release.txt file and pull the machine-os information. In this case that would of been https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.5.0-0.nightly-2020-05-13-130344/release.txt which says I need the 45.81.202005131029-0 version of the rhcos. Then I go here to pull the ova for that rhcos: https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.5/45.81.202005130448-0/x86_64/rhcos-45.81.202005130448-0-vmware.x86_64.ova Once I pull that ova I upload that into our vCenter using "Deploy ovf template" function within the vCenter to up-load it into our vCenter so it is referencable by the VMware Terraform Create function. As I said the VM's are getting created successfully, ignition files are being passed around and the VM's are up, using this rhcos and basically are creating the cluster. Now if there is a function within the install process that is somehow re-naming the rhcos template I used originally to create the VMs to walt-45latest1-5vwzr-rhcos, I don't know anything about that. I usually use this naming convention to name the rhcos ova's I upload to our vCenter, rhcos-45.81.202005130448-0-vmware.x86_64 for example in this case. So this does not match "walt-45latest1-5vwzr-rhcos" what you asking for. I would not know where to look for walt-45latest1-5vwzr-rhcos, so if you can provide very specific information on where I would look on the VM itself or in the vCenter I would need that to answer your question on whether the file exists? Do you want me to look on the VMs (master/workers) themselves, or look someplace on the vCenter? If so where?
I just noticed that the version of the rhcos referenced in the release.txt, got updated from the 45.81.202005130448 to the 45.81.20200513102909 between the time I pulled it, and looked at it again today. Not sure why your process is doing this and how anyone can keep up-to-date if this is happening. I doubt this has anything to do with why these errors are coming up since I've been seeing them for months with 4.5 nightly builds. Anyway, my questions still stand. Please tell me (with very specific information) where I should be looking for the walt-45latest1-5vwzr-rhcos file?
I disagree with this being moved to 4.6. The problem originally happened on 4.4, and was not fixed and then moved to 4.5. It should be fixed in 4.5.
>I disagree with this being moved to 4.6. The problem originally happened on 4.4, and was not fixed and then moved to 4.5. It should be fixed in 4.5. Hey krapohl the initial bug as reported is a dup of https://bugzilla.redhat.com/show_bug.cgi?id=1834966 we target any non release blocker bug against the current release under feature development i.e 4.6. Then we evaluate backward back ports to the version where it was first found. I moved this one too fast though, I'm moving back to 4.5 to reevaluate severity and renaming the bz according to https://bugzilla.redhat.com/show_bug.cgi?id=1822345#c21
Hey krapohl to get the logs from https://bugzilla.redhat.com/show_bug.cgi?id=1822345#c23 can you clarify which steps did you run? did you just run IPI installer? did you run UPI steps?
I don't understand your first question about logs .... all I did was a must-gather ... there were not logs .. put whatever must-gather puts in the folder it creates. For second question .... this is a VMware install,as is indicated in previous information, so it must be UPI.
>For second question .... this is a VMware install,as is indicated in previous information, so it must be UPI. Based on https://bugzilla.redhat.com/show_bug.cgi?id=1822345#c22 this is 4.5 so this might be as well IPI. So this is a 4.5 UPI vSphere install which resulted on a running cluster by following documented steps. During the UPI install steps the installer instantiated a machineSet object with a bad input for the template? Moving this to installer to prevent this object from being instantiated this object or ensures it uses the right input. Relates to https://bugzilla.redhat.com/show_bug.cgi?id=1834966.
*** This bug has been marked as a duplicate of bug 1834966 ***