Description of problem: On master, run "oadm diagnostics", get an error: <--snip--> ERROR: [DClu3002 from diagnostic MasterNode@openshift/origin/pkg/diagnostics/cluster/master_node.go:99] Client error while retrieving node records. Client retrieved records during discovery, so this is likely to be a transient error. Try running diagnostics again. If this message persists, there may be a permissions problem with getting node records. The error was: (*errors.StatusError) found '<', expected: !, identifier, or 'end of string' <--snip--> ose-3.1 does not have such issue. Version-Release number of selected component (if applicable): atomic-openshift-3.2.0.6-1.git.0.19d1bde.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I've seen this too but not had a chance to look into it. Devan, do you have any idea what's going on here?
This is reproducing in latest origin: [Note] Running diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch) ERROR: [DClu3002 from diagnostic MasterNode@openshift/origin/pkg/diagnostics/cluster/master_node.go:99] Client error while retrieving node records. Client retrieved records during discovery, so this is likely to be a transient error. Try running diagnostics again. If this message persists, there may be a permissions problem with getting node records. The error was: (*errors.StatusError) found '<', expected: !, identifier, or 'end of string'
I dont see this error if run with master config file: oadm diagnostics --master-config=./openshift.local.config/master/master-config.yaml And it shows: [Note] Skipping diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch) Because: Network plugin does not require master to also run node: I am testing it in a one master and one node setup.
It seems to me that this error happens if master is running just as master not as master-node combination for openvswitch. As by default (without passing master-config), this oadm diagnostic seems to be assuming that master is running as master-node which is not the case and so the error.
As this bug does not have enough information about setup (openshift cluster) and steps, I am just assuming that the cause of the error is same what I am noticing.
From the comment in CanRun() in pkg/diagnostics/cluster/master_node.go, // If there is a master config file available, we'll perform an additional // check to see if an OVS network plugin is in use. If no master config, // we assume this is the case for now and let the check run anyhow. It seems pretty obvious pretty, this diagnostic is making assumption that master is always running as node too if master config file is not provided, which should be true in real time deployments. However, as in my dev setup, I am just running master as master and not providing any master-config by default to "oadm diagnostic" is causing this error which seems harmful, although it would be better for "oadm disgnostic" to find this by itself, but does not seem like blocker at the moment to me unless Johnny Liu (the reporter) confirms otherwise.
I meanted the error seems harmless not harmful.
Somehow I think that the following line: nodes, err := d.KubeClient.Nodes().List(kapi.ListOptions{LabelSelector: labels.Nothing()}) in Check() in pkg/diagnostics/cluster/master_node.go seems strange, as why it is trying to find "nodes without any label selector", or does it mean "nodes with any label selector"? If the latter, shouldn't it be: nodes, err := d.KubeClient.Nodes().List(kapi.ListOptions{})
I had a discussion with dgoodwin on IRC, and send a PR to fix an issue where oadm diagnostic does not seem to find any nodes on the same machine as master: https://github.com/openshift/origin/pull/8249 However, here are more thoughts based on different cases for oadm diagnostic (specifically master-node for openvswitch SDN): 1. oadm diagnostic is run with --master-config behavior: diagnostic should figure out about ovs SDN plugin existence 1a) if ovs exists and a node exists,the diagnostic should pass, otherwise (if it fails) something is wrong. 1b) if ovs exists and a node does not exist, and the diagnostic fails 2. oadm diagnostic is run without --master-config behavior: diagnostic can not figure out about ovs SDN plugin existence, but continues with the check anyway: 2a) a node exists, it passes, otherwise (if it fails) something is wrong. 2b) a node does not exist, it fails. Note: currently it seems that the diagnostic can not differentiate if a node just exists (unschedulable) for openvswitch SDN or a real node exists on the same machine (irrespective of schedulable and unschedulable) (may be it does not matter but just pointing out). Also, the diagnostic do not seem to take into account the node's status (Ready, NotReady), (again not sure if it matters). Perhaps more discussion is needed to figure out what is expected out of this diagnostic to make it more useful.
Johnny, thanks for information, I sent this PR to origin andshould fix this issue: https://github.com/openshift/origin/pull/8249
(In reply to Avesh Agarwal from comment #8) > nodes, err := d.KubeClient.Nodes().List(kapi.ListOptions{LabelSelector: > labels.Nothing()}) > > in Check() in pkg/diagnostics/cluster/master_node.go seems strange, as why > it is trying to find "nodes without any label selector", or does it mean > "nodes with any label selector"? It means nodes that match an empty label selector. Which is all nodes of course. There is no way to match only nodes that *don't* have any labels. I agree it's a bit confusing... > If the latter, shouldn't it be: > nodes, err := d.KubeClient.Nodes().List(kapi.ListOptions{}) In the past, yes, but at some point that apparently became a malformed request, i.e. the LabelSelector element became mandatory, even if the selector itself is empty. Thanks for the PR, it seems to fix the issue.
Fix merged in Origin.
Should be in atomic-openshift-3.2.0.9-1.git.0.b99af7d.el7, which is now built and ready for testing.
Verified this bug with atomic-openshift-3.2.0.9-1.git.0.b99af7d.el7.x86_64, and PASS. <--output--> [Note] Running diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch) Info: Found a node with same IP as master: 10.66.78.46 <--output-->
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064