Bug 1478841 - sosreport runs invalid kubernetes commands
sosreport runs invalid kubernetes commands
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos (Show other bugs)
7.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Pavel Moravec
BaseOS QE - Apps
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-07 05:36 EDT by Marko Myllynen
Modified: 2018-04-06 02:33 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
mimicking sos calling command (2.57 KB, text/plain)
2017-08-15 11:31 EDT, Pavel Moravec
no flags Details

  None (edit)
Description Marko Myllynen 2017-08-07 05:36:17 EDT
Description of problem:
On latest RHEL 7.4 with latest OCP 3.5 we can see following messages from sosreport -v:

[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--username resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--username services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=-v, events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=-v, limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=-v, pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=-v, pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=-v, rc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=-v, resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=-v, services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--version events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--version limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--version pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--version pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--version rc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--version resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--version services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--vmodule events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--vmodule limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--vmodule pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--vmodule pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--vmodule rc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--vmodule resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=--vmodule services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Error: events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Error: limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Error: pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Error: pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Error: rc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Error: resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Error: services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes rc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Usage: events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Usage: limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Usage: pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Usage: pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Usage: rc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Usage: resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Usage: services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes rc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes resourcequota'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Kubernetes services'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Available events'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Available limitrange'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Available pods'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Available pvc'
[plugin:kubernetes] added cmd output 'kubectl --config=/etc/origin/master/admin.kubeconfig get -o json  --namespace=Available rc'

There's obviously something wrong, no namespace "Usage:" or "Error:" are sane.

Version-Release number of selected component (if applicable):
sos-3.4-6.el7
Comment 2 Bryn M. Reeves 2017-08-07 09:20:56 EDT
Apparently the command used to obtain the list of namespaces is now returning an error and spitting a usage message.

The command is constructed on the fly, depending on the presence of a local configuration file but is either:

  kubectl get namespaces

Or:

  kubectl --config=/etc/origin/master/admin.kubeconfig get namespaces

Is this the expected syntax for current versions? The code to collect namespaces was added in June last year, and this is the first report of errors:

commit 047a207b8f3d90ac8e4223cec0b6de44c1a36611
Author: Jake Hunsaker <jhunsake@redhat.com>
Date:   Tue Jun 28 16:02:20 2016 -0400

    [kubernetes] new data, namespace support, and options

If the 'kubectl' command has changed syntax we will need a mechanism to either detect this or to check the package version.
Comment 3 Marko Myllynen 2017-08-07 09:36:12 EDT
(In reply to Bryn M. Reeves from comment #2)
> 
> The command is constructed on the fly, depending on the presence of a local
> configuration file but is either:
> 
>   kubectl get namespaces
> 
> Or:
> 
>   kubectl --config=/etc/origin/master/admin.kubeconfig get namespaces
> 
> Is this the expected syntax for current versions?

Both of these commands work fine on the same host from where the sosreport was generated. Thanks.
Comment 4 Bryn M. Reeves 2017-08-07 12:02:13 EDT
In that case, could you please attach either the complete tool output, with -vvv, or the log file embedded in the report at sos_logs/sos.log (also w/-vvv)?

Alternately the whole tarball is fine.
Comment 6 Bryn M. Reeves 2017-08-08 08:47:00 EDT
Something quite odd going on here. Unfortunately, we do not currently capture the whole command strings for commands that are run using Plugin.get_command_output() but it should always be either one or the other of the two forms shown in comment #2.

I'll push a fix upstream to add these command strings to the verbose log so that we have them captured in future.

Trying to work backwards from the emitted namespace values for the 'event' subcommand we get:

$ grep -- 'unpacked.*--namespace=.*events' sos.log | sed -e 's/.*--namespace=//' -e "s/',.*//" -e "s/ events//"
Usage
--allow-verification-with-non-compliant-keys
--alsologtostderr
--application-metrics-count-limit
--as
--azure-container-registry-config
--boot-id-file
--certificate-authority
--client-certificate
--client-key
--cluster
--container-hints
--context
--docker
--docker-env-metadata-whitelist
--docker-only
--docker-root
--enable-load-reader
--event-storage-age-limit
--event-storage-event-limit
--global-housekeeping-interval
--google-json-key
-h,
--housekeeping-interval
--insecure-skip-tls-verify
--ir-data-source
--ir-dbname
--ir-hawkular
--ir-influxdb-host
--ir-namespace-only
--ir-password
--ir-percentile
--ir-user
--kubeconfig
--log-backtrace-at
--log-cadvisor-usage
--log-dir
--log-flush-frequency
--logtostderr
--machine-id-file
--match-server-version
-n,
--password
--request-timeout
-s,
--stderrthreshold
--storage-driver-buffer-duration
--storage-driver-db
--storage-driver-host
--storage-driver-password
--storage-driver-secure
--storage-driver-table
--storage-driver-user
--token
--user
--username
-v,
--version
--vmodule
Error:
Kubernetes
Usage:
Kubernetes
Available
--allow-verification-with-non-compliant-keys
--alsologtostderr
--application-metrics-count-limit
--as
--azure-container-registry-config
--boot-id-file
--certificate-authority
--client-certificate
--client-key
--cluster
--container-hints
--context
--docker
--docker-env-metadata-whitelist
--docker-only
--docker-root
--enable-load-reader
--event-storage-age-limit
--event-storage-event-limit
--global-housekeeping-interval
--google-json-key
-h,
--housekeeping-interval
--insecure-skip-tls-verify
--ir-data-source
--ir-dbname
--ir-hawkular
--ir-influxdb-host
--ir-namespace-only
--ir-password
--ir-percentile
--ir-user
--kubeconfig
--log-backtrace-at
--log-cadvisor-usage
--log-dir
--log-flush-frequency
--logtostderr
--machine-id-file
--match-server-version
-n,
--password
--request-timeout
-s,
--stderrthreshold
--storage-driver-buffer-duration
--storage-driver-db
--storage-driver-host
--storage-driver-password
--storage-driver-secure
--storage-driver-table
--storage-driver-user
--token
--user
--username
-v,
--version
--vmodule
unknown

Which is certainly a kubernetes usage message.

Would the customer be willing to install a test version of sos to try to get more details?
Comment 11 Pavel Moravec 2017-08-15 11:31 EDT
Created attachment 1313762 [details]
mimicking sos calling command

Can't be the cause similar(*) to https://bugzilla.redhat.com/show_bug.cgi?id=1214209#c6 ?

(*) similar in the sense that python calls timeout calls some program, and that fails.


Could you please try:

1) run the attached script on the system with argument:

python sos_call_cmd.py "kubectl --config=/etc/origin/master/admin.kubeconfig get namespaces"

(it shall mimic what & how sosreport invokes the command - first it prints args of Popen and then stdout+return value from the command executed)


2) run:

python -c "python -c "import os; os.system('timeout 300s kubectl --config=/etc/origin/master/admin.kubeconfig get namespaces')"

that shall be one-line simplification of above (where you can try to identify what variant (without python? without timeout?) will succeed/fail)
Comment 12 Bryn M. Reeves 2017-08-15 11:49:19 EDT
It seems unlikely: in bug 1214209 the child process enters the STOPPED state (i.e. the result of SIGSTOP). It occurs because hpasmcli attempts to interact with the terminal - normally this is forbidden when using timeout(2).

An alternative fix for that bug (which I have not tested) would be to allow plugins to selectively enable --foreground mode for their called processes - this is not something that we want on by default, but I believe it would address the hpasmcli case and related problems.

In this case, the process runs to completion, reports usage information, and exits as expected without reaching the timeout. The problem appears to lie in the formation or parsing of the arguments: the kubernetes fails to parse the '--config' option:

  unknown flag: --config
  Usage of kubectl:

Interestingly, the --config option is not mentioned in the usage text that is reported.

Is it possible that this host has multiple versions of the 'kubernetes' command in different locations? The sos policy defines a fixed PATH setting per distribution: if the correct kubernetes command is not installed in the locations in this list then sos will not be able to call it correctly.

Just to rule this out, can we get the output of "which kubernetes", "kubernetes --help" and "echo $PATH" in the environment where "kubernetes get namespaces" works correctly?
Comment 13 Marko Myllynen 2017-08-15 14:24:37 EDT
(In reply to Bryn M. Reeves from comment #12)
> 
> Is it possible that this host has multiple versions of the 'kubernetes'
> command in different locations? The sos policy defines a fixed PATH setting
> per distribution: if the correct kubernetes command is not installed in the
> locations in this list then sos will not be able to call it correctly.

That's it!

[root@master01 ~]# updatedb
[root@master01 ~]# locate bin/kubectl
/usr/bin/kubectl
/usr/local/bin/kubectl
[root@master01 ~]# which kubectl
/usr/local/bin/kubectl
[root@master01 ~]# rpm -qf /usr/bin/kubectl /usr/local/bin/kubectl
kubernetes-client-1.5.2-0.7.git269f928.el7.x86_64
file /usr/local/bin/kubectl is not owned by any package
[root@master01 ~]# /usr/local/bin/kubectl --config=/etc/origin/master/admin.kubeconfig get namespaces > /dev/null 2>&1 ; echo $?
0
[root@master01 ~]# /usr/bin/kubectl --config=/etc/origin/master/admin.kubeconfig get namespaces > /dev/null 2>&1 ; echo $?
1
[root@master01 ~]# rpm -e --test kubernetes-client
error: Failed dependencies:
        /usr/bin/kubectl is needed by (installed) cockpit-kubernetes-147-1.el7.x86_64

This is a containerized installation so /usr/local/bin/kubectl is extracted from OSE images. kubernetes-client comes from RHEL Extras channel.

Thanks.
Comment 14 Pavel Moravec 2017-09-01 10:54:16 EDT
I am not sure what should be proper resolution here:

- adding /usr/local/bin to sosreport's PATH ?
- calling /usr/local/bin/kubectl instead of kubectl from sosreport?
- something else?
Comment 15 Pavel Moravec 2018-03-03 11:55:10 EST
deferring to 7.6
Comment 16 Pavel Moravec 2018-04-02 09:38:15 EDT
(In reply to Pavel Moravec from comment #14)
> I am not sure what should be proper resolution here:
> 
> - adding /usr/local/bin to sosreport's PATH ?
> - calling /usr/local/bin/kubectl instead of kubectl from sosreport?
> - something else?

Marko,
am I right that kubectl should be called from:
- /usr/bin/kubectl on non-containerized installation
- /usr/local/bin/kubectl on containerized installation

Is the above correct every time? Such that sosreport should use proper path depending on if run in container?
Comment 17 Marko Myllynen 2018-04-04 06:21:36 EDT
(In reply to Pavel Moravec from comment #16)
> (In reply to Pavel Moravec from comment #14)
> > I am not sure what should be proper resolution here:
> > 
> > - adding /usr/local/bin to sosreport's PATH ?
> > - calling /usr/local/bin/kubectl instead of kubectl from sosreport?
> > - something else?
> 
> am I right that kubectl should be called from:
> - /usr/bin/kubectl on non-containerized installation
> - /usr/local/bin/kubectl on containerized installation
> 
> Is the above correct every time? Such that sosreport should use proper path
> depending on if run in container?

For the time being it would probably best to use /usr/local/bin/kubectl (if available), otherwise /usr/bin/kubectl.

However, OCP 3.10 will introduce some fundamental changes on this front, I'm CC'ing Eric Rich who probably could share more insight of those upcoming changes and their possible impact for sosreport.

Eric, when time permits, would you please your share thoughts here?

Thanks.
Comment 18 Eric Rich 2018-04-04 07:39:19 EDT
This seems reasonable however we should ask jhunsake@redhat.com for his thoughts as he wrote the origin plug-in.
Comment 19 Jake Hunsaker 2018-04-04 09:28:04 EDT
It's easy enough to do that, and I don't see any issues with it. 

However, my question is the usefulness of the kube plugin for OCP at this point. Since we have an origin plugin that, at a glance, seems to capture all this from oc instead of kubectl, do we instead want to just remove the OCP stuff from the kube plugin? The last time I spoke to the sbr-shift guys, they all said they never look at the stuff from the kubernetes plugin.
Comment 20 Eric Rich 2018-04-04 09:37:48 EDT
(In reply to Jake Hunsaker from comment #19)
> It's easy enough to do that, and I don't see any issues with it. 
> 
> However, my question is the usefulness of the kube plugin for OCP at this
> point. Since we have an origin plugin that, at a glance, seems to capture
> all this from oc instead of kubectl, do we instead want to just remove the
> OCP stuff from the kube plugin? The last time I spoke to the sbr-shift guys,
> they all said they never look at the stuff from the kubernetes plugin.

I was under the impression that we explicitly did not collect stuff in the origin plugin because it was collected in the kube plugin, so ... if that statement is true, the sbr-shift people either don't need the data for the issues they are working, or simply don't know that its available to help them. 

@pep can you help confirm that the kube plugin does things the OCP plugin does not?
Comment 21 Josep 'Pep' Turro Mauri 2018-04-04 11:32:26 EDT
(In reply to Eric Rich from comment #20)
> I was under the impression that we explicitly did not collect stuff in the
> origin plugin because it was collected in the kube plugin, so ... if that
> statement is true, the sbr-shift people either don't need the data for the
> issues they are working, or simply don't know that its available to help
> them. 
> 
> @pep can you help confirm that the kube plugin does things the OCP plugin
> does not?

That's right: the 'origin' (OCP) plugin tries to avoid duplicating work and therefore does not collect things that are already collected by the kubernetes plugin, and instead it focuses on openshift-specific things.

(In reply to Jake Hunsaker from comment #19)
> However, my question is the usefulness of the kube plugin for OCP at this
> point. Since we have an origin plugin that, at a glance, seems to capture
> all this from oc instead of kubectl, do we instead want to just remove the
> OCP stuff from the kube plugin? The last time I spoke to the sbr-shift guys,
> they all said they never look at the stuff from the kubernetes plugin.

I'm not sure what would be "the OCP stuff from the kube plugin": there is no OCP specific information in the kube plugin, right? If there is, then it should be indeed moved to the origin plugin.

As OCP is built on kube, IMHO there's plenty information collected by the kube plugin that is relevant for OCP troubleshooting: details about persistent volumes, nodes, events... I'm surprised to hear that this stuff is not looked at.
Comment 22 Josep 'Pep' Turro Mauri 2018-04-04 12:00:47 EDT
About the bug itself: I believe that the fix would be to use --kubeconfig instead of --config

The additional path check might not hurt, but current versions of kubectl do not understand --config so this seems to be bound to break.
Comment 23 Jake Hunsaker 2018-04-04 22:20:10 EDT
(In reply to Josep 'Pep' Turro Mauri from comment #21)

> I'm not sure what would be "the OCP stuff from the kube plugin": there is no
> OCP specific information in the kube plugin, right? If there is, then it
> should be indeed moved to the origin plugin.
> 

I meant the logic to enable the kubernetes plugin on OCP deployments, I.E. only run the origin plugin. But, see below.

> As OCP is built on kube, IMHO there's plenty information collected by the
> kube plugin that is relevant for OCP troubleshooting: details about
> persistent volumes, nodes, events... I'm surprised to hear that this stuff
> is not looked at.

I was surprised too, but if there's information overlap that's an opportunity with the SBR, not something we should change with the plugin necessarily.


> but current versions of kubectl do not understand --config

That'll do it. This part of the plugin was written ~2 years ago, so I wouldn't be surprised if syntax changed.


Sounds like we have two adjustments to make:

1) binary location
2) --config to --kubeconfig
Comment 25 Pavel Moravec 2018-04-05 04:27:59 EDT
OK, so it was clarified we shall replace --config by --kubeconfig as the --config is deprecated.

About the binary location: how to identify when use which one, please?

(not sure who shall I raise needinfo / who can answer it best)
Comment 26 Marko Myllynen 2018-04-05 06:15:56 EDT
Sorry, but I don't have any additional insights about the location. Thanks.
Comment 27 Pavel Moravec 2018-04-06 02:33:40 EDT
The best feedback I got about location of kubectl is:

- test if /usr/local/bin/kubectl exists and is executable
- if so, use it
- if not, fallback to /usr/bin/kubectl

If there won't be any objections to this, I will propose an upstream PR in very few weeks.

Note You need to log in before you can comment on or make changes to this bug.