Bug 1394527

Summary: sosreport OpenShift plugin should collect "oc status --all-namespaces"
Product: Red Hat Enterprise Linux 7 Reporter: Kenjiro Nakayama <knakayam>
Component: sosAssignee: Josep 'Pep' Turro Mauri <pep>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.2CC: agk, bmr, gavin, knakayam, mhradile, pep, plambri, pmoravec, sbradley
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-19 07:30:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Kenjiro Nakayama 2016-11-13 02:13:04 UTC
Description of problem:
===
- sosreport OpenShift plugin should collect "oc status --all-namespaces"


Version-Release number of selected component (if applicable):
===
- sos-3.3-4.el7.noarch


How reproducible:
===
  Steps to Reproduce:
  1. Run sosreport


Actual results:
===
- sosreport doesn't collect the result of "oc status".


Expected results:
===
- sosreport should collect "oc status --all-namespaces' or "oc status -v --all-namespaces".


Additional info:
===
- Since "oc status -v --all-namespaces" can overview the system, it would be great the information which sosreport includes.

Comment 2 Pavel Moravec 2016-11-13 21:27:02 UTC
Doesn't parameter --all-namespaces require some prior authentication? (cf. [1])?

Shall the command be run on brokers/nodes/both? (i.e. where to add in in [2] plugin?)

Can't output contain sensitive data worth to be obfuscated?

Can't (esp. -v or --all-namespace) command execution take too long (so it is worth to trigger calling it only via some plugin option)?


[1] https://www.mankier.com/1/oc-status
[2] https://github.com/sosreport/sos/blob/master/sos/plugins/openshift.py

Comment 3 Kenjiro Nakayama 2016-11-14 00:08:22 UTC
> Doesn't parameter --all-namespaces require some prior authentication? (cf. [1])?

Yes, it is necessary. But as other commands do, '--config=--config=/etc/origin/master/admin.kubeconfig' should work like "oc --config=/etc/origin/master/admin.kubeconfig status".

> Shall the command be run on brokers/nodes/both? (i.e. where to add in in [2] plugin?)

It is OK to run on OpenShift Master only.

> Can't output contain sensitive data worth to be obfuscated?

Not at all. 'oc status' is kind of health check command.

> Can't (esp. -v or --all-namespace) command execution take too long (so it is worth to trigger calling it only via some plugin option)?

No. It doesn't take long time.

Comment 4 Bryn M. Reeves 2016-11-14 09:17:05 UTC
Adding Pep to the CC as he's the main maintainer of the OpenShift plugin upstream.

> But as other commands do, '--config=--config=/etc/origin/master
> /admin.kubeconfig' should work like "oc --config=/etc/origin/master
> /admin.kubeconfig status".

We don't run any commands in this way in the current plugin.

Comment 5 Kenjiro Nakayama 2016-11-14 09:54:50 UTC
> We don't run any commands in this way in the current plugin.

OK... I thought current plugin runs like "oc --config=/etc/origin/master/admin.kubeconfig" after checking plugin code[1] and output of sos_commands[2].

I'm sorry for bothering you, but could you please tell me how current plugin run the command?

[1] https://github.com/sosreport/sos/blob/dc37f9f720863cc981447e491fb470260e6636e0/sos/plugins/origin.py#L53-L55
~~~
    master_base_dir = "/etc/origin/master"
    ... <snip> ...
    admin_cfg = os.path.join(master_base_dir, "admin.kubeconfig")
    oc_cmd = "oc --config=%s" % admin_cfg
    oadm_cmd = "oadm --config=%s" % admin_cfg
~~~

[2] sosreport-knakayam-ose33-smaster-20161113014808/sos_commands/origin/
~~~
# ls | grep oc
oc_--config_.etc.origin.master.admin.kubeconfig_adm_diagnostics_-l_0_--config_.etc.origin.master.admin.kubeconfig
oc_--config_.etc.origin.master.admin.kubeconfig_config_view_--config_.etc.origin.node.system_node_knakayam-ose33-smaster.kubeconfig
oc_--config_.etc.origin.master.admin.kubeconfig_describe_projects
oc_--config_.etc.origin.master.admin.kubeconfig_get_-o_json_clusternetwork
oc_--config_.etc.origin.master.admin.kubeconfig_get_-o_json_dc_-n_default
oc_--config_.etc.origin.master.admin.kubeconfig_get_-o_json_hostsubnet
oc_--config_.etc.origin.master.admin.kubeconfig_get_-o_json_netnamespaces
oc_--config_.etc.origin.master.admin.kubeconfig_logs_-n_default_pod.docker-registry-1-jm8d0
oc_--config_.etc.origin.master.admin.kubeconfig_logs_-n_default_pod.router-1-7im69
~~~

Comment 6 Josep 'Pep' Turro Mauri 2016-11-14 10:16:38 UTC
Yes, the plugin uses admin.kubeconfig for authentication:

https://github.com/sosreport/sos/blob/3.3/sos/plugins/origin.py#L53-L55

About the request itself (to collect status from all namespaces), this was already considered as a potential thing to collect:

https://github.com/sosreport/sos/blob/3.3/sos/plugins/origin.py#L106-L113

I left that out because I'm not sure how useful it is in general. Personally, so far I have not needed it... and in environments with projects in the order of hundreds and pods/svcs in the order of thousands, it might potentially be a non-trivial amount of [potentially] superfluous information to collect.

Nakayama-san, if you feel that this is important to collect in general can you please explain the need a bit more? I'm still undecided, but I'll be happy to be convinced for or against it.

Comment 7 Bryn M. Reeves 2016-11-14 11:05:12 UTC
> https://github.com/sosreport/sos/blob/3.3/sos/plugins/origin.py#L53-L55

Right; I forget we build those up at the top of the class.

I'll leave any decision on this for upstream with you, as you're best placed to understand what is and isn't needed.

Comment 8 Kenjiro Nakayama 2016-11-14 12:43:01 UTC
(In reply to Josep 'Pep' Turro Mauri from comment #6)

I _personally_ think that "oc status" has a diagnostics function like oadm diagnostics and they are separated like:

  "oadm diagnostics" ... diagnostics of infrastructure
  "oc status"        ... diagnostics of deployments

That may be just my understanding. However, especially after 3.3 release, "oc status" has implemented rich analysis and friendly output which are listed in https://github.com/openshift/origin/blob/v1.3.1/pkg/cmd/cli/describe/projectstatus.go#L370-L387

For example, if we ran "oc new-app nginx" and caused "CrashLoopBackOff" due to Permission denied, "oc stauts -v" tells us the diagnostics like below[1].

Yes, I agree and admit that it might contain superfluous output, but it would be helpful to cut the need of some transactions.

[1]
~~~
# oc status -v
  ... (snip) ...
Errors:
  * pod/nginx-1-n3tgp is crash-looping

    The container is starting and exiting repeatedly. This usually means the container is unable
    to start, misconfigured, or limited by security restrictions. Check the container logs with

      oc logs nginx-1-n3tgp -c nginx

    Current security policy prevents your containers from being run as the root user. Some images
    may fail expecting to be able to change ownership or permissions on directories. Your admin
    can grant you access to run containers that need to run as the root user with this command:

      oadm policy add-scc-to-user anyuid -n foo -z default
~~~

Comment 9 Pavel Moravec 2016-12-14 09:31:02 UTC
Hi Pep,
I understood the initiative is on you now - please let me know if I can/shall help.

Is there any progress here / plans to have PR "soon enough" to make it into RHEL7.4? I am asking just for the sake of 7.4 planning purposes.

Comment 10 Kenjiro Nakayama 2016-12-19 04:57:10 UTC
Actually, this request has not been demanded by majority of support members... So, I think that we can close this ticket without implementation once, then if we got same request or more specific usecase, we can re-open or open new request.

Comment 11 Pavel Moravec 2016-12-19 07:30:14 UTC
OK, no problem. BZ can be reopened when needed.