Bug 1726197 - must-gather does not account for command usage in liveness probe
Summary: must-gather does not account for command usage in liveness probe
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.3.0
Assignee: Luis Sanchez
QA Contact: zhou ying
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-02 10:19 UTC by Fan Jia
Modified: 2019-11-07 09:49 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-07 09:49:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift must-gather pull 120 0 'None' closed Bug 1726197: must-gather does not account for command usage in liveness probe 2020-03-12 09:00:49 UTC

Description Fan Jia 2019-07-02 10:19:29 UTC
Description of problem: 

Can't gather the endpoint data (metrics, healthz, version...)about the marketplace operator by tool must-gather. 

Version-Release number of selected component (if applicable): 
cv:4.2.0-0.nightly-2019-06-30-221852

How reproducible: 
100% of the time. 


Steps to Reproduce:
1. Run must-gather collection tools.
./openshift-must-gather inspect clusteroperator/marketplace --kubeconfig=/home

Actual results: 
The logs about collection:
./openshift-must-gather inspect clusteroperator/marketplace --kubeconfig=/home/jfan/work/env3/kubeconfig

E0701 16:44:11.092956   31034 portforward.go:331] an error occurred forwarding 37587 -> 60000: error forwarding port 60000 to pod d84bba70e1924bef41c63e10e1b42e44c4125596754ea46659f0b38f73f9b9e3, uid : exit status 1: 2019/07/01 08:44:10 socat[38586] E connect(5, AF=2 127.0.0.1:60000, 16): Connection refused
E0701 16:44:12.777536   31034 portforward.go:331] an error occurred forwarding 37587 -> 60000: error forwarding port 60000 to pod d84bba70e1924bef41c63e10e1b42e44c4125596754ea46659f0b38f73f9b9e3, uid : exit status 1: 2019/07/01 08:44:12 socat[38693] E connect(5, AF=2 127.0.0.1:60000, 16): Connection refused
E0701 16:44:14.425401   31034 portforward.go:331] an error occurred forwarding 37587 -> 60000: error forwarding port 60000 to pod d84bba70e1924bef41c63e10e1b42e44c4125596754ea46659f0b38f73f9b9e3, uid : exit status 1: 2019/07/01 08:44:14 socat[38711] E connect(5, AF=2 127.0.0.1:60000, 16): Connection refused
2019/07/01 16:44:14         Gathering data for pod "redhat-operators-7886454bd6-rfvbh"
2019/07/01 16:44:14         Unable to gather previous container logs: previous terminated container "redhat-operators" in pod "redhat-operators-7886454bd6-rfvbh" not found
Error: one or more errors ocurred while gathering pod-specific data for namespace: openshift-marketplace

    [one or more errors ocurred while gathering container data for pod certified-operators-575db8f595-vwflb:

    [unable to gather container /healthz: Get https://localhost:37587/: tls: first record does not look like a TLS handshake, unable to gather container /version: Get https://localhost:37587/: tls: first record does not look like a TLS handshake, unable to gather container /metrics: Get https://localhost:37587/metrics: tls: first record does not look like a TLS handshake], one or more errors ocurred while gathering container data for pod community-operators-cf58468c6-mbtjc:

    [unable to gather container /healthz: Get https://localhost:37587/: tls: first record does not look like a TLS handshake, unable to gather container /version: Get https://localhost:37587/: tls: first record does not look like a TLS handshake, unable to gather container /metrics: Get https://localhost:37587/metrics: tls: first record does not look like a TLS handshake], one or more errors ocurred while gathering container data for pod marketplace-operator-774bc7f648-ktghr:

    [unable to gather container /healthz: Get https://localhost:37587/: EOF, unable to gather container /version: Get https://localhost:37587/: EOF, unable to gather container /metrics: Get https://localhost:37587/metrics: EOF], one or more errors ocurred while gathering container data for pod redhat-operators-7886454bd6-rfvbh:

    [unable to gather container /healthz: Get https://localhost:37587/: tls: first record does not look like a TLS handshake, unable to gather container /version: Get https://localhost:37587/: tls: first record does not look like a TLS handshake, unable to gather container /metrics: Get https://localhost:37587/metrics: tls: first record does not look like a TLS handshake]]

2) There is no metrics,healthz,version information in the pods dir.
├── pods
│   ├── certified-operators-575db8f595-vwflb
│   │   ├── certified-operators
│   │   │   └── certified-operators
│   │   │       ├── healthz
│   │   │       └── logs
│   │   │           ├── current.log
│   │   │           └── previous.log
│   │   └── certified-operators-575db8f595-vwflb.yaml

Expected results: 
The collection should no error 
"2019/07/02 15:52:22 Finished successfully with no errors."

2) there are metrics,healthz,version information in the pods dir like apiserver.
├── apiserver-9tt5f
│   ├── apiserver-9tt5f.yaml
│   ├── fix-audit-permissions
│   │   └── fix-audit-permissions
│   │       └── logs
│   │           ├── current.log
│   │           └── previous.log
│   └── openshift-apiserver
│       └── openshift-apiserver
│           ├── healthz
│           │   ├── index
│           │   ├── log
│           │   ├── ping
│           │   ├── poststarthook
│           │   │   ├── apiservice-openapi-controller
│           │   │   ├── authorization.openshift.io-bootstrapclusterroles
│           │   │   ├── authorization.openshift.io-ensureopenshift-infra
│           │   │   ├── clientCA-reload
│           │   │   ├── generic-apiserver-start-informers
│           │   │   ├── image.openshift.io-apiserver-caches
│           │   │   ├── openshift.io-restmapperupdater
│           │   │   ├── openshift.io-startinformers
│           │   │   ├── project.openshift.io-projectauthorizationcache
│           │   │   ├── project.openshift.io-projectcache
│           │   │   ├── quota.openshift.io-clusterquotamapping
│           │   │   ├── requestheader-reload
│           │   │   └── security.openshift.io-bootstrapscc
│           │   └── ready
│           ├── logs
│           │   ├── current.log
│           │   └── previous.log
│           ├── metrics.json
│           └── version.json


https://github.com/openshift/must-gather/blob/master/pkg/cmd/inspect/resource.go#L148-L164

Additional info:

Comment 2 Eric Rich 2019-07-05 12:40:35 UTC
How does this relate to https://bugzilla.redhat.com/show_bug.cgi?id=1717638

Comment 3 Aravindh Puthiyaparambil 2019-07-09 14:55:03 UTC
(In reply to Eric Rich from comment #2)
> How does this relate to https://bugzilla.redhat.com/show_bug.cgi?id=1717638

That bug is against OLM and not the marketplace operator

Comment 6 Luis Sanchez 2019-08-09 03:48:59 UTC
must-gather is not looking to capture the liveliness/rediness probes per-say, but the following endpoints that the marketplace operator and operand pods pods do not implement:

  /healthz -- see https://github.com/kubernetes/apiserver/tree/master/pkg/server/healthz
  /version -- see https://github.com/kubernetes/apiserver/blob/master/pkg/server/routes/version.go
  /metrics -- see https://github.com/kubernetes/apiserver/blob/master/pkg/server/routes/metrics.go

I've adjusted must-gather to better detect which pods do not support these endpoints.

Comment 7 Eric Rich 2019-08-09 16:15:39 UTC
(In reply to Luis Sanchez from comment #6)
> must-gather is not looking to capture the liveliness/rediness probes
> per-say, but the following endpoints that the marketplace operator and
> operand pods pods do not implement:
> 
>   /healthz -- see
> https://github.com/kubernetes/apiserver/tree/master/pkg/server/healthz
>   /version -- see
> https://github.com/kubernetes/apiserver/blob/master/pkg/server/routes/
> version.go
>   /metrics -- see
> https://github.com/kubernetes/apiserver/blob/master/pkg/server/routes/
> metrics.go
> 
> I've adjusted must-gather to better detect which pods do not support these
> endpoints.

Would it not be better to open up new bugs for the pods that don't support these to have them support these endpoints?

Comment 8 Maciej Szulik 2019-08-29 10:59:07 UTC
It doesn't look like it's going to make 4.2, moving to 4.3.


Note You need to log in before you can comment on or make changes to this bug.