Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1587889

Summary: The master-api restart continuously during logging redeploy testing
Product: OpenShift Container Platform Reporter: Anping Li <anli>
Component: MasterAssignee: Jordan Liggitt <jliggitt>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Wang Haoran <haowang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.10.0CC: anli, aos-bugs, jokerman, mfojtik, mmccomas
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-19 07:35:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Attached the master-api logs none

Description Anping Li 2018-06-06 09:11:05 UTC
Created attachment 1448192 [details]
Attached the master-api logs

Description of problem:
The master-api restart continuously during logging redeploy testing. Once stop the testing, the master-api come back.


Version-Release number of selected component (if applicable):
v3.10.0-0.60.0

How reproducible:
this time 

Steps to Reproduce:
1) Deploy OCP v3.10.0 on Atomic hosts
2) Deploy logging v3.9
3) Redeploy logging playbooks to upgrade EFK to v3.10. and run step 4) and 5) simultaneously

4) loop to create generator applications

#!/bin/bash
IMAGE=docker.io/mffiedler/ocp-logtest:latest

curl -O https://raw.githubusercontent.com/openshift/svt/master/openshift_scalability/content/logtest/logtest-syslog-rc.json
for i in `seq 1 100`; do
PROJECT="prj$i"
oc new-project prj$i
oc adm policy add-scc-to-user privileged -z default
oc new-app logtest-syslog-rc.json -n $PROJECT
sleep 10
done

Waiting for a while; when there are 30 projects, delete projects
for i in `seq 1 30`; do
PROJECT="prj$i"
oc delete project $PROJECT

done

5) loop to check the Elasticsearch status

while true; do
oc exec -c elasticsearch $es_pod -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cluster/health?pretty
oc exec -c elasticsearch $es_pod -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cat/shards
oc exec -c elasticsearch $es_pod -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/_cluster/setting?pretty

sleep 1
done

Actual results:
The master-api was restarted continuously during the testing.

 I0606 08:20:39.489556       1 handler.go:149] kube-apiserver: GET "/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system:controller:namespace-controller" satisfied by gorestful with webservice /apis/rbac.authorization.k8s.io/v1
 I0606 08:20:39.491762       1 wrap.go:42] GET /healthz: (202.482596ms) 500
 goroutine 7252 [running]:
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc427f4f180, 0x1f4)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:207 +0xdd
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc427f4f180, 0x1f4)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:186 +0x35
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*baseTimeoutWriter).WriteHeader(0xc4330c4e00, 0x1f4)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:188 +0xac
 net/http.Error(0x7f6d173e0d58, 0xc437741920, 0xc420dcd200, 0x46c, 0x1f4)
        /usr/lib/golang/src/net/http/server.go:1930 +0xda
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/healthz.handleRootHealthz.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/healthz/healthz.go:130 +0x508
 net/http.HandlerFunc.ServeHTTP(0xc433052380, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /usr/lib/golang/src/net/http/server.go:1918 +0x44
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc436e7ea40, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:241 +0x55a
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc428786e00, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:234 +0xa1
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x5a4d0f6, 0xf, 0xc4260747e0, 0xc428786e00, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:160 +0x6ad
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*director).ServeHTTP(0xc42c31ae60, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        <autogenerated>:1 +0x75
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:52 +0x37d
 net/http.HandlerFunc.ServeHTTP(0xc42791a2d0, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /usr/lib/golang/src/net/http/server.go:1918 +0x44
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:165 +0x42a
 net/http.HandlerFunc.ServeHTTP(0xc430652540, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /usr/lib/golang/src/net/http/server.go:1918 +0x44
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go:49 +0x203a
 net/http.HandlerFunc.ServeHTTP(0xc42791a320, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /usr/lib/golang/src/net/http/server.go:1918 +0x44
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthentication.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authentication.go:78 +0x2b1
 net/http.HandlerFunc.ServeHTTP(0xc42791a370, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /usr/lib/golang/src/net/http/server.go:1918 +0x44
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request.WithRequestContext.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request/requestcontext.go:110 +0xcb
 net/http.HandlerFunc.ServeHTTP(0xc42c31af40, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /usr/lib/golang/src/net/http/server.go:1918 +0x44
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithCORS.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/cors.go:75 +0x17b
 net/http.HandlerFunc.ServeHTTP(0xc43051cc60, 0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
        /usr/lib/golang/src/net/http/server.go:1918 +0x44
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc42c31b420, 0xaca4fe0, 0xc437741920, 0xc422814b00, 0xc432ef85a0)
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:93 +0x8d
 created by github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
        /builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:92 +0x1ab

 logging error output: "[+]ping ok\n[+]etcd ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[+]poststarthook/project.openshift.io-projectcache ok\n[+]poststarthook/project.openshift.io-projectauthorizationcache ok\n[+]poststarthook/security.openshift.io-bootstrapscc ok\n[+]poststarthook/bootstrap-controller ok\n[+]poststarthook/ca-registration ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/apiservice-openapi-controller ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[+]autoregister-completion ok\n[-]poststarthook/authorization.openshift.io-bootstrapclusterroles failed: reason withheld\n[+]poststarthook/authorization.openshift.io-ensureopenshift-infra ok\n[+]poststarthook/quota.openshift.io-clusterquotamapping ok\n[+]poststarthook/openshift.io-AdmissionInit ok\n[+]poststarthook/openshift.io-StartInformers ok\n[+]poststarthook/oauth.openshift.io-StartOAuthClientsBootstrapping ok\nhealthz check failed\n"
  [[kube-probe/1.10+] 10.240.0.23:42094]
 I0606 08:20:39.494269       1 wrap.go:42] GET /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system:controller:namespace-controller: (5.17134ms) 200 [[openshift/v1.10.0+b81c8f8 (linux/amd64) kubernetes/b81c8f8] 127.0.0.1:60950]
 I0606 08:20:39.495633       1 handler.go:159] kube-aggregator: GET "/api/v1/namespaces/openshift-web-console/configmaps/webconsole-config" satisfied by nonGoRestful

                                                                                                      

Expected results:


Additional info:

Comment 1 Michal Fojtik 2018-06-06 12:15:38 UTC
From the logs it seems like the healthz panic is not what is causing the restarts as it eventually succeed:

 I0606 08:20:39.491762       1 wrap.go:42] GET /healthz: (202.482596ms) 500
 github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/healthz.handleRootHealthz.func1(0x7f6d173e0d58, 0xc437741920, 0xc422814b00)
 	/builddir/build/BUILD/atomic-openshift-git-0.0e11ff5/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/healthz/healthz.go:130 +0x508
 logging error output: "[+]ping ok\n[+]etcd ok\n[+]poststarthook/generic-apiserver-start-informers ok\n[+]poststarthook/start-apiextensions-informers ok\n[+]poststarthook/start-apiextensions-controllers ok\n[+]poststarthook/project.openshift.io-projectcache ok\n[+]poststarthook/project.openshift.io-projectauthorizationcache ok\n[+]poststarthook/security.openshift.io-bootstrapscc ok\n[+]poststarthook/bootstrap-controller ok\n[+]poststarthook/ca-registration ok\n[+]poststarthook/start-kube-aggregator-informers ok\n[+]poststarthook/apiservice-registration-controller ok\n[+]poststarthook/apiservice-status-available-controller ok\n[+]poststarthook/apiservice-openapi-controller ok\n[+]poststarthook/kube-apiserver-autoregistration ok\n[+]autoregister-completion ok\n[-]poststarthook/authorization.openshift.io-bootstrapclusterroles failed: reason withheld\n[+]poststarthook/authorization.openshift.io-ensureopenshift-infra ok\n[+]poststarthook/quota.openshift.io-clusterquotamapping ok\n[+]poststarthook/openshift.io-AdmissionInit ok\n[+]poststarthook/openshift.io-StartInformers ok\n[+]poststarthook/oauth.openshift.io-StartOAuthClientsBootstrapping ok\nhealthz check failed\n"
 I0606 08:20:45.627769       1 handler.go:159] kube-aggregator: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:20:45.637807       1 pathrecorder.go:253] kube-aggregator: "/healthz/ready" satisfied by NotFoundHandler
 I0606 08:20:45.638027       1 handler.go:159] kube-apiserver: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:20:45.638196       1 pathrecorder.go:253] kube-apiserver: "/healthz/ready" satisfied by NotFoundHandler
 I0606 08:20:45.638320       1 handler.go:159] openshift-apiserver: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:20:45.638427       1 pathrecorder.go:240] openshift-apiserver: "/healthz/ready" satisfied by exact match
 I0606 08:20:45.638617       1 wrap.go:42] GET /healthz/ready: (11.083987ms) 200 [[kube-probe/1.10+] 10.240.0.23:41800]
 I0606 08:20:55.586263       1 handler.go:159] kube-aggregator: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:20:55.586580       1 pathrecorder.go:253] kube-aggregator: "/healthz/ready" satisfied by NotFoundHandler
 I0606 08:20:55.586703       1 handler.go:159] kube-apiserver: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:20:55.586844       1 pathrecorder.go:253] kube-apiserver: "/healthz/ready" satisfied by NotFoundHandler
 I0606 08:20:55.586963       1 handler.go:159] openshift-apiserver: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:20:55.587082       1 pathrecorder.go:240] openshift-apiserver: "/healthz/ready" satisfied by exact match
 I0606 08:20:55.587225       1 wrap.go:42] GET /healthz/ready: (1.214201ms) 200 [[kube-probe/1.10+] 10.240.0.23:42206]
 I0606 08:21:05.583527       1 handler.go:159] kube-aggregator: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:21:05.583691       1 pathrecorder.go:253] kube-aggregator: "/healthz/ready" satisfied by NotFoundHandler
 I0606 08:21:05.583822       1 handler.go:159] kube-apiserver: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:21:05.583936       1 pathrecorder.go:253] kube-apiserver: "/healthz/ready" satisfied by NotFoundHandler
 I0606 08:21:05.584048       1 handler.go:159] openshift-apiserver: GET "/healthz/ready" satisfied by nonGoRestful
 I0606 08:21:05.584169       1 pathrecorder.go:240] openshift-apiserver: "/healthz/ready" satisfied by exact match
 I0606 08:21:05.584337       1 wrap.go:42] GET /healthz/ready: (1.369246ms) 200 [[kube-probe/1.10+] 10.240.0.23:42268]


Can you get logs from the api server container that died? (docker logs).

Comment 2 Jordan Liggitt 2018-06-06 13:19:39 UTC
healthz is not panicking, the server just logs a stack trace on any 5xx error

what does `oc get --raw /poststarthook/authorization.openshift.io-bootstrapclusterroles` return?

Comment 3 Anping Li 2018-06-07 01:34:53 UTC
The ENV have been deleted. The log is attached.

Comment 7 Jordan Liggitt 2018-06-19 07:35:42 UTC
Closing since it cannot be reproduced and we don't have any other info to go on.