1744560 – RHHI cluster lost API connectivity for some time after graceful shut down of master node

Bug 1744560 - RHHI cluster lost API connectivity for some time after graceful shut down of master node

Summary: RHHI cluster lost API connectivity for some time after graceful shut down of ...

Keywords:
Status:	CLOSED DUPLICATE of bug 1751978
Alias:	None
Product:	Kubernetes-native Infrastructure
Classification:	Red Hat
Component:	Management
Sub Component:
Version:	1.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ohad Levy
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1741265
TreeView+	depends on / blocked

Reported:	2019-08-22 12:32 UTC by Artem Hrechanychenko
Modified:	2020-04-06 13:15 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-12-13 11:18:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift baremetal-runtimecfg pull 18	0	'None'	closed	Do not consider unresolvable backends	2021-01-14 17:46:22 UTC

Description Artem Hrechanychenko 2019-08-22 12:32:09 UTC

Description of problem:

test bare metal host shut down operation,
it consist of two steps:
1) Start Node maintenance from UI/CLI, waiting until Node maintenance will reach "Succeeded" phae
2) Gracefully Shut down from UI

After some time after shutting down node - cluster isn't operable from UI or OC console 

[cloud-user@rhhi-node-worker-0 dev-scripts]$ oc status
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get routes.route.openshift.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get imagestreams.image.openshift.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get buildconfigs.build.openshift.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get deploymentconfigs.apps.openshift.io)
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get builds.build.openshift.io)

While I'm able to login to another master

oc status
In project default on server https://api.rhhi-ahrechan-tlv.qe.lab.redhat.com:6443

svc/openshift - kubernetes.default.svc.cluster.local
svc/kubernetes - 172.30.0.1:443 -> 6443

View details with 'oc describe <resource>/<name>' or list everything with 'oc get all'.


after some time API connectivity returns

[cloud-user@rhhi-node-worker-0 dev-scripts]$ oc status
In project default on server https://api.rhhi-ahrechan-tlv.qe.lab.redhat.com:6443

svc/openshift - kubernetes.default.svc.cluster.local
svc/kubernetes - 172.30.0.1:443 -> 6443



Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-22-043819

How reproducible:
Always

Steps to Reproduce:
1.Deploy RHHI
2.Start maintenance for bare metal host
3.Shut down bare metal host
4. Check API connectivity

Actual results:
after shut down - Error from server (ServiceUnavailable): the server is currently unable to handle the request 

Expected results:
Cluster is alive all time when shut down 1 of 3 master

Additional info:

Comment 1 Doug Hellmann 2019-08-23 23:00:57 UTC

This seems like an OpenShift issue, rather than a RHHI issue. Does OpenShift support removing one of the masters from a 3-node cluster?

Comment 2 Steven Hardy 2019-09-23 13:24:51 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1751978 looks related, possibly this is the same issue?

Comment 3 Bob Fournier 2019-12-06 16:25:07 UTC

Is this still occurring? If not, can we close this?

Comment 4 Steven Hardy 2019-12-13 11:18:54 UTC

Lets close this, it's either a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1751978 or we didn't get sufficient info to determine the root cause

*** This bug has been marked as a duplicate of bug 1751978 ***

Note You need to log in before you can comment on or make changes to this bug.