1913411 – When openshift-sdn segfaults, creation fails with 'error reserving pod name ...: name is reserved"

Bug 1913411 - When openshift-sdn segfaults, creation fails with 'error reserving pod name ...: name is reserved"

Summary: When openshift-sdn segfaults, creation fails with 'error reserving pod name ....

Keywords:
Status:	CLOSED DUPLICATE of bug 1924741
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Surya Seetharaman
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:	aos-scalability-43
Depends On:
Blocks:	1887744
TreeView+	depends on / blocked

Reported:	2021-01-06 17:35 UTC by Peter Hunt
Modified:	2024-10-01 17:16 UTC (History)
CC List:	45 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1785399
Environment:
Last Closed:	2021-04-27 19:54:28 UTC
Target Upstream Version:
Embargoed:
Flags:	lmartinh: needinfo- lmartinh: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5116561	0	None	None	None	2021-03-09 03:07:03 UTC

Comment 12 Abu Kashem 2021-01-21 16:55:26 UTC

> - However, along with the 'netplugin failed with no error message: signal: killed' and 'name is reserved' messages, they observed the apiserver reporting panic.
> - As a workaround, they restart the kube-apiserver pods and everything works as expected for a while.

I would like the following to be tracked:

- do a grep on the kube apiserver logs and give us a count of the total number of panics seen with time range.
- can you have the customer run the following prometheus query on the web console (time range = starting with the time master was rebooted and going back to 48 hours) and share the screenshot with us?
> sum(apiserver_flowcontrol_current_executing_requests) by (flowSchema,priorityLevel)

Comment 15 Abu Kashem 2021-01-25 16:59:28 UTC

apjagtap, requests for new data


grep the current kube apiserver logs (all instances):
> grep -rni -E "timeout.go:(132|134)" namespaces/openshift-kube-apiserver/*

please run the following prometheus queries and share the entire screenshot with me.
> topk(25, sum(apiserver_flowcontrol_current_executing_requests) by (priorityLevel,instance))
> topk(25, sum(apiserver_flowcontrol_request_concurrency_limit) by (priorityLevel,instance))

Thanks!

Comment 21 Abu Kashem 2021-02-02 16:48:26 UTC

apjagtap,

> Should I open another bug and share it over 
sounds good to me, and please follow the instructions for data capture from this - https://bugzilla.redhat.com/show_bug.cgi?id=1908383#c19

Comment 35 Surya Seetharaman 2021-04-27 19:54:28 UTC


*** This bug has been marked as a duplicate of bug 1924741 ***

Note You need to log in before you can comment on or make changes to this bug.

aconstan
akamra
akashem
anbhat
aos-bugs
apjagtap
bbennett
ckavili
dosmith
dsanzmor
eparis
hongkliu
itbrown
izhang
jcallen
jcrumple
jmalde
jokerman
jsafrane
jtaleric
jtanenba
kramdoss
lbac
llopezmo
lmartinh
mapandey
mwasher
naoto30
nelluri
openshift-bugs-escalate
pehunt
rcernin
rkrawitz
rperiyas
rphillips
rsevilla
sbhavsar
schoudha
scuppett
ssonigra
swasthan
tkatarki
vpagar
wking
xmorano