Bug 1913411 - When openshift-sdn segfaults, creation fails with 'error reserving pod name ...: name is reserved"
Summary: When openshift-sdn segfaults, creation fails with 'error reserving pod name ....
Keywords:
Status: CLOSED DUPLICATE of bug 1924741
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.8.0
Assignee: Surya Seetharaman
QA Contact: zhaozhanqi
URL:
Whiteboard: aos-scalability-43
Depends On:
Blocks: 1887744
TreeView+ depends on / blocked
 
Reported: 2021-01-06 17:35 UTC by Peter Hunt
Modified: 2024-10-01 17:16 UTC (History)
45 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1785399
Environment:
Last Closed: 2021-04-27 19:54:28 UTC
Target Upstream Version:
Embargoed:
lmartinh: needinfo-
lmartinh: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5116561 0 None None None 2021-03-09 03:07:03 UTC

Comment 12 Abu Kashem 2021-01-21 16:55:26 UTC
> - However, along with the 'netplugin failed with no error message: signal: killed' and 'name is reserved' messages, they observed the apiserver reporting panic.
> - As a workaround, they restart the kube-apiserver pods and everything works as expected for a while.

I would like the following to be tracked:

- do a grep on the kube apiserver logs and give us a count of the total number of panics seen with time range.
- can you have the customer run the following prometheus query on the web console (time range = starting with the time master was rebooted and going back to 48 hours) and share the screenshot with us?
> sum(apiserver_flowcontrol_current_executing_requests) by (flowSchema,priorityLevel)

Comment 15 Abu Kashem 2021-01-25 16:59:28 UTC
apjagtap, requests for new data


grep the current kube apiserver logs (all instances):
> grep -rni -E "timeout.go:(132|134)" namespaces/openshift-kube-apiserver/*

please run the following prometheus queries and share the entire screenshot with me.
> topk(25, sum(apiserver_flowcontrol_current_executing_requests) by (priorityLevel,instance))
> topk(25, sum(apiserver_flowcontrol_request_concurrency_limit) by (priorityLevel,instance))

Thanks!

Comment 21 Abu Kashem 2021-02-02 16:48:26 UTC
apjagtap,

> Should I open another bug and share it over 
sounds good to me, and please follow the instructions for data capture from this - https://bugzilla.redhat.com/show_bug.cgi?id=1908383#c19

Comment 35 Surya Seetharaman 2021-04-27 19:54:28 UTC

*** This bug has been marked as a duplicate of bug 1924741 ***


Note You need to log in before you can comment on or make changes to this bug.