Bug 1836198 - [Tracking OCP BZ #1839933] Pods with PVCs attached takes a long time to start
Summary: [Tracking OCP BZ #1839933] Pods with PVCs attached takes a long time to start
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: unclassified
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: OCS 4.5.0
Assignee: Humble Chirammal
QA Contact: Ramakrishnan Periyasamy
URL:
Whiteboard:
: 1861438 1862196 (view as bug list)
Depends On: 1839933 1854311 1854312
Blocks: 1870193
TreeView+ depends on / blocked
 
Reported: 2020-05-15 12:02 UTC by Elvir Kuric
Modified: 2023-10-06 20:04 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1839933 1870193 (view as bug list)
Environment:
Last Closed: 2020-09-15 16:32:00 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1839933 0 high CLOSED Pods with PVCs attached takes long time to start 2021-02-22 00:41:40 UTC

Comment 7 Raz Tamir 2020-05-18 16:23:31 UTC
@Karthick, do we see the same in our tests?

Comment 8 Michael Adam 2020-05-18 17:39:34 UTC
@Elvir, is this also showing on OCP 4.4?

Comment 10 Raz Tamir 2020-05-19 08:53:31 UTC
Thanks Elvir,

Based on this info, marking as regression

Comment 11 Humble Chirammal 2020-05-19 10:18:07 UTC
Elvir,  let me understand some more here:

Is the below table reflect your testing result? :

OCS 4.3 +  OCP 4.3 -> works
OCS 4.3 +  OCP 4.4 -> fail
OCS 4.4 +  OCP 4.4 -> fail

Also, what is the result of:

OCS 4.4 +  OCP 4.3 ??

Are all the tests performed in same kind of setup? say: same cloud provider, number of nodes, spec ..etc?


When you say, pods are stuck at `Container creating` stage, is the PVCs are also getting created when pod comes up ? of PVCs are created upfront and pod just use those to attach volumes?

Comment 12 Elvir Kuric 2020-05-19 10:30:38 UTC
(In reply to Humble Chirammal from comment #11)
> Elvir,  let me understand some more here:
> 
> Is the below table reflect your testing result? :
> 
> OCS 4.3 +  OCP 4.3 -> works
> OCS 4.3 +  OCP 4.4 -> fail
> OCS 4.4 +  OCP 4.4 -> fail
I tested so far 

OCP v4.5 + OCS v4.4 - fail
OCP v4.4 + OCS v4.4 - fail 
OCP v4.3 + OCS v4.3 - works 

I did not tested yet "OCS 4.3 +  OCP 4.4" 
 
> 
> Also, what is the result of:
> 
> OCS 4.4 +  OCP 4.3 ??
waiting on quay.io to be operational in order to test. I expect to have this information today later. 

> 
> Are all the tests performed in same kind of setup? say: same cloud provider,
> number of nodes, spec ..etc?

yes: aws - m5.4xlarge - 6 OCS nodes, 12 app nodes - same environment. 
> 
> 
> When you say, pods are stuck at `Container creating` stage, is the PVCs are
> also getting created when pod comes up ? of PVCs are created upfront and pod
> just use those to attach volumes?

PVCs are created - I can see desired number of PV/PVCs in "oc get pv/pvc", also I can see that this number is equal to number I get on ceph side with 

# rbd ls example-storagecluster-cephblockpool | wc -l    

ie. If I want 2000 PVCs, then "# rbd ls example-storagecluster-cephblockpool | wc -l   " will show 2000 on ceph side. 

The problem here is that pods for above configurations stay very long in "ContainerCreating" state  ( images for pods are present on machines so no issue from that side ) and these ( second batch of 1000 pods ) starts after 90mins, or longer time.

From an single pod logs we see 

 Normal   Scheduled         <unknown>            default-scheduler                                       Successfully assigned pvcscalea0/pvc-pod-qj9kb to ip-10-0-142-115.eu-central-1.compute.internal
  Warning  FailedMount       23m                  kubelet, ip-10-0-142-115.eu-central-1.compute.internal  Unable to attach or mount volumes: unmounted volumes=[persistentvolume], unattached volumes=[default-token-wh87w persistentvolume]: timed out waiting for the condition
  Warning  FailedMount       4m59s (x9 over 25m)  kubelet, ip-10-0-142-115.eu-central-1.compute.internal  Unable to attach or mount volumes: unmounted volumes=[persistentvolume], unattached volumes=[persistentvolume default-token-wh87w]: timed out waiting for the condition
  Normal   Pulled            4m45s                kubelet, ip-10-0-142-115.eu-central-1.compute.internal  Container image "quay.io/openshift-scale/scale-ci-workload" already present on machine
  Normal   Created           4m44s                kubelet, ip-10-0-142-115.eu-central-1.compute.internal  Created container fio-pod
  Normal   Started           4m44s                kubelet, ip-10-0-142-115.eu-central-1.compute.internal  Started container fio-pod

so here it waits cca 19mins between these two error messages  "timed out waiting for the condition". 

Will update later this bz with test result for OCP v4.3 + OCS v4.4

Comment 13 Humble Chirammal 2020-05-19 12:41:30 UTC
(In reply to Elvir Kuric from comment #12)
> (In reply to Humble Chirammal from comment #11)
> > Elvir,  let me understand some more here:
> > 
> > Is the below table reflect your testing result? :
> > 
> > OCS 4.3 +  OCP 4.3 -> works
> > OCS 4.3 +  OCP 4.4 -> fail
> > OCS 4.4 +  OCP 4.4 -> fail
> I tested so far 
> 
> OCP v4.5 + OCS v4.4 - fail
> OCP v4.4 + OCS v4.4 - fail 
> OCP v4.3 + OCS v4.3 - works 
> 
> I did not tested yet "OCS 4.3 +  OCP 4.4" 


Sure, I was trying to isolate the component here.

>  
> > 
> > Also, what is the result of:
> > 
> > OCS 4.4 +  OCP 4.3 ??
> waiting on quay.io to be operational in order to test. I expect to have this
> information today later. 

Sure, that should help us!

> 
> > 
> > Are all the tests performed in same kind of setup? say: same cloud provider,
> > number of nodes, spec ..etc?
> 
> yes: aws - m5.4xlarge - 6 OCS nodes, 12 app nodes - same environment. 
> > 
> > 
> > When you say, pods are stuck at `Container creating` stage, is the PVCs are
> > also getting created when pod comes up ? of PVCs are created upfront and pod
> > just use those to attach volumes?
> 
> PVCs are created - I can see desired number of PV/PVCs in "oc get pv/pvc",
> also I can see that this number is equal to number I get on ceph side with 
> 
> # rbd ls example-storagecluster-cephblockpool | wc -l    
> 
> ie. If I want 2000 PVCs, then "# rbd ls example-storagecluster-cephblockpool
> | wc -l   " will show 2000 on ceph side. 
> 
> The problem here is that pods for above configurations stay very long in
> "ContainerCreating" state  ( images for pods are present on machines so no
> issue from that side ) and these ( second batch of 1000 pods ) starts after
> 90mins, or longer time.
> 
> From an single pod logs we see 
> 
>  Normal   Scheduled         <unknown>            default-scheduler          
> Successfully assigned pvcscalea0/pvc-pod-qj9kb to
> ip-10-0-142-115.eu-central-1.compute.internal
>   Warning  FailedMount       23m                  kubelet,
> ip-10-0-142-115.eu-central-1.compute.internal  Unable to attach or mount
> volumes: unmounted volumes=[persistentvolume], unattached
> volumes=[default-token-wh87w persistentvolume]: timed out waiting for the
> condition
>   Warning  FailedMount       4m59s (x9 over 25m)  kubelet,
> ip-10-0-142-115.eu-central-1.compute.internal  Unable to attach or mount
> volumes: unmounted volumes=[persistentvolume], unattached
> volumes=[persistentvolume default-token-wh87w]: timed out waiting for the
> condition
>   Normal   Pulled            4m45s                kubelet,
> ip-10-0-142-115.eu-central-1.compute.internal  Container image
> "quay.io/openshift-scale/scale-ci-workload" already present on machine
>   Normal   Created           4m44s                kubelet,
> ip-10-0-142-115.eu-central-1.compute.internal  Created container fio-pod
>   Normal   Started           4m44s                kubelet,
> ip-10-0-142-115.eu-central-1.compute.internal  Started container fio-pod
> 
> so here it waits cca 19mins between these two error messages  "timed out
> waiting for the condition". 

Thanks, that's why I was requesting that information, with that, I wanted to 
check, is it a delay in provisioning or attaching?. 
from the initial reply, it looks like this is on attaching and not on provisioning
side. 

> 
> Will update later this bz with test result for OCP v4.3 + OCS v4.4

Sure.

Comment 14 Humble Chirammal 2020-05-19 12:55:35 UTC
[Side note]

I was going through the must gather and other log directory mentioned in c#4, unfortunately I couldn't fetch or figure out the logs. There are many logs, but I don't see any logs belonging to CSI. Can you relook at whether this has CSI pod details pertaining to this test?

Comment 15 Humble Chirammal 2020-05-19 13:19:09 UTC
(In reply to Raz Tamir from comment #10)
> Thanks Elvir,
> 
> Based on this info, marking as regression

I don't think this qualifies as a regression. 

Few things to summarize:

1) iic, 1500 pods per 3 nodes are the supported/qualified limit ( maybe this is documented too). But, whether we scale this limit with each 3 node cluster iteration is still a question for me. Say, 12 nodes, are we saying we could support 6000? say 24 nodes, 12000?

2) Also this is not about pods are not coming up, rather iic, its about **delay** in pod getting spawned. If thats the concern in previous releases we would have told, the pods will be up and running say 'X' amount of time. Actually drawing such conclusion is really difficult as it depends on many other things.

Comment 16 Yaniv Kaul 2020-05-19 14:33:22 UTC
The matrix Elvir gave above makes it clear it's a regression. If it's a reason to block the release - surely it isn't, for two reasons:
1. We believe it's an edge case - in the high end of the scale.
2. It is, as you've commented, just a delay, not a clear failure.

(I'm still very interested to understand where it's coming from).

Michael, thoughts?

Comment 18 Yaniv Kaul 2020-05-19 16:51:44 UTC
Please describe your VMware setup.
Are you sure the hypervisors, NICs or storage is not overloaded?

Comment 20 Humble Chirammal 2020-05-20 03:29:14 UTC
(In reply to Yaniv Kaul from comment #16)
> The matrix Elvir gave above makes it clear it's a regression. If it's a
> reason to block the release - surely it isn't, for two reasons:
> 1. We believe it's an edge case - in the high end of the scale.
> 2. It is, as you've commented, just a delay, not a clear failure.
> 
> (I'm still very interested to understand where it's coming from).
> 


I am almost sure that, this is coming from either OCP sidecars or from kubelet/controller shipped with OCP 4.4. 

The reason for this assumption is that there is NO code change between
OCS 4.3 and OCS 4.4 wrt ceph csi builds.

Comment 21 Humble Chirammal 2020-05-20 03:31:43 UTC
Elvir, while testing this on different combinations, does the base/node software remains same? like kernel or OS release?

Comment 29 Humble Chirammal 2020-05-20 09:25:12 UTC
[Action Plan]

1) We have to get or capture the logs and analyze the logs at least once. 

Unfortunately both must gather dont have the logs ( c#22 and c#3) in it. Not sure why its missing atm though.

2) Cross-check with the OCP team, whether there are any changes that could cause this.

Comment 32 Michael Adam 2020-05-20 10:08:13 UTC
Also, to narrow it down, what is slower than before?

- pvc creation?
- pod creation?
- pvc attaching / mounting?

Can you do 1000 PVC creation without pods/mounting?
Etc...

Comment 35 Jan Safranek 2020-05-21 05:18:31 UTC
Does OCS use external-attacher? You may be hitting this: https://github.com/kubernetes/kubernetes/issues/84169.
Check this comment for workarounds: https://github.com/kubernetes/kubernetes/issues/84169#issuecomment-545692734 - does this help?
There is also PR in progress: https://github.com/kubernetes/kubernetes/pull/91307

Comment 36 Humble Chirammal 2020-05-21 06:13:43 UTC
(In reply to Jan Safranek from comment #35)
> Does OCS use external-attacher? You may be hitting this:
> https://github.com/kubernetes/kubernetes/issues/84169.
> Check this comment for workarounds:
> https://github.com/kubernetes/kubernetes/issues/84169#issuecomment-545692734
> - does this help?
> There is also PR in progress:
> https://github.com/kubernetes/kubernetes/pull/91307

Thanks Jan! Looks like we are hitting the same. 

I think the workaround is setting the kube-controller-manager flag disable-attach-detach-reconcile-sync to true.


Elvir, can we start controller-manager with above flag to true and find the result ?

Comment 37 Humble Chirammal 2020-05-21 06:46:14 UTC
// Additional Note:

Adding --worker-threads and --reconcileSync is one of the thing we could try:
https://github.com/ceph/ceph-csi/blob/master/deploy/cephfs/kubernetes/csi-cephfsplugin-provisioner.yaml#L70

Also just noticed:

https://github.com/rook/rook/issues/5366 where it claims `disable-attach-detach-reconcile-sync to true` fixed the issue for them.

One remaining question is, why this issue is not visible in OCP 4.3 + combination. 

AFAICT, the code in the controller manager on this area remains even in OCP 4.3.

Comment 38 Elvir Kuric 2020-05-21 08:20:37 UTC
I have to figure out how to configure disable-attach-detach-reconcile-sync to true. I will update bz once I test with this.

Comment 40 Humble Chirammal 2020-05-21 09:27:13 UTC
Elvir, I have added or changed this configuration in your system, can you please retest?

Comment 41 Elvir Kuric 2020-05-21 13:28:45 UTC
I did some tests more 

- with 1000 pods in "ContainerCreating" state and none of them Running, below was done

# oc get cm config -n openshift-kube-controller-manager -o yaml  > controllermanager.yml

add 

disable-attach-detach-reconcile-sync:true

in the configuration and apply changes. 

# oc apply  -f controllermanager.yml 

After this pods starts to move in Running state after contoller manager pods are restarted after above change. 

So pods are starting, but there are some things  which are still problematic 

- If user request 1000 pods  - none of pods will be started and marked "Ready" with attached PVC before all pods are created. 
After all pods are created , they will start gradually moving to Ready state. Previsouly pods were moved to "Ready" state while other pods were creating. 

- Pods are starting slower than with OCP v4.3 / OCS v4.3 and start time for pods with PVC attached is bad. 


Also, how to implement this change permanently to do more exensive tests?

Comment 43 Humble Chirammal 2020-05-22 14:34:47 UTC
Thanks Elvir for the detailed testing and sharing (c#42) the result. Sure, as this is not specific to OCS and mostly on the controller side, we could work with OCP team, will do clone of this bz, and proceed. 

ps# I am also experimenting with some of the sidecar options mentioned in c#37 too. I will update if I get any positive result with those.

Comment 45 Humble Chirammal 2020-05-26 04:07:03 UTC
Elad,Raz, considering the debugging so far or the isolation, can we remove the regression/blocker flag ..etc?

Comment 48 Michael Adam 2020-05-26 15:25:21 UTC
(In reply to Humble Chirammal from comment #45)
> Elad,Raz, considering the debugging so far or the isolation, can we remove
> the regression/blocker flag ..etc?

This can not even be a question any more.

* it is NOT an OCS regression.
* it *IS* an OCP regression.
* upstream issue: https://github.com/kubernetes/kubernetes/issues/84169

==> we need to:
* remove the blocker flag here
  ==> DONE and updated the Subject

* find or create a bug in OCP to track this and have this one depend on the OCP bug
  ==> Can someone please take this up?

Comment 50 Michael Adam 2020-05-26 15:27:21 UTC
resetting assignees

Comment 51 Michael Adam 2020-05-26 15:29:04 UTC
OK. bot is re-adding blocker b/c of regression. removing regression tag to be able to remove blocker...

Comment 52 Humble Chirammal 2020-05-27 04:49:12 UTC
(In reply to Michael Adam from comment #48)
> (In reply to Humble Chirammal from comment #45)

> 
> * find or create a bug in OCP to track this and have this one depend on the
> OCP bug
>   ==> Can someone please take this up?

There is already a OCP bug which I created to track it and its available here
https://bugzilla.redhat.com/show_bug.cgi?id=1839933

Comment 53 Humble Chirammal 2020-05-27 04:56:26 UTC
Thinking about the solution or avoiding the possibilities of getting into the situation:

This is what I could come up with. That said, the Ceph CSI driver does not make use of 
CONTROLLER PUBLISH and UNPUBLISH calls or these capabilities are not exposed from the driver.
However with the history of the development of CSI and Ceph CSI plugin we ***were*** making use of external-attacher sidecar till now. But, the upstream has a feature or solution to `skip attach` being a part of the CSIDriver object. Considering this is GA with "1.18", my proposal is (https://github.com/ceph/ceph-csi/issues/1106) to get rid of 'external-attacher" completely in Ceph CSI driver implementation by making use of this field. However this needs extensive testing in the driver..etc which we will get on with release v3.0.0 of upstream.

Comment 54 Humble Chirammal 2020-06-04 08:29:28 UTC
Considering this is not a regression at OCS end and this is getting tracked for OCP, I would like to consider this for OCS 4.5. This is also about scaling to a very large set and just on performance.

Comment 55 Elad 2020-06-15 11:40:49 UTC
Removing the needonfo as it was already decided that it's not a blocker for 4.4. Sorry for the late reply.

Comment 57 Mudit Agarwal 2020-06-25 03:36:34 UTC
From OCS side, it still falls in ceph-csi side hence changing the component field.

Looking at the discussions in https://github.com/ceph/ceph-csi/issues/1106, I don't think we are going to disable external attacher soon. 
Doen't look like a 4.5 candidate to me.


@Humble any comments?

Comment 58 Michael Adam 2020-06-26 12:52:00 UTC
fix direction of dependency

Comment 59 Michael Adam 2020-06-26 12:56:52 UTC
(In reply to Humble Chirammal from comment #53)
> Thinking about the solution or avoiding the possibilities of getting into
> the situation:
> 
> This is what I could come up with. That said, the Ceph CSI driver does not
> make use of 
> CONTROLLER PUBLISH and UNPUBLISH calls or these capabilities are not exposed
> from the driver.
> However with the history of the development of CSI and Ceph CSI plugin we
> ***were*** making use of external-attacher sidecar till now. But, the
> upstream has a feature or solution to `skip attach` being a part of the
> CSIDriver object. Considering this is GA with "1.18", my proposal is
> (https://github.com/ceph/ceph-csi/issues/1106) to get rid of
> 'external-attacher" completely in Ceph CSI driver implementation by making
> use of this field. However this needs extensive testing in the driver..etc
> which we will get on with release v3.0.0 of upstream.

Based on https://github.com/ceph/ceph-csi/issues/1106#issuecomment-642252773 , removing the attacher is discouraged.

Let's use this bug as a tracker for the OCP bug in the meantime.

The OCP BZ is targeted for 4.6. ==> moving this BZ out to OCS 4.6 too

Comment 67 Yaniv Kaul 2020-07-12 11:02:43 UTC
This one can move to ON_QA - the tracked BZ is already verified (for OCP 4.6).
The 4.5 one is still on POST - https://bugzilla.redhat.com/show_bug.cgi?id=1854311
The 4.4. one is still on ASSIGNED - https://bugzilla.redhat.com/show_bug.cgi?id=1854312

Comment 68 Humble Chirammal 2020-07-13 04:59:28 UTC
(In reply to Yaniv Kaul from comment #67)
> This one can move to ON_QA - the tracked BZ is already verified (for OCP
> 4.6).
> The 4.5 one is still on POST -
> https://bugzilla.redhat.com/show_bug.cgi?id=1854311
> The 4.4. one is still on ASSIGNED -
> https://bugzilla.redhat.com/show_bug.cgi?id=1854312

hmm... Ideally this should be moved to ON_QA when OCP 4.5 and OCP 4.4 are ON_QA. The reason being, even though this is a tracker for OCS  4.6, the validation test bed can be OCP 4.5, OCP 4.4..etc, where this will be failed unless the fix build is not available.

Comment 69 Yaniv Kaul 2020-07-13 05:33:10 UTC
(In reply to Humble Chirammal from comment #68)
> (In reply to Yaniv Kaul from comment #67)
> > This one can move to ON_QA - the tracked BZ is already verified (for OCP
> > 4.6).
> > The 4.5 one is still on POST -
> > https://bugzilla.redhat.com/show_bug.cgi?id=1854311
> > The 4.4. one is still on ASSIGNED -
> > https://bugzilla.redhat.com/show_bug.cgi?id=1854312
> 
> hmm... Ideally this should be moved to ON_QA when OCP 4.5 and OCP 4.4 are
> ON_QA. The reason being, even though this is a tracker for OCS  4.6, the
> validation test bed can be OCP 4.5, OCP 4.4..etc, where this will be failed
> unless the fix build is not available.

If that's a tracker for ALL the above bugs, we can postpone this move. If we want QE to test the functionality, it makes more sense to split.

Comment 70 Ramakrishnan Periyasamy 2020-07-23 06:45:58 UTC
Respective OCP bz https://bugzilla.redhat.com/show_bug.cgi?id=1854311 is already in ON_QA 

Update from QE: we were able to create mode than 1200+ pods without any issues, tested this in independent mode cluster. Note: Earlier we observed PVC attach problem with just 300+ pods.

OCP Version: 4.5.0-0.nightly-2020-07-22-074214
OCS Version: ocs-operator.v4.5.0-494.ci

@Elvir have you tried testing in OCP 4.5 with above build or the latest ? any updates ?

Comment 71 Elvir Kuric 2020-07-23 07:14:10 UTC
(In reply to Ramakrishnan Periyasamy from comment #70)
> Respective OCP bz https://bugzilla.redhat.com/show_bug.cgi?id=1854311 is
> already in ON_QA 
> 
> Update from QE: we were able to create mode than 1200+ pods without any
> issues, tested this in independent mode cluster. Note: Earlier we observed
> PVC attach problem with just 300+ pods.
> 
> OCP Version: 4.5.0-0.nightly-2020-07-22-074214
> OCS Version: ocs-operator.v4.5.0-494.ci
> 
> @Elvir have you tried testing in OCP 4.5 with above build or the latest ?
> any updates ?

Yes, with OCP v4.6, worked fine https://bugzilla.redhat.com/show_bug.cgi?id=1836198#c64

Comment 77 emahoney 2020-07-30 18:09:13 UTC
*** Bug 1862196 has been marked as a duplicate of this bug. ***

Comment 78 Yaniv Kaul 2020-08-02 11:06:22 UTC
*** Bug 1861438 has been marked as a duplicate of this bug. ***

Comment 80 Mudit Agarwal 2020-08-03 11:42:47 UTC
This is already ON_QA, but we don't have qa ack?

Comment 81 Humble Chirammal 2020-08-03 12:17:19 UTC
(In reply to Mudit Agarwal from comment #80)
> This is already ON_QA, but we don't have qa ack?

QE ack was available, but it looks like in some changes it went away. 
Raz, can you bring back QE ack here?

Comment 82 Elad 2020-08-03 12:22:08 UTC
As this was fixed in OCP 4.6 and backported to 4.5.z and 4.4.z, should we also create tracker for OCS 4.4.z and keep this for 4.5?

Comment 83 Mudit Agarwal 2020-08-03 14:30:49 UTC
(In reply to Elad from comment #82)
> As this was fixed in OCP 4.6 and backported to 4.5.z and 4.4.z, should we
> also create tracker for OCS 4.4.z and keep this for 4.5?

If I am not wrong, this is just an OCP tracker and there is no change from OCS side. So it is upto QE, if you want a 4.4.z BZ to track testing.
From dev side I don't see a need to open a 4.4.z BZ

Also, qa_ack still pending. Can you please provide, I don't have permissions.


Note You need to log in before you can comment on or make changes to this bug.