Description of problem: - Customer is testing a power cycling scenario with OCP + HPE CSI + multipath. - After a power cycle event on an OCP worker node, with multipath device attached (provided by HEP CSI) used by PODs, POD cannot start when node comes back until a scsi rescan opeartion takes place in the node followed by a multipath service restart. Version-Release number of selected component (if applicable): - OpenShift 4.8.45 - RHCOS 48.84.202206172122-0 - HPE CSI plugin version 1.3 How reproducible: - Randomly when abruptly powercycling OCP nodes Steps to Reproduce: 1. Have a workload (POD) using PVs provided by external CSI drivers, HPE in this case and multipath configured on nodes. Node boots from SAN with multipath enabled too. 2. Abruptly powercycle the node where the POD is running (and has the volume attached) 3. Node comes back but POD cannot mount the volume until a scsi rescan happens (which clear devices that are unused and no longer present on the storage array side) followed by a multipath service restart. Actual results: - When node comes back, POD cannot mount the volume. Expected results: - POD should be able to mount the PV associated with it once node gets back Additional info: I'm providing the information we were able to collect and location for log files to inspect / analyze
@bzvonar ---> Case is being tracked followed on GChat Group : https://mail.google.com/chat/u/0/#chat/space/AAAAFrtdLuE Bill - Yes hope to now close this case soon --> let's keep this on your tracker for a little bit longer. Trying now to conclude where we are with this one and notify Nokia NOM that they need to take a contact with HPE to sort out which CSI Driver they should use. Colum Gaynor - Senior Partner Success Manager, Nokia Global Account - - - - - - - - - - - - - - - - - - @jcoscia --> Javier Is it possible for you to make a summary and public update to the support case now based on Hemant's work and set the case status to "Waiting For Customer" Clearly recommend to Nokia NOM that they need to clarify with HPE the correct version of the CSI Driver that they should be using. State clearly that Red Hat recommend the case is closed. Basically we cannot reproduce the original condition if I understand Hemmant correctly ? But we have also discovered evidence that they are on an old driver version ? ---- From the GChat space ----- Hemant Kumar, Yesterday 5:18 PM The way I see it: 1. If customer is running older version of CSI driver, they should try with latest version and report. 2. If this was the DNS error, then again error is between customer and HPE. We do not support this driver. We don't know anything about what version customer should deploy I am going to close this bug and ask you guys to work with HPE and NOM. If the bug comes somewhere in kube stack then please loop us in. Ivan Bodunov, Yesterday 5:23 PM Ok. So we don't need any call with HPE any more, the response was good enough for us? Hemant Kumar, Yesterday 5:24 PM Are we going to support HPE driver? Why is nokia not talking to HPE directly? It sounds like, we are trying very hard to support a driver, we know nothing about Ivan Bodunov, Yesterday 5:25 PM I don't think we should support HPE driver.. Mihai Idu, Yesterday 5:26 PM My opinion is we try to know both sides ( HPE and RH ) and build a joint framework on this topic. We never know when we need HPE side Ivan Bodunov, Yesterday 5:26 PM But understanding problem would be nice.. Hemant Kumar, Yesterday 5:26 PM In this case - the first line of contact should have been HPE. Hemant Kumar, Yesterday 5:28 PM The fact that, HPE is pointing us to tools they build for dealing with non-graceful shutdown of nodes etc, is clear indicator that, HPE knows best how to deploy their driver We should figure out, why it took so long to talk to HPE? Hemant Kumar, Yesterday 6:04 PM @Colum Gaynor so tldr; I don't think we need to setup a call with HPE right now. I hear Nokia is setting up a new environment and if the bug surfaces again in this environment. IMO, we should setup a call with HPE and redhat engineering. Hemant Kumar, Yesterday 6:10 PM @jcoscia ---- From the GChat space -----