Bug 1511110

Summary: Logging deployments are not maintaining the association of DC to PVC claim for Elasticsearch causing deployment failures
Product: OpenShift Container Platform Reporter: Peter Portante <pportant>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.0CC: aos-bugs, ewolinet, rmeggins
Target Milestone: ---Keywords: OpsBlocker
Target Release: 3.7.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
This fixes a regression introduced with 3.7 regarding reusing the pvc that was previously specified within a DC
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-18 13:23:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Portante 2017-11-08 17:28:25 UTC
On starter-ca-central-1 an upgrade of logging from an earlier version of 3.7 (v3.7.0-0.143.7) to v3.7.0-0.178.2 resulted in ES pods not able to start due to "unable to attach volume errors":

  Multi-Attach error for volume "pvc-61e094bc-b375-11e7-b95b-02d8407159d1"
  Volume is already exclusively attached to one node and can't be attached
  to another.

  kubelet, ip-172-31-19-199.ca-central-1.compute.internal   Unable to mount
  volumes for pod "logging-es-data-master-m5d1xc5i-11-8h4v7_logging
  (dd07bb1b-c493-11e7-84d1-02d8407159d1)": timeout expired waiting for
  volumes to attach/mount for pod "logging"/
  "logging-es-data-master-m5d1xc5i-11-8h4v7". list of unattached/unmounted
  volumes=[elasticsearch-storage]


Looking through the existing pods that are running, we see a DC to PVC claim name as follows:

# oc describe pod -l component=es | grep -E "(ClaimName:|^Name:)"
Name:		logging-es-data-master-m5d1xc5i-4-gdm6q
    ClaimName:	logging-es-2
Name:		logging-es-data-master-wo54khfh-4-fxchw
    ClaimName:	logging-es-1
Name:		logging-es-data-master-yo5htett-4-qwg8d
    ClaimName:	logging-es-0

However, the newly updated DCs have changed the claim names associated with each DC:

# oc describe dc -l component=es | grep -E "(ClaimName:|^Name:)"
Name:		logging-es-data-master-m5d1xc5i
    ClaimName:	logging-es-0
Name:		logging-es-data-master-wo54khfh
    ClaimName:	logging-es-2
Name:		logging-es-data-master-yo5htett
    ClaimName:	logging-es-1

When we updated the DCs to restore the proper association of PVC to DC, logging started working just fine.

Comment 1 Rich Megginson 2017-11-08 17:37:03 UTC
Eric, is this related to what you are working on?

Comment 2 ewolinet 2017-11-08 17:48:33 UTC
The logic to maintain the same PVC claim for a DC should be in openshift-ansible 3.7.0-0.192.0

Comment 3 Anping Li 2017-11-09 11:00:29 UTC
The claim was preserved during upgrade for openshift-ansible-3.7.0-0.198.1

1. claimname for v3.6.

oc describe pod -l component=es | grep -E "(ClaimName:|^Name:)"
Name:			logging-es-data-master-aglrm65m-1-qlq5z
    ClaimName:	logging-es-2
Name:			logging-es-data-master-ftcbo50f-1-ft3zp
    ClaimName:	logging-es-0
Name:			logging-es-data-master-u1j0vpoe-1-zmm01
    ClaimName:	logging-es-1

2. Upgrade to v3.7

# oc describe pod -l component=es | grep -E "(ClaimName:|^Name:)"
Name:			logging-es-data-master-aglrm65m-2-2kxmw
    ClaimName:	logging-es-2
Name:			logging-es-data-master-ftcbo50f-2-nzltr
    ClaimName:	logging-es-0
Name:			logging-es-data-master-u1j0vpoe-2-rpksq
    ClaimName:	logging-es-1

Comment 6 errata-xmlrpc 2017-12-18 13:23:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3464