Bug 1511110 - Logging deployments are not maintaining the association of DC to PVC claim for Elasticsearch causing deployment failures
Summary: Logging deployments are not maintaining the association of DC to PVC claim fo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.7.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 3.7.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-08 17:28 UTC by Peter Portante
Modified: 2017-12-18 13:23 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
This fixes a regression introduced with 3.7 regarding reusing the pvc that was previously specified within a DC
Clone Of:
Environment:
Last Closed: 2017-12-18 13:23:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3464 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.7 bug fix and enhancement update 2017-12-18 18:22:05 UTC

Description Peter Portante 2017-11-08 17:28:25 UTC
On starter-ca-central-1 an upgrade of logging from an earlier version of 3.7 (v3.7.0-0.143.7) to v3.7.0-0.178.2 resulted in ES pods not able to start due to "unable to attach volume errors":

  Multi-Attach error for volume "pvc-61e094bc-b375-11e7-b95b-02d8407159d1"
  Volume is already exclusively attached to one node and can't be attached
  to another.

  kubelet, ip-172-31-19-199.ca-central-1.compute.internal   Unable to mount
  volumes for pod "logging-es-data-master-m5d1xc5i-11-8h4v7_logging
  (dd07bb1b-c493-11e7-84d1-02d8407159d1)": timeout expired waiting for
  volumes to attach/mount for pod "logging"/
  "logging-es-data-master-m5d1xc5i-11-8h4v7". list of unattached/unmounted
  volumes=[elasticsearch-storage]


Looking through the existing pods that are running, we see a DC to PVC claim name as follows:

# oc describe pod -l component=es | grep -E "(ClaimName:|^Name:)"
Name:		logging-es-data-master-m5d1xc5i-4-gdm6q
    ClaimName:	logging-es-2
Name:		logging-es-data-master-wo54khfh-4-fxchw
    ClaimName:	logging-es-1
Name:		logging-es-data-master-yo5htett-4-qwg8d
    ClaimName:	logging-es-0

However, the newly updated DCs have changed the claim names associated with each DC:

# oc describe dc -l component=es | grep -E "(ClaimName:|^Name:)"
Name:		logging-es-data-master-m5d1xc5i
    ClaimName:	logging-es-0
Name:		logging-es-data-master-wo54khfh
    ClaimName:	logging-es-2
Name:		logging-es-data-master-yo5htett
    ClaimName:	logging-es-1

When we updated the DCs to restore the proper association of PVC to DC, logging started working just fine.

Comment 1 Rich Megginson 2017-11-08 17:37:03 UTC
Eric, is this related to what you are working on?

Comment 2 ewolinet 2017-11-08 17:48:33 UTC
The logic to maintain the same PVC claim for a DC should be in openshift-ansible 3.7.0-0.192.0

Comment 3 Anping Li 2017-11-09 11:00:29 UTC
The claim was preserved during upgrade for openshift-ansible-3.7.0-0.198.1

1. claimname for v3.6.

oc describe pod -l component=es | grep -E "(ClaimName:|^Name:)"
Name:			logging-es-data-master-aglrm65m-1-qlq5z
    ClaimName:	logging-es-2
Name:			logging-es-data-master-ftcbo50f-1-ft3zp
    ClaimName:	logging-es-0
Name:			logging-es-data-master-u1j0vpoe-1-zmm01
    ClaimName:	logging-es-1

2. Upgrade to v3.7

# oc describe pod -l component=es | grep -E "(ClaimName:|^Name:)"
Name:			logging-es-data-master-aglrm65m-2-2kxmw
    ClaimName:	logging-es-2
Name:			logging-es-data-master-ftcbo50f-2-nzltr
    ClaimName:	logging-es-0
Name:			logging-es-data-master-u1j0vpoe-2-rpksq
    ClaimName:	logging-es-1

Comment 6 errata-xmlrpc 2017-12-18 13:23:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3464


Note You need to log in before you can comment on or make changes to this bug.