Bug 1420256
| Summary: | [intservice_public_324] ES is in pending status if deploy logging with dynamic PV enabled | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||||||||||||||||||||
| Component: | Logging | Assignee: | Jeff Cantrill <jcantril> | ||||||||||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Xia Zhao <xiazhao> | ||||||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||||||
| Priority: | high | ||||||||||||||||||||||||||
| Version: | 3.5.0 | CC: | aos-bugs, jokerman, juzhao, mmccomas, wabouham, xiazhao | ||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||
| Target Release: | 3.5.z | ||||||||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||
| Whiteboard: | |||||||||||||||||||||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||||||||||||||||||||
| Doc Text: |
undefined
|
Story Points: | --- | ||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||
| Last Closed: | 2017-10-25 13:00:48 UTC | Type: | Bug | ||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||
|
Description
Xia Zhao
2017-02-08 09:56:45 UTC
Created attachment 1248574 [details]
full ansible execution logs with dynamic PV enabled
Is your cluster set-up for dynamic provisioning? Here are the 3.2 docs, but same applies to the 3.5 clusters as far as I know: https://docs.openshift.com/enterprise/3.2/install_config/persistent_storage/dynamically_provisioning_pvs.html#enabling-provisioner-plugins (In reply to Jeff Cantrill from comment #2) > Is your cluster set-up for dynamic provisioning? Here are the 3.2 docs, but > same applies to the 3.5 clusters as far as I know: > https://docs.openshift.com/enterprise/3.2/install_config/persistent_storage/ > dynamically_provisioning_pvs.html#enabling-provisioner-plugins Yes. I enabled cloud-provider in master-config when launching the OCP env. Please attach the JSON output for the various resources (pod, pvc, pv). Assuming we are producing the object the same as a comparable 3.4 deployment, this would not be a logging bug Attached the JSON output of es, es-ops pods, and the pvcs. No pv was actually created. # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-lm9x9 1/1 Running 1 4m logging-curator-ops-1-5427s 1/1 Running 1 4m logging-es-b6ozqvp5-1-kcxjl 0/1 Pending 0 4m logging-es-ops-g7s5xgaz-1-k1wdj 0/1 Pending 0 4m logging-fluentd-c159n 1/1 Running 0 4m logging-fluentd-k4z36 1/1 Running 0 4m logging-kibana-1-b8pxh 2/2 Running 0 4m logging-kibana-ops-1-522z7 2/2 Running 0 4m # oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE foo--0 Pending 5m foo-ops--0 Pending 5m # oc get pv NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE regpv-volume 17G RWX Retain Bound default/regpv-claim 35m Created attachment 1249779 [details]
pvc_foo-ops--0
Created attachment 1249780 [details]
pvc_foo--0
Created attachment 1249781 [details]
pod_es_ops
Created attachment 1249782 [details]
pod_es
From the es pods' JSON output, found es-ops pod is bound with wrong claimName foo--0, it should be foo-ops--0
# oc get po logging-es-ops-g7s5xgaz-1-k1wdj -o json | grep foo
"claimName": "foo--0"
# oc get po logging-es-b6ozqvp5-1-kcxjl -o json | grep foo
"claimName": "foo--0"
openshift_logging_es_pvc_prefix=foo-
openshift_logging_es_ops_pvc_prefix=foo-ops-
Possibly related to: https://bugzilla.redhat.com/show_bug.cgi?id=1399523 I am unable to recreate, but we have made some changes recently in openshift-ansible in this area that may have resolved this. I am unable to point to specific merges or commits. I recently tested with: openshift-ansible: HEAD - bdbb8d2ec6e81ec0eb8b5b5c512583392af2004d Partial Inventory: openshift_logging_es_ops_pvc_prefix=foo-ops- openshift_logging_es_ops_pvc_size=1G verified on GCE with the latest openshift-ansible, same error as xizhao reported before.
# oc get pv
No resources found.
# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
foo--0 Pending 13m
foo-ops--0 Pending 13m
# oc get po
NAME READY STATUS RESTARTS AGE
logging-curator-1-kzbmp 1/1 Running 3 12m
logging-curator-ops-1-1l720 1/1 Running 3 12m
logging-es-1bmm4loa-1-3g2f7 0/1 Pending 0 12m
logging-es-ops-xgdflp96-1-4mtsj 0/1 Pending 0 12m
logging-fluentd-x9b33 1/1 Running 0 13m
logging-kibana-1-tdrrk 2/2 Running 0 12m
logging-kibana-ops-1-fhtv8 2/2 Running 0 12m
# oc get po logging-es-1bmm4loa-1-3g2f7 -o json| grep foo -A 2 -B 2
"name": "elasticsearch-storage",
"persistentVolumeClaim": {
"claimName": "foo--0"
}
},
# oc get po logging-es-ops-xgdflp96-1-4mtsj -o json| grep foo -A 2 -B 2
"name": "elasticsearch-storage",
"persistentVolumeClaim": {
"claimName": "foo--0"
}
},
re-open this defect and attached the fully ansible run log
Created attachment 1255624 [details]
ansible log - 20170220
Retested on AWS with openshift-ansible-3.5.15-1.git.0.8d2a456.el7.noarch, still did not get the dynamic PV created, attached the inventory file I used and ansible execution log. And i feel it a regression that no pvc created even these parameters are specified: openshift_logging_es_pvc_dynamic=true openshift_logging_es_pvc_prefix=foo- openshift_logging_es_ops_pvc_dynamic=true openshift_logging_es_ops_pvc_prefix=foo-ops- [root@ip-172-18-0-24 ~]# oc get pv No resources found. [root@ip-172-18-0-24 ~]# oc get pvc -n logging No resources found. Created attachment 1258615 [details]
inventory_March_1st_2017
Created attachment 1258619 [details]
Ansible log when used inventory_March_1st_2017 on a cloudprovider enabled env
Can you also provide the following: <bc> get a dump of their PVC and of the storageclass <bc> they need a storage class defined that they can provision against I think we already have the PVC but need understanding of the storageclass. To my knowledge, we have parity with PVC generation between 3.5 ansible and 3.4 deployer so I would expect logging to still deploy using dynamic PVC allocation. I noticed the PVCs you have do not have the proper annotation. But during investigation I found an issue with creating PVCs in general that is fixed by: https://github.com/openshift/openshift-ansible/pull/3548 This resolves the issue assuming the logging namespace does not have PVCs already Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/bd7f9386fa2cbe201509486b9ba9ac74b23e8f8a bug 1420256. Initialize openshift_logging pvc_facts to empty https://github.com/openshift/openshift-ansible/commit/4630305622b6f8d6957f93da72b15fe4bda1fd02 Merge pull request #3548 from jcantrill/bz_1420256_again_reset_pvc_facts bug 1420256. Initialize openshift_logging pvc_facts to empty (In reply to Jeff Cantrill from comment #19) > Can you also provide the following: > > <bc> get a dump of their PVC and of the storageclass > <bc> they need a storage class defined that they can provision against > > I think we already have the PVC but need understanding of the storageclass. > To my knowledge, we have parity with PVC generation between 3.5 ansible and > 3.4 deployer so I would expect logging to still deploy using dynamic PVC > allocation. Hi Jeff, Tested with today's latest code get from openshift-ansible repo , mater branch, HEAD revision 4630305622b6f8d6957f93da72b15fe4bda1fd02, the es pod can now be started up with dynamic PV provisioned, but the es-ops pod is still failed, could you please help to take a further look? Thanks! 1. Deployed with these paramters in the inventory file: openshift_logging_es_cluster_size=1 openshift_logging_es_pvc_dynamic=true openshift_logging_es_pvc_prefix=foo- openshift_logging_es_pvc_size=1G openshift_logging_use_ops=true openshift_logging_es_ops_cluster_size=1 openshift_logging_es_pvc_dynamic=true openshift_logging_es_ops_pvc_prefix=foo-ops- openshift_logging_es_ops_pvc_size=1G 2. The es pod is bound with dynamic pv but es-ops pod Pending: # oc get po logging-es-lecuuah9-1-1cq38 1/1 Running 0 9m logging-es-ops-j9b8tf1m-1-vmllt 0/1 Pending 0 9m 3. # oc get pvc NAME STATUS VOLUME CAPACITY ACCESSMODES AGE foo--0 Bound pvc-589d4995-fff0-11e6-8651-0e370a4933aa 1Gi RWO 9m foo-ops--0 Pending 9m # oc describe po logging-es-lecuuah9-1-1cq38 ... elasticsearch-storage: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: foo--0 ReadOnly: false ... # oc get pv NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE pvc-589d4995-fff0-11e6-8651-0e370a4933aa 1Gi RWO Delete Bound logging/foo--0 9m 4. Please find the JSON output of pvc foo-ops--0 in the attachment Created attachment 1259461 [details]
The es-ops pvc (JSON output)
Created attachment 1259462 [details]
The es pvc (JSON output) which worked fine
You dont have the ops dynamic variable set correctly: openshift_logging_es_ops_cluster_size=1 openshift_logging_es_pvc_dynamic=true openshift_logging_es_ops_pvc_prefix=foo-ops- openshift_logging_es_ops_pvc_size=1G it should be: openshift_logging_es_ops_pvc_dynamic=true @Jeff
Used the correct parameter and tested on GCE,(dynamicProvisioningEnabled is true)
# oc get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pvc-c4238016-0234-11e7-91c0-42010af00019 1Gi RWO Delete Bound logging/foo--0 39m
pvc-c775476e-0234-11e7-91c0-42010af00019 1Gi RWO Delete Bound logging/foo-ops--0 39m
# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
foo--0 Bound pvc-c4238016-0234-11e7-91c0-42010af00019 1Gi RWO 39m
foo-ops--0 Bound pvc-c775476e-0234-11e7-91c0-42010af00019 1Gi RWO 39m
# oc get po
NAME READY STATUS RESTARTS AGE
logging-curator-1-8zsbk 1/1 Running 0 37m
logging-curator-ops-1-ml9jg 1/1 Running 7 37m
logging-es-0x646pi9-1-nx5p3 1/1 Running 0 37m
logging-es-ops-xrrpzwmo-1-c4vpt 0/1 Pending 0 37m
logging-fluentd-8h5z9 1/1 Running 0 39m
logging-kibana-1-m448l 2/2 Running 0 37m
logging-kibana-ops-1-2m2md 2/2 Running 0 37m
# oc get po logging-es-0x646pi9-1-nx5p3 -o json| grep foo -A 2 -B 2
"name": "elasticsearch-storage",
"persistentVolumeClaim": {
"claimName": "foo--0"
}
},
# oc get po logging-es-ops-xrrpzwmo-1-c4vpt -o json| grep foo -A 2 -B 2
"name": "elasticsearch-storage",
"persistentVolumeClaim": {
"claimName": "foo-ops--0"
}
},
ES OPS pod is in pending status because of NoVolumeZoneConflict error. This is a knonw issue for GCE: https://bugzilla.redhat.com/show_bug.cgi?id=1397672
Created attachment 1260285 [details]
es ops pod log, shows NoVolumeZoneConflict error
(In reply to Junqi Zhao from comment #27) > Created attachment 1260285 [details] > es ops pod log, shows NoVolumeZoneConflict error ES and ES OPS pod bound to correct pvc now Tested on dynamic pv enabled EC2, ES and ES OPS pods are running well and bound to correct pvc now, since we specified wrong parameter, close this defect as 'Not A Bug'.
# oc get po
NAME READY STATUS RESTARTS AGE
logging-curator-1-x135f 1/1 Running 0 11m
logging-curator-ops-1-8ntj1 1/1 Running 0 11m
logging-es-h1zzrt4p-1-v1h85 1/1 Running 0 11m
logging-es-ops-17t409u3-1-t7t8q 1/1 Running 0 11m
logging-fluentd-vhtvk 1/1 Running 0 13m
logging-kibana-1-rsgvv 2/2 Running 0 11m
logging-kibana-ops-1-bvzt6 2/2 Running 0 11m
# oc get po logging-es-h1zzrt4p-1-v1h85 -o json| grep foo -A 2 -B 2
"name": "elasticsearch-storage",
"persistentVolumeClaim": {
"claimName": "foo--0"
}
},
# oc get po logging-es-ops-17t409u3-1-t7t8q -o json| grep foo -A 2 -B 2
"name": "elasticsearch-storage",
"persistentVolumeClaim": {
"claimName": "foo-ops--0"
}
},
# oc get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
pvc-9e3bec68-0240-11e7-9cf8-0e4d82dcdc76 1Gi RWO Delete Bound logging/foo--0 15m
pvc-a185e5d5-0240-11e7-9cf8-0e4d82dcdc76 1Gi RWO Delete Bound logging/foo-ops--0 15m
# oc get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
foo--0 Bound pvc-9e3bec68-0240-11e7-9cf8-0e4d82dcdc76 1Gi RWO 15m
foo-ops--0 Bound pvc-a185e5d5-0240-11e7-9cf8-0e4d82dcdc76 1Gi RWO 15m
@juzhao The status should be set to "Verified" since the original issue had been fixed. Set to verified according to comment #29 (In reply to Xia Zhao from comment #30) > @juzhao The status should be set to "Verified" since the original issue had > been fixed. It should be 'Not a Bug', there is no code change, we wrongly used openshift_logging_es_pvc_dynamic for es_ops, should use openshift_logging_es_ops_pvc_dynamic. see Comment 25 (In reply to Junqi Zhao from comment #32) > (In reply to Xia Zhao from comment #30) > > @juzhao The status should be set to "Verified" since the original issue had > > been fixed. > > It should be 'Not a Bug', there is no code change, we wrongly used > openshift_logging_es_pvc_dynamic for es_ops, should use > openshift_logging_es_ops_pvc_dynamic. see Comment 25 If you read the full history data here, you'll find: the ORIGINAL issue was addressed and fixed as mentioned in comment #20, the PR: https://github.com/openshift/openshift-ansible/pull/3548 Please be careful next time when you resolve a bug as Not A BUG. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3049 |