Bug 1426511
| Summary: | Failed to fit node if nodeselector sepecified when upgrade logging stacks via ansible | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||||||||||
| Component: | Logging | Assignee: | ewolinet | ||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||||||||||
| Severity: | low | Docs Contact: | |||||||||||||
| Priority: | medium | ||||||||||||||
| Version: | 3.5.0 | CC: | aos-bugs, jcantril, juzhao, rmeggins, wmeng | ||||||||||||
| Target Milestone: | --- | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | Unspecified | ||||||||||||||
| OS: | Unspecified | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||||||||||
| Doc Text: |
undefined
|
Story Points: | --- | ||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2017-04-26 05:36:20 UTC | Type: | Bug | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
Junqi Zhao
2017-02-24 05:41:09 UTC
Created attachment 1257147 [details]
fully ansible running log
Created attachment 1257150 [details]
ansible inventory file
Can you attach the dc definitions after upgrade as well as node details so we may confirm we have matching selectors. Tested again, nodeSelector: logging-infra-fluentd: "true" are in both 3.3.1 and 3.5.0 dc, but pod is still in pending status after upgrading to 3.5.0,failed to fit on nodes. see attached logging_331_dc_info.txt and logging_350_dc_info.txt Created attachment 1258273 [details]
logging_331_dc_info.txt
Created attachment 1258274 [details]
logging_350_dc_info.txt
Can you attach node info: 'oc get nodes -o yaml' @Junqi I'm asking for the node yaml because I want to confirm the node is labeled with the selector to match the pod specs. Found the issue, during upgrade, we stop the cluster, which unlabels the node. We then try to start ES for upgrade but it cant be placed because it has the same nodeSelector that is used to deploy fluentd. @Junqi, Can you use a different node label (or omit it) for Elasticsearch? As part of the upgrade entry point, for the index migration we have only Elasticsearch running. However you are using the same node selector for ES as you are for Fluentd, and with the node not being labelled for Fluentd to be deployed, this means that the ES pod will never be able to be deployed scheduled on that node so the role will fail waiting for the pod to be available. If you'd like to still verify that we are adding the node selector to the ES DC, i would recommend pre-labelling the node with that label prior to running the playbook with the logging entry point. Lowering the severity as this is not a blocker and can be resolved by using a nodeSelector that is different from fluentd (In reply to ewolinet from comment #10) > @Junqi, > > Can you use a different node label (or omit it) for Elasticsearch? > As part of the upgrade entry point, for the index migration we have only > Elasticsearch running. However you are using the same node selector for ES > as you are for Fluentd, and with the node not being labelled for Fluentd to > be deployed, this means that the ES pod will never be able to be deployed > scheduled on that node so the role will fail waiting for the pod to be > available. > > If you'd like to still verify that we are adding the node selector to the ES > DC, i would recommend pre-labelling the node with that label prior to > running the playbook with the logging entry point. I have one questions: 1. Deployed logging 3.3.1 succeesfully, used same nodeSelector for curator, es and kibana curator-nodeselector=logging-infra-fluentd=true es-nodeselector=logging-infra-fluentd=true kibana-nodeselector=logging-infra-fluentd=true es_nodeselector and fluentd_nodeselector use different value, and upgrade to 3.5.0, same error happens. openshift_logging_es_nodeselector={'logging-infra-elasticsearch':'true'} openshift_logging_kibana_nodeselector={'logging-infra-fluentd':'true'} openshift_logging_curator_nodeselector={'logging-infra-fluentd':'true'} openshift_logging_fluentd_nodeselector={'logging-infra-fluentd':'true'} if we want to use nodeSelector for es, kibana and curator, should they have different value with each other and with fluentd_nodeselector, and they should be deployed on different node? They need to be different then fluentd since this is the mechanism by which fluentd gets deployed/undeployed. In practice, we should use these selectors and affinity/anti-affinity to spread the components across the infra nodes. @Junqi To add to what Jeff said above - this is in part due to the way that the 3.3 deployer unscheduled the Fluentd pods as part of the upgrade vs how the 3.5 ansible role does. With the deployer, we did not grant it access to label and unlabel nodes, so we simply deleted the logging-fluentd daemonset object and then later recreated it based on its template to schedule and unschedule the nodes. With the new role, we have access to label and unlabel nodes so we use that approach rather than deleting and recreating the daemonset object. use different nodeSelector with fluentd, this issue does not exist. but another error happend,https://bugzilla.redhat.com/show_bug.cgi?id=1428711 Set this defect as VERIFIED How did you do the upgrade from 3.3 to 3.5? I'm following the official documentation for 3.4: https://docs.openshift.com/container-platform/3.4/install_config/upgrading/automated_upgrades.html#preparing-for-an-automated-upgrade except that I'm using ansible-playbook -vvv -i /root/ansible-inventory playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.yml And I get this error message: MSG: openshift_release is 3.3 which is not a valid release for a 3.5 upgrade (In reply to Rich Megginson from comment #16) > How did you do the upgrade from 3.3 to 3.5? I'm following the official > documentation for 3.4: > https://docs.openshift.com/container-platform/3.4/install_config/upgrading/ > automated_upgrades.html#preparing-for-an-automated-upgrade > > except that I'm using > > ansible-playbook -vvv -i /root/ansible-inventory > playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.yml > > And I get this error message: > > MSG: > > openshift_release is 3.3 which is not a valid release for a 3.5 upgrade We specified the following ansible parameters to upgrade from 3.3.1 to 3.5.0 openshift_logging_install_logging=false openshift_logging_upgrade_logging=true Since the upgrade has a lot of errors, we have not upgraded successfully until now, this defect is about nodeSelector, use different nodeSelector with fluentd, this issue does not exist. We will continue to do the upgrade testing, will let you know if the upgrade process is successful Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1129 |