Bug 1622336
Summary: | etcd migrate playbook fail if controllerLeaseTTL has 0s in master-config.yaml | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Kenjiro Nakayama <knakayam> |
Component: | Cluster Version Operator | Assignee: | Scott Dodson <sdodson> |
Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.7.0 | CC: | aos-bugs, deads, jokerman, mmccomas, sdodson |
Target Milestone: | --- | ||
Target Release: | 3.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The etcd v2 to v3 migration playbooks improperly attempted to assign a TTL of 0 seconds to certain migrated keys when the environment was previously configured for a 0 second TTL.
Consequence: The v2 to v3 migration would fail.
Fix: The migration playbooks now assign a 1 second TTL to migrated keys when those keys had a 0 second TTL configured.
Result: This ensures that TTLs are migrated and those keys will immediately expire after 1s. This effectively provides a 0 second TTL because this migration process happens while the API is offline and those keys would expire prior to the API coming back online.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-11-21 11:56:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Kenjiro Nakayama
2018-08-26 12:15:42 UTC
Although we skipped the failed task and completed the playbook manually, the customer eagers to know we should set TTL of "/openshift.io/leases/controllers" to 30s (default in playbook) or should leave it. Could you please advice we should run "oadm migrate etcd-ttl --lease-duration 30" for /openshift.io/leases/controllers" (like [1]) or do not need it when controllerLeaseTTL: 0 was set in master-config.yaml [1] https://github.com/openshift/openshift-ansible/blob/release-3.6/roles/etcd_migrate/tasks/add_ttls.yml#L11-L33 proposal fix: https://github.com/openshift/openshift-ansible/pull/9768 David, Can you comment on the validity of setting the controller lease TTL to 0s? David, Jordan If we had "controllerLeaseTTL: 0" in master-config.yaml for etcdv2, what is the best TTL for "/openshift.io/leases/controllers" during the migration(etcdv2->v3), some default value(30s) or nothing should be set? The controllerLeaseTTL is about leader election for a controller (https://github.com/openshift/origin/blob/release-3.7/pkg/cmd/server/api/v1/types.go#L216-L223) `oc adm migrate etcd-ttl --lease-duration 0` is about migrating from etcd2 to etcd3. I don't think they're related. Why does the controllerLeaseTTL affect the `oc adm migrate` call? David, we were given a list of keys for which we should re-attach TTLs and told to attach TTLs that were inline with master configuration values that would've affected their original TTLs. To me, the question is when we face a TTL configured for 0s what do we do? Do we skip migrating TTLs for that key? Do we set a TTL of 1s because we need to ensure that some TTL exists to clean up the key? Discussed with David, If we find a configured value of 0s we should migrate with a 1s TTL so that we can be sure the key expires. Without a TTL being re-attached there's a change the key may persist forever when it otherwise should've been removed. Kenjiro, can you make that so? Thank you. Sure, I can. But please let me confirm one more thing. Current controllerLeaseTTL could be set "0", omitted or "-1"[1]. "controllerLeaseTTL: -1" should also be migrated to "1", correct? [1] https://github.com/openshift/origin/blob/release-3.7/pkg/cmd/server/api/v1/types.go#L216-L223 "This value defaults off (0, or omitted) and controller election can be disabled with -1." Scott, David, is there any update for the fix? I'm sorry again, but can we get any update for this? (In reply to Kenjiro Nakayama from comment #9) > Thank you. Sure, I can. But please let me confirm one more thing. > > Current controllerLeaseTTL could be set "0", omitted or "-1"[1]. > "controllerLeaseTTL: -1" should also be migrated to "1", correct? > > [1] > https://github.com/openshift/origin/blob/release-3.7/pkg/cmd/server/api/v1/ > types.go#L216-L223 > "This value defaults off (0, or omitted) and controller election can be > disabled with -1." Yes, if controllerLeaseTTL > 0 migrate with ttl set to controllerLeaseTTL, else migrate with ttl set to 1s. Okay, then is there any reason why https://github.com/openshift/openshift-ansible/pull/9768 is not merged? No, merged it. QE could reproduce this issue with openshift-ansible-3.7.61-1.git.0.36791ef.el7.noarch.rpm When master has "controllerLeaseTTL: 0" configured in master-config.yaml, run playbooks/byo/openshift-etcd/migrate.yml to migrate etcd v2 date will fail as below: failed: [ec2-34-229-247-224.compute-1.amazonaws.com] (item={u'keys': u'/openshift.io/leases/controllers', u'ttl': u'0s'}) => {"changed": true, "cmd": ["oadm", "migrate", "etcd-ttl", "--cert", "/etc/origin/master/master.etcd-client.crt", "--key", "/etc/origin/master/master.etcd-client.key", "--cacert", "/etc/origin/master/master.etcd-ca.crt", "--etcd-address", "https://172.18.8.214:2379", "--ttl-keys-prefix", "<built-in", "method", "keys", "of", "dict", "object", "at", "0x7ffaafe545c8>", "--lease-duration", "0s"], "delta": "0:00:00.549652", "end": "2018-10-09 03:41:07.641047", "failed": true, "item": {"keys": "/openshift.io/leases/controllers", "ttl": "0s"}, "msg": "non-zero return code", "rc": 1, "start": "2018-10-09 03:41:07.091395", "stderr": "error: --lease-duration must be at least one second", "stderr_lines": ["error: --lease-duration must be at least one second"], "stdout": "", "stdout_lines": []} Verify this bug with openshift-ansible-3.7.65-1.git.0.de90d64.el7.noarch.rpm, the migration playbook run well without such error, it use controllerLeaseTTL as "1" instead. changed: [ec2-54-157-46-140.compute-1.amazonaws.com] => (item={u'keys': u'/openshift.io/leases/controllers', u'ttl': u'1s'}) => {"changed": true, "cmd": ["oadm", "migrate", "etcd-ttl", "--cert", "/etc/origin/master/master.etcd-client.crt", "--key", "/etc/origin/master/master.etcd-client.key", "--cacert", "/etc/origin/master/master.etcd-ca.crt", "--etcd-address", "https://172.18.0.129:2379", "--ttl-keys-prefix", "<built-in", "method", "keys", "of", "dict", "object", "at", "0x7f141881fb40>", "--lease-duration", "1s"], "delta": "0:00:00.502297", "end": "2018-10-09 04:16:05.808228", "failed": false, "item": {"keys": "/openshift.io/leases/controllers", "ttl": "1s"}, "rc": 0, "start": "2018-10-09 04:16:05.305931", "stderr": "", "stderr_lines": [], "stdout": "info: Lease #8195819424583871782 with TTL 4 created\ninfo: Attaching lease to 0 entries", "stdout_lines": ["info: Lease #8195819424583871782 with TTL 4 created", "info: Attaching lease to 0 entries"]} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2906 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |