Description of problem: - When controllerLeaseTTL has 0s in master-config.yaml, "TASK [etcd : Re-introduce leases (as a replacement for key TTLs)] fails since oc adm migrate etcd-ttl --lease-duration 0 is invalid. Version-Release number of the following components: # rpm -qa |grep ansible openshift-ansible-playbooks-3.7.61-1.git.0.36791ef.el7.noarch ansible-2.4.2.0-2.el7.noarch openshift-ansible-3.7.61-1.git.0.36791ef.el7.noarch openshift-ansible-filter-plugins-3.7.61-1.git.0.36791ef.el7.noarch openshift-ansible-callback-plugins-3.7.61-1.git.0.36791ef.el7.noarch openshift-ansible-roles-3.7.61-1.git.0.36791ef.el7.noarch openshift-ansible-docs-3.7.61-1.git.0.36791ef.el7.noarch openshift-ansible-lookup-plugins-3.7.61-1.git.0.36791ef.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Configure controllerLeaseTTL: 0 in master-config.yaml. /etc/origin/master/master-config.yaml ``` controllerLeaseTTL: 0 ``` 2. Run /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml Actual results: Failed with following error: failed: [xxx.example.com] (item={u'keys': u'/openshift.io/leases/controllers', u'ttl': u'0s'}) => {"changed": true, "cmd": ["oadm", "migrate", "etcd-ttl", "--cert", "/etc/origin/master/mast$ r.etcd-client.crt", "--key", "/etc/origin/master/master.etcd-client.key", "--cacert", "/etc/origin/master/master.etcd-ca.crt", "--etcd-address", "https://<etcdip>:2379", "--ttl-keys-prefix", "<built-in$ , "method", "keys", "of", "dict", "object", "at", "0x7faff2b984b0>", "--lease-duration", "0s"], "delta": "0:00:00.230322", "end": "2018-08-26 14:43:21.794366", "item": {"keys": "/openshift.io/leases/control$ ers", "ttl": "0s"}, "msg": "non-zero return code", "rc": 1, "start": "2018-08-26 14:43:21.564044", "stderr": "error: --lease-duration must be at least one second", "stderr_lines": ["error: --lease-duration $ ust be at least one second"], "stdout": "", "stdout_lines": []} Expected results: Completed ansible playbook. Additional info: - Full logs attached in private
Although we skipped the failed task and completed the playbook manually, the customer eagers to know we should set TTL of "/openshift.io/leases/controllers" to 30s (default in playbook) or should leave it. Could you please advice we should run "oadm migrate etcd-ttl --lease-duration 30" for /openshift.io/leases/controllers" (like [1]) or do not need it when controllerLeaseTTL: 0 was set in master-config.yaml [1] https://github.com/openshift/openshift-ansible/blob/release-3.6/roles/etcd_migrate/tasks/add_ttls.yml#L11-L33
proposal fix: https://github.com/openshift/openshift-ansible/pull/9768
David, Can you comment on the validity of setting the controller lease TTL to 0s?
David, Jordan If we had "controllerLeaseTTL: 0" in master-config.yaml for etcdv2, what is the best TTL for "/openshift.io/leases/controllers" during the migration(etcdv2->v3), some default value(30s) or nothing should be set?
The controllerLeaseTTL is about leader election for a controller (https://github.com/openshift/origin/blob/release-3.7/pkg/cmd/server/api/v1/types.go#L216-L223) `oc adm migrate etcd-ttl --lease-duration 0` is about migrating from etcd2 to etcd3. I don't think they're related. Why does the controllerLeaseTTL affect the `oc adm migrate` call?
David, we were given a list of keys for which we should re-attach TTLs and told to attach TTLs that were inline with master configuration values that would've affected their original TTLs. To me, the question is when we face a TTL configured for 0s what do we do? Do we skip migrating TTLs for that key? Do we set a TTL of 1s because we need to ensure that some TTL exists to clean up the key?
Discussed with David, If we find a configured value of 0s we should migrate with a 1s TTL so that we can be sure the key expires. Without a TTL being re-attached there's a change the key may persist forever when it otherwise should've been removed. Kenjiro, can you make that so?
Thank you. Sure, I can. But please let me confirm one more thing. Current controllerLeaseTTL could be set "0", omitted or "-1"[1]. "controllerLeaseTTL: -1" should also be migrated to "1", correct? [1] https://github.com/openshift/origin/blob/release-3.7/pkg/cmd/server/api/v1/types.go#L216-L223 "This value defaults off (0, or omitted) and controller election can be disabled with -1."
Scott, David, is there any update for the fix?
I'm sorry again, but can we get any update for this?
(In reply to Kenjiro Nakayama from comment #9) > Thank you. Sure, I can. But please let me confirm one more thing. > > Current controllerLeaseTTL could be set "0", omitted or "-1"[1]. > "controllerLeaseTTL: -1" should also be migrated to "1", correct? > > [1] > https://github.com/openshift/origin/blob/release-3.7/pkg/cmd/server/api/v1/ > types.go#L216-L223 > "This value defaults off (0, or omitted) and controller election can be > disabled with -1." Yes, if controllerLeaseTTL > 0 migrate with ttl set to controllerLeaseTTL, else migrate with ttl set to 1s.
Okay, then is there any reason why https://github.com/openshift/openshift-ansible/pull/9768 is not merged?
No, merged it.
QE could reproduce this issue with openshift-ansible-3.7.61-1.git.0.36791ef.el7.noarch.rpm When master has "controllerLeaseTTL: 0" configured in master-config.yaml, run playbooks/byo/openshift-etcd/migrate.yml to migrate etcd v2 date will fail as below: failed: [ec2-34-229-247-224.compute-1.amazonaws.com] (item={u'keys': u'/openshift.io/leases/controllers', u'ttl': u'0s'}) => {"changed": true, "cmd": ["oadm", "migrate", "etcd-ttl", "--cert", "/etc/origin/master/master.etcd-client.crt", "--key", "/etc/origin/master/master.etcd-client.key", "--cacert", "/etc/origin/master/master.etcd-ca.crt", "--etcd-address", "https://172.18.8.214:2379", "--ttl-keys-prefix", "<built-in", "method", "keys", "of", "dict", "object", "at", "0x7ffaafe545c8>", "--lease-duration", "0s"], "delta": "0:00:00.549652", "end": "2018-10-09 03:41:07.641047", "failed": true, "item": {"keys": "/openshift.io/leases/controllers", "ttl": "0s"}, "msg": "non-zero return code", "rc": 1, "start": "2018-10-09 03:41:07.091395", "stderr": "error: --lease-duration must be at least one second", "stderr_lines": ["error: --lease-duration must be at least one second"], "stdout": "", "stdout_lines": []} Verify this bug with openshift-ansible-3.7.65-1.git.0.de90d64.el7.noarch.rpm, the migration playbook run well without such error, it use controllerLeaseTTL as "1" instead. changed: [ec2-54-157-46-140.compute-1.amazonaws.com] => (item={u'keys': u'/openshift.io/leases/controllers', u'ttl': u'1s'}) => {"changed": true, "cmd": ["oadm", "migrate", "etcd-ttl", "--cert", "/etc/origin/master/master.etcd-client.crt", "--key", "/etc/origin/master/master.etcd-client.key", "--cacert", "/etc/origin/master/master.etcd-ca.crt", "--etcd-address", "https://172.18.0.129:2379", "--ttl-keys-prefix", "<built-in", "method", "keys", "of", "dict", "object", "at", "0x7f141881fb40>", "--lease-duration", "1s"], "delta": "0:00:00.502297", "end": "2018-10-09 04:16:05.808228", "failed": false, "item": {"keys": "/openshift.io/leases/controllers", "ttl": "1s"}, "rc": 0, "start": "2018-10-09 04:16:05.305931", "stderr": "", "stderr_lines": [], "stdout": "info: Lease #8195819424583871782 with TTL 4 created\ninfo: Attaching lease to 0 entries", "stdout_lines": ["info: Lease #8195819424583871782 with TTL 4 created", "info: Attaching lease to 0 entries"]}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2906
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days