| Summary: | [etcd3]All defined namespaces and contents disappear after migrating from etcd2 to etcd3 storage | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> |
| Component: | Master | Assignee: | Mo <mkhan> |
| Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.4.0 | CC: | anli, aos-bugs, dma, eparis, jokerman, mfojtik, mifiedle, mmccomas, sdodson, tdawson, xtian |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The code used to build the root etcd prefix was not the same between etcdv2 and etcdv3.
Consequence: After migrating from etcdv2 to etcdv3, the cluster was unable to find any data if a root etcd prefix was used that did not start with a "/" (which is the default case for OpenShift).
Fix: Use the same code to build the root etcd prefix for both etcdv2 and etcdv3.
Result: After a migration, the cluster is able to find migrated data as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-04-12 19:16:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Attachments: | |||
|
Description
Mike Fiedler
2016-11-10 09:03:02 UTC
Dumped keys after migration - content seems to be there. Lots of list failures in the master logs: Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.662287 74760 reflector.go:199] pkg/controller/informers/factory.go:89: Failed to list *api.ServiceAccount: User "system:openshift-master" cannot list all serviceaccounts in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.664074 74760 reflector.go:199] pkg/controller/informers/factory.go:89: Failed to list *api.LimitRange: User "system:openshift-master" cannot list all limitranges in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.664192 74760 reflector.go:199] pkg/controller/informers/factory.go:89: Failed to list *api.Namespace: User "system:openshift-master" cannot list all namespaces in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.664393 74760 reflector.go:199] github.com/openshift/origin/pkg/controller/shared/shared_informer.go:89: Failed to list *api.SecurityContextConstraints: User "system:openshift-master" cannot list all securitycontextconstraints in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.664406 74760 reflector.go:199] github.com/openshift/origin/pkg/controller/shared/shared_informer.go:89: Failed to list *api.ImageStream: User "system:openshift-master" cannot list all imagestreams in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.762346 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list *api.ServiceAccount: User "system:openshift-master" cannot list all serviceaccounts in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.763258 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list *api.ResourceQuota: User "system:openshift-master" cannot list all resourcequotas in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.778946 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list *api.Secret: User "system:openshift-master" cannot list all secrets in the cluster Feb 14 15:32:51 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:51.779168 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/storageclass/default/admission.go:75: Failed to list *storage.StorageClass: User "system:openshift-master" cannot list all storage.k8s.io.storageclasses in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.020282 74760 reflector.go:199] github.com/openshift/origin/pkg/controller/shared/shared_informer.go:101: Failed to list *api.ClusterResourceQuota: User "system:openshift-master" cannot list all clusterresourcequotas in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.020383 74760 reflector.go:188] github.com/openshift/origin/pkg/project/cache/cache.go:107: Failed to list *api.Namespace: User "system:openshift-master" cannot list all namespaces in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.668403 74760 reflector.go:199] github.com/openshift/origin/pkg/controller/shared/shared_informer.go:89: Failed to list *api.SecurityContextConstraints: User "system:openshift-master" cannot list all securitycontextconstraints in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.668493 74760 reflector.go:199] github.com/openshift/origin/pkg/controller/shared/shared_informer.go:89: Failed to list *api.ImageStream: User "system:openshift-master" cannot list all imagestreams in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.668574 74760 reflector.go:199] pkg/controller/informers/factory.go:89: Failed to list *api.ServiceAccount: User "system:openshift-master" cannot list all serviceaccounts in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.668635 74760 reflector.go:199] pkg/controller/informers/factory.go:89: Failed to list *api.LimitRange: User "system:openshift-master" cannot list all limitranges in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.668691 74760 reflector.go:199] pkg/controller/informers/factory.go:89: Failed to list *api.Namespace: User "system:openshift-master" cannot list all namespaces in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.763775 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:103: Failed to list *api.ServiceAccount: User "system:openshift-master" cannot list all serviceaccounts in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.765069 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/resourcequota/resource_access.go:83: Failed to list *api.ResourceQuota: User "system:openshift-master" cannot list all resourcequotas in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.802041 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/storageclass/default/admission.go:75: Failed to list *storage.StorageClass: User "system:openshift-master" cannot list all storage.k8s.io.storageclasses in the cluster Feb 14 15:32:52 ip-172-31-2-44 atomic-openshift-master: E0214 15:32:52.802262 74760 reflector.go:199] github.com/openshift/origin/vendor/k8s.io/kubernetes/plugin/pkg/admission/serviceaccount/admission.go:119: Failed to list *api.Secret: User "system:openshift-master" cannot list all secrets in the cluster Created attachment 1250335 [details]
Before dump (keys only)
Created attachment 1250336 [details]
After dump
It's blocked on the admission controller post conversion. Some number of denials are normal at server start as the authz cache fills. Do the list errors continue indefinitely in the log? Will re-test today and give it more time. Will keep the env around as well if anyone wants to take a look at it. OCP 3.5.0.20 and etcd 3.1.0 - created 2 projects with deployments, builds, secrets, routes, services etc. Everything working. - shutdown etcd, shutdown master (single master) - ETCDCTL_API=3 etcdctl migrate --data-dir=/var/lib/etcd --no-ttl - Updated master-config.yaml to use storage-backend "etcd3" - started etcd, started master, restarted all nodes There were some initial list failures in the master log, but as indicated in comment 11 they eventually went away. They did not continue indefinitely. The logs were quiet after the nodes re-registered. However oc get on projects, pods, dc, services, builds, etc did not return any of the resources created pre-migration. No errors in the log (attached). The default projects are there but no imagestreams, templates, etc that are part of the install. New resources can be created and displayed, but pre-migration items are gone even though they seem to exist in etcdctl get "" --from-key Created attachment 1250628 [details]
Log for initial master startup after etcd data migration
whatever the migration is doing, it is making etcd appear completely empty to the master at startup: ensure.go:222] No cluster policy found. Creating bootstrap policy based on: /etc/origin/master/policy.json can you dump the contents of etcd after the master has started up after migration? want to find out where the new content is getting stored Created attachment 1250636 [details]
Dump after master restart and creating 1 project (keys only)
Created attachment 1250637 [details]
Dump after master restart and creating 1 project (keys/values)
Projects mff0 and mff1 created before migration. Project mff (sorry for similarity) created after migration and master restart. There are two sets of keys, one with leading slashes and one without. Looks like the etcd3 client does not include leading '/' when accessing the data, which makes all existing data seem to disappear. New data is created without leading slashes Part of the fix is https://github.com/kubernetes/kubernetes/pull/42506 I will open another PR to handle some decoder issues. Upon further research the only change we need is https://github.com/kubernetes/kubernetes/pull/42506 (it may be a bit before we have this change in origin). Decoder issues only occur in unsupported configurations (going from etcdv2+protobuf to etcdv3+protobuf instead of the supported etcdv2+json to etcdv3+protobuf). Origin PR open at https://github.com/openshift/origin/pull/13298 The fix is merged into master and the 1.5 release branch. No pick is required for OSE. https://github.com/openshift/origin/pull/13298 https://github.com/openshift/origin/pull/13299 This has been merged into ocp and is in OCP v3.5.0.52 or newer. Verified on 3.5.0.52.
1. Run cluster-loader to create 10 projects with builds, bcs, svc, routes, dcs, rcs, pods and secrets.
2. shutdown master and etcd
3. ETCDCTL_API=3 ./etcdctl migrate --data-dir=${data_dir} --no-ttl
4. restart etcd
5. configure master-api for etcd3 storage
6. restart master
Verify all expected resource exist, all pods are running, users can login, etc.
Create new resources and verify they work as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884 |