Bug 1389773 - Unexpected object conversion after migration to etcd3 storage
Summary: Unexpected object conversion after migration to etcd3 storage
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.4.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Timothy St. Clair
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-28 14:41 UTC by Mike Fiedler
Modified: 2017-07-24 14:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-31 21:08:00 UTC
Target Upstream Version:


Attachments (Terms of Use)
master log - search for "About to convert" and "failed to handle" (216.26 KB, application/x-gzip)
2016-10-28 14:41 UTC, Mike Fiedler
no flags Details

Description Mike Fiedler 2016-10-28 14:41:36 UTC
Created attachment 1215015 [details]
master log - search for "About to convert" and "failed to handle"

Description of problem:

After migrating cluster storage from etcd v2 to etcd v3, there are a large number of object conversion and conversion failure messages in the logs when the masters are re-started.

Opening this BZ for confirmation that conversion is expected when the only change is to etcd storage - i.e. no format/schema/version change in the underlying OCP data.

Example (see attached log)

Oct 27 20:23:02 192 atomic-openshift-node: I1027 20:23:02.339957    2922 conversion.go:133] failed to handle multiple devices for container. Skipping Filesystem stats
Oct 27 20:23:02 192 atomic-openshift-node: I1027 20:23:02.340009    2922 conversion.go:133] failed to handle multiple devices for container. Skipping Filesystem stats
Oct 27 20:23:02 192 atomic-openshift-master-api: I1027 20:23:02.485103  120864 trace.go:61] Trace "Update /api/v1/nodes/192.1.18.20/status" (started 2016-10-27 20:22:55.868264451 -0400 EDT):
Oct 27 20:23:02 192 atomic-openshift-master-api: [57.021µs] [57.021µs] About to convert to expected version
Oct 27 20:23:02 192 atomic-openshift-master-api: [174.003µs] [116.982µs] Conversion done
Oct 27 20:23:02 192 atomic-openshift-master-api: [180.27µs] [6.267µs] About to store object in database
Oct 27 20:23:02 192 atomic-openshift-master-api: [6.616439956s] [6.616259686s] Object stored in database
Oct 27 20:23:02 192 atomic-openshift-master-api: [6.616456216s] [16.26µs] Self-link added
Oct 27 20:23:02 192 atomic-openshift-master-api: [6.616754191s] [297.975µs] END
Oct 27 20:23:02 192 atomic-openshift-master-api: I1027 20:23:02.485155  120864 trace.go:61] Trace "Update /api/v1/nodes/192.1.18.218/status" (started 2016-10-27 20:22:55.813897248 -0400 EDT):
Oct 27 20:23:02 192 atomic-openshift-master-api: [113.948µs] [113.948µs] About to convert to expected version
Oct 27 20:23:02 192 atomic-openshift-master-api: [306.242µs] [192.294µs] Conversion done
Oct 27 20:23:02 192 atomic-openshift-master-api: [315.259µs] [9.017µs] About to store object in database
Oct 27 20:23:02 192 atomic-openshift-master-api: [6.670909865s] [6.670594606s] Object stored in database
Oct 27 20:23:02 192 atomic-openshift-master-api: [6.670922331s] [12.466µs] Self-link added
Oct 27 20:23:02 192 atomic-openshift-master-api: [6.671170995s] [248.664µs] END

Version-Release number of selected component (if applicable): 3.4.0.16 and etcd 3.0.12-3


How reproducible: always on first restart after etcd data migration to V3


Steps to Reproduce:
1.  Install an HA cluster (3 masters, 3 etcd) with OCP 3.4.0.16 + etcd 2.3.7
2.  Create projects with running deployments
3.  Shutdown masters and etcd.   Leave OpenShift nodes running.
4.  On each etcd:  yum swap etcd3 etcd to install etcd3 3.0.12-3. 
5.  On each etcd:  etcdctl migrate --data-dir /var/lib/etcd
6.  Start etcd on each 
7.  Start OpenShift masters

Actual results:

OpenShift master logs have many messages (see above) for object conversion and conversion failures.

Expected results:

Unknown - possibly unexpected that conversions are taking place when the objects are unchanged in terms of version or schema.   Looking for confirmation on correct behavior.

Comment 3 Timothy St. Clair 2016-10-31 20:11:34 UTC
Q1: Pre 'etcdctl migrate' were you seeing large # of conversion traces? 

C1: the "failed to handle multiple devices for container. Skipping Filesystem stats" is orthogonal to the api-conversion data.  

C2: There are Storage errors on leases which was expected, because we used the default migrator.

Comment 4 Timothy St. Clair 2016-10-31 21:08:00 UTC
So in chatting with clayton, there is indeed a 'v2mode:json -> v3mode:protobuf' conversion that is implicit.

We will simply need to document that the cluster should be brought back online slowly to allow for the data conversion.

We will also need to be certain that TTL keys are wiped on the migration. xref (https://github.com/coreos/etcd/issues/6767)


Note You need to log in before you can comment on or make changes to this bug.