Bug 1500362

Summary:	nova-manage db archive_deleted_rows fails with Duplicate entry for key 'PRIMARY'
Product:	Red Hat OpenStack	Reporter:	Martin Schuppert <mschuppe>
Component:	openstack-nova	Assignee:	Eoghan Glynn <eglynn>
Status:	CLOSED DUPLICATE	QA Contact:	Archit Modi <amodi>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	9.0 (Mitaka)	CC:	awaugama, berrange, dasmith, ebarrera, eglynn, hfukumot, kchamart, lyarwood, mbooth, mschuppe, sbauza, sferdjao, sgordon, srevivo, stephenfin, vromanso
Target Milestone:	Upstream M2	Keywords:	Triaged
Target Release:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-09 14:52:05 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Martin Schuppert 2017-10-10 12:59:04 UTC

Description of problem:

Due to an yet unknown reason we can end up in having duplicate primary keys in 
prod and shadow nova tables, which make the cleanup cron job to fail:
~~~
1 */12 * * * nova-manage db archive_deleted_rows --max_rows 100 >>/dev/null 2>&1
~~~

This was reported on OSP9 and OSP11.

On a broken situation we have:

- 1062 deleted instances in the non shadow tables
# mysql -u root nova -e "select count(*) from instances where deleted_at is not null\G;"
*************************** 1. row ***************************
count(*): 1062

- 1250 deleted instances got previously migrated from the prod tables to the shadow tables
# mysql -u root nova -e "select count(*) from shadow_instances\G;"
*************************** 1. row ***************************
count(*): 1250

- running the job manually fails with dup primary key 3 in shadow_reservations table:

# su - nova -s /bin/bash -c "nova-manage db archive_deleted_rows --max_rows 100"
Option "verbose" from group "DEFAULT" is deprecated for removal.  Its value may be silently ignored in the future.
Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".
error: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '3' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_reservations (created_at, updated_at, deleted_at, deleted, id, uuid, usage_id, project_id, user_id, resource, delta, expire) SELECT reservations.created_at, reservations.updated_at, reservations.deleted_at, reservations.deleted, reservations.id, reservations.uuid, reservations.usage_id, reservations.project_id, reservations.user_id, reservations.resource, reservations.delta, reservations.expire \nFROM reservations \nWHERE reservations.deleted != %s ORDER BY reservations.id \n LIMIT %s'] [parameters: (0, 100)]

Checking the id='3' in shadow_reservations and reservations table:

# mysql -u root nova -e "select * from shadow_reservations where id='3'\G;" 
*************************** 1. row ***************************
created_at: 2017-05-31 18:44:58
updated_at: NULL
deleted_at: 2017-05-31 18:44:58
        id: 3
      uuid: 693727f4-a06e-4b0f-be68-9696e6367e1b
  usage_id: 3
project_id: 1a03f74536e0494681379940c80cf994
  resource: instances
     delta: 1
    expire: 2017-06-01 18:44:58
   deleted: 3
   user_id: 2f33b551cc1d4d3ea85880d1f5e07428

# mysql -u root nova -e "select * from reservations where id='3'\G;" 
*************************** 1. row ***************************
created_at: 2017-07-05 18:38:46
updated_at: NULL
deleted_at: 2017-07-05 18:38:49
        id: 3
      uuid: e9d7983b-9413-4b6b-894b-23828a7a3e93
  usage_id: 180
project_id: 29417c4074cf40699337f392c20b35ee
  resource: instances
     delta: -1
    expire: 2017-07-06 18:38:46
   deleted: 3
   user_id: 45aadbf32f8da0be1ff49749f2e80e96a117b125e68ebc65b2f422665011325e

The primary key seems have started from the beginning at some point.
Checking the last entry in the shadow_reservations table the last successful 
run was a day before the new dup entry got created in the reservations table.

*************************** 6350. row ***************************
created_at: 2017-07-04 18:33:53
updated_at: NULL
deleted_at: 2017-07-04 18:33:53
        id: 17907
      uuid: c468f4cc-7e84-43cf-8a33-0b54f69bcab1
  usage_id: 186
project_id: 29417c4074cf40699337f392c20b35ee
  resource: cores
     delta: 1
    expire: 2017-07-05 18:33:53
   deleted: 17907
   user_id: 45aadbf32f8da0be1ff49749f2e80e96a117b125e68ebc65b2f422665011325e

Version-Release number of selected component (if applicable):
openstack-nova-13.1.4-7

How reproducible:
unknown at the moment

Actual results:


Expected results:


Additional info:

Comment 1 Martin Schuppert 2017-10-10 13:07:22 UTC

I was able to reproduce the dupe primary key situation when we start the archive job on two nodes at the same time,
but that will not result in the situation that information get moved to the shadow tables and prim key starts fresh.

# su - nova -s /bin/bash -c "nova-manage --verbose db archive_deleted_rows --max_rows 100 --until-complete"
Option "verbose" from group "DEFAULT" is deprecated for removal.  Its value may be silently ignored in the future.
An error has occurred:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1606, in main
    ret = fn(*fn_args, **fn_kwargs)
  File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 681, in archive_deleted_rows
    run = db.archive_deleted_rows(max_rows)
  File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 2040, in archive_deleted_rows
    return IMPL.archive_deleted_rows(max_rows=max_rows)
  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 6564, in archive_deleted_rows
    tablename, max_rows=max_rows - total_rows_archived)
  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 6513, in _archive_deleted_rows_for_table
    conn.execute(insert)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception
    util.raise_from_cause(newraise, exc_info)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
    result = self._query(query)
  File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result
    result.read()
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read
    first_packet = self.connection._read_packet()
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet
    packet.check_error()
  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
    raise errorclass(errno, errval)
DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '4553' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_actions_events (created_at, updated_at, deleted_at, deleted, id, event, action_id, start_time, finish_time, result, traceback, host, details) SELECT instance_actions_events.created_at, instance_actions_events.updated_at, instance_actions_events.deleted_at, instance_actions_events.deleted, instance_actions_events.id, instance_actions_events.event, instance_actions_events.action_id, instance_actions_events.start_time, instance_actions_events.finish_time, instance_actions_events.result, instance_actions_events.traceback, instance_actions_events.host, instance_actions_events.details \nFROM instance_actions_events \nWHERE instance_actions_events.deleted != %(deleted_1)s ORDER BY instance_actions_events.id \n LIMIT %(param_1)s'] [parameters: {u'param_1': 100, u'deleted_1': 0}]

This issue here got also reported as seen in the nova-rowsflush.log, but is likely related that we 
start all archive jobs on all controllers at the same time. There is no delay as we do it with 
the keystone token_flush where we create it with: keystone::cron::token_flush::maxdelay

  if $maxdelay == 0 {
    $sleep = ''
  } else {
    $sleep = "sleep `expr \${RANDOM} \\% ${maxdelay}`; "
  }

Comment 2 Martin Schuppert 2017-10-10 13:19:08 UTC

Was able to reproduce with the following sequence:

1) create x instances
2) delete x instances
3) archive the instances
4) restart mariadb
5) create new instance
6) delete instance
7) archive the instance
=> 
# su - nova -s /bin/bash -c "nova-manage db archive_deleted_rows --max_rows 100"
Option "verbose" from group "DEFAULT" is deprecated for removal.  Its value may be silently ignored in the future.
Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".
error: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '1' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_actions_events (created_at, updated_at, deleted_at, deleted, id, event, action_id, start_time, finish_time, result, traceback, host, details) SELECT instance_actions_events.created_at, instance_actions_events.updated_at, instance_actions_events.deleted_at, instance_actions_events.deleted, instance_actions_events.id, instance_actions_events.event, instance_actions_events.action_id, instance_actions_events.start_time, instance_actions_events.finish_time, instance_actions_events.result, instance_actions_events.traceback, instance_actions_events.host, instance_actions_events.details \nFROM instance_actions_events \nWHERE instance_actions_events.deleted != %s ORDER BY instance_actions_events.id \n LIMIT %s'] [parameters: (0, 100)]

From discussing this with Melanie Witt:
> While I was looking into this, I learned that the InnoDB engine stores its
> AUTO_INCREMENT counter only in memory, and after a restart it selects the
> maximum value and increments it by one to initialize the counter again.

Comment 3 Dan Smith 2017-10-12 18:17:05 UTC

Martin, is the customer okay with just dumping the shadow tables before each archive run? Because that's the easy workaround here. The alternative is ... not easy.

Comment 4 Martin Schuppert 2017-10-13 15:56:18 UTC

(In reply to Dan Smith from comment #3)
> Martin, is the customer okay with just dumping the shadow tables before each
> archive run? Because that's the easy workaround here. The alternative is ...
> not easy.

Yes I think that should be fine. We might want to backup the tables before, in case any information is needed at some later point, like:

DATE=$(date -I); mkdir -p /srv/$DATE; for i in $(mysql -u root nova -e "show tables like 'shadow%';"); do echo "backup and truncate table $i"; mysqldump -u root nova $i > /srv/$DATE/$i && mysql -u root nova -e "truncate table $i;"; done

Comment 9 Dan Smith 2018-03-09 14:52:05 UTC

I'm closing this as a duplicate of bz1154875 which is DB purging functionality merged in master which will be released in OSP14.

*** This bug has been marked as a duplicate of bug 1154875 ***

Comment 10 Stephen Finucane 2019-09-19 16:17:57 UTC

(In reply to Martin Schuppert from comment #2)
> Was able to reproduce with the following sequence:
> 
> 1) create x instances
> 2) delete x instances
> 3) archive the instances
> 4) restart mariadb

In case it's not clear to others (because it took me a while to figure out), this is the problematic step...

> 5) create new instance
> 6) delete instance
> 7) archive the instance
> => 
> # su - nova -s /bin/bash -c "nova-manage db archive_deleted_rows --max_rows
> 100"
> Option "verbose" from group "DEFAULT" is deprecated for removal.  Its value
> may be silently ignored in the future.
> Option "notification_topics" from group "DEFAULT" is deprecated. Use option
> "topics" from group "oslo_messaging_notifications".
> error: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '1' for key
> 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_actions_events (created_at,
> updated_at, deleted_at, deleted, id, event, action_id, start_time,
> finish_time, result, traceback, host, details) SELECT
> instance_actions_events.created_at, instance_actions_events.updated_at,
> instance_actions_events.deleted_at, instance_actions_events.deleted,
> instance_actions_events.id, instance_actions_events.event,
> instance_actions_events.action_id, instance_actions_events.start_time,
> instance_actions_events.finish_time, instance_actions_events.result,
> instance_actions_events.traceback, instance_actions_events.host,
> instance_actions_events.details \nFROM instance_actions_events \nWHERE
> instance_actions_events.deleted != %s ORDER BY instance_actions_events.id \n
> LIMIT %s'] [parameters: (0, 100)]
> 
> From discussing this with Melanie Witt:
> > While I was looking into this, I learned that the InnoDB engine stores its
> > AUTO_INCREMENT counter only in memory, and after a restart it selects the
> > maximum value and increments it by one to initialize the counter again.

...because of this ^

As noted, InnoDB stores the AUTO_INCREMENT counter only in memory, and after a restart it selects the maximum value and increments it by one to initialize the counter again. By archiving instances, we have copied lots of rows from the prod tables into the shadow tables and then deleted those rows from the prod tables. By restarting mariadb, our AUTO_INCREMENT counter will get reset and may be set to a lower value that previously, since some of the instances with the highest IDs may have been archived. This isn't an issue for the prod tables since there will be no entries with IDs greater than the AUTO_INCREMENT counter (they were deleted), but it will be an issue for the archive tables.

There are a few solutions: we could manually set the AUTO_INCREMENT counter start values for the prod tables based on the max value in the archive table, possibly by way of an 'init_file' [1], we could use a different engine that doesn't have this in-memory AUTO_INCREMENT counter behavior (I'm not aware of any), or we could suggest people either never archive or always archive and purge. None of those are particularly attractive resolutions, but the third one is the least invasive and therefore the one we've opted for here.

[1] https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_init_file