Bug 1500362
Summary: | nova-manage db archive_deleted_rows fails with Duplicate entry for key 'PRIMARY' | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Martin Schuppert <mschuppe> |
Component: | openstack-nova | Assignee: | Eoghan Glynn <eglynn> |
Status: | CLOSED DUPLICATE | QA Contact: | Archit Modi <amodi> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 9.0 (Mitaka) | CC: | awaugama, berrange, dasmith, ebarrera, eglynn, hfukumot, kchamart, lyarwood, mbooth, mschuppe, sbauza, sferdjao, sgordon, srevivo, stephenfin, vromanso |
Target Milestone: | Upstream M2 | Keywords: | Triaged |
Target Release: | 14.0 (Rocky) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-03-09 14:52:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Martin Schuppert
2017-10-10 12:59:04 UTC
I was able to reproduce the dupe primary key situation when we start the archive job on two nodes at the same time, but that will not result in the situation that information get moved to the shadow tables and prim key starts fresh. # su - nova -s /bin/bash -c "nova-manage --verbose db archive_deleted_rows --max_rows 100 --until-complete" Option "verbose" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future. An error has occurred: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1606, in main ret = fn(*fn_args, **fn_kwargs) File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 681, in archive_deleted_rows run = db.archive_deleted_rows(max_rows) File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 2040, in archive_deleted_rows return IMPL.archive_deleted_rows(max_rows=max_rows) File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 6564, in archive_deleted_rows tablename, max_rows=max_rows - total_rows_archived) File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 6513, in _archive_deleted_rows_for_table conn.execute(insert) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute return meth(self, multiparams, params) File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context context) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception util.raise_from_cause(newraise, exc_info) File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context context) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute cursor.execute(statement, parameters) File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute result = self._query(query) File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query conn.query(q) File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result result.read() File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read first_packet = self.connection._read_packet() File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet packet.check_error() File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error err.raise_mysql_exception(self._data) File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception raise errorclass(errno, errval) DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '4553' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_actions_events (created_at, updated_at, deleted_at, deleted, id, event, action_id, start_time, finish_time, result, traceback, host, details) SELECT instance_actions_events.created_at, instance_actions_events.updated_at, instance_actions_events.deleted_at, instance_actions_events.deleted, instance_actions_events.id, instance_actions_events.event, instance_actions_events.action_id, instance_actions_events.start_time, instance_actions_events.finish_time, instance_actions_events.result, instance_actions_events.traceback, instance_actions_events.host, instance_actions_events.details \nFROM instance_actions_events \nWHERE instance_actions_events.deleted != %(deleted_1)s ORDER BY instance_actions_events.id \n LIMIT %(param_1)s'] [parameters: {u'param_1': 100, u'deleted_1': 0}] This issue here got also reported as seen in the nova-rowsflush.log, but is likely related that we start all archive jobs on all controllers at the same time. There is no delay as we do it with the keystone token_flush where we create it with: keystone::cron::token_flush::maxdelay if $maxdelay == 0 { $sleep = '' } else { $sleep = "sleep `expr \${RANDOM} \\% ${maxdelay}`; " } Was able to reproduce with the following sequence:
1) create x instances
2) delete x instances
3) archive the instances
4) restart mariadb
5) create new instance
6) delete instance
7) archive the instance
=>
# su - nova -s /bin/bash -c "nova-manage db archive_deleted_rows --max_rows 100"
Option "verbose" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future.
Option "notification_topics" from group "DEFAULT" is deprecated. Use option "topics" from group "oslo_messaging_notifications".
error: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '1' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_actions_events (created_at, updated_at, deleted_at, deleted, id, event, action_id, start_time, finish_time, result, traceback, host, details) SELECT instance_actions_events.created_at, instance_actions_events.updated_at, instance_actions_events.deleted_at, instance_actions_events.deleted, instance_actions_events.id, instance_actions_events.event, instance_actions_events.action_id, instance_actions_events.start_time, instance_actions_events.finish_time, instance_actions_events.result, instance_actions_events.traceback, instance_actions_events.host, instance_actions_events.details \nFROM instance_actions_events \nWHERE instance_actions_events.deleted != %s ORDER BY instance_actions_events.id \n LIMIT %s'] [parameters: (0, 100)]
From discussing this with Melanie Witt:
> While I was looking into this, I learned that the InnoDB engine stores its
> AUTO_INCREMENT counter only in memory, and after a restart it selects the
> maximum value and increments it by one to initialize the counter again.
Martin, is the customer okay with just dumping the shadow tables before each archive run? Because that's the easy workaround here. The alternative is ... not easy. (In reply to Dan Smith from comment #3) > Martin, is the customer okay with just dumping the shadow tables before each > archive run? Because that's the easy workaround here. The alternative is ... > not easy. Yes I think that should be fine. We might want to backup the tables before, in case any information is needed at some later point, like: DATE=$(date -I); mkdir -p /srv/$DATE; for i in $(mysql -u root nova -e "show tables like 'shadow%';"); do echo "backup and truncate table $i"; mysqldump -u root nova $i > /srv/$DATE/$i && mysql -u root nova -e "truncate table $i;"; done I'm closing this as a duplicate of bz1154875 which is DB purging functionality merged in master which will be released in OSP14. *** This bug has been marked as a duplicate of bug 1154875 *** (In reply to Martin Schuppert from comment #2) > Was able to reproduce with the following sequence: > > 1) create x instances > 2) delete x instances > 3) archive the instances > 4) restart mariadb In case it's not clear to others (because it took me a while to figure out), this is the problematic step... > 5) create new instance > 6) delete instance > 7) archive the instance > => > # su - nova -s /bin/bash -c "nova-manage db archive_deleted_rows --max_rows > 100" > Option "verbose" from group "DEFAULT" is deprecated for removal. Its value > may be silently ignored in the future. > Option "notification_topics" from group "DEFAULT" is deprecated. Use option > "topics" from group "oslo_messaging_notifications". > error: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '1' for key > 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_actions_events (created_at, > updated_at, deleted_at, deleted, id, event, action_id, start_time, > finish_time, result, traceback, host, details) SELECT > instance_actions_events.created_at, instance_actions_events.updated_at, > instance_actions_events.deleted_at, instance_actions_events.deleted, > instance_actions_events.id, instance_actions_events.event, > instance_actions_events.action_id, instance_actions_events.start_time, > instance_actions_events.finish_time, instance_actions_events.result, > instance_actions_events.traceback, instance_actions_events.host, > instance_actions_events.details \nFROM instance_actions_events \nWHERE > instance_actions_events.deleted != %s ORDER BY instance_actions_events.id \n > LIMIT %s'] [parameters: (0, 100)] > > From discussing this with Melanie Witt: > > While I was looking into this, I learned that the InnoDB engine stores its > > AUTO_INCREMENT counter only in memory, and after a restart it selects the > > maximum value and increments it by one to initialize the counter again. ...because of this ^ As noted, InnoDB stores the AUTO_INCREMENT counter only in memory, and after a restart it selects the maximum value and increments it by one to initialize the counter again. By archiving instances, we have copied lots of rows from the prod tables into the shadow tables and then deleted those rows from the prod tables. By restarting mariadb, our AUTO_INCREMENT counter will get reset and may be set to a lower value that previously, since some of the instances with the highest IDs may have been archived. This isn't an issue for the prod tables since there will be no entries with IDs greater than the AUTO_INCREMENT counter (they were deleted), but it will be an issue for the archive tables. There are a few solutions: we could manually set the AUTO_INCREMENT counter start values for the prod tables based on the max value in the archive table, possibly by way of an 'init_file' [1], we could use a different engine that doesn't have this in-memory AUTO_INCREMENT counter behavior (I'm not aware of any), or we could suggest people either never archive or always archive and purge. None of those are particularly attractive resolutions, but the third one is the least invasive and therefore the one we've opted for here. [1] https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_init_file |