Bug 1955661 - [DB] Neutron quota request implementation can end in a lock status
Summary: [DB] Neutron quota request implementation can end in a lock status
Keywords:
Status: POST
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Rodolfo Alonso
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-30 15:46 UTC by Rodolfo Alonso
Modified: 2021-05-12 04:33 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1926787 0 None None None 2021-04-30 15:50:20 UTC
OpenStack gerrit 790060 0 None NEW New Quota driver ``DbQuotaNoLockDriver`` 2021-05-12 15:26:04 UTC

Description Rodolfo Alonso 2021-04-30 15:46:19 UTC
Neutron quota request implementation can end in a DB lock status. The quota is assigned per resource (port, network, security group, etc.) and per project. When a request is done, a DB lock is set for this (resource, project) tuple. This lock in the DB engine to lock this tuple in all workers of all API servers.

That implies there is a a bottleneck when a high number of requests arrive to the API at the same time. If the number of requests exceeds the number of resources processes, the DB locked transactions will increase indefinitely. This can be seen in the DB executing:

  $ mysql -e "show processlist;" | egrep "reservations|quotausages"


The query used by Neutron to lock this (resource, project) tuple is:

    UPDATE quotausages SET dirty=1 WHERE quotausages.project_id = <project_id> \
      AND quotausages.resource = <resource_type>


An improved quota system should be implemented that allow parallel resource request and avoids this DB lock status.

NOTE: please check [2][3]. "Neutron does not enforce quotas in such a way that a quota violation like this could never occur". That means even with this restrictive DB locking method, resource overcommit is possible.


[1]https://github.com/openstack/neutron/blob/b4812af4ee3cd651b0b03d5f90e71e8201ccfed7/neutron/objects/quota.py#L150
[2]https://bugzilla.redhat.com/show_bug.cgi?id=1884455#c2
[3]https://bugs.launchpad.net/neutron/+bug/1862050/comments/5

Comment 1 Michael Bayer 2021-05-03 13:41:28 UTC
My recommendation remains on this that Neutron should identify those transactions which impact the "quotausages" and potentially the "reservations" tables and ensure that READ COMMITTED isolation level is set for the scope of these transactions, rather than using the default of REPEATABLE READ.   "detected deadlock" status in the logs [4] as well as our observation of these locks in that they involve a small, tightly packed series of rows with lots of replacements, indicate the behavior is due to what mysql/mariadb refer to as a gap lock [1].  A gap lock is how the Innodb engine locks database rows in order to achieve MVCC, and roughly the idea is that instead of locking the row in question only, it locks additional (empty) gaps in the indexes for the table that prevent INSERT statements from occurring for those spaces as well as the row preceding and succeeding the row in question.  For the kinds of transactions it seems neutron is doing with tables like "quotausages" and possibly "reservations" , these tables are prime candidates for a large number of gap locks - a relatively small number of rows (quotausages has like 3000 or so) with lots and lots of UPDATEs, DELETES and INSERTs so there's lots of gaps in the indexes being created constantly, lots of gaps being locked, lots of transactions piling up.    In [2] we can see that the canonical solution to this issue is to use READ COMMITTED transaction isolation, which is also described in [1].    Any system in Neutron that involves lots of quick insert/update/delete at high speed, should be trying to use this isolation level rather than REPEATABLE READ.

The reason implementing this strategy may be difficult for Neutron is that the "quotausages" operations may be intrinsic to a large number of existing neutron functions, and downgrading the isolation level increases the risk that some parts of neutron may be relying upon the behavior of REPEATABLE READ to do its job of isolating operations from each other such that the value of a database row can't change from the outside mid-transaction.   I would not advise an across-the-board downgrade of the isolation level for this reason, though it is also something that might be worth testing if not for any other reason than figuring out where neutron may be relying upon REPEATABLE READ.    If the "quotausages" operations are in fact embedded across lots of other operations that do their own work in the same db transaction, I would suggest it might be worth it to move the "quotauages" table manipulation into a separate transaction that is by itself using READ COMMITTED; the code example below suggests one pattern using enginefacade that would allow this.

The strategy also requires that transaction be altered at the start of the operation - you can't change the isolation level after work as begun in the transaction (behind the scenes, SQLAlchemy emits "SET SESSION TRANSACTION ISOLATION LEVEL" and then a COMMIT, where then pymysql starts the new transaction upon first SQL operation, in order to implement this).    Neutron uses a heavy "nesting" pattern where the outermost method is the one that actually starts the transaction, and it may not be always clear where this happens in neutron's source code.   SQLAlchemy's means of setting the isolation level for the 1.3 series is documented at [3] (I spent time revising this series of docs specifically in response to conversations I've had with Neutron devs at openstack conferences in the past where they were considering using these methods, but I don't know the status of this).    When these methods are called, if the transaction is not at the outermost level, a warning will be emitted, so a developer testing this feature will be able to see if the transaction is being impacted at the topmost level or not.    Otherwise, another strategy as mentioned above is that the "quotausages" operation is converted into its own transaction, that would also allow the per-transaction isolation level to be set at a clear boundary.   I'm not sure to what degree neutron is using the newer oslo.db enginefacade however if so you can get an "independent" transaction by using a pattern such as:

     with enginefacade.writer.independent.using(context) as second_session:
         second_session.connection(execution_options={'isolation_level': 'READ COMMITTED'})
         # ... work with quotausages here ...


[1] https://mariadb.com/kb/en/innodb-lock-modes/#gap-locks
[2] https://www.percona.com/blog/2012/03/27/innodbs-gap-locks/
[3] https://docs.sqlalchemy.org/en/13/orm/session_transaction.html#setting-isolation-for-individual-transactions

[4]  paste shared w/ me on IRC of mysql log.  it's not fully clear to me yet why more than one table is involved but this might have to do with foreign keys between quotausages and other tables, but we are seeing transactions timing out with gap lock issues.

    LATEST DETECTED DEADLOCK
    ------------------------
    2021-04-30 12:51:53 0x7fd592423700
    *** (1) TRANSACTION:
    TRANSACTION 67406136842, ACTIVE 1 sec starting index read
    mysql tables in use 1, locked 1
    LOCK WAIT 9 lock struct(s), heap size 1128, 5 row lock(s), undo log entries 2
    MySQL thread id 206114, OS thread handle 140555916941056, query id 183334694 172.16.21.72 neutron Updating
    UPDATE quotausages SET dirty=1 WHERE quotausages.project_id = '16d7ab88b78a43c48bb05b772bfb59c7' AND quotausages.resource = 'port'
    *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 320 page no 34 n bits 232 index PRIMARY of table `ovs_neutron`.`quotausages` trx id 67406136842 lock_mode X locks rec but not gap waiting
    Record lock, heap no 11 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
     0: len 30; hex 313664376162383862373861343363343862623035623737326266623539; asc 16d7ab88b78a43c48bb05b772bfb59; (total 32 bytes);
     1: len 4; hex 706f7274; asc port;;
     2: len 6; hex 000fb1b7e637; asc      7;;
     3: len 7; hex 1f0000801a2062; asc       b;;
     4: len 1; hex 80; asc  ;;
     5: len 4; hex 800004c1; asc     ;;
     6: len 4; hex 80000000; asc     ;;
     
    *** (2) TRANSACTION:
    TRANSACTION 67406129110, ACTIVE 51 sec inserting
    mysql tables in use 1, locked 1
    5 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 2
    MySQL thread id 208150, OS thread handle 140555258574592, query id 183339728 172.16.21.72 neutron Update
    INSERT INTO resourcedeltas (resource, reservation_id, amount) VALUES ('port', '7aab84b9-96e2-4108-bac3-c6e27517f711', 1)
    *** (2) HOLDS THE LOCK(S):
    RECORD LOCKS space id 320 page no 34 n bits 232 index PRIMARY of table `ovs_neutron`.`quotausages` trx id 67406129110 lock_mode X locks rec but not gap
    Record lock, heap no 11 PHYSICAL RECORD: n_fields 7; compact format; info bits 0
     0: len 30; hex 313664376162383862373861343363343862623035623737326266623539; asc 16d7ab88b78a43c48bb05b772bfb59; (total 32 bytes);
     1: len 4; hex 706f7274; asc port;;
     2: len 6; hex 000fb1b7e637; asc      7;;
     3: len 7; hex 1f0000801a2062; asc       b;;
     4: len 1; hex 80; asc  ;;
     5: len 4; hex 800004c1; asc     ;;
     6: len 4; hex 80000000; asc     ;;
     
    *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 340 page no 4 n bits 336 index reservation_id of table `ovs_neutron`.`resourcedeltas` trx id 67406129110 lock_mode X locks gap before rec insert intention waiting
    Record lock, heap no 146 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
     0: len 30; hex 37643138393736632d653734652d346438632d626136372d643137646336; asc 7d18976c-e74e-4d8c-ba67-d17dc6; (total 36 bytes);
     1: len 4; hex 706f7274; asc port;;
     
    *** WE ROLL BACK TRANSACTION (2)

Comment 2 Rodolfo Alonso 2021-05-04 18:10:28 UTC
Hello Michael:

Thanks a lot for the detailed description.

My suggestion goes in other direction. I would prefer to keep the transaction isolation in the highest possible level to avoid having other side effects.

The problem I see in the quota architecture is that, as in other Neutron code sections, we use the python code to ensure the multiple workers (across multiple hosts) isolation by using external locks. This is what is causing the problem here. And these locks are unique for resources with a high number of transactions, what could lead to a blockage situation as described.

Instead my suggestion is to use the DB engine transaction isolation. Note: it may look like we are currently using the DB to ensure multiple workers transactions isolation but this is not the case. The amount of resources used do not came from a DB transaction but from a calculation done inside the Neutron code and updated in a DB register. This number does not represent the realtime resource usage.

When a request come to the server, a hook [1] creates a resource reservation (we can ask for several resources from several projects). Inside a short lived transaction, we should:
1) Retrieve the (resource, project_id) count of objects.
2) Retrieve the number of current (resource, project_id) reservations.
3) Retrieve the (resource, project_id) quota
4) If available, create a reservation.

In SQL terms, (2) is just a matter of counting the number of "reservations" existing for (resource, project_id). This number will depend on the number of available workers but will be limited. The main problem could be (1), but it is not. Each resource (port, network, SG, etc) has a "project_id" column that is an index [2]. A search in a table, retrieving only "id" column and filtering by an indexed column will take milliseconds, tested in a table with 210,000 registers [3]. That means we can exactly calculate the sum of the number of reservations plus the number of used resources each time with a minimal time consumption.

In [4], the reserved resources will be freed. In case the HTTP request does not reach this point, we can have a periodic task reading the stale reservations to freed those "forgotten" reservation registers.

This architecture is much simpler than the current one, easier to understand and (without any benchmark or proof) I will guest that faster because we skip all the intermediate resync operations, the current quota calculations, the resource "clean up" (that means remove the "dirty" flag) and the (resource, project_id) lock.

Regards.

[1]https://github.com/openstack/neutron/blob/d6d68552347a70f4affb486758cdaf9ce8cf5950/neutron/pecan_wsgi/hooks/quota_enforcement.py#L34
[2]https://github.com/openstack/neutron-lib/blob/ba36d60717b85d715e08556ffca23446431402eb/neutron_lib/db/model_base.py#L27
[3]http://pastebin.test.redhat.com/961133
[4]https://github.com/openstack/neutron/blob/d6d68552347a70f4affb486758cdaf9ce8cf5950/neutron/pecan_wsgi/hooks/quota_enforcement.py#L65

Comment 3 Michael Bayer 2021-05-04 18:26:56 UTC
Hi Rodolfo -

absolutely, if you can rearchitect Neutron to not use the database in this manner and instead use much fewer queries with less concurrency by design, then all the better.    I was merely indicating the most minimal changes to resolve the issue as that's what I'm usually called upon to advise.

Comment 5 Bogdan Dobrelya 2021-05-10 07:26:53 UTC
@Michael, isn't gap locks should only be used for the SERIALIZABLE isolation level and not RR? Since RR only guarantees to read the commited data unchanged each time and grants nothing on deleting or adding of unrelated data rows?

Comment 6 Rodolfo Alonso 2021-05-10 08:32:39 UTC
Hello:

Please check https://review.opendev.org/c/openstack/neutron/+/790060. As commented before, instead of trying to tweak the DB to make it working with the Quota driver, I've proposed what I think is a better design, using the DB transactionality isolation in the proper way: instead of creating a global (resource, project_id) lock, this new driver uses the DB engine transaction isolation to retrieve (1) the used resources and (2) the current resource reservations. Within the same transaction, if we have enough quota, a new reservation is done.

This new driver will ensure that any other worker (from the same server or not) will request resources based on a valid resource count, guaranteed by the DB engine transactionality. There is a small change to over commit the quota if several requests ask for resources in the limit of the quota at the same time. But this is already happening with the previous design: "Neutron does not enforce quotas in such a way that a quota violation like this could never occur. The extent of the issue will vary greatly by deployment architecture, specifically the number of neutron workers that are deployed. If more workers are deployed, this becomes more probable." [1]

Because the transaction that counts the used resources and creates the reservation is much faster than the current driver "make_reservation" transaction, the chances to hit this issue will be lower.

In any case, the goal of this new driver is to get rid of the global (resource, project_id) locks that block the Neutron API.

Regards.

[1]https://bugs.launchpad.net/neutron/+bug/1862050/comments/5

Comment 7 Michael Bayer 2021-05-10 12:38:18 UTC
(In reply to Bogdan Dobrelya from comment #5)
> @Michael, isn't gap locks should only be used for the SERIALIZABLE isolation
> level and not RR? Since RR only guarantees to read the commited data
> unchanged each time and grants nothing on deleting or adding of unrelated
> data rows?

per https://mariadb.com/kb/en/innodb-lock-modes/#gap-locks (we are on a pre 10.4 mariadb) it's REPEATABLE READ.   Mysql's isolation levels are not in line with the usual descriptions of isolation levels, https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html has background.    from blog posts like https://www.percona.com/blog/2012/03/27/innodbs-gap-locks/ it looks like MySQL RR is trying to do phantom reads as well.


Note You need to log in before you can comment on or make changes to this bug.