Bug 2227109

Summary: `podman login` against the container registry returns 500 intermittently
Product: Red Hat Satellite Reporter: matt jia <mjia>
Component: Container Management - ContentAssignee: Ian Ballou <iballou>
Status: CLOSED MIGRATED QA Contact: Vladimír Sedmík <vsedmik>
Severity: high Docs Contact:
Priority: high    
Version: 6.13.0CC: ahumbe, alsouza, iballou, kmarutha, rlavi, saydas, vsedmik
Target Milestone: streamKeywords: MigratedToJIRA, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: smart_proxy_container_gateway-3.0.0.gem Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-06 16:25:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description matt jia 2023-07-28 01:28:56 UTC
Description of problem:

As $summary,

podman login example.capsule.com -u admin

Error: error authenticating creds for "example.capsule.com": Requesting bear token: invalid status code from registry 500 (Internal Server Error)


Version-Release number of selected component (if applicable):

6.13

How reproducible:

Easy

Steps to Reproduce:
1. Run `podman login` on multiple hosts at the same time
2.
3.

Actual results:

Sometimes login failed with:

Error: error authenticating creds for "example.satellite.com": Requesting bear token: invalid status code from registry 500 (Internal Server Error)


Expected results:

Login should succeed


Additional info:

Traceback on the smart proxy:

2300:2023-07-27T10:52:59 1aff9aac [I] Started GET /container_gateway/v2/token account=admin&service=example.capsule.com
2023-07-27T10:53:17 1aff9aac [W] Error processing request '1aff9aac-4d4b-4582-a23f-ec310fc95b8e: <Sequel::DatabaseError>: SQLite3::BusyException: database is locked
/usr/share/gems/gems/sqlite3-1.4.2/lib/sqlite3/resultset.rb:108:in `step'
/usr/share/gems/gems/sqlite3-1.4.2/lib/sqlite3/resultset.rb:108:in `next'
/usr/share/gems/gems/sqlite3-1.4.2/lib/sqlite3/resultset.rb:133:in `each'
/usr/share/gems/gems/sqlite3-1.4.2/lib/sqlite3/database.rb:202:in `to_a'
/usr/share/gems/gems/sqlite3-1.4.2/lib/sqlite3/database.rb:202:in `block in execute'
/usr/share/gems/gems/sqlite3-1.4.2/lib/sqlite3/database.rb:151:in `prepare'
/usr/share/gems/gems/sqlite3-1.4.2/lib/sqlite3/database.rb:193:in `execute'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/adapters/sqlite.rb:218:in `block (2 levels) in _execute'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/database/logging.rb:38:in `log_connection_yield'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/adapters/sqlite.rb:218:in `block in _execute'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/connection_pool/threaded.rb:92:in `hold'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/database/connecting.rb:293:in `synchronize'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/adapters/sqlite.rb:209:in `_execute'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/adapters/sqlite.rb:176:in `execute_insert'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/dataset/actions.rb:1178:in `execute_insert'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/dataset/actions.rb:409:in `insert'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/model/associations.rb:2252:in `block in def_many_to_many'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/model/associations.rb:2599:in `add_associated_object'
/usr/share/gems/gems/sequel-5.62.0/lib/sequel/model/associations.rb:2069:in `block in def_association_instance_methods'
/usr/share/gems/gems/smart_proxy_container_gateway-1.0.7/lib/smart_proxy_container_gateway/container_gateway_main.rb:126:in `block in update_user_repositories'
/usr/share/gems/gems/smart_proxy_container_gateway-1.0.7/lib/smart_proxy_container_gateway/container_gateway_main.rb:121:in `each'
/usr/share/gems/gems/smart_proxy_container_gateway-1.0.7/lib/smart_proxy_container_gateway/container_gateway_main.rb:121:in `update_user_repositories'
/usr/share/gems/gems/smart_proxy_container_gateway-1.0.7/lib/smart_proxy_container_gateway/container_gateway_api.rb:112:in `block in <class:Api>'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1686:in `call'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1686:in `block in compile!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1023:in `block (3 levels) in route!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1042:in `route_eval'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1023:in `block (2 levels) in route!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1071:in `block in process_route'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1069:in `catch'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1069:in `process_route'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1021:in `block in route!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1018:in `each'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1018:in `route!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1140:in `block in dispatch!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1112:in `block in invoke'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1112:in `catch'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1112:in `invoke'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1135:in `dispatch!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:949:in `block in call!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1112:in `block in invoke'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1112:in `catch'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1112:in `invoke'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:949:in `call!'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:938:in `call'
/usr/share/foreman-proxy/lib/proxy/log.rb:101:in `call'
/usr/share/foreman-proxy/lib/proxy/request_id_middleware.rb:11:in `call'
/usr/share/gems/gems/rack-protection-2.2.2/lib/rack/protection/xss_header.rb:18:in `call'
/usr/share/gems/gems/rack-protection-2.2.2/lib/rack/protection/path_traversal.rb:16:in `call'
/usr/share/gems/gems/rack-protection-2.2.2/lib/rack/protection/json_csrf.rb:26:in `call'
/usr/share/gems/gems/rack-protection-2.2.2/lib/rack/protection/base.rb:50:in `call'
/usr/share/gems/gems/rack-protection-2.2.2/lib/rack/protection/base.rb:50:in `call'
/usr/share/gems/gems/rack-protection-2.2.2/lib/rack/protection/frame_options.rb:31:in `call'
/usr/share/gems/gems/rack-2.2.4/lib/rack/null_logger.rb:11:in `call'
/usr/share/gems/gems/rack-2.2.4/lib/rack/head.rb:12:in `call'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/show_exceptions.rb:22:in `call'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:218:in `call'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1993:in `call'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1553:in `block in call'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1769:in `synchronize'
/usr/share/gems/gems/sinatra-2.2.2/lib/sinatra/base.rb:1553:in `call'
/usr/share/gems/gems/rack-2.2.4/lib/rack/urlmap.rb:74:in `block in call'
/usr/share/gems/gems/rack-2.2.4/lib/rack/urlmap.rb:58:in `each'
/usr/share/gems/gems/rack-2.2.4/lib/rack/urlmap.rb:58:in `call'
/usr/share/gems/gems/rack-2.2.4/lib/rack/builder.rb:244:in `call'
/usr/share/gems/gems/rack-2.2.4/lib/rack/handler/webrick.rb:95:in `service'
/usr/share/gems/gems/webrick-1.7.0/lib/webrick/httpserver.rb:140:in `service'
/usr/share/gems/gems/webrick-1.7.0/lib/webrick/httpserver.rb:96:in `run'
/usr/share/gems/gems/webrick-1.7.0/lib/webrick/server.rb:310:in `block in start_thread'
/usr/share/gems/gems/logging-2.3.1/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
2023-07-27T10:53:17 1aff9aac [I] Finished GET /container_gateway/v2/token with 500 (18361.06 ms)

Comment 1 matt jia 2023-07-28 01:31:27 UTC
The strange part is why update_user_repositories is called when doing `podman login`? It clearly seems like a bug to me.

Comment 2 Ian Ballou 2023-08-02 19:27:50 UTC
When a user authenticates to the smart proxy container gateway, we need to update what repositories they have access to in case anything changed on the Foreman. That's why update_user_repositories is run.

The bug here is that sometimes SQLite locks up when multiple requests happen at once. This popped up before: https://bugzilla.redhat.com/show_bug.cgi?id=2105349

Comment 3 matt jia 2023-08-02 21:53:42 UTC
(In reply to Ian Ballou from comment #2)
> When a user authenticates to the smart proxy container gateway, we need to
> update what repositories they have access to in case anything changed on the
> Foreman. That's why update_user_repositories is run.
> The bug here is that sometimes SQLite locks up when multiple requests happen
> at once. This popped up before:
> https://bugzilla.redhat.com/show_bug.cgi?id=2105349

Right. Do we have any workaround here as this is significantly impacting the customer's Openstack deployment.

Comment 6 Ashish Humbe 2023-09-18 05:18:36 UTC
(In reply to matt jia from comment #3)
> (In reply to Ian Ballou from comment #2)
> > When a user authenticates to the smart proxy container gateway, we need to
> > update what repositories they have access to in case anything changed on the
> > Foreman. That's why update_user_repositories is run.
> > The bug here is that sometimes SQLite locks up when multiple requests happen
> > at once. This popped up before:
> > https://bugzilla.redhat.com/show_bug.cgi?id=2105349
> 
> Right. Do we have any workaround here as this is significantly impacting the
> customer's Openstack deployment.

Tagging Ian to know if we have any workaround available.

Comment 7 Ian Ballou 2023-09-18 13:39:03 UTC
A likely workaround would be to trigger the container registry user data sync between the Satellite and the Capsule manually:

```
# foreman-rake console

::SmartProxy.find(<capsule id>).sync_container_gateway
```

This avoids the overhead of a capsule sync.

Run the workaround above when the capsule is not receiving registry requests from clients.

Comment 8 Bryan Kearney 2023-09-26 16:02:51 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36771 has been resolved.

Comment 9 Ian Ballou 2023-09-26 20:15:15 UTC
Smart proxy container gateway code is fixed in smart_proxy_container_gateway-1.0.9.gem

Packaging PR upstream: https://github.com/theforeman/foreman-packaging/pull/9836

An installer change goes with this as well so the added SQLite busy timeout is more easily configured: https://github.com/theforeman/puppet-foreman_proxy/pull/813

The bug is likely to be fixed directly by smart_proxy_container_gateway-1.0.9 because I also increased the default sqlite timeout from 5 seconds to 30 seconds.

If the busy error is still occurring, the busy timeout can be increased via the new sqlite_timeout setting introduced here.

Comment 11 Ian Ballou 2023-10-26 18:27:40 UTC
*** Bug 2105349 has been marked as a duplicate of this bug. ***

Comment 12 Brad Buckingham 2023-10-30 11:29:29 UTC
Bulk setting Target Milestone = 6.15.0 where sat-6.15.0+ is set.

Comment 14 Ian Ballou 2024-01-15 22:04:15 UTC
For some reason during testing on Satellite, the busy timeout didn't help with concurrent `podman login` requests. My first action will be to see why that was occurring. However, I think the solution could be further improved.

SQLite has a different mode that has better concurrency: https://www.sqlite.org/wal.html

We could try using this instead.

On top of that, the container gateway should probably be handling the busy exception and retrying. With WAL mode however the busy exception should become much rarer.

Comment 16 Bryan Kearney 2024-01-19 20:02:29 UTC
Upstream bug assigned to iballou

Comment 20 Eric Helms 2024-05-16 16:17:54 UTC
Upstream bug assigned to ekohlvan

Comment 21 Ian Ballou 2024-05-20 16:54:04 UTC
Moving to POST since all upstream PRs are now merged.

Comment 22 Eric Helms 2024-06-06 16:25:30 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "SAT-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.