Bug 2158614
Summary: | deleting of products after a content export sometimes ends up in a candlepin error | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Griffin Sullivan <gsulliva> | ||||
Component: | Repositories | Assignee: | Evgeni Golov <egolov> | ||||
Status: | CLOSED ERRATA | QA Contact: | Griffin Sullivan <gsulliva> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.12.1 | CC: | crog, egolov, jalbrech, nmoumoul, osousa, paji, zhunting | ||||
Target Milestone: | 6.13.0 | Keywords: | Triaged | ||||
Target Release: | Unused | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | tfm-rubygem-katello-4.7.0.8-1 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2023-05-03 13:24:04 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Griffin Sullivan
2023-01-05 20:36:46 UTC
Is this a regression from prior release? Does the same scenario fail when using either the hammer cli or api directly? Thanks! I believe it was working on 6.12 before GA. I haven't tested the steps manually, but I can today. Well, at least I can reproduce this. Also, the following bash script seems also to trigger that issue: hammer organization create --name lulz hammer product create --name lulz --organization lulz hammer repository create --name lulz --content-type yum --product lulz --organization lulz hammer content-export complete library --organization lulz hammer content-export incremental library --organization lulz hammer repository delete --name lulz --product lulz --organization lulz hammer product delete --name lulz --organization lulz hammer organization delete --name lulz Forgot to post the output! [root@sat-6-12-qa-rhel8 ~]# bash test.sh Organization created. Product created. Repository created. [..............................................................................................................................................................................................................................] [100%] Generated /var/lib/pulp/exports/lulz/Export-Library/1.0/2023-01-10T10-28-58-00-00/metadata.json [..............................................................................................................................................................................................................................] [100%] Generated /var/lib/pulp/exports/lulz/Export-Library/2.0/2023-01-10T10-29-04-00-00/metadata.json Repository deleted. Product destroyed. Could not delete the organization: Required lock is already taken by other running tasks. Please inspect their state, fix their errors and resume them. Conflicts with tasks: - https://sat-6-12-qa-rhel8.tanso.example.com/foreman_tasks/tasks/1f77431b-2cc8-404a-bd94-9b8541c0d4fa And the linked task is a Actions::Katello::Product::Destroy that errored out on the Actions::Candlepin::Product::ContentRemove step with 'Product with ID "858025897830" could not be found.' This is *slightly* different to what Griffin reports in the original post, but I believe it's related. production log reads: 2023-01-10T10:29:10 [I|app|660cc209] Started DELETE "/katello/api/products/1" for 192.168.122.161 at 2023-01-10 10:29:10 +0000 2023-01-10T10:29:10 [I|app|660cc209] Processing by Katello::Api::V2::ProductsController#destroy as JSON 2023-01-10T10:29:10 [I|app|660cc209] Parameters: {"api_version"=>"v2", "id"=>"1", "product"=>{}} 2023-01-10T10:29:10 [I|app|660cc209] Authorized user admin(Admin User) 2023-01-10T10:29:10 [I|bac|660cc209] Task {label: , execution_plan_id: eab1259d-c6ef-4559-9bfc-2625dab38e15} state changed: pending 2023-01-10T10:29:10 [I|bac|660cc209] Task {label: Actions::Katello::Product::Destroy, id: 1f77431b-2cc8-404a-bd94-9b8541c0d4fa, execution_plan_id: eab1259d-c6ef-4559-9bfc-2625dab38e15} state changed: planning 2023-01-10T10:29:10 [I|bac|660cc209] Task {label: Actions::Katello::Product::Destroy, id: 1f77431b-2cc8-404a-bd94-9b8541c0d4fa, execution_plan_id: eab1259d-c6ef-4559-9bfc-2625dab38e15} state changed: planned 2023-01-10T10:29:10 [I|app|660cc209] Rendering /usr/share/gems/gems/katello-4.5.0.22/app/views/katello/api/v2/common/async.json.rabl within katello/api/v2/layouts/resource 2023-01-10T10:29:10 [I|app|660cc209] Rendered /usr/share/gems/gems/katello-4.5.0.22/app/views/katello/api/v2/common/async.json.rabl within katello/api/v2/layouts/resource (Duration: 6.2ms | Allocations: 7926) 2023-01-10T10:29:10 [I|app|660cc209] Completed 202 Accepted in 179ms (Views: 7.7ms | ActiveRecord: 12.9ms | Allocations: 70120) 2023-01-10T10:29:10 [I|bac|660cc209] Task {label: Actions::Katello::Product::Destroy, id: 1f77431b-2cc8-404a-bd94-9b8541c0d4fa, execution_plan_id: eab1259d-c6ef-4559-9bfc-2625dab38e15} state changed: running 2023-01-10T10:29:11 [I|kat|59d5408d] candlepin event handled success=true subject=pool.deleted entity_id=4028fa21859b2fa201859b39c3cf0009 duration=1.01 2023-01-10T10:29:11 [E|bac|660cc209] Product with ID "858025897830" could not be found. (Katello::Errors::CandlepinError) I am writing it's "different", because here we see a "product not found" error, while Griffin reports a deadlock, but I believe both are the "same" race condition somewhere when deleting stuff. Downgrading Candlepin to 4.1.18 (from 4.1.19) seems to "fix" the issue, so I guess it's time to re-assign this to Candlepin? Note: the candlepin 4.1.19 was brought in as part of bug 2150116 (clone of bug 1931027). Evgeni, Is there a stack trace in the candlepin log that can be attached? Griffin, Can you confirm if downgrading candlepin resolves he issue in your environment? Is the environment accessible for Nikos (on candlepin team) to take a look? Thanks! in my environment, I have not seen any stacktraces, no in Griffins, I've seen the following: 2023-01-05 12:58:54,087 [thread=https-jsse-nio-127.0.0.1-23443-exec-1] [req=2343de75-a5c3-426e-9ded-610e3293b954, org=, csid=c800b04a-cc47-4d22-a4cd-23eef399a8ca] WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 40P01 2023-01-05 12:58:54,089 [thread=https-jsse-nio-127.0.0.1-23443-exec-1] [req=2343de75-a5c3-426e-9ded-610e3293b954, org=, csid=c800b04a-cc47-4d22-a4cd-23eef399a8ca] ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper - ERROR: dead lock detected Detail: Process 56971 waits for ShareLock on transaction 286166; blocked by process 56585. Process 56585 waits for ShareLock on transaction 286165; blocked by process 56971. Hint: See server log for query details. Where: while locking tuple (0,2) in relation "cp_system_locks" 2023-01-05 12:58:54,104 [thread=https-jsse-nio-127.0.0.1-23443-exec-1] [req=2343de75-a5c3-426e-9ded-610e3293b954, org=, csid=c800b04a-cc47-4d22-a4cd-23eef399a8ca] ERROR org.candlepin.exceptions.mappers.CandlepinExceptionMapper - Ru ntime Error could not extract ResultSet at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse:2,675 org.hibernate.exception.LockAcquisitionException: could not extract ResultSet at org.hibernate.dialect.PostgreSQL81Dialect$4.convert(PostgreSQL81Dialect.java:452) at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42) at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113) at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99) at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:67) at org.hibernate.loader.Loader.getResultSet(Loader.java:2292) at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2050) at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:2012) at org.hibernate.loader.Loader.doQuery(Loader.java:953) at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:354) at org.hibernate.loader.Loader.doList(Loader.java:2815) at org.hibernate.loader.Loader.doList(Loader.java:2797) at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2629) at org.hibernate.loader.Loader.list(Loader.java:2624) at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:506) at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:396) at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:219) at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1396) at org.hibernate.query.internal.AbstractProducedQuery.doList(AbstractProducedQuery.java:1558) at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1526) at org.hibernate.query.internal.AbstractProducedQuery.getSingleResult(AbstractProducedQuery.java:1574) at org.candlepin.model.AbstractHibernateCurator.getSystemLock(AbstractHibernateCurator.java:1298) at org.candlepin.controller.ProductManager.updateProduct(ProductManager.java:334) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:56) at org.candlepin.resource.OwnerResource.removeBatchContent(OwnerResource.java:2097) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:56) at org.candlepin.resource.OwnerResource.removeContent(OwnerResource.java:2124) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:64) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:151) at org.jboss.resteasy.core.MethodInjectorImpl.lambda$invoke$3(MethodInjectorImpl.java:122) at java.base/java.util.concurrent.CompletableFuture.uniApplyNow(CompletableFuture.java:680) at java.base/java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:658) at java.base/java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:2094) at java.base/java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:143) at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:122) at org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:594) at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:468) at org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(ResourceMethodInvoker.java:421) at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:363) at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:423) at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:391) at org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invoke$1(ResourceMethodInvoker.java:365) at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106) at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106) at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235) at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:143) at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:365) at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:477) at org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke$4(SynchronousDispatcher.java:252) at org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess$0(SynchronousDispatcher.java:153) at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:363) at org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:156) at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:238) at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:249) at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:60) at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:55) at javax.servlet.http.HttpServlet.service(HttpServlet.java:764) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:290) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:280) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:184) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:89) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at org.candlepin.servlet.filter.EventFilter.doFilter(EventFilter.java:65) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at org.candlepin.servlet.filter.logging.LoggingFilter.doFilter(LoggingFilter.java:123) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at org.candlepin.servlet.filter.CandlepinPersistFilter.doFilter(CandlepinPersistFilter.java:48) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at org.candlepin.servlet.filter.CandlepinScopeFilter.doFilter(CandlepinScopeFilter.java:68) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at org.candlepin.servlet.filter.CandlepinContentTypeFilter.doFilter(CandlepinContentTypeFilter.java:67) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:121) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:133) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:190) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:163) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:542) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:143) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:357) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:382) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1724) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected Detail: Process 56971 waits for ShareLock on transaction 286166; blocked by process 56585. Process 56585 waits for ShareLock on transaction 286165; blocked by process 56971. Hint: See server log for query details. Where: while locking tuple (0,2) in relation "cp_system_locks" at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408) at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166) at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118) at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:379) at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:57) ... 90 common frames omitted I got a new 6.12.1 and downgraded candlepin from 4.1.19 to 4.1.18. I ran the test 4 times and it passed every time. Only thing I see in the logs are warnings about SELinux blocking access to the leftover export artifacts which I am going to guess is not an issue here. Example: setroubleshoot[40253]: SELinux is preventing /usr/bin/python3.9 from read access on the file export-9c3946fe-0785-4c0b-97cd-981e10dc3439-20230110_1530.tar.gz (In reply to Evgeni Golov from comment #3) > hammer content-export complete library --organization lulz > hammer content-export incremental library --organization lulz Hi, what exactly are these operations doing? (In reply to Nikos Moumoulidis from comment #11) > (In reply to Evgeni Golov from comment #3) > > hammer content-export complete library --organization lulz > > hammer content-export incremental library --organization lulz > > Hi, what exactly are these operations doing? "they create a content export", the triggered action is this task here: https://github.com/Katello/katello/blob/master/app/lib/actions/pulp3/orchestration/content_view_version/export_library.rb it *should* be rather light on Candlepin interaction, but @paji can probably explain that part better than I do. I'll attach a debug log from the Katello side in a moment. The interesting part is this, I guess: 2023-01-11T09:41:33 [D|kat|f48abe47] Resource DELETE request: /candlepin/owners/lol/products/426446953603 2023-01-11T09:41:33 [D|kat|f48abe47] Headers: {"accept":"application/json","accept-language":"en","content-type":"application/json","X-Correlation-ID":"f48abe47-e9af-4505-b2f4-ba114b3d4e00","cp-user":"foreman_admin"} 2023-01-11T09:41:33 [D|kat|f48abe47] Body: null 2023-01-11T09:41:33 [D|kat|f48abe47] Resource DELETE request: /candlepin/owners/lol/products/426446953603/content/1673430076445 2023-01-11T09:41:33 [D|kat|f48abe47] Headers: {"accept":"application/json","accept-language":"en","content-type":"application/json","X-Correlation-ID":"f48abe47-e9af-4505-b2f4-ba114b3d4e00","cp-user":"foreman_admin"} 2023-01-11T09:41:33 [D|kat|f48abe47] Body: null 2023-01-11T09:41:33 [D|kat|f48abe47] Candlepin request c22d91f2-445e-4da5-ae8e-478a4a8c73a5 returned with code 204 2023-01-11T09:41:33 [D|kat|f48abe47] Processing response: 204 2023-01-11T09:41:33 [D|kat|f48abe47] 2023-01-11T09:41:33 [E|bac|f48abe47] Product with ID "426446953603" could not be found. (Katello::Errors::CandlepinError) Created attachment 1937272 [details]
production.log of the issue
(In reply to Evgeni Golov from comment #13) > Created attachment 1937272 [details] > production.log of the issue Hi Evgeni, Thank you for attaching the production logs! Taking a look at the events, it looks like a request to delete the product is made first (DELETE request: /candlepin/owners/lol/products/426446953603) and then a request to delete the product's content (DELETE request: /candlepin/owners/lol/products/426446953603/content/1673430076445) is made before a response for the first request. Based on this order, I expect that error message to occur because there is a check to make sure the product exists before attempting to delete content in Candlepin master branch and 4.1.19-1. It looks like that check has existed in master for 5 months and was introduced in version 4.1.19-1 and does not exist in version 4.1.18-1. Do we know if these tests have ever passed for version 4.1.19-1 or master version of Candlepin? Should the order of events be delete the product's content and then delete to product? This should work fine. Candlepin line where the error is thrown: https://github.com/candlepin/candlepin/blob/master/src/main/java/org/candlepin/resource/OwnerProductResource.java#L261-L263 (In reply to Josh Albrecht from comment #14) > (In reply to Evgeni Golov from comment #13) > > Created attachment 1937272 [details] > > production.log of the issue > > Hi Evgeni, > > Thank you for attaching the production logs! Taking a look at the events, it > looks like a request to delete the product is made first (DELETE request: > /candlepin/owners/lol/products/426446953603) and then a request to delete > the product's content (DELETE request: > /candlepin/owners/lol/products/426446953603/content/1673430076445) is made > before a response for the first request. Based on this order, I expect that > error message to occur because there is a check to make sure the product > exists before attempting to delete content in Candlepin master branch and > 4.1.19-1. It looks like that check has existed in master for 5 months and > was introduced in version 4.1.19-1 and does not exist in version 4.1.18-1. Thanks for looking into this! This sounds very plausible. > Do we know if these tests have ever passed for version 4.1.19-1 or master > version of Candlepin? We do not run any tests against master. We do have 4.2.x, but there we're stuck at 4.2.3 for unrelated issues -- is that change also in 4.2.3? As for 4.1.19, from my tests, there is a small chance for this to pass (so there seems to be a race condition somewhere), but it's like 10% or so. > Should the order of events be delete the product's content and then delete > to product? This should work fine. I guess that would be a possible fix, yeah! > Candlepin line where the error is thrown: > https://github.com/candlepin/candlepin/blob/master/src/main/java/org/ > candlepin/resource/OwnerProductResource.java#L261-L263 Okay, with that explanation I went and looked again what katello does when it deletes a product. https://github.com/Katello/katello/blob/master/app/lib/actions/katello/product/destroy.rb#L42-L50 concurrence do remaining_product_content.each do |pc| plan_action(Candlepin::Product::ContentRemove, owner: product.organization.label, product_id: product.cp_id, content_id: pc.content.cp_content_id) end plan_action(Candlepin::Product::Destroy, cp_id: product.cp_id, :owner => product.organization.label) end "remaining_product_content" is content that is outside of a normal repository -- exactly like the one we use for exports! now the removal of that and the one of the actual product happens concurrently and thus sometimes in the wrong order. Created redmine issue https://projects.theforeman.org/issues/35929 from this bug https://github.com/Katello/katello/pull/10410 has a patch that seems to fix the issue for me. Upstream bug assigned to egolov Upstream bug assigned to egolov Hi Evgeni, I believe that the deadlock error is a separate issue and one that we should not lose sight of. Is this error still occurring for you or was it a one time issue? If it is still occurring can we get a separate Candlepin BZ opened with information around the issue? I would also include what version of Candlepin were you using when you saw that deadlock error? Thanks, Josh Something that may be important to note, is that the DELETE /candlepin/owners/{owner_key}/products/{product_id}/content/{content_id} endpoint, does *not* delete content -- it unlinks a given product from the content. While it's valid to delete all the product-content links before deleting a product, it is not necessary to do so (as has been observed in the behavior leading to this BZ). Product-content linkage is a property of the product, so deleting the product itself will clean these links up. As such, if the primary goal of the operation in question is to delete the product, deletion of product-content within CP is strictly optional. (In reply to Josh Albrecht from comment #21) > Hi Evgeni, > > I believe that the deadlock error is a separate issue and one that we should > not lose sight of. Is this error still occurring for you or was it a one > time issue? If it is still occurring can we get a separate Candlepin BZ > opened with information around the issue? I'll bounce that question to @gsulliva because I could not reproduce that part in my environment yet. > I would also include what version of Candlepin were you using when you saw that deadlock error? same 4.1.19 (In reply to Chris "Ceiu" Rog from comment #22) > Something that may be important to note, is that the DELETE > /candlepin/owners/{owner_key}/products/{product_id}/content/{content_id} > endpoint, does *not* delete content -- it unlinks a given product from the > content. While it's valid to delete all the product-content links before > deleting a product, it is not necessary to do so (as has been observed in > the behavior leading to this BZ). Product-content linkage is a property of > the product, so deleting the product itself will clean these links up. As > such, if the primary goal of the operation in question is to delete the > product, deletion of product-content within CP is strictly optional. So, you're saying, we could optimize that DELETE away, and the resulting DB would look "the same" after we deleted the Product? Would the same apply to Subscriptions and Pools that we also delete in https://github.com/Katello/katello/blob/master/app/lib/actions/katello/product/destroy.rb ? @jalbrech I'm pretty sure I only hit the deadlock when I filed this BZ. I will keep a look out for it while we continue to test. Every other time it's been the "Product ID can't be found" error. Moving this over to the "Repositories" component, as it's not Ansible related. (In reply to Evgeni Golov from comment #24) > > So, you're saying, we could optimize that DELETE away, and the resulting DB > would look "the same" after we deleted the Product? Assuming we're strictly talking about the deletion of product-content links, yes. Deleting the product itself will also delete those links. Note that the content in this case will still remain, as content objects are first-class in CP at the time of writing. Also note that this *shouldn't* work from the content side: attempts to delete a content that is linked to a product should fail while that link exists. > > Would the same apply to Subscriptions and Pools that we also delete in > https://github.com/Katello/katello/blob/master/app/lib/actions/katello/ > product/destroy.rb ? I would want to confirm the actual CP endpoints being hit and walk through those to make sure, but as of CP2, Subscriptions and Pools are the same thing internally for the most part. There is a fair bit of nuance there due to some legacy code that may not have been fully removed yet, but my gut reaction is that deleting a pool and deleting a subscription are resulting in the same underlying behavior. Just wanted to add in here that I've tested this on 6.12.0, 6.12.1, and 6.13.0 and hit it every time after running multiple times. It takes an arbitrary number of times to hit it, since it is most likely a timing issue, but in my testing I usually hit it within 5 runs. The time I downgraded candlepin and didn't hit the error was most likely luck since I ran it about 4 times. Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/35929 has been resolved. Verified on 6.13 snap 9 Deleting a product after content export is no longer causing a candlepin error. Steps to reproduce: 1) Install / setup foreman-ansible-modules repo 2) Configure server.yml to run on a Satellite 3) # make livetest_content_export_library Result: Test runs successfully. Notes: I ran step 3 at least 10+ times on the same Satellite with 0 issues. Before I would hit the error within the first few times of running, so I think it's safe to say the issue is resolved. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.13 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2097 |