RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1781386 - processing db env (corrupted?) file gets stuck with spinning CPU
Summary: processing db env (corrupted?) file gets stuck with spinning CPU
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libdb
Version: 7.8
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Matej Mužila
QA Contact: RHEL CS Apps Subsystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-09 21:51 UTC by Chris Roberts
Modified: 2024-12-20 18:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-15 05:58:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 28462 0 Normal Closed Cloning a Satellite with out pulp data fails to start qpid during restore 2021-02-03 19:36:59 UTC
Red Hat Knowledge Base (Solution) 4645231 0 None None None 2019-12-09 21:57:09 UTC

Description Chris Roberts 2019-12-09 21:51:00 UTC
Description of problem:
When cloning a Red Hat Satellite 6.6 without pulpdata the 

Version-Release number of selected component (if applicable):

Redhat Satellite 6.6
qpid-tools-1.36.0-28.el7amq.noarch
tfm-rubygem-qpid_messaging-1.36.0-9.el7sat.x86_64
qpid-cpp-server-1.36.0-28.el7amq.x86_64
python-qpid-qmf-1.36.0-28.el7amq.x86_64
qpid-cpp-client-1.36.0-28.el7amq.x86_64
qpid-dispatch-router-1.5.0-4.el7.x86_64
python-qpid-1.35.0-5.el7.noarch
qpid-cpp-client-devel-1.36.0-28.el7amq.x86_64
qpid-proton-c-0.28.0-1.el7.x86_64
qpid-cpp-server-linearstore-1.36.0-28.el7amq.x86_64
qpid-qmf-1.36.0-28.el7amq.x86_64
python-gofer-qpid-2.12.5-5.el7sat.noarch
python-qpid-proton-0.28.0-1.el7.x86_64

How reproducible:


Steps to Reproduce:
1. Install Red Hat Satellite 6.6 , sync some content and create a backup without pulp content
2. Use Satellite Clone to clone and restore and see the installer fail with qpid trying to start up

Actual results:
qpid hangs during start up and passes the installer timeout duration and causes the puppet-installer to fail leaving the box in an error state.

Expected results:
Clone and restore to finish correctly

Additional info:

From talking to Pavel, it looks like qpid is starting correctly but not listening so it hangs. 

qpidd loads plugins, first linearstore, then ACL, then SSL - and the loading of linearstore that processes durable queues had to fail in some way, causing further plugins were not loaded
so the journals must be damaged.

workaround: rm -f /var/lib/qpidd/.qpidd/qls/dat2/*
info about queues etc shall not be touched

Comment 4 Chris Roberts 2019-12-09 21:54:26 UTC
Since I can not edit comments I will correct this part:

Description of problem:
When cloning a Red Hat Satellite 6.6 without pulpdata the installer fails upon doing the restore since the qpid journals are corrupted. This works correctly in 6.5 so marking it as a regression. Pavel is going to create and attach a KCS to the BZ as well for customer if they hit this.

Comment 6 Pavel Moravec 2019-12-09 21:59:00 UTC
Hi Kim,
as qpid-cpp-linearstore author with some bdb knowledge, could you please review this qpidd stuck during linearstore init? Reproducer in #c5 .

Comment 7 Pavel Moravec 2019-12-10 07:58:20 UTC
(In reply to Pavel Moravec from comment #6)
> Hi Kim,
> as qpid-cpp-linearstore author with some bdb knowledge, could you please
> review this qpidd stuck during linearstore init? Reproducer in #c5 .

Forgot to add some relevant debugging info: the libdb gets stuck forever in busy loop with backtrace:

(gdb) bt
#0  __env_size_insert (head=<optimized out>, elp=0x7f827dfb91a8) at ../../src/env/env_alloc.c:598
#1  0x00007f827793d30c in __env_detach (env=env@entry=0x1a0b350, destroy=destroy@entry=1) at ../../src/env/env_region.c:823
#2  0x00007f827793d852 in __env_remove_env (env=env@entry=0x1a0b350) at ../../src/env/env_region.c:934
#3  0x00007f8277937124 in __env_open (dbenv=0x1a0bdd0, db_home=<optimized out>, flags=10023, mode=<optimized out>) at ../../src/env/env_open.c:201
#4  0x00007f8277836d2a in DbEnv::open (this=this@entry=0x1a0b630, db_home=0x1a0b9d8 "/var/lib/qpidd/.qpidd/qls/dat2/", flags=flags@entry=10023, mode=mode@entry=0)
    at ../../lang/cxx/cxx_env.cpp:658
#5  0x00007f827805ee99 in qpid::linearstore::MessageStoreImpl::init (this=this@entry=0x1a0bb20, truncateFlag=<optimized out>)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/MessageStoreImpl.cpp:233
#6  0x00007f82780605b1 in qpid::linearstore::MessageStoreImpl::init (this=this@entry=0x1a0bb20, storeDir_="/var/lib/qpidd/.qpidd", efpPartition_=efpPartition_@entry=1, 
    efpFileSize_kib_=efpFileSize_kib_@entry=128, truncateFlag_=<optimized out>, wCachePageSizeKib_=wCachePageSizeKib_@entry=4, wCacheNumPages_=wCacheNumPages_@entry=16, 
    tplWCachePageSizeKib_=tplWCachePageSizeKib_@entry=4, tplWCacheNumPages_=tplWCacheNumPages_@entry=16, overwriteBeforeReturnFlag_=false)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/MessageStoreImpl.cpp:200
#7  0x00007f8278061358 in qpid::linearstore::MessageStoreImpl::init (this=0x1a0bb20, options_=options_@entry=0x7f827829eb28 <qpid::broker::instance+8>)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/MessageStoreImpl.cpp:172
#8  0x00007f827804b640 in qpid::broker::StorePlugin::earlyInitialize (this=0x7f827829eb20 <qpid::broker::instance>, target=...)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/StorePlugin.cpp:72
#9  0x00007f827d5b543f in operator() (a1=..., p=<optimized out>, this=<synthetic pointer>) at /usr/include/boost/bind/mem_fn_template.hpp:165
#10 operator()<boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list1<qpid::Plugin* const&> > (a=<synthetic pointer>, f=<synthetic pointer>, 
    this=<synthetic pointer>) at /usr/include/boost/bind/bind.hpp:313
#11 operator()<qpid::Plugin*> (a1=@0x19d1908: 0x7f827829eb20 <qpid::broker::instance>, this=<synthetic pointer>) at /usr/include/boost/bind/bind_template.hpp:47
#12 for_each<__gnu_cxx::__normal_iterator<qpid::Plugin* const*, std::vector<qpid::Plugin*> >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list2<boost::arg<1>, boost::reference_wrapper<qpid::Plugin::Target> > > > (__f=..., __last=..., __first=<qpid::broker::instance>) at /usr/include/c++/4.8.2/bits/stl_algo.h:4417
#13 qpid::(anonymous namespace)::each_plugin<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list2<boost::arg<1>, boost::reference_wrapper<qpid::Plugin::Target> > > > (f=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/Plugin.cpp:73
#14 0x00007f827d5b54e2 in qpid::Plugin::earlyInitAll (t=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/Plugin.cpp:87
#15 0x00007f827daefa76 in qpid::broker::Broker::Broker (this=0x19e4930, conf=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Broker.cpp:310
#16 0x0000000000405aa2 in qpid::broker::QpiddBroker::execute (this=this@entry=0x7ffd69bf058e, options=0x19d96b0) at /usr/src/debug/qpid-cpp-1.36.0/src/posix/QpiddBroker.cpp:229
#17 0x0000000000409914 in qpid::broker::run_broker (argc=3, argv=0x7ffd69bf0928, hidden=<optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpidd.cpp:108
#18 0x00007f827c635545 in __libc_start_main (main=0x404b60 <main(int, char**)>, argc=3, argv=0x7ffd69bf0928, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffd69bf0918) at ../csu/libc-start.c:266
#19 0x0000000000404dd1 in _start ()
(gdb) 

on the corrupted(?) file /var/lib/qpidd/.qpidd/qls/dat2/__db.001 .

Comment 8 Pavel Moravec 2019-12-10 18:13:36 UTC
Workaround (that is really tricky): delete just the /var/lib/qpidd/.qpidd/qls/dat2/__db.00* file(s). They dont contain info about queues/exchanges/bindings, so no data loss.

Comment 9 Pavel Moravec 2019-12-11 12:08:48 UTC
Reassigning to libdb component in RHEL7, as I have a dedicated reproducer without Satellite or qpidd - even db_recover hungs the same way, with backtrace


(gdb) bt
#0  0x00007f4cf904f8d8 in __env_size_insert (head=<optimized out>, elp=0x7f4cf93d41a8) at ../../src/env/env_alloc.c:598
#1  0x00007f4cf905a13c in __env_detach (env=env@entry=0x13bc840, destroy=destroy@entry=1) at ../../src/env/env_region.c:823
#2  0x00007f4cf905a682 in __env_remove_env (env=env@entry=0x13bc840) at ../../src/env/env_region.c:934
#3  0x00007f4cf9053f54 in __env_open (dbenv=0x13bc010, db_home=<optimized out>, flags=75271, mode=<optimized out>) at ../../src/env/env_open.c:201
#4  0x0000000000400eed in main (argc=<optimized out>, argv=<optimized out>) at ../../util/db_recover.c:148
(gdb)

Comment 13 Pavel Moravec 2019-12-11 16:13:35 UTC
cancelling needinfo as we have reproducer even outside qpidd

Comment 18 Honza Horak 2020-07-15 05:58:57 UTC
(In reply to Chris Roberts from comment #0)
> workaround: rm -f /var/lib/qpidd/.qpidd/qls/dat2/*
> info about queues etc shall not be touched

Given the work-around exists and that it is extra hard to dig deeper to find the cause, and considering RHEL-7 phase, I see chances we will ever be able to fix this as close to zero. Thus, closing this bug. Feel free to re-open and re-assign if it is valuable for the Satellite team, but I don't see any reason why to track this for libdb any more. I'm sorry we couldn't help.


Note You need to log in before you can comment on or make changes to this bug.