Bug 1781386
| Summary: | processing db env (corrupted?) file gets stuck with spinning CPU | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Chris Roberts <chrobert> |
| Component: | libdb | Assignee: | Matej Mužila <mmuzila> |
| Status: | CLOSED CANTFIX | QA Contact: | RHEL CS Apps Subsystem QE <rhel-cs-apps-subsystem-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.8 | CC: | databases-maint, hhorak, kim.vdriet, lmiksik, mmuzila, pkubat, pmoravec |
| Target Milestone: | rc | Keywords: | Reproducer |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-15 05:58:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Chris Roberts
2019-12-09 21:51:00 UTC
Since I can not edit comments I will correct this part: Description of problem: When cloning a Red Hat Satellite 6.6 without pulpdata the installer fails upon doing the restore since the qpid journals are corrupted. This works correctly in 6.5 so marking it as a regression. Pavel is going to create and attach a KCS to the BZ as well for customer if they hit this. Hi Kim, as qpid-cpp-linearstore author with some bdb knowledge, could you please review this qpidd stuck during linearstore init? Reproducer in #c5 . (In reply to Pavel Moravec from comment #6) > Hi Kim, > as qpid-cpp-linearstore author with some bdb knowledge, could you please > review this qpidd stuck during linearstore init? Reproducer in #c5 . Forgot to add some relevant debugging info: the libdb gets stuck forever in busy loop with backtrace: (gdb) bt #0 __env_size_insert (head=<optimized out>, elp=0x7f827dfb91a8) at ../../src/env/env_alloc.c:598 #1 0x00007f827793d30c in __env_detach (env=env@entry=0x1a0b350, destroy=destroy@entry=1) at ../../src/env/env_region.c:823 #2 0x00007f827793d852 in __env_remove_env (env=env@entry=0x1a0b350) at ../../src/env/env_region.c:934 #3 0x00007f8277937124 in __env_open (dbenv=0x1a0bdd0, db_home=<optimized out>, flags=10023, mode=<optimized out>) at ../../src/env/env_open.c:201 #4 0x00007f8277836d2a in DbEnv::open (this=this@entry=0x1a0b630, db_home=0x1a0b9d8 "/var/lib/qpidd/.qpidd/qls/dat2/", flags=flags@entry=10023, mode=mode@entry=0) at ../../lang/cxx/cxx_env.cpp:658 #5 0x00007f827805ee99 in qpid::linearstore::MessageStoreImpl::init (this=this@entry=0x1a0bb20, truncateFlag=<optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/MessageStoreImpl.cpp:233 #6 0x00007f82780605b1 in qpid::linearstore::MessageStoreImpl::init (this=this@entry=0x1a0bb20, storeDir_="/var/lib/qpidd/.qpidd", efpPartition_=efpPartition_@entry=1, efpFileSize_kib_=efpFileSize_kib_@entry=128, truncateFlag_=<optimized out>, wCachePageSizeKib_=wCachePageSizeKib_@entry=4, wCacheNumPages_=wCacheNumPages_@entry=16, tplWCachePageSizeKib_=tplWCachePageSizeKib_@entry=4, tplWCacheNumPages_=tplWCacheNumPages_@entry=16, overwriteBeforeReturnFlag_=false) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/MessageStoreImpl.cpp:200 #7 0x00007f8278061358 in qpid::linearstore::MessageStoreImpl::init (this=0x1a0bb20, options_=options_@entry=0x7f827829eb28 <qpid::broker::instance+8>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/MessageStoreImpl.cpp:172 #8 0x00007f827804b640 in qpid::broker::StorePlugin::earlyInitialize (this=0x7f827829eb20 <qpid::broker::instance>, target=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/linearstore/StorePlugin.cpp:72 #9 0x00007f827d5b543f in operator() (a1=..., p=<optimized out>, this=<synthetic pointer>) at /usr/include/boost/bind/mem_fn_template.hpp:165 #10 operator()<boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list1<qpid::Plugin* const&> > (a=<synthetic pointer>, f=<synthetic pointer>, this=<synthetic pointer>) at /usr/include/boost/bind/bind.hpp:313 #11 operator()<qpid::Plugin*> (a1=@0x19d1908: 0x7f827829eb20 <qpid::broker::instance>, this=<synthetic pointer>) at /usr/include/boost/bind/bind_template.hpp:47 #12 for_each<__gnu_cxx::__normal_iterator<qpid::Plugin* const*, std::vector<qpid::Plugin*> >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list2<boost::arg<1>, boost::reference_wrapper<qpid::Plugin::Target> > > > (__f=..., __last=..., __first=<qpid::broker::instance>) at /usr/include/c++/4.8.2/bits/stl_algo.h:4417 #13 qpid::(anonymous namespace)::each_plugin<boost::_bi::bind_t<void, boost::_mfi::mf1<void, qpid::Plugin, qpid::Plugin::Target&>, boost::_bi::list2<boost::arg<1>, boost::reference_wrapper<qpid::Plugin::Target> > > > (f=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/Plugin.cpp:73 #14 0x00007f827d5b54e2 in qpid::Plugin::earlyInitAll (t=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/Plugin.cpp:87 #15 0x00007f827daefa76 in qpid::broker::Broker::Broker (this=0x19e4930, conf=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Broker.cpp:310 #16 0x0000000000405aa2 in qpid::broker::QpiddBroker::execute (this=this@entry=0x7ffd69bf058e, options=0x19d96b0) at /usr/src/debug/qpid-cpp-1.36.0/src/posix/QpiddBroker.cpp:229 #17 0x0000000000409914 in qpid::broker::run_broker (argc=3, argv=0x7ffd69bf0928, hidden=<optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpidd.cpp:108 #18 0x00007f827c635545 in __libc_start_main (main=0x404b60 <main(int, char**)>, argc=3, argv=0x7ffd69bf0928, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd69bf0918) at ../csu/libc-start.c:266 #19 0x0000000000404dd1 in _start () (gdb) on the corrupted(?) file /var/lib/qpidd/.qpidd/qls/dat2/__db.001 . Workaround (that is really tricky): delete just the /var/lib/qpidd/.qpidd/qls/dat2/__db.00* file(s). They dont contain info about queues/exchanges/bindings, so no data loss. Reassigning to libdb component in RHEL7, as I have a dedicated reproducer without Satellite or qpidd - even db_recover hungs the same way, with backtrace (gdb) bt #0 0x00007f4cf904f8d8 in __env_size_insert (head=<optimized out>, elp=0x7f4cf93d41a8) at ../../src/env/env_alloc.c:598 #1 0x00007f4cf905a13c in __env_detach (env=env@entry=0x13bc840, destroy=destroy@entry=1) at ../../src/env/env_region.c:823 #2 0x00007f4cf905a682 in __env_remove_env (env=env@entry=0x13bc840) at ../../src/env/env_region.c:934 #3 0x00007f4cf9053f54 in __env_open (dbenv=0x13bc010, db_home=<optimized out>, flags=75271, mode=<optimized out>) at ../../src/env/env_open.c:201 #4 0x0000000000400eed in main (argc=<optimized out>, argv=<optimized out>) at ../../util/db_recover.c:148 (gdb) cancelling needinfo as we have reproducer even outside qpidd (In reply to Chris Roberts from comment #0) > workaround: rm -f /var/lib/qpidd/.qpidd/qls/dat2/* > info about queues etc shall not be touched Given the work-around exists and that it is extra hard to dig deeper to find the cause, and considering RHEL-7 phase, I see chances we will ever be able to fix this as close to zero. Thus, closing this bug. Feel free to re-open and re-assign if it is valuable for the Satellite team, but I don't see any reason why to track this for libdb any more. I'm sorry we couldn't help. |