Bug 1443470
| Summary: | Orphaned qpid journal files when unregistering a content host | |||
|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | |
| Component: | Qpid | Assignee: | Pavel Moravec <pmoravec> | |
| Status: | CLOSED ERRATA | QA Contact: | Sanket Jagtap <sjagtap> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 6.2.8 | CC: | andrew.schofield, bbuckingham, bkearney, jcallaha, kim.vdriet, mcressma, sjagtap, zhunting | |
| Target Milestone: | Unspecified | Keywords: | Triaged | |
| Target Release: | Unused | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | qpid-cpp-0.34-25 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 1067429 | |||
| : | 1463819 (view as bug list) | Environment: | ||
| Last Closed: | 2017-08-10 17:02:29 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Pavel Moravec
2017-04-19 10:00:18 UTC
Kim, as journal files in /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use (or similar) directory are left orphaned, is it a safe workaround to: - identify what such files dont have a symlink from /var/lib/qpidd/.qpidd/qls/jrnl2/*/* - delete such files Is this procedure safe? Does it require qpidd being stopped? FYI this seems not to be _current_ problem in Sat6.2 that uses qpid-cpp 0.30. That qpid-cpp version isnt affected. Only 0.34 version is affected, where we plan to upgrade to, due to bz 1367735. (In reply to Pavel Moravec from comment #1) > Kim, > as journal files in > > /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use > > (or similar) directory are left orphaned, is it a safe workaround to: > - identify what such files dont have a symlink from > > /var/lib/qpidd/.qpidd/qls/jrnl2/*/* > > - delete such files > > Is this procedure safe? Does it require qpidd being stopped? Yes, this should be safe. No, qpidd does not need to be stopped, as it has lost all references and interest in the orphaned files. I suggest trying this procedure out on a running broker to be sure. If it works, a simple clean-up script could be run periodically from cron to do this. This is only a workaround, this behavior is a bug and should be fixed. As an alternative to deleting these files, they can be moved into the emtpy file pool for later re-use. Their size should determine which EFP directory to place them in. IIRC, if there has been a stoppage or sudden crash, it sometimes can happen that orphaned files are not of the correct size (as they were interrupted during creation). These files should be deleted. > As an alternative to deleting these files, they can be moved into the emtpy
> file pool for later re-use.
To clarify, these files can be moved up one directory out of NNNNk/in_use into the NNNNk directory (by default 2048k) so that they can be re-used. However, if this is done while the broker is running, the moved files will not be seen until a restart as the store initializes its internal list of unused files only on startup.
Thanks Kim. I was wrong earlier about the version affected. 0.34 is affected but 0.30 isnt. So an upgrade to 0.34 would have this BZ as a regression. Kim,
can you please review my patch (against upstream)?
I did some testing (like delete (non)empty queue with 1 or more files), but nothing more serious.
Reasons behind the code changes:
- LinearFileController::purgeEmptyFilesToEfp (when called from jcntl::delete_jrnl_files) needs to return _really_ all files, including the latest one
- but it can do so at the right time, since the sequence must be:
- first close currentJournalFilePtr_ (otherwise segfault in atomic counters for AIO)
- purge files
- finalize the L.F.C.
- remove the directory (in jcntl::delete_jrnl_files)
The above explains both split of LinearFileController::finalize and argument for LinearFileController::purgeEmptyFilesToEfp.
Patch itself:
diff -rup journal-orig/jcntl.cpp journal/jcntl.cpp
--- journal-orig/jcntl.cpp 2017-05-13 20:37:09.193101767 +0200
+++ journal/jcntl.cpp 2017-05-16 14:05:11.780167885 +0200
@@ -145,7 +145,8 @@ void
jcntl::delete_jrnl_files()
{
stop(true); // wait for AIO to complete
- _linearFileController.purgeEmptyFilesToEfp();
+ _linearFileController.closeCurrentJournal();
+ _linearFileController.purgeEmptyFilesToEfp(true);
_jdir.delete_dir();
}
diff -rup journal-orig/LinearFileController.cpp journal/LinearFileController.cpp
--- journal-orig/LinearFileController.cpp 2017-05-13 20:37:09.192101773 +0200
+++ journal/LinearFileController.cpp 2017-05-16 14:17:11.589132413 +0200
@@ -49,16 +49,20 @@ void LinearFileController::initialize(co
}
void LinearFileController::finalize() {
- if (currentJournalFilePtr_) {
- currentJournalFilePtr_->close();
- currentJournalFilePtr_ = 0;
- }
+ closeCurrentJournal();
while (!journalFileList_.empty()) {
delete journalFileList_.front();
journalFileList_.pop_front();
}
}
+void LinearFileController::closeCurrentJournal() {
+ if (currentJournalFilePtr_) {
+ currentJournalFilePtr_->close();
+ currentJournalFilePtr_ = 0;
+ }
+}
+
void LinearFileController::addJournalFile(JournalFile* journalFilePtr,
const uint32_t completedDblkCount,
const bool makeCurrentFlag) {
@@ -105,9 +109,10 @@ void LinearFileController::restoreEmptyF
addJournalFile(fileName, emptyFilePoolPtr_->getIdentity(), getNextFileSeqNum(), 0);
}
-void LinearFileController::purgeEmptyFilesToEfp() {
+void LinearFileController::purgeEmptyFilesToEfp(bool force_all) {
slock l(journalFileListMutex_);
- while (journalFileList_.front()->isNoEnqueuedRecordsRemaining() && journalFileList_.size() > 1) { // Can't purge last file, even if it has no enqueued records
+ while ((force_all && (journalFileList_.size() > 0)) || // when deleting the queue, remove really all journal files, otherwise ..
+ ((journalFileList_.size() > 1) && (journalFileList_.front()->isNoEnqueuedRecordsRemaining()))) { // .. dont purge last file, even if it has no enqueued records
emptyFilePoolPtr_->returnEmptyFileSymlink(journalFileList_.front()->getFqFileName());
delete journalFileList_.front();
journalFileList_.pop_front();
diff -rup journal-orig/LinearFileController.h journal/LinearFileController.h
--- journal-orig/LinearFileController.h 2017-05-13 20:37:09.192101773 +0200
+++ journal/LinearFileController.h 2017-05-16 14:04:52.817274436 +0200
@@ -60,6 +60,7 @@ public:
EmptyFilePool* emptyFilePoolPtr,
uint64_t initialFileNumberVal);
void finalize();
+ void closeCurrentJournal();
void addJournalFile(JournalFile* journalFilePtr,
const uint32_t completedDblkCount,
@@ -71,7 +72,7 @@ public:
uint64_t getNextRecordId();
void removeFileToEfp(const std::string& fileName);
void restoreEmptyFile(const std::string& fileName);
- void purgeEmptyFilesToEfp();
+ void purgeEmptyFilesToEfp(bool force_all=false);
// Functions for manipulating counts of non-current JournalFile instances in journalFileList_
uint32_t getEnqueuedRecordCount(const uint64_t fileSeqNumber);
Pavel, The patch looks good to me. Coincidentally, I submitted a similar patch (though not quite as elegant) on BZ1067429. Thanks for feedback. I run some stress tests (concurrent creating+enqueueing+dequeueing+deleting of queues) and: - no stability issue noticed - all files returned properly - no jrnl file leak elsewhere so commited my patch to upstream. Build: Satellite 6.2.11 snap1
On Satellite
for i in {1..1000};do ls /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use | wc; sleep 5;done
12 12 504
11 11 462
12 12 504
12 12 504
12 12 504
11 11 462
12 12 504
12 12 504
12 12 504
11 11 462
12 12 504
12 12 504
12 12 504
11 11 462
12 12 504
12 12 504
12 12 504
12 12 504
12 12 504
12 12 504
12 12 504
On client system approx for 30 times
for i in {1..100}; do bash apple.sh echo $i;done
This system is already registered. Use --force to override
System has been unregistered.
The system has been registered with ID: 5a658ee2--2fa893e060fd
Installed Product Current Status:
Product Name: Red Hat Enterprise Linux Server
Status: Subscribed
Redirecting to /bin/systemctl restart goferd.service
The contents of the folder did not increase and were stable.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2466 |