Bug 1034832

Summary: RHEL7 ipa-replica-manage hang waiting on CLEANALLRUV tasks
Product: Red Hat Enterprise Linux 7 Reporter: Dmitri Pal <dpal>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED NOTABUG QA Contact: Sankar Ramalingam <sramling>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: dpal, jgalipea, martinez, mreynolds, nkinder, rcritten, spoore
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1031852 Environment:
Last Closed: 2014-01-17 22:03:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1031852    

Comment 3 Nathan Kinder 2014-01-06 15:48:49 UTC
It was found that the test procedure was incorrect.  Fixing the test procedure causes the tests to pass, so I'm removing the TestBlocker keyword.  I'll leave this open so some more investigation can be performed, but this is no longer a high priority.

Comment 4 mreynolds 2014-01-17 17:19:38 UTC
As for the cleanallruv hanging - it is just waiting for replication to catch up(which ever happens because the changelog was overwritten).  The task can always be aborted.  No bug on the task side.  Closing bug.

Comment 5 Scott Poore 2014-01-17 18:25:48 UTC
Moving back to Assigned to get a couple questions answered before we completely close this one:

Was the changelog overwritten by the re-initialize I removed from the test?

If so, is it normal behavior for re-initialize to remove the changelog?  Or something else unknown removed the changelog?

How can the cleanallruv task be aborted and this be cleaned up?  I thought the servers were in state that was irreversible?

Thanks,
Scott

Comment 6 mreynolds 2014-01-17 18:42:56 UTC
(In reply to Scott Poore from comment #5)
> Moving back to Assigned to get a couple questions answered before we
> completely close this one:
> 
> Was the changelog overwritten by the re-initialize I removed from the test?

Yes.  Your issue was kind of a timing issue, where changes were not sent before you initialized.

> 
> If so, is it normal behavior for re-initialize to remove the changelog?  Or
> something else unknown removed the changelog?

When you reinitialize you are basically creating a new database, and the previous changelog contents are no longer valid.

> 
> How can the cleanallruv task be aborted and this be cleaned up? 

There is a CLEANALLRUV abort task that can be issued.  Once the initial clean task get hung(waiting for replication to complete), then the task should be aborted, and finally you would need to rerun the task using the "force" mode.  This "force mode" would not check the csns(or the replication state), and just go ahead and clean everything.  It's kind of last ditch effort to clean, but it is the only option at this stage.

More on this here: http://port389.org/wiki/Howto:CLEANRUV

> I thought the servers were in state that was irreversible?

Replication was broken, that can not be fixed with reinitializing once again.  Then CLEANALLRUV would need to be run in the force mode as mentioned above.

> 
> Thanks,
> Scott

Comment 7 Scott Poore 2014-01-17 18:52:18 UTC
(In reply to mreynolds from comment #6)
> (In reply to Scott Poore from comment #5)
...
> > 
> > If so, is it normal behavior for re-initialize to remove the changelog?  Or
> > something else unknown removed the changelog?
> 
> When you reinitialize you are basically creating a new database, and the
> previous changelog contents are no longer valid.
> 

Is this a scenario where ipa-replica-manage should maybe check the changelog before actually doing the reinitialize?  Is that something that is even possible?

> > 
> > How can the cleanallruv task be aborted and this be cleaned up? 
> 
> There is a CLEANALLRUV abort task that can be issued.  Once the initial
> clean task get hung(waiting for replication to complete), then the task
> should be aborted, and finally you would need to rerun the task using the
> "force" mode.  This "force mode" would not check the csns(or the replication
> state), and just go ahead and clean everything.  It's kind of last ditch
> effort to clean, but it is the only option at this stage.
> 
> More on this here: http://port389.org/wiki/Howto:CLEANRUV

Ok, Thanks for the info.  I'll keep this in mind if I hit a similar issue in the future.

Comment 8 mreynolds 2014-01-17 19:52:57 UTC
(In reply to Scott Poore from comment #7)
> (In reply to mreynolds from comment #6)
> > (In reply to Scott Poore from comment #5)
> ...
> > > 
> > > If so, is it normal behavior for re-initialize to remove the changelog?  Or
> > > something else unknown removed the changelog?
> > 
> > When you reinitialize you are basically creating a new database, and the
> > previous changelog contents are no longer valid.
> > 
> 
> Is this a scenario where ipa-replica-manage should maybe check the changelog
> before actually doing the reinitialize?  Is that something that is even
> possible?

It's not that you need to check the change log, but you need to wait for replication to complete or be idle(e.g. by putting all the replicas in read-only mode and checking the replication status of each agreement).  

However, that being said, you shouldn't be reinitializing a replica over and over.  Each replica should only be initialized once.  It's only when you run into a replication problem that you need to reinitialize.

> 
> > > 
> > > How can the cleanallruv task be aborted and this be cleaned up? 
> > 
> > There is a CLEANALLRUV abort task that can be issued.  Once the initial
> > clean task get hung(waiting for replication to complete), then the task
> > should be aborted, and finally you would need to rerun the task using the
> > "force" mode.  This "force mode" would not check the csns(or the replication
> > state), and just go ahead and clean everything.  It's kind of last ditch
> > effort to clean, but it is the only option at this stage.
> > 
> > More on this here: http://port389.org/wiki/Howto:CLEANRUV
> 
> Ok, Thanks for the info.  I'll keep this in mind if I hit a similar issue in
> the future.

Comment 9 Scott Poore 2014-01-17 22:03:34 UTC
So, sounds like there really isn't much that could be done to prevent this on the directory server side.  And I've already changed my tests so this shouldn't be happening anymore for me.

Closing this one again and moving the question over to IPA bug #1031852 for review there.

Thanks for the explanation.
Scott