Bug 676655

Summary: winsync stops working after server restart
Product: [Retired] 389 Reporter: Rich Megginson <rmeggins>
Component: Sync ServiceAssignee: Rich Megginson <rmeggins>
Status: CLOSED CURRENTRELEASE QA Contact: Viktor Ashirov <vashirov>
Severity: high Docs Contact:
Priority: high    
Version: 1.2.8CC: juancar, sramling
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 677575 (view as bug list) Environment:
Last Closed: 2015-12-07 17:00:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 639035, 656390    
Attachments:
Description Flags
logs on a variety of platforms/versions
none
0001-Bug-676655-winsync-stops-working-after-server-restar.patch nhosoi: review+

Description Rich Megginson 2011-02-10 15:47:29 UTC
After a restart of the directory service, the error.log shows "Replica has no update vector, it has never been initialized..". 

use service dirsrv restart to restart the server

Comment 1 Rich Megginson 2011-02-10 22:41:23 UTC
Cannot reproduce - steps

1) setup windows sync between current 1.2.8 and Windows 2008
2) added users to DS - confirm they are synced to windows
3) added users to AD - confirm they are synced to ds

service dirsrv restart

added user to DS - confirm synced to ad
added user to AD - confirm synced to ds

Note: this may be broken due to Bug 676689 - crash while adding a new user to be synced to windows

Comment 2 Juan Carlos Camargo 2011-02-15 07:02:55 UTC
I can reproduce it.

1) setup winsync between either 1.2.7 or 1.2.8 and windows 2008/2003
2) added users to either side: sync occurs.

service dirsrv restart (sometimes hangs). But wether it hanged or not, sync has been stopped, the error logs shows the same message of first post.

Cant tell if this was broken due to Bug676689. I was able to add users on the 389ds side flawlessly.

Comment 3 Rich Megginson 2011-02-15 15:05:43 UTC
*** Bug 677575 has been marked as a duplicate of this bug. ***

Comment 4 Rich Megginson 2011-02-15 15:25:37 UTC
(In reply to comment #2)
> I can reproduce it.
> 
> 1) setup winsync between either 1.2.7 or 1.2.8 and windows 2008/2003
> 2) added users to either side: sync occurs.
> 
> service dirsrv restart (sometimes hangs).

When it hangs, can you capture the error log?  For how long does it hang?  What
do you do when it hangs?

> But wether it hanged or not, sync has
> been stopped, the error logs shows the same message of first post.
> 
> Cant tell if this was broken due to Bug676689. I was able to add users on the
> 389ds side flawlessly.

Por favor, can you run the server with the replication log level, reproduce the
problem, then attach your errors log to this bug? 
http://directory.fedoraproject.org/wiki/FAQ#Troubleshooting

Comment 5 Juan Carlos Camargo 2011-02-17 09:01:53 UTC
Sorry about the delay, but I've been testing some platforms and versions in order to help you find the cause. I've uploaded a sort or report/log. 
Regards.

Comment 6 Juan Carlos Camargo 2011-02-17 09:06:32 UTC
Created attachment 479285 [details]
logs on a variety of platforms/versions

Comment 7 Rich Megginson 2011-02-18 19:12:33 UTC
So it doesn't work on your 32-bit systems, but it does work on your 64-bit systems?

On your 32-bit systems, does it always hang when you try to shut it down?  And you always have to kill it?  If so, doing the kill could be causing the winsync data to be corrupted in the configuration, such that it thinks the server needs to be reinitialized.

Comment 8 Juan Carlos Camargo 2011-02-21 06:34:59 UTC
That's it. It works on 64-bit systems. In fact, it's been running the whole weekend and now I've been able to restart it gracefully and winsync keeps working.

No. On 32-bit systems, sometimes it hangs, sometimes it does not. And no matter if it hanged or not, winsync wont work afterwards.

Comment 9 Rich Megginson 2011-02-28 22:58:30 UTC
Created attachment 481466 [details]
0001-Bug-676655-winsync-stops-working-after-server-restar.patch

Comment 10 Rich Megginson 2011-02-28 23:35:51 UTC
To ssh://git.fedorahosted.org/git/389/ds.git
   e05a918..11f265b  master -> master
commit 11f265bdd03832c6255cbf5ba3b79510346a3be0
Author: Rich Megginson <rmeggins>
Date:   Mon Feb 28 15:15:27 2011 -0700
    Reviewed by: nhosoi (Thanks!)
    Branch: master
    Fix Description: The problem really is that the server does not shutdown.
    You have to kill -9 the server.  When it restarts, the sync agreement is no
    longer working.  The problem was only observed on 32-bit platforms.  The
    size of the mmr Repl_Connection structure is different than the size of
    the winsync Repl_Connection structure.  The code was calling conn_delete()
    on the winsync Repl_Connection*.  We have to use windows_conn_delete instead
    The repl protocol code did not have a place to do that, so I added a new
    delete_conn function pointer so that prot_free can call the right
    conn_delete function.
    Platforms tested: RHEL5 i386, RHEL6 x86_64
    Flag Day: no
    Doc impact: no
To ssh://git.fedorahosted.org/git/389/ds.git
   1e5fbb5..9eb6f1b  389-ds-base-1.2.8 -> 389-ds-base-1.2.8
commit 9eb6f1b913555948f0251d4b7a494c989fb3ebf8
Author: Rich Megginson <rmeggins>
Date:   Mon Feb 28 15:15:27 2011 -0700

Comment 11 Sankar Ramalingam 2011-08-25 10:04:21 UTC
Users and Groups synced from both AD and DS after restart. So marking the bug as verified.