Bug 186657
| Summary: | PassSync crashed after running four hours under load | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Directory Server | Reporter: | To Ngan <tngan> | ||||||||||
| Component: | Sync Service | Assignee: | Nathan Kinder <nkinder> | ||||||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Orla Hegarty <ohegarty> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | medium | ||||||||||||
| Version: | 7.1 | CC: | nhosoi, rmeggins | ||||||||||
| Target Milestone: | DS8.0 | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | All | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2006-05-26 20:34:58 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 152373, 182367, 240316 | ||||||||||||
| Attachments: |
|
||||||||||||
|
Description
To Ngan
2006-03-25 02:25:37 UTC
Created attachment 126687 [details]
PassSync crash screen shot
After last crash, I restarted passsync and continued the test. PassSync.exe crashed again, this time happened after 12 hours. I clicked Abort on the crash dialog. When I tried to restart passsync.exe again, it failed to start. lsass.exe is taking up lots of CPU cycle, memory footprint is only 88520k, but VM size is 1.32G. The whole system became pretty much non-responsive. Again, about 2 minutes before this 2nd crash, passsync.log shows repeated
attempts to process what appears to be a bad entry (Memory corruption? Bad data
due to file contention?)
03/25/06 02:55:02: Attempting to sync password for testuser0061
03/25/06 02:55:02: Searching for (ntuserdomainid=testuser0061)
03/25/06 02:55:02: Ldap error in ModifyPassword
19: Constraint violation
03/25/06 02:55:02: Modify password failed for remote entry:
uid=tuser0061,ou=people,dc=sfbay,dc=redhat,dc=com,dc=local
03/25/06 02:55:02: Deferring password change for testuser0061
03/25/06 02:55:02: Attempting to sync password for ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
03/25/06 02:55:02: Searching for
(ntuserdomainid=ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃà )
03/25/06 02:55:02: There are no entries that match: ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
03/25/06 02:55:02: Deferring password change for ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
03/25/06 02:55:02: Backing off for 2000ms
03/25/06 02:55:02: Backoff time expired. Attempting sync
03/25/06 02:55:02: 1 new entries loaded from data file
03/25/06 02:55:02: Cleared contents of data file
03/25/06 02:55:02: Password list has 3 entries
03/25/06 02:55:02: Attempting to sync password for testuser0061
03/25/06 02:55:02: Searching for (ntuserdomainid=testuser0061)
03/25/06 02:55:02: Ldap error in ModifyPassword
19: Constraint violation
03/25/06 02:55:02: Modify password failed for remote entry:
uid=tuser0061,ou=people,dc=sfbay,dc=redhat,dc=com,dc=local
03/25/06 02:55:02: Deferring password change for testuser0061
03/25/06 02:55:02: Attempting to sync password for ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
03/25/06 02:55:02: Searching for
(ntuserdomainid=ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃà )
03/25/06 02:55:02: There are no entries that match: ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
03/25/06 02:55:02: Deferring password change for ÃÃÃÃÃÃÃÃÃÃÃÃÃýýýýÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
Created attachment 126998 [details]
CVS DIffs
These changes add a mutex around accessing the passhook.dat file. The logic is
that you must aquire the mutex before reading the changes out of the file
(loadSet), and you release the mutex after saving the file back out (saveSet.
Both passhook.dll and passsync.exe always do a loadSet then a saveSet. The
work done between loadSet and saveSet is quite small, so the mutex is not being
held longer than necessary.
A few changes were required by the addition of the mutex. Since passsync.exe
actually creates the mutex object, the mutex will not exist when the passsync
service is not running. Passhook.dll need to assume that it's safe to continue
accessing the file if the mutex does not exist. The passsync service also
needed to be modified because of this. Prior to these changes, passsync would
try to read the contents of passhook.dat immediately after startup. Since
passhook.dll may be accessing the file without a mutex at that time, we need to
be more cautious. I changed the behavior of passsync.exe to either wait for an
event from passhook.dll, or to wakeup every 5 minutes to check the passhook.dat
file for changes. It will no longer check the data file at startup. This
allows any running passhook.dll function to finish it's job (which is very
quick since there's only ever one password change to deal with per function
call). Every call to passhook.dll after passsync.exe is running will see that
the mutex object exists and wait to aquire it before accessing the data file.
If passhook.dll can't aquire the mutex, it will continue after 30 seconds. We
don't want passhook.dll to wait forever since it will block the lsass.exe
process.
There will also a few small cleaup items in this fix. One was a memory leak
where we needed to un-freeze a strstream object to allow the destructor to free
the memory used.
Created attachment 127069 [details]
Revised Diffs
In response to Rich's review, the fix has been changed. There was a concern
that having a timeout on passhook and passsync could result in either lost
password changes, blocking lsass, or ignoring the mutex and corrupting the data
file.
This new approach deals with the all of the above issues. Basically, we are
still using a mutex around access to the passhook data file, but we will have
no timeouts while waiting to acquire the mutex on either the passhook or
passsync side. In order to not block the lsass.exe process, we now have
passhook.dll create a thread for each password change. This thread will aquire
the mutex, add the change to the data file, and release the mutex. If the
thread has to wait a long time to acquire the mutex (in the case that passsync
is reading in a huge data file), it will not block lsass.exe from going about
it's normal business.
In order to avoid the overhead of constantly creating a handle to the mutex
object in passhook.dll for every change, the handle is created when the dll is
initialized at startup now. We also read the logLevel settign from the
registry at this time instead of doing it for every password change as we did
before. Since the threads will be writing to the passhook log file now, the
mutex must protect that file too. All access to the passhook log file is done
with the mutex acquired.
Created attachment 127081 [details]
Revised Diffs
In response to reviews, there are a few minor modifications. These changes
mainly concern logging error conditions and error checking.
Checked into winsync (HEAD). Reviewed by Rich, Noriko, and Pete. Checking in passwordsync/passhand.cpp; /cvs/dirsec/winsync/passwordsync/passhand.cpp,v <-- passhand.cpp new revision: 1.10; previous revision: 1.9 done Checking in passwordsync/passhand.h; /cvs/dirsec/winsync/passwordsync/passhand.h,v <-- passhand.h new revision: 1.8; previous revision: 1.7 done Checking in passwordsync/passhook/passhook.cpp; /cvs/dirsec/winsync/passwordsync/passhook/passhook.cpp,v <-- passhook.cpp new revision: 1.9; previous revision: 1.8 done Checking in passwordsync/passsync/syncserv.cpp; /cvs/dirsec/winsync/passwordsync/passsync/syncserv.cpp,v <-- syncserv.cpp new revision: 1.14; previous revision: 1.13 done Checking in passwordsync/passsync/syncserv.h; /cvs/dirsec/winsync/passwordsync/passsync/syncserv.h,v <-- syncserv.h new revision: 1.8; previous revision: 1.7 done Verified against build 20060330.1 Same mod passwd script ran for days without passsync.exe crash. Also bad passhook.dat file won't cause lsass.exe to hog memory and lock up the whole system. It'll simply print out errors in the passhook.log and passsync.log. *DOCS* Password sync tool under sustained heavy password modification load on the Active Directory side could potentially crash passsync.exe. This has been fixed. DS7.2 is not a valid milestone anymore. Anything thats set to DS7.2 should be set to DS8.0. Will make further changes per bug council on 07/24/2007, after this. Bug already CLOSED. setting screened+ flag |