Bug 1321606

Summary: accPolicy stress tests crashing the server
Product: Red Hat Enterprise Linux 7 Reporter: Sankar Ramalingam <sramling>
Component: 389-ds-baseAssignee: Noriko Hosoi <nhosoi>
Status: CLOSED WORKSFORME QA Contact: Viktor Ashirov <vashirov>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.2CC: nkinder, rmeggins, sramling
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-11 16:06:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Email received from abrt crash report for accPolicy stress tests none

Description Sankar Ramalingam 2016-03-28 15:21:27 UTC
Created attachment 1140893 [details]
Email received from abrt crash report for accPolicy stress tests

Description of problem: Account policy stress tests are making the server to hang the server crashes.


Version-Release number of selected component (if applicable): 389-ds-base-1.3.4.0-29


How reproducible: Consistently on RHEL7.2


Steps to Reproduce:
1. Run accPolicy stress tests by cloning a beaker job.
Eg: https://beaker.engineering.redhat.com/jobs/1274745
2. Server crashes.
3. If you don't observe a crash, try running it manually on the same machine.
    Modify engage.cfg file to remove cleanup tests and Uninstall test suite.
4. I am yet to figure out which test is crashing the server. I will update the bugzilla with more information.

Actual results: Server hangs and slapd crashes.


Expected results: No server crash.


Additional info: It fails on RHEL7.x or RHEL7.2. Not reproducible on RHEL6.x
Also, there was a crash from "/usr/lib/systemd/systemd-logind"

Attaching the crash e-mail.

Comment 7 Sankar Ramalingam 2016-03-29 18:17:14 UTC
The system isn't accessible to me as well. I doubt, the server crashes might have been the reason. I rebooted the server from beaker UI.

Comment 8 Sankar Ramalingam 2016-03-29 18:29:23 UTC
The machine is now accessible. Also, I would like to add more information about the test case which is causing these failures...

Test Case accPolicy_21 does the following and that causes the server to hang/crash, I guess.
1. Set accountInactivityLimit: 31536600 # which is one year
2. Set the system date to 1 year ahead.
3. Check if the account is inactivated
4. Then, change the date back to original using ntpd service.

Patch - https://code.engineering.redhat.com/gerrit/70858

with patch set 1, the execution completed without any hassle. I excluded accPolicy_21 test case.
Bkr job - https://beaker.engineering.redhat.com/jobs/1281583

with patch set 2, the execution is hanging. I added it back to the execution.
Bkr job - https://beaker.engineering.redhat.com/jobs/1282568

Comment 9 Noriko Hosoi 2016-03-29 19:19:45 UTC
Thanks, Sankar.

I could login ibm-x3650m4-02-vm-04.lab.eng.bos.redhat.com.

Did you observe a crash or a hang on this host/test env?

If so, could you please tell me where I can find it?

The error log /var/log/dirsrv/slapd-deftestinst/errors looks clean and I don't see any ns-slapd related logs in /var/log/messages.

No core files are found in /var/log/dirsrv/slapd-deftestinst.

I also checked /var/*/abrt, but I don't see anything there...

Could it be possible to leave a core on the system?
http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-crashes

If it is a hang, how you could tell that?

Thanks.

Comment 10 Sankar Ramalingam 2016-04-11 08:59:35 UTC
Hi Noriko, sorry for the late response. I tried few more attempts over the last two weeks, but couldn't reproduce the crash. I guess, its specific to test environment/machine. So, this can be closed as not reproducible.

Comment 11 Noriko Hosoi 2016-04-11 16:06:48 UTC
(In reply to Sankar Ramalingam from comment #10)
> Hi Noriko, sorry for the late response. I tried few more attempts over the
> last two weeks, but couldn't reproduce the crash. I guess, its specific to
> test environment/machine. So, this can be closed as not reproducible.

Thank you for retesting this case, Sankar!

Closing this bug...