Bug 982621

Summary:	Disk monitoring not shutting down with nsslapd-disk-monitoring-logging-critical set to off
Product:	Red Hat Enterprise Linux 6	Reporter:	Ján Rusnačko <jrusnack>
Component:	389-ds-base	Assignee:	mreynolds
Status:	CLOSED DUPLICATE	QA Contact:	Sankar Ramalingam <sramling>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	6.4	CC:	jgalipea, jrusnack, nkinder
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-07-12 21:25:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ján Rusnačko 2013-07-09 12:24:02 UTC

Description of problem:
Disk Monitoring plugin is not triggered and does not work if nsslapd-disk-monitoring-logging-critical set to off. However, with nsslapd-disk-monitoring-logging-critical set to on, it works properly.

Version-Release number of selected component (if applicable):
389-ds-base-1.2.11.15-16.el6_4.x86_64

How reproducible:
always

Steps to Reproduce:
1. Enabled disk monitoring plugin and set nsslapd-disk-monitoring-logging-critical to off:

[jrusnack@dstet dstet]$ ldapsearch -D "cn=directory manager" -w Secret123 -b "cn=config" -s base | grep "nsslapd-disk-monitoring"
nsslapd-disk-monitoring: on
nsslapd-disk-monitoring-threshold: 30000000
nsslapd-disk-monitoring-grace-period: 1
nsslapd-disk-monitoring-logging-critical: off

2. Restart DS
3. Fill up space to go below the threshold:
[jrusnack@dstet dstet]$ dd if=/dev/zero of=/var/log/dirsrv/slapd-dstet/foo bs=1M count=20
20+0 records in
20+0 records out
20971520 bytes (21 MB) copied, 0.462266 s, 45.4 MB/s
[jrusnack@dstet dstet]$ df -h /var/log/dirsrv/slapd-dstet
Filesystem            Size  Used Avail Use% Mounted on
/home/jrusnack/tmpfs   39M   25M   13M  67% /var/log/dirsrv/slapd-dstet

4. Check whether DS is in shutdown mode:

[jrusnack@dstet dstet]$ tail /var/log/dirsrv/slapd-dstet/errors
[09/Jul/2013:05:26:59 +0200] - slapd stopped.
[09/Jul/2013:05:27:48 +0200] - 389-Directory/1.2.11.15 B2013.182.2043 starting up
[09/Jul/2013:05:27:48 +0200] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[09/Jul/2013:05:28:10 +0200] - slapd shutting down - signaling operation threads
[09/Jul/2013:05:28:10 +0200] - slapd shutting down - closing down internal subsystems and plugins
[09/Jul/2013:05:28:11 +0200] - Waiting for 4 database threads to stop
[09/Jul/2013:05:28:11 +0200] - All database threads now stopped
[09/Jul/2013:05:28:12 +0200] - slapd stopped.
[09/Jul/2013:05:28:14 +0200] - 389-Directory/1.2.11.15 B2013.182.2043 starting up
[09/Jul/2013:05:28:14 +0200] - slapd started.  Listening on All Interfaces port 389 for LDAP requests

Actual results:
If filesystem is filled to go below the threshold, disk monitoring plugin is not tiggered - it does not disable verbose logging, remove logs or disable logging. 

If filesystem is filled to go below 1/2 of the threshold, DS does not enter shutdown period and no error message appears in the error log.

Default behavior, when disk monitoring plugin is invoked every 10 seconds, was taken into account.

Expected results:
Disk monitoring plugin should be tiggered.

Additional info:
Already automated in disk_monitoring testsuite.

Comment 4 mreynolds 2013-07-09 19:59:54 UTC

The server is not going into shut-down mode because you are not continuing to lose disk space.  See the design doc:

http://directory.fedoraproject.org/wiki/Disk_Monitoring

The feature only wants to shut-down the server as a last resort.  If things remain stable, it doesn't do anything.  I do think there is room for improvement in this area though.

It looks like you just consumed a chunk of disk space at one shot.  If you continue to consume disk space does it go into shutdown mode?

Comment 5 Ján Rusnačko 2013-07-09 20:07:09 UTC

(In reply to mreynolds from comment #4)
> The server is not going into shut-down mode because you are not continuing
> to lose disk space.  See the design doc:
> 
> http://directory.fedoraproject.org/wiki/Disk_Monitoring
> 
> The feature only wants to shut-down the server as a last resort.  If things
> remain stable, it doesn't do anything.  I do think there is room for
> improvement in this area though.
> 
> It looks like you just consumed a chunk of disk space at one shot.  If you
> continue to consume disk space does it go into shutdown mode?

According to the design doc: 

"Once the available disk space on any of the disks gets below the threshold we start taking action."

The space was consumed in one shot, however, since the threshold was passed, as the first thing verbose logging should get disabled, right ? It was not, therefore I assumed disk monitoring plugin was not even triggered when the space dropped below the threshold.

I will investigate what will happen if available space continues to drop and report back.

Comment 6 mreynolds 2013-07-09 20:15:41 UTC

(In reply to Ján Rusnačko from comment #5)
> (In reply to mreynolds from comment #4)
> > The server is not going into shut-down mode because you are not continuing
> > to lose disk space.  See the design doc:
> > 
> > http://directory.fedoraproject.org/wiki/Disk_Monitoring
> > 
> > The feature only wants to shut-down the server as a last resort.  If things
> > remain stable, it doesn't do anything.  I do think there is room for
> > improvement in this area though.
> > 
> > It looks like you just consumed a chunk of disk space at one shot.  If you
> > continue to consume disk space does it go into shutdown mode?
> 
> According to the design doc: 
> 
> "Once the available disk space on any of the disks gets below the threshold
> we start taking action."
> 
> The space was consumed in one shot, however, since the threshold was passed,
> as the first thing verbose logging should get disabled, right ? It was not,
> therefore I assumed disk monitoring plugin was not even triggered when the
> space dropped below the threshold.

This is the area that can use some improvement.  It expects that the threshold will be hit first, but not past the halfway mark.  So it makes a complete pass, then on the next pass if space continues to drop it will go into shutdown mode.  But in your case, even though you are past the threshold halfway mark, it will not enter the shutdown code because it happened in one shot and did not continue to lose disk space.  

So yes, there is a bug to fix, but I just wanted you to verify this behavior.

Thanks,
Mark
> 
> I will investigate what will happen if available space continues to drop and
> report back.

Comment 7 Ján Rusnačko 2013-07-09 20:19:09 UTC

(In reply to mreynolds from comment #6)
> (In reply to Ján Rusnačko from comment #5)
> > (In reply to mreynolds from comment #4)
> > > The server is not going into shut-down mode because you are not continuing
> > > to lose disk space.  See the design doc:
> > > 
> > > http://directory.fedoraproject.org/wiki/Disk_Monitoring
> > > 
> > > The feature only wants to shut-down the server as a last resort.  If things
> > > remain stable, it doesn't do anything.  I do think there is room for
> > > improvement in this area though.
> > > 
> > > It looks like you just consumed a chunk of disk space at one shot.  If you
> > > continue to consume disk space does it go into shutdown mode?
> > 
> > According to the design doc: 
> > 
> > "Once the available disk space on any of the disks gets below the threshold
> > we start taking action."
> > 
> > The space was consumed in one shot, however, since the threshold was passed,
> > as the first thing verbose logging should get disabled, right ? It was not,
> > therefore I assumed disk monitoring plugin was not even triggered when the
> > space dropped below the threshold.
> 
> This is the area that can use some improvement.  It expects that the
> threshold will be hit first, but not past the halfway mark.  So it makes a
> complete pass, then on the next pass if space continues to drop it will go
> into shutdown mode.  But in your case, even though you are past the
> threshold halfway mark, it will not enter the shutdown code because it
> happened in one shot and did not continue to lose disk space.  
> 
> So yes, there is a bug to fix, but I just wanted you to verify this behavior.
Ok, thanks !
> 
> Thanks,
> Mark
> > 
> > I will investigate what will happen if available space continues to drop and
> > report back.
I have gradually decreased amount of available space by 2MB every 2 seconds - disk monitoring plugin was not triggered. I was able to hit 0 free space without DS immediately shutting down.

Comment 8 mreynolds 2013-07-09 20:24:13 UTC

(In reply to Ján Rusnačko from comment #7)
> (In reply to mreynolds from comment #6)
> > (In reply to Ján Rusnačko from comment #5)
> > > (In reply to mreynolds from comment #4)
> > > > The server is not going into shut-down mode because you are not continuing
> > > > to lose disk space.  See the design doc:
> > > > 
> > > > http://directory.fedoraproject.org/wiki/Disk_Monitoring
> > > > 
> > > > The feature only wants to shut-down the server as a last resort.  If things
> > > > remain stable, it doesn't do anything.  I do think there is room for
> > > > improvement in this area though.
> > > > 
> > > > It looks like you just consumed a chunk of disk space at one shot.  If you
> > > > continue to consume disk space does it go into shutdown mode?
> > > 
> > > According to the design doc: 
> > > 
> > > "Once the available disk space on any of the disks gets below the threshold
> > > we start taking action."
> > > 
> > > The space was consumed in one shot, however, since the threshold was passed,
> > > as the first thing verbose logging should get disabled, right ? It was not,
> > > therefore I assumed disk monitoring plugin was not even triggered when the
> > > space dropped below the threshold.
> > 
> > This is the area that can use some improvement.  It expects that the
> > threshold will be hit first, but not past the halfway mark.  So it makes a
> > complete pass, then on the next pass if space continues to drop it will go
> > into shutdown mode.  But in your case, even though you are past the
> > threshold halfway mark, it will not enter the shutdown code because it
> > happened in one shot and did not continue to lose disk space.  
> > 
> > So yes, there is a bug to fix, but I just wanted you to verify this behavior.
> Ok, thanks !
> > 
> > Thanks,
> > Mark
> > > 
> > > I will investigate what will happen if available space continues to drop and
> > > report back.
> I have gradually decreased amount of available space by 2MB every 2 seconds
> - disk monitoring plugin was not triggered. I was able to hit 0 free space
> without DS immediately shutting down.

Yeah that's bad, and surprising.  I'll start working on this right away.

Comment 9 Ján Rusnačko 2013-07-09 20:28:45 UTC

(In reply to mreynolds from comment #8)
> (In reply to Ján Rusnačko from comment #7)
> > (In reply to mreynolds from comment #6)
> > > (In reply to Ján Rusnačko from comment #5)
> > > > (In reply to mreynolds from comment #4)
> > > > > The server is not going into shut-down mode because you are not continuing
> > > > > to lose disk space.  See the design doc:
> > > > > 
> > > > > http://directory.fedoraproject.org/wiki/Disk_Monitoring
> > > > > 
> > > > > The feature only wants to shut-down the server as a last resort.  If things
> > > > > remain stable, it doesn't do anything.  I do think there is room for
> > > > > improvement in this area though.
> > > > > 
> > > > > It looks like you just consumed a chunk of disk space at one shot.  If you
> > > > > continue to consume disk space does it go into shutdown mode?
> > > > 
> > > > According to the design doc: 
> > > > 
> > > > "Once the available disk space on any of the disks gets below the threshold
> > > > we start taking action."
> > > > 
> > > > The space was consumed in one shot, however, since the threshold was passed,
> > > > as the first thing verbose logging should get disabled, right ? It was not,
> > > > therefore I assumed disk monitoring plugin was not even triggered when the
> > > > space dropped below the threshold.
> > > 
> > > This is the area that can use some improvement.  It expects that the
> > > threshold will be hit first, but not past the halfway mark.  So it makes a
> > > complete pass, then on the next pass if space continues to drop it will go
> > > into shutdown mode.  But in your case, even though you are past the
> > > threshold halfway mark, it will not enter the shutdown code because it
> > > happened in one shot and did not continue to lose disk space.  
> > > 
> > > So yes, there is a bug to fix, but I just wanted you to verify this behavior.
> > Ok, thanks !
> > > 
> > > Thanks,
> > > Mark
> > > > 
> > > > I will investigate what will happen if available space continues to drop and
> > > > report back.
> > I have gradually decreased amount of available space by 2MB every 2 seconds
> > - disk monitoring plugin was not triggered. I was able to hit 0 free space
> > without DS immediately shutting down.
> 
> Yeah that's bad, and surprising.  I'll start working on this right away.
Oh, to be precise and avoid misunderstanding, I have first depleted space below half of the threshold in one shot, and only THEN decreased space by 2 MB till 0 free space.

Comment 10 mreynolds 2013-07-10 20:04:37 UTC

Jan, I have fixed all the reported issues.  Should I do a respin of 1.2.11, or are you still running other tests?

Thanks,
Mark

Comment 11 Ján Rusnačko 2013-07-11 07:41:18 UTC

(In reply to mreynolds from comment #10)
> Jan, I have fixed all the reported issues.  Should I do a respin of 1.2.11,
> or are you still running other tests?
> 
> Thanks,
> Mark
Hi Mark,

All the tests are automated and I have reported all issues I have found. Please continue with respin.

Thank you,
Jan

Comment 12 Nathan Kinder 2013-07-12 21:25:55 UTC

Closing this as a duplicate of bug 972930.  The fix for that bug was not correct, and caused the issue described in this report.  We will continue to use 972930 for this issue, as it is already acked for the planned 6.4.z update.

*** This bug has been marked as a duplicate of bug 972930 ***