Bug 1310529

Summary:	Is there a way to disable the creation of check_writable.nodename.xxxxx hidden files ?
Product:	Red Hat Enterprise Linux 6	Reporter:	nikhil kshirsagar <nkshirsa>
Component:	resource-agents	Assignee:	Oyvind Albrigtsen <oalbrigt>
Status:	CLOSED WONTFIX	QA Contact:	cluster-qe <cluster-qe>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	6.6	CC:	agk, cfeist, cluster-maint, cww, fdinitto, jkortus, jpokorny, mnovacek, nkshirsa, oalbrigt, rmccabe
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-06-20 15:10:44 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1414139
Bug Blocks:	1269194

Description nikhil kshirsagar 2016-02-22 07:11:20 UTC

Description of problem:
Red Hat Linux HA Cluster is creating hidden .check_writable.nodename.xxxxx and interfering with customer application. These .check_writable files are on a Samba mount and picked up by Autosys client when created. Is it possible to disable this feature or at least exclude specific directories underneath the mount point.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1023099 has the patch for it, but it appears like the files are not deleted.

Comment 2 Fabio Massimo Di Nitto 2016-02-22 08:20:27 UTC

What version of resource-agents is the customer using?

Comment 4 Fabio Massimo Di Nitto 2016-02-22 08:29:10 UTC

(In reply to nikhil from comment #0)
> Description of problem:
> Red Hat Linux HA Cluster is creating hidden .check_writable.nodename.xxxxx
> and interfering with customer application. These .check_writable files are
> on a Samba mount and picked up by Autosys client when created. Is it
> possible to disable this feature or at least exclude specific directories
> underneath the mount point.

It´s not clear what you mean by: "or at least exclude specific directories from underneath the mount point."

The check is very specific and only uses the top level mount point and files are moved immediately:

        if [ $rw -eq $YES ]; then
                file=$(mktemp "$mount_point/.check_writable.$(hostname).XXXXXX")
                if [ ! -e "$file" ]; then
                        ocf_log err "${OCF_RESOURCE_INSTANCE}: is_alive: failed write test on [$mount_point]. Return code: $errcode"
                        return $NO
                fi
                rm -f $file > /dev/null 2> /dev/null
        fi


> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> 
> 
> Expected results:
> 
> 
> Additional info:
> https://bugzilla.redhat.com/show_bug.cgi?id=1023099 has the patch for it,
> but it appears like the files are not deleted.

files are deleted immediately after, see above.

Comment 5 nikhil kshirsagar 2016-02-22 08:34:44 UTC

I think the customer has a situation where these files are picked up as soon as they are created by the Autosys client, so it seems like they want to disable the creation of these files.

I have asked for the version of resource-agents.

Comment 6 Oyvind Albrigtsen 2016-02-22 08:51:59 UTC

This check is run when OCF_CHECK_LEVEL is less than 20.

        [ $OCF_CHECK_LEVEL -lt 20 ] && return $YES
...
        if [ $rw -eq $YES ]; then
                file=$(mktemp "$mount_point/.check_writable.$(hostname).XXXXXX")

Comment 7 Oyvind Albrigtsen 2016-02-22 13:15:43 UTC

Can you get the configuration from the customer as well?

I'm having issues reproducing it.

Comment 8 nikhil kshirsagar 2016-03-09 05:35:23 UTC

I think the customer has a situation where these files are picked up as soon as they are created by the Autosys client. Is there any particular information you need? It may be that since the files are short lived, their script was run at the exact time the file was created, and managed to copy it over before the files were deleted. I am confirming from them if this was indeed the situation.

Comment 20 Oyvind Albrigtsen 2016-11-24 12:44:40 UTC

https://github.com/ClusterLabs/resource-agents/pull/889

Comment 21 Oyvind Albrigtsen 2016-11-24 13:25:15 UTC

Step by step reproducer

Before:
Add service to /etc/cluster/cluster.conf:
<fs device="/dev/sdb" fstype="ext4" mountpoint="/mnt/fstest" name="filesystem" >
# mount -o ro,remount /mnt/fstest/
# grep is_alive: /var/log/cluster/rgmanager.log
Nov 24 14:16:53 rgmanager [fs] fs:filesystem: is_alive: failed write test on [/mnt/fstest]. Return code: 0

Remounts automatically
# mount
...
/dev/sdb on /mnt/fstest type ext4 (rw)


After:
Add "write_check" part to service /etc/cluster/cluster.conf:
<fs device="/dev/sdb" fstype="ext4" mountpoint="/mnt/fstest" name="filesystem" write_check="off"/>
# mount -o ro,remount /mnt/fstest/
# grep is_alive: /var/log/cluster/rgmanager.log

No new is_alive errors reported, and the partition isnt remounted rw
# mount
...
/dev/sdb on /mnt/fstest type ext4 (ro)

Comment 22 Jan Pokorný [poki] 2016-11-25 15:58:29 UTC

Hold on a sec!

Isn't it enough to explicitly set:

<action name="status" depth="20" timeout="0" interval="0"/>
<action name="monitor" depth="20" timeout="0" interval="0"/>

for any resource that defines the same implicitly with nonzero
interval and which you want to override to achieve status
(monitor) with level of 20 never being triggered?

Comment 23 Jan Pokorný [poki] 2016-11-25 15:59:46 UTC

Set = configure in cluster.conf.

Comment 25 Jan Pokorný [poki] 2017-01-16 19:19:54 UTC

Just rechecked that the approach in [comment 22] does indeed work.

Few pointers to the other (luci) bug I was recently dealing with and
which was addressing actions in general:
- "monitor" action disregarded:  [bug 1173942 comment 17], point 1b.
- where should actions be defined: [bug 1173942 comment 17], point 2.

So, in order to prevent recurring "status" invocations for a particular
resource, just override the "status" action at depth=20, which uses
interval 1 or 10 minutes (fs and {cluster,net}fs, respectively) by
default, with a custom action at that depth and interval set to 0:

  # ccs --addaction <RESOURCE> status depth=20 interval=0 --activate

Note that, IIRC, --activate is important here, as that effectively
triggers "push configuration logic to live cluster".  Upon that event,
rgmanager will trigger an internal reload and it will notices that
depth=20 action should now not recur.

This can be tested with starting cman separately, as usual, then
running "rgmanager -f &" which will log some additional messages
as rgmanager progress in its event handling.

Hence I suppose this is just a matter of documentation and the bug
should be closed as NOTABUG.

Comment 27 Jan Pokorný [poki] 2017-01-17 17:02:09 UTC

Upon further investigation, I think there's something wrong with
rgmanager as it behaves differently upon self-configuration at
the initial start and when there's a configuration reload triggered
by cluster-wide "propagate/reread configuration" signal.

Using "rgmanager -f &" rather than "service rgmanager start":

- at the beginning:
  Replacing action 'status' depth 20: interval: 60->2
- upon the reread:
  Replacing action 'status' depth 20: interval: 60->0

Sounds like a bug in rgmanager.

Comment 28 Jan Pokorný [poki] 2017-01-17 22:24:19 UTC

re [comment 27]:

> Sounds like a bug in rgmanager.

Indeed: [bug 1414139]

Comment 29 Jan Pokorný [poki] 2017-01-23 16:09:20 UTC

Supposing that the rgmanager [comment 28] unblocks a very
straightforward way to prevent it from running implicit status
operation for particular resource (+ on particular depths),
which is mentioned in [comment 25], I consider this bug
eligible for CLOSED WORKSFORME (or similar) resolution.

Comment 30 Oyvind Albrigtsen 2017-02-13 12:19:38 UTC

I'll close it when the fix is available in RHEL6.

Comment 33 Chris Williams 2017-06-20 15:10:44 UTC

Red Hat Enterprise Linux 6 transitioned to the Production 3 Phase on May 10, 2017.  During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.
 
The official life cycle policy can be reviewed here:
 
http://redhat.com/rhel/lifecycle
 
This issue does not appear to meet the inclusion criteria for the Production Phase 3 and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification.  Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:
 
https://access.redhat.com