Bug 1317578

Summary: Oracle12c resource status check fails if username is longer than 8 characters in pacemaker cluster
Product: Red Hat Enterprise Linux 7 Reporter: Josef Zimek <pzimek>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.1CC: agk, cluster-maint, fdinitto, mnovacek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: resource-agents-3.9.5-69.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1337671 (view as bug list) Environment:
Last Closed: 2016-11-04 00:02:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1337671    

Description Josef Zimek 2016-03-14 15:25:01 UTC
Description of problem:

Oracle resource agent for Oracle12c in RHEL 7 cluster contain status check which performs grep of "ps" output for Oracle user based on environment variable $ORACLE_OWNER. However if the $ORACLE_OWNER value contains more than 8 characters the `ps` cuts the username in output so the grep won't succeed causing status check to fail.


From sources of resource-agents (RHEL 7.1), ClusterLabs-resource-agents-5434e96/heartbeat/oralsnr:


269 show_procs() {
270     ps -e -o pid,user,args |
271         grep '[t]nslsnr' | grep -w "$listener" | grep -w "$ORACLE_OWNER"


EXAMPLE:
========
If $ORACLE_OWNER is "oracle123" it will be displayed in `ps -e -o pid,user,args` output as "oracle1+" and the check (above line #271) will fail causing the status check to fail.
========


"ps" offers -U <username> parameter to list only processes related to <username> so we could use it instead of grepping the user itself. The `ps -U $ORACLE_OWNER` should work also if the username is longer than 8 characters


Version:
RHEL 7.1, resource-agents-3.9.5

How reproducible:
Always

Comment 4 Oyvind Albrigtsen 2016-03-16 12:02:13 UTC
Tested and working as expected:
https://github.com/ClusterLabs/resource-agents/pull/781

Comment 5 Mike McCune 2016-03-28 23:14:23 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 7 michal novacek 2016-08-19 10:28:12 UTC
I have verified the with resource-agents resource-agents-3.9.5-79.el7.x86_64
oralsnr agent will handle correctly oracle listener process running under user
having name longer than eight characters.


Common setup:
=============

-- create new user with name longer than eight characters

  * configure oracle to run in the cluster (2), (3)
  * disable oracle listener that you verified will run in the cluster (1)
  * check that there are no processes owned by user oracle
  * create new user with name longer than eight characters having same groups
      as user oracle
  * exchange its uid with oracle user

    [root@kiff-03 ~]# useradd oracle1234 -G  oinstall, dba, asmdba, asmoper, oper, backupdba, dgdba, kmdba
    [root@kiff-03 ~]# id oracle1234
    uid=54323(oracle1234) gid=54330(oracle1234) groups=54330(oracle1234),54321(oinstall),54322(dba),54323(asmdba),54324(asmoper),54326(oper),54327(backupdba),54328(dgdba),54329(kmdba)

    [root@kiff-03 ~]# id oracle
    uid=54321(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54323(asmdba),54324(asmoper),54326(oper),54327(backupdba),54328(dgdba),54329(kmdba)

    [root@kiff-03 ~]# usermod --uid 54321 oracle1234 --non-unique
    [root@kiff-03 ~]# usermod --uid 54323 oracle

    [root@kiff-03 ~]# id oracle
    uid=54323(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54323(asmdba),54324(asmoper),54326(oper),54327(backupdba),54328(dgdba),54329(kmdba)

    [root@kiff-03 ~]# id oracle1234
    uid=54321(oracle1234) gid=54330(oracle1234) groups=54330(oracle1234),54321(oinstall),54322(dba),54323(asmdba),54324(asmoper),54326(oper),54327(backupdba),54328(dgdba),54329(kmdba)


patched version: resource-agents-3.9.5-79.el7.x86_64
====================================================

oracle listener starts, monitor action succeeds

[root@kiff-03 ~]# pcs resource enable oralsnr
[root@kiff-03 ~]# pcs resource | grep oracle
     oracle     (ocf::heartbeat:oracle):        Started kiff-03.cluster-qe.lab.eng.brq.redhat.com

[root@kiff-03 ~]# pcs resource debug-monitor oralsnr
Operation monitor for oralsnr (ocf:heartbeat:oralsnr) returned 0

[root@kiff-03 ~]# ps axfu | grep LISTENER
root     14735  0.0  0.0 112648   968 pts/0    S+   12:25   0:00  |       \_ grep --color=auto LISTENER
oracle1+ 17837  0.1  0.0 171964 12992 ?        Ssl  12:21   0:00 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit

BEFORE the patch: resource-agents-3.9.5-52.el7.x86_64
=====================================================

[root@kiff-03 ~]# pcs resource enable oralsnr
[root@kiff-03 ~]# pcs resource | grep oracle
     oracle     (ocf::heartbeat:oracle):        (FAILED) Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
[root@kiff-03 ~]# pcs resource debug-monitor oralsnr
Error performing operation: Argument list too long
Operation monitor for oralsnr (ocf:heartbeat:oralsnr) returned 7
[root@kiff-03 ~]# ps axfu | grep LISTENER
root      5354  0.0  0.0 112648   968 pts/0    S+   12:24   0:00  |       \_ grep --color=auto LISTENER
oracle1+ 17837  0.1  0.0 171964 12992 ?        Ssl  12:21   0:00 /u01/app/oracle/product/12.1.0/dbhome_1/bin/tnslsnr LISTENER -inherit








-----

(1)
[root@kiff-03 ~]# pcs status
Cluster name: STSRHTS2268
Stack: corosync
Current DC: kiff-03.cluster-qe.lab.eng.brq.redhat.com (version 1.1.15-10.el7-e174ec8) - partition with quorum
Last updated: Fri Aug 19 11:58:08 2016          Last change: Fri Aug 19 11:54:14 2016 by root via crm_resource on kiff-03.cluster-qe.lab.eng.brq.redhat.com

2 nodes and 7 resources configured: 3 resources DISABLED and 0 BLOCKED from being started due to failures

Online: [ kiff-01.cluster-qe.lab.eng.brq.redhat.com kiff-03.cluster-qe.lab.eng.brq.redhat.com ]

Full list of resources:

 fence-kiff-01  (stonith:fence_ipmilan):        Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
 fence-kiff-03  (stonith:fence_ipmilan):        Started kiff-01.cluster-qe.lab.eng.brq.redhat.com
 Resource Group: ora-group
     vip        (ocf::heartbeat:IPaddr2):       Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
     halvm      (ocf::heartbeat:LVM):   Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
     fs (ocf::heartbeat:Filesystem):    Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
     oracle     (ocf::heartbeat:oracle):        Started kiff-03.cluster-qe.lab.eng.brq.redhat.com
 oralsnr        (ocf::heartbeat:oralsnr):       Stopped (disabled)

Failed Actions:
* oralsnr_start_0 on kiff-01.cluster-qe.lab.eng.brq.redhat.com 'not installed' (5): call=90, status=complete, exitreason='none',
    last-rc-change='Fri Aug 19 11:38:11 2016', queued=0ms, exec=49ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

(2)
$cat /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.                                                        
SID_LIST_LISTENER =                                                                               
 (SID_LIST=                                                                                                
  (SID_DESC =                                                                                              
   (GLOBAL_DBNAME = oradb)                                                                                              
   (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)                                                            
   (SID_NAME = oradb)                                                                                                   
  )                                                                                                                   
 )                                                                                                                          
                                                                                                                            
SID_LIST_LISTENER_CDB2 =                                                                                                        
 (SID_LIST=                                                                                                                     
  (SID_DESC =                                                                                                                   
   (GLOBAL_DBNAME = cdb2)                                                                                                            
   (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)                                                                           
   (SID_NAME = cdb2)
  )
 )

LISTENER =
 (DESCRIPTION_LIST =
  (DESCRIPTION =
   (ADDRESS = (PROTOCOL = TCP)(HOST = 10.34.71.249)(PORT = 1521))
  )
  (DESCRIPTION =
   (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
  )
 )

LISTENER_CDB2 =
 (DESCRIPTION_LIST =
  (DESCRIPTION =
   (ADDRESS = (PROTOCOL = TCP)(HOST = 10.34.71.249)(PORT = 1522))
  )
  (DESCRIPTION =
   (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1522))
  )
 )

(3)
[root@kiff-03 ~]# pcs resource --full
 Group: ora-group
  Resource: vip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=10.34.71.249 cidr_netmask=23
   Operations: start interval=0s timeout=20s (vip-start-interval-0s)
               stop interval=0s timeout=20s (vip-stop-interval-0s)
               monitor interval=30s (vip-monitor-interval-30s)
  Resource: halvm (class=ocf provider=heartbeat type=LVM)
   Attributes: exclusive=true partial_activation=false volgrpname=shared
   Operations: start interval=0s timeout=30 (halvm-start-interval-0s)
               stop interval=0s timeout=30 (halvm-stop-interval-0s)
               monitor interval=10 timeout=30 (halvm-monitor-interval-10)
  Resource: fs (class=ocf provider=heartbeat type=Filesystem)
   Attributes: device=/dev/shared/shared0 directory=/u01 fstype=ext4 options=
   Operations: start interval=0s timeout=60 (fs-start-interval-0s)
               stop interval=0s timeout=60 (fs-stop-interval-0s)
               monitor interval=30s (fs-monitor-interval-30s)
  Resource: oracle (class=ocf provider=heartbeat type=oracle)
   Attributes: sid=oradb
   Operations: start interval=0s timeout=120 (oracle-start-interval-0s)
               stop interval=0s timeout=120 (oracle-stop-interval-0s)
               monitor interval=30s (oracle-monitor-interval-30s)
 Resource: oralsnr (class=ocf provider=heartbeat type=oralsnr)
  Attributes: listener=LISTENER sid=oradb
  Meta Attrs: target-role=Stopped 
  Operations: start interval=0s timeout=120 (oralsnr-start-interval-0s)
              stop interval=0s timeout=120 (oralsnr-stop-interval-0s)
              monitor interval=10 timeout=30 (oralsnr-monitor-interval-10)

Comment 9 errata-xmlrpc 2016-11-04 00:02:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2174.html