Bug 1750596

Summary:	fence_scsi_check_hardreboot: use readonly parameter to avoid "Couldnt find device" error
Product:	Red Hat Enterprise Linux 8	Reporter:	Yuki Okada <yuokada>
Component:	fence-agents	Assignee:	Oyvind Albrigtsen <oalbrigt>
Status:	CLOSED ERRATA	QA Contact:	cluster-qe <cluster-qe>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	8.0	CC:	agk, cluster-maint, fdinitto, mjuricek, nwahl, oalbrigt
Target Milestone:	rc	Flags:	pm-rhel: mirror+
Target Release:	8.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	fence-agents-4.2.1-47.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-11-04 02:28:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yuki Okada 2019-09-10 02:17:33 UTC

Description of problem:

## Environment
  - Two node cluster
  - fence_scsi is used as fence agent
  - fence_scsi_check_hardreboot is used as watchdog script
  - Filesystem resource is used
  - No HA-LVM (Filesystem is directly created on block device)

Filesystem resource occasionally fails to start, and the following error appears in pacemaker.log.

  notice: <resource name>:<process id>:stderr [ blockdev: cannot open /dev/sdX1: No such file or directory ]
  notice: <resource name>:<process id>:stderr [ mount: <mount directory>: special device /dev/sdX1 does not exist. ]
  notice: <resource name>:<process id>:stderr [ ocf-exit-reason:Couldn't mount device [/dev/sdX1] as <mount directory> ]

Although "device /dev/sdX1 does not exist" message appears, the device does actually exist.

In fence_scsi_check_hardreboot script, sg_persist command is periodically called to a target device. When sg_persist command is run, a device file seems to be recreated because timestamps (atime, mtime, and ctime) are all updated to the same timestamp. Also, inode number of the partition is changed. Here is a test result in my environment.
 
  # stat /dev/sda
    File: /dev/sda
    Size: 0         Blocks: 0          IO Block: 4096   block special file
  Device: 6h/6d Inode: 3866872     Links: 1     Device type: 8,0
  Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
  Context: system_u:object_r:fixed_disk_device_t:s0
  Access: 2019-08-30 10:35:30.816000000 +0900
  Modify: 2019-08-30 10:35:30.806000000 +0900
  Change: 2019-08-30 10:35:30.806000000 +0900
   Birth: -
  # stat /dev/sda1
    File: /dev/sda1
    Size: 0         Blocks: 0          IO Block: 4096   block special file
  Device: 6h/6d Inode: 5805607     Links: 1     Device type: 8,1
  Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
  Context: system_u:object_r:fixed_disk_device_t:s0
  Access: 2019-08-30 10:35:30.817000000 +0900
  Modify: 2019-08-30 10:35:30.817000000 +0900
  Change: 2019-08-30 10:35:30.817000000 +0900
   Birth: -
 
  # sg_persist -n -i -k -d /dev/sda
  PR in (Read keys): command not supported
  sg_persist failed: Illegal request, Invalid opcode
  
  # stat /dev/sda
    File: /dev/sda
    Size: 0         Blocks: 0          IO Block: 4096   block special file
  Device: 6h/6d Inode: 3866872     Links: 1     Device type: 8,0
  Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
  Context: system_u:object_r:fixed_disk_device_t:s0
  Access: 2019-08-30 10:37:26.971000000 +0900
  Modify: 2019-08-30 10:37:26.962000000 +0900
  Change: 2019-08-30 10:37:26.962000000 +0900
   Birth: -
  # stat /dev/sda1
    File: /dev/sda1
    Size: 0         Blocks: 0          IO Block: 4096   block special file
  Device: 6h/6d Inode: 5806211     Links: 1     Device type: 8,1
  Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
  Context: system_u:object_r:fixed_disk_device_t:s0
  Access: 2019-08-30 10:37:26.972000000 +0900
  Modify: 2019-08-30 10:37:26.972000000 +0900
  Change: 2019-08-30 10:37:26.972000000 +0900
   Birth: -

I suspect this issue occurs because 1. mount operation and 2. recreation of device file by sg_persist command are run simultaneously.

Version-Release number of selected component (if applicable):
pacemaker-2.0.1-4.el8_0.3.x86_64
pcs-0.10.1-4.el8_0.3.x86_64
resource-agents-4.1.1-17.el8_0.3.x86_64
fence-agents-scsi-4.2.1-17.el8_0.3.noarch

How reproducible:
occasionally when starting Filesystem resource

Steps to Reproduce:
1. Create a block device and a partition
2. Create a filesystem on the partition
3. Create /mnt directory
4. Run the following two scripts simultaneously (First scritp runs mount/umount repeatedly, and the other runs sg_persist repeatedly.)

mount.sh
----------
#!/bin/sh
for i in {1..1000}; do
echo "### ${i} ###"
lsblk | grep sda

mount /dev/sda1 /mnt
if [ $? -ne 0 ]; then
    break
fi
sleep 1
lsblk | grep sda
umount /dev/sda1
sleep 1

done;
----------

sg_persist.sh
----------
#!/bin/sh
for i in {1..10000}; do
echo "### ${i} ###"
sg_persist -n -i -k -d /dev/sda
done;
----------

mount.sh script sometimes fails with the following error, which is the same error as the original issue.

  # ./mount.sh
  ### 1 ###
  sda               8:0    0    1G  0 disk
  └─sda1            8:1    0 1023M  0 part
  sda               8:0    0    1G  0 disk
  └─sda1            8:1    0 1023M  0 part /mnt
  [...]
  ### 5 ###
  sda               8:0    0    1G  0 disk
  └─sda1            8:1    0 1023M  0 part
  mount: /mnt: special device /dev/sda1 does not exist.

Actual results:
Filesystem resource occasionally fails to start

Expected results:
Filesystem resource always starts successfully

Additional info:
- HA-LVM would be generally recommended, but I understand it is not mandatory.
- If the behavior of sg_persist is expected, there might be a room for improvement in cluster side (Filesystem resource agent or fence_scsi_check_hardreboot script).

Comment 1 Oyvind Albrigtsen 2019-10-17 07:40:36 UTC

Sounds like you need to set retry to 1 or higher in the config file.

# fence_scsi -o metadata
...
When used as a watchdog device you can define e.g. retry=1, retry-sleep=2 and verbose=yes parameters in /etc/sysconfig/stonith if you have issues with it failing.</longdesc>

Comment 2 Yuki Okada 2019-10-24 02:04:36 UTC

From my understanding, retry parameter in fence_scsi is effective when sg_persist command in a watchdog script fails. However, in our case, sg_persist is run successfully while filesystem resource fails to start. It might be good if similar retry mechanism is also implemented to filesystem resource agent.

Comment 3 Reid Wahl 2019-10-24 02:14:32 UTC

It also took me a while to understand what was going on, but Yuki demonstrated that the partition device file (e.g., /dev/sda1) is getting re-created every time sg_persist -inkd is run against the disk device file (e.g., `sg_persist -inkd /dev/sda`). This was quite surprising to me.

The watchdog script for fence_scsi runs this sg_persist command frequently. If the partition device file (/dev/sda1) is missing while the Filesystem resource tries to start, the start operation fails because we can't find the device file.


I'm assuming the reason we've never seen this before is that we always tell customers to use HA-LVM logical volumes as the underlying devices for Filesystem resources. This issue arose when a customer used /dev/sda1 as the underlying device for a Filesystem resource, IIRC. This is technically supported AFAIK, even if it's a bad idea, so the start operation shouldn't fail due to the /dev/sda1 file being missing.

Comment 14 Oyvind Albrigtsen 2020-05-14 14:05:43 UTC

https://github.com/ClusterLabs/fence-agents/pull/334

Comment 28 errata-xmlrpc 2020-11-04 02:28:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (fence-agents bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4622