Bug 1572774

Summary: ironic-rootwrap hangs on fuser to iscsi device during imaging
Product: Red Hat OpenStack Reporter: John Fulton <johfulto>
Component: openstack-ironicAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: mlammon
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: bfournie, mburns, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-01 22:11:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sosreport from after vmcrash and during redeploy to reproduce none

Description John Fulton 2018-04-27 19:46:51 UTC
During TripleO Deployment on three Dell m630s and three r730XDs [1] the nodes PXE boot and enter Ironic ProvisioningState=deploying [2] and Nova Status=BUILD [3]. They remain hung in these states even after Heat times out and fails [4].

It seems that $subject because 'ps axu | grep fuser' [5] outputs the following for each iscsi device:

root     19221  0.0  0.0 216452  4036 ?        S    Apr26   0:00 sudo ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.31:3260-iscsi-iqn.2008-10.org.openstack:c57c8ede-78d8-473d-8b7e-ebf9ab5b4d12-lun-1

root     19222  0.0  0.1 244120 21768 ?        S    Apr26   0:00 /usr/bin/python2 /usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.31:3260-iscsi-iqn.2008-10.org.openstack:c57c8ede-78d8-473d-8b7e-ebf9ab5b4d12-lun-1

root     19235  0.0  0.0 107964   852 ?        S    Apr26   0:00 /sbin/fuser /dev/disk/by-path/ip-192.168.1.31:3260-iscsi-iqn.2008-10.org.openstack:c57c8ede-78d8-473d-8b7e-ebf9ab5b4d12-lun-1

Commands like 'fuser /dev/vda', which normally work for this machine, hang at this point. 'lsblk' shows [6] what should be the root disk of each node for it to receive it's disk image via dd over iscsi but the hangs fails the deployment. 



Footnotes

[1] Exact same hardware, switches and network configuration as used in https://access.redhat.com/documentation/en-us/reference_architectures/2017/html/hyper-converged_red_hat_openstack_platform_10_and_red_hat_ceph_storage_2/environment-details#overcloud_compute_ceph_osd

[2]
(undercloud) [stack@hci-director ~]$ openstack baremetal node list
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name        | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
| c57c8ede-78d8-473d-8b7e-ebf9ab5b4d12 | m630_slot13 | dc6a6e7a-532c-46df-94b9-af95d2d9040b | power on    | deploying          | False       |
| 393ae197-602c-491e-a52c-75495fb482e8 | m630_slot14 | 01c95295-33d9-4bc7-8e5a-639d4ef00945 | power on    | deploying          | False       |
| b698cb71-284d-4233-8d85-3a50e9b89fc0 | m630_slot15 | 55aef33c-c782-4981-8ee6-dcfe7b6f51f0 | power on    | deploying          | False       |
| e333de6a-e4f8-4e4d-9ed3-093f57162d9f | r730xd_u29  | 376229b9-0643-4478-9566-1cf37677b3c8 | power on    | deploying          | False       |
| d9a712ef-2b3d-4163-a85d-f6edab404af5 | r730xd_u31  | 9dd79403-d524-428e-a2ae-66d8c793c431 | power on    | deploying          | False       |
| 5739f488-767d-488b-8b65-d40861d57f01 | r730xd_u33  | d7f667ab-5c1f-4c5b-b2e8-1196aa0a8fec | power on    | deploying          | False       |
+--------------------------------------+-------------+--------------------------------------+-------------+--------------------+-------------+
(undercloud) [stack@hci-director ~]$ 

[3]
(undercloud) [stack@hci-director ~]$ openstack server list
+--------------------------------------+-------------------------+--------+-----------------------+----------------+-----------+
| ID                                   | Name                    | Status | Networks              | Image          | Flavor    |
+--------------------------------------+-------------------------+--------+-----------------------+----------------+-----------+
| 01c95295-33d9-4bc7-8e5a-639d4ef00945 | overcloud-controller-0  | BUILD  | ctlplane=192.168.1.28 | overcloud-full | baremetal |
| 376229b9-0643-4478-9566-1cf37677b3c8 | overcloud-compute-0     | BUILD  | ctlplane=192.168.1.25 | overcloud-full | baremetal |
| 55aef33c-c782-4981-8ee6-dcfe7b6f51f0 | overcloud-controller-1  | BUILD  | ctlplane=192.168.1.27 | overcloud-full | baremetal |
| 9dd79403-d524-428e-a2ae-66d8c793c431 | overcloud-cephstorage-0 | BUILD  | ctlplane=192.168.1.23 | overcloud-full | baremetal |
| d7f667ab-5c1f-4c5b-b2e8-1196aa0a8fec | overcloud-cephstorage-1 | BUILD  | ctlplane=192.168.1.24 | overcloud-full | baremetal |
| dc6a6e7a-532c-46df-94b9-af95d2d9040b | overcloud-controller-2  | BUILD  | ctlplane=192.168.1.31 | overcloud-full | baremetal |
+--------------------------------------+-------------------------+--------+-----------------------+----------------+-----------+
(undercloud) [stack@hci-director ~]$ 

[4] 
  status: CREATE_FAILED
  status_reason: |
    resources[0]: Stack CREATE cancelled
overcloud.CephStorage.1:
  resource_type: OS::TripleO::CephStorage
  physical_resource_id: 89a450d6-aa60-4820-b71d-65b6fb67fc92
  status: CREATE_FAILED
  status_reason: |
    resources[1]: Stack CREATE cancelled
Heat Stack create failed.
Heat Stack create failed.

real    245m25.105s
user    0m29.112s
sys     0m1.112s
[stack@hci-director ~]$ 


[5]
[root@hci-director ~]# ps axu | grep fuser
root     18351  1.0  0.0 107964   824 ?        D    15:21   0:00 fuser /dev/vda
root     18418  0.0  0.0 112708   972 pts/5    S+   15:22   0:00 grep --color=auto fuser
root     19221  0.0  0.0 216452  4036 ?        S    Apr26   0:00 sudo ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.31:3260-iscsi-iqn.2008-10.org.openstack:c57c8ede-78d8-473d-8b7e-ebf9ab5b4d12-lun-1
root     19222  0.0  0.1 244120 21768 ?        S    Apr26   0:00 /usr/bin/python2 /usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.31:3260-iscsi-iqn.2008-10.org.openstack:c57c8ede-78d8-473d-8b7e-ebf9ab5b4d12-lun-1
root     19235  0.0  0.0 107964   852 ?        S    Apr26   0:00 /sbin/fuser /dev/disk/by-path/ip-192.168.1.31:3260-iscsi-iqn.2008-10.org.openstack:c57c8ede-78d8-473d-8b7e-ebf9ab5b4d12-lun-1
root     19264  0.0  0.0 216452  4036 ?        S    Apr26   0:00 sudo ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.27:3260-iscsi-iqn.2008-10.org.openstack:b698cb71-284d-4233-8d85-3a50e9b89fc0-lun-1
root     19265  0.0  0.1 244120 21760 ?        S    Apr26   0:00 /usr/bin/python2 /usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.27:3260-iscsi-iqn.2008-10.org.openstack:b698cb71-284d-4233-8d85-3a50e9b89fc0-lun-1
root     19269  0.0  0.0 107964   848 ?        S    Apr26   0:00 /sbin/fuser /dev/disk/by-path/ip-192.168.1.27:3260-iscsi-iqn.2008-10.org.openstack:b698cb71-284d-4233-8d85-3a50e9b89fc0-lun-1
root     19403  0.0  0.0 216452  4040 ?        S    Apr26   0:00 sudo ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.24:3260-iscsi-iqn.2008-10.org.openstack:5739f488-767d-488b-8b65-d40861d57f01-lun-1
root     19404  0.0  0.1 244120 21760 ?        S    Apr26   0:00 /usr/bin/python2 /usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.24:3260-iscsi-iqn.2008-10.org.openstack:5739f488-767d-488b-8b65-d40861d57f01-lun-1
root     19408  0.0  0.0 107964   848 ?        S    Apr26   0:00 /sbin/fuser /dev/disk/by-path/ip-192.168.1.24:3260-iscsi-iqn.2008-10.org.openstack:5739f488-767d-488b-8b65-d40861d57f01-lun-1
root     19541  0.0  0.0 216452  4036 ?        S    Apr26   0:00 sudo ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.25:3260-iscsi-iqn.2008-10.org.openstack:e333de6a-e4f8-4e4d-9ed3-093f57162d9f-lun-1
root     19542  0.0  0.1 244120 21764 ?        S    Apr26   0:00 /usr/bin/python2 /usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.25:3260-iscsi-iqn.2008-10.org.openstack:e333de6a-e4f8-4e4d-9ed3-093f57162d9f-lun-1
root     19545  0.0  0.0 107964   852 ?        S    Apr26   0:00 /sbin/fuser /dev/disk/by-path/ip-192.168.1.25:3260-iscsi-iqn.2008-10.org.openstack:e333de6a-e4f8-4e4d-9ed3-093f57162d9f-lun-1
root     19548  0.0  0.0 216452  4044 ?        S    Apr26   0:00 sudo ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.23:3260-iscsi-iqn.2008-10.org.openstack:d9a712ef-2b3d-4163-a85d-f6edab404af5-lun-1
root     19549  0.0  0.1 244120 21768 ?        S    Apr26   0:00 /usr/bin/python2 /usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.23:3260-iscsi-iqn.2008-10.org.openstack:d9a712ef-2b3d-4163-a85d-f6edab404af5-lun-1
root     19553  0.0  0.0 107964   852 ?        S    Apr26   0:00 /sbin/fuser /dev/disk/by-path/ip-192.168.1.23:3260-iscsi-iqn.2008-10.org.openstack:d9a712ef-2b3d-4163-a85d-f6edab404af5-lun-1
root     19616  0.0  0.0 216452  4040 ?        S    Apr26   0:00 sudo ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.28:3260-iscsi-iqn.2008-10.org.openstack:393ae197-602c-491e-a52c-75495fb482e8-lun-1
root     19618  0.0  0.1 244120 21764 ?        S    Apr26   0:00 /usr/bin/python2 /usr/bin/ironic-rootwrap /etc/ironic/rootwrap.conf fuser /dev/disk/by-path/ip-192.168.1.28:3260-iscsi-iqn.2008-10.org.openstack:393ae197-602c-491e-a52c-75495fb482e8-lun-1
root     19621  0.0  0.0 107964   852 ?        S    Apr26   0:00 /sbin/fuser /dev/disk/by-path/ip-192.168.1.28:3260-iscsi-iqn.2008-10.org.openstack:393ae197-602c-491e-a52c-75495fb482e8-lun-1
[root@hci-director ~]#
[root@hci-director ~]# strace -p 19621
strace: Process 19621 attached
^C^C

[6] 
(undercloud) [stack@hci-director ~]$ sudo lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 558.4G  0 disk 
sdb      8:16   0 558.4G  0 disk 
sdc      8:32   0 278.9G  0 disk 
sdd      8:48   0 278.9G  0 disk 
sde      8:64   0 278.9G  0 disk 
sdf      8:80   0 558.4G  0 disk 
vda    253:0    0    40G  0 disk 
└─vda1 253:1    0    40G  0 part /
(undercloud) [stack@hci-director ~]$

Comment 2 John Fulton 2018-04-27 20:31:03 UTC
The following from: 

 journalctl -k | curl -F 'f:1=<-' ix.io    ======>   http://ix.io/18Tj

seems to show it waiting on stat:

Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: scsi 7:0:0:1: alua: Attached
Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: Attached scsi generic sg5 type 0
Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] 1170997248 512-byte logical blocks: (599 GB/558 GiB)
Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Write Protect is off
Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Mode Sense: 43 00 00 08
Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Attached SCSI disk
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: INFO: task fuser:18351 blocked for more than 120 seconds.
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: fuser           D ffff880428b3a600     0 18351  18331 0x00000084
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  ffff8801a97e7a30 0000000000000082 ffff880193e70fb0 ffff8801a97e7fd8
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  ffff8801a97e7fd8 ffff8801a97e7fd8 ffff880193e70fb0 ffff880429384018
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  ffff880429384020 7fffffffffffffff ffff880193e70fb0 ffff880428b3a600
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: Call Trace:
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8168b579>] schedule+0x29/0x70
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff810b13c6>] ? finish_wait+0x56/0x70
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81689722>] ? mutex_lock+0x12/0x2f
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81289762>] ? autofs4_wait+0x3f2/0x900
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8128a8db>] autofs4_expire_wait+0x6b/0x110
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff812b180d>] ? selinux_inode_setsecurity+0x6d/0x140
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81287992>] do_expire_wait+0x172/0x190
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81287b8f>] autofs4_d_manage+0x6f/0x170
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81208a15>] follow_managed+0xb5/0x300
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120937b>] lookup_fast+0x19b/0x2e0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120bc65>] path_lookupat+0x165/0x7a0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff811ddf75>] ? kmem_cache_alloc+0x35/0x1e0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120ebbf>] ? getname_flags+0x4f/0x1a0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120c2cb>] filename_lookup+0x2b/0xc0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120fce7>] user_path_at_empty+0x67/0xc0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81114512>] ? from_kgid_munged+0x12/0x20
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff812036cf>] ? cp_new_stat+0x14f/0x180
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120fd51>] user_path_at+0x11/0x20
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff812031c3>] vfs_fstatat+0x63/0xc0
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120372e>] SYSC_newstat+0x2e/0x60
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8111ede6>] ? __audit_syscall_exit+0x1e6/0x280
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81203a0e>] SyS_newstat+0xe/0x10
Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: INFO: task fuser:18351 blocked for more than 120 seconds.
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: fuser           D ffff880428b3a600     0 18351  18331 0x00000084
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  ffff8801a97e7a30 0000000000000082 ffff880193e70fb0 ffff8801a97e7fd8
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  ffff8801a97e7fd8 ffff8801a97e7fd8 ffff880193e70fb0 ffff880429384018
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  ffff880429384020 7fffffffffffffff ffff880193e70fb0 ffff880428b3a600
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: Call Trace:
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8168b579>] schedule+0x29/0x70
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff810b13c6>] ? finish_wait+0x56/0x70
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81689722>] ? mutex_lock+0x12/0x2f
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81289762>] ? autofs4_wait+0x3f2/0x900
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8128a8db>] autofs4_expire_wait+0x6b/0x110
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff812b180d>] ? selinux_inode_setsecurity+0x6d/0x140
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81287992>] do_expire_wait+0x172/0x190
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81287b8f>] autofs4_d_manage+0x6f/0x170
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81208a15>] follow_managed+0xb5/0x300
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120937b>] lookup_fast+0x19b/0x2e0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120bc65>] path_lookupat+0x165/0x7a0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff811ddf75>] ? kmem_cache_alloc+0x35/0x1e0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120ebbf>] ? getname_flags+0x4f/0x1a0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120c2cb>] filename_lookup+0x2b/0xc0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120fce7>] user_path_at_empty+0x67/0xc0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81114512>] ? from_kgid_munged+0x12/0x20
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff812036cf>] ? cp_new_stat+0x14f/0x180
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120fd51>] user_path_at+0x11/0x20
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff812031c3>] vfs_fstatat+0x63/0xc0
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8120372e>] SYSC_newstat+0x2e/0x60
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff8111ede6>] ? __audit_syscall_exit+0x1e6/0x280
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff81203a0e>] SyS_newstat+0xe/0x10
Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel:  [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
Apr 27 15:28:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: INFO: task fuser:18351 blocked for more than 120 seconds.

Comment 5 John Fulton 2018-04-27 21:51:17 UTC
Created attachment 1427861 [details]
sosreport from after vmcrash and during redeploy to reproduce

Comment 7 John Fulton 2018-05-01 22:11:42 UTC
My undercloud was not fully running RHEL75.

When I hit this issue I had a RHEL74 install running kernel-3.10.0-514.el7 which I yum upgraded to RHEL75 but I didn't reboot to the new RHEL75 kernel (kernel-3.10.0-862.el7). I had used this partial 75 system when I installed the undercloud and then deployed the overcloud to hit this bug. I had a script to deploy the undercloud VM from an image and yum upgrade it to 75, without reboot, and so I was observed this behaviour repeatedly which is why I filed this bug.

I modified my procedure to reboot after the RHEL75 upgrade before the undercloud install and then I was able to deploy without hitting the described bug.