Bug 1572774
Summary: | ironic-rootwrap hangs on fuser to iscsi device during imaging | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | John Fulton <johfulto> | ||||
Component: | openstack-ironic | Assignee: | RHOS Maint <rhos-maint> | ||||
Status: | CLOSED NOTABUG | QA Contact: | mlammon | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 13.0 (Queens) | CC: | bfournie, mburns, srevivo | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-05-01 22:11:42 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
John Fulton
2018-04-27 19:46:51 UTC
The following from: journalctl -k | curl -F 'f:1=<-' ix.io ======> http://ix.io/18Tj seems to show it waiting on stat: Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: scsi 7:0:0:1: alua: Attached Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: Attached scsi generic sg5 type 0 Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] 1170997248 512-byte logical blocks: (599 GB/558 GiB) Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Write Protect is off Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Mode Sense: 43 00 00 08 Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Apr 26 18:47:12 hci-director.cloud.lab.eng.bos.redhat.com kernel: sd 7:0:0:1: [sdf] Attached SCSI disk Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: INFO: task fuser:18351 blocked for more than 120 seconds. Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: fuser D ffff880428b3a600 0 18351 18331 0x00000084 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: ffff8801a97e7a30 0000000000000082 ffff880193e70fb0 ffff8801a97e7fd8 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: ffff8801a97e7fd8 ffff8801a97e7fd8 ffff880193e70fb0 ffff880429384018 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: ffff880429384020 7fffffffffffffff ffff880193e70fb0 ffff880428b3a600 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: Call Trace: Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8168b579>] schedule+0x29/0x70 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff810b13c6>] ? finish_wait+0x56/0x70 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81689722>] ? mutex_lock+0x12/0x2f Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81289762>] ? autofs4_wait+0x3f2/0x900 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8168b956>] wait_for_completion+0x116/0x170 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8128a8db>] autofs4_expire_wait+0x6b/0x110 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff812b180d>] ? selinux_inode_setsecurity+0x6d/0x140 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81287992>] do_expire_wait+0x172/0x190 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81287b8f>] autofs4_d_manage+0x6f/0x170 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81208a15>] follow_managed+0xb5/0x300 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120937b>] lookup_fast+0x19b/0x2e0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120bc65>] path_lookupat+0x165/0x7a0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff811ddf75>] ? kmem_cache_alloc+0x35/0x1e0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120ebbf>] ? getname_flags+0x4f/0x1a0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120c2cb>] filename_lookup+0x2b/0xc0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120fce7>] user_path_at_empty+0x67/0xc0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81114512>] ? from_kgid_munged+0x12/0x20 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff812036cf>] ? cp_new_stat+0x14f/0x180 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120fd51>] user_path_at+0x11/0x20 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff812031c3>] vfs_fstatat+0x63/0xc0 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120372e>] SYSC_newstat+0x2e/0x60 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8111ede6>] ? __audit_syscall_exit+0x1e6/0x280 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81203a0e>] SyS_newstat+0xe/0x10 Apr 27 15:24:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: INFO: task fuser:18351 blocked for more than 120 seconds. Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: fuser D ffff880428b3a600 0 18351 18331 0x00000084 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: ffff8801a97e7a30 0000000000000082 ffff880193e70fb0 ffff8801a97e7fd8 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: ffff8801a97e7fd8 ffff8801a97e7fd8 ffff880193e70fb0 ffff880429384018 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: ffff880429384020 7fffffffffffffff ffff880193e70fb0 ffff880428b3a600 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: Call Trace: Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8168b579>] schedule+0x29/0x70 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff810b13c6>] ? finish_wait+0x56/0x70 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81689722>] ? mutex_lock+0x12/0x2f Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81289762>] ? autofs4_wait+0x3f2/0x900 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8168b956>] wait_for_completion+0x116/0x170 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8128a8db>] autofs4_expire_wait+0x6b/0x110 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff812b180d>] ? selinux_inode_setsecurity+0x6d/0x140 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81287992>] do_expire_wait+0x172/0x190 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81287b8f>] autofs4_d_manage+0x6f/0x170 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81208a15>] follow_managed+0xb5/0x300 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120937b>] lookup_fast+0x19b/0x2e0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120bc65>] path_lookupat+0x165/0x7a0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff811ddf75>] ? kmem_cache_alloc+0x35/0x1e0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120ebbf>] ? getname_flags+0x4f/0x1a0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120c2cb>] filename_lookup+0x2b/0xc0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120fce7>] user_path_at_empty+0x67/0xc0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81114512>] ? from_kgid_munged+0x12/0x20 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff812036cf>] ? cp_new_stat+0x14f/0x180 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120fd51>] user_path_at+0x11/0x20 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff812031c3>] vfs_fstatat+0x63/0xc0 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8120372e>] SYSC_newstat+0x2e/0x60 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff8111ede6>] ? __audit_syscall_exit+0x1e6/0x280 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff81203a0e>] SyS_newstat+0xe/0x10 Apr 27 15:26:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b Apr 27 15:28:13 hci-director.cloud.lab.eng.bos.redhat.com kernel: INFO: task fuser:18351 blocked for more than 120 seconds. Created attachment 1427861 [details]
sosreport from after vmcrash and during redeploy to reproduce
My undercloud was not fully running RHEL75. When I hit this issue I had a RHEL74 install running kernel-3.10.0-514.el7 which I yum upgraded to RHEL75 but I didn't reboot to the new RHEL75 kernel (kernel-3.10.0-862.el7). I had used this partial 75 system when I installed the undercloud and then deployed the overcloud to hit this bug. I had a script to deploy the undercloud VM from an image and yum upgrade it to 75, without reboot, and so I was observed this behaviour repeatedly which is why I filed this bug. I modified my procedure to reboot after the RHEL75 upgrade before the undercloud install and then I was able to deploy without hitting the described bug. |