Bug 1477455

Summary: app pod with block vol as pvc fails to restart as iSCSI login fails
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: krishnaram Karthick <kramdoss>
Component: gluster-blockAssignee: Prasanna Kumar Kalever <prasanna.kalever>
Status: CLOSED ERRATA QA Contact: krishnaram Karthick <kramdoss>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: agrover, akhakhar, amukherj, annair, hchiramm, jarrpa, kramdoss, madam, mchristi, mliyazud, mzywusko, pkarampu, pprakash, prasanna.kalever, rcyriac, rhs-bugs, rreddy, rtalur, sanandpa, sankarshan, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tcmu-runner-1.2.0-14.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1476285 Environment:
Last Closed: 2017-09-21 04:20:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1417151, 1476285    
Attachments:
Description Flags
iscsi messages from problematic node.
none
kernel logs; naming kernel_logs_$host_$devname none

Comment 3 Humble Chirammal 2017-08-02 08:00:16 UTC
We ( Prasanna and me) had a look at this setup and things observed. 

*) The login/discovery dont work on the targets.  To isolate the issue, we tried to login from one of the other node ( ocp master -> a fresh initiator) and discovery didnt work from it either. 

*) Looking at target side, we could see `tcmu-runner` service is hung and not responding at all.  There are errors on glfs handlers as well. 

The interesting bit is that, these targets were able to discover,login and attach to a pod and write. Atleast I dont expect this is due to a 'network'/iptables issue because the nodes are NOT rebooted and also with same iptables rules in place the discovery and login worked.  

Couple of things to note here: There was a block volume 'delete' before Karthick noticed this issue, not sure it is related. 

There are also softlockups and panic strings in these servers, yet to match the timestamp though.

~~~

[1925749.244007] INFO: task iscsi_np:82714 blocked for more than 120 seconds.
[1925749.258478] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1925749.262048] iscsi_np        D 0000000000000001     0 82714      2 0x00000084
[1925749.262055]  ffff8802acf43bc0 0000000000000046 ffff8802927ccf10 ffff8802acf43fd8
[1925749.262058]  ffff8802acf43fd8 ffff8802acf43fd8 ffff8802927ccf10 ffff8802702f31a8
[1925749.262061]  7fffffffffffffff ffff8802702f31a0 ffff8802927ccf10 0000000000000001
[1925749.262065] Call Trace:
[1925749.262077]  [<ffffffff816a94c9>] schedule+0x29/0x70
[1925749.262083]  [<ffffffff816a6fd9>] schedule_timeout+0x239/0x2c0
[1925749.262090]  [<ffffffff8109c3be>] ? __send_signal+0x18e/0x490
[1925749.262093]  [<ffffffff816a987d>] wait_for_completion+0xfd/0x140
[1925749.262098]  [<ffffffff810c4850>] ? wake_up_state+0x20/0x20
[1925749.262115]  [<ffffffffc069e0fe>] iscsit_cause_connection_reinstatement+0x9e/0x100 [iscsi_target_mod]
[1925749.262126]  [<ffffffffc06b15c1>] iscsit_stop_session+0x121/0x1e0 [iscsi_target_mod]
[1925749.262134]  [<ffffffffc06a1ef8>] iscsi_check_for_session_reinstatement+0x208/0x280 [iscsi_target_mod]
[1925749.262142]  [<ffffffffc06a4c95>] iscsi_target_check_for_existing_instances+0x35/0x40 [iscsi_target_mod]
[1925749.262150]  [<ffffffffc06a4de1>] iscsi_target_do_login+0x141/0x670 [iscsi_target_mod]
[1925749.262157]  [<ffffffffc06a603c>] iscsi_target_start_negotiation+0x1c/0xb0 [iscsi_target_mod]
[1925749.262165]  [<ffffffffc06a3c6f>] iscsi_target_login_thread+0xb0f/0x1060 [iscsi_target_mod]
[1925749.262168]  [<ffffffff816a8f00>] ? __schedule+0x310/0x8b0
[1925749.262179]  [<ffffffffc06a3160>] ? iscsi_target_login_sess_out+0x250/0x250 [iscsi_target_mod]
[1925749.262183]  [<ffffffff810b09cf>] kthread+0xcf/0xe0
[1925749.262186]  [<ffffffff810b0900>] ? insert_kthread_work+0x40/0x40
[1925749.262190]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[1925749.262193]  [<ffffffff810b0900>] ? insert_kthread_work+0x40/0x40

~~~

[91397.324059] CPU: 24 PID: 47197 Comm: iscsi_trx Tainted: G    B          ------------   3.10.0-693.el7.x86_64 #1
[91397.324059] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[91397.324061] task: ffff880a07f89fa0 ti: ffff88086d3ec000 task.ti: ffff88086d3ec000
[91397.324072] RIP: 0010:[<ffffffffc0035650>]  [<ffffffffc0035650>] target_free_sgl+0x20/0x60 [target_core_mod]
[91397.324073] RSP: 0018:ffff88086d3efda8  EFLAGS: 00010202
[91397.324074] RAX: 0000000000040109 RBX: 0000000000000000 RCX: dead000000000200
[91397.324074] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 0000000000000000
[91397.324075] RBP: ffff88086d3efdc8 R08: ffff88086d3efd58 R09: 00000001056dff1a
[91397.324076] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[91397.324076] R13: 0000000000000002 R14: 0000000000000000 R15: ffff88085d2eec00
[91397.324077] FS:  0000000000000000(0000) GS:ffff880c0dc00000(0000) knlGS:0000000000000000
[91397.324078] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[91397.324079] CR2: 0000000000000000 CR3: 0000000bc82b8000 CR4: 00000000003407e0
[91397.324127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[91397.324128] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 67
[91397.324153] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 117
[91397.324168] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[91397.324169] Stack:
[91397.324171]  ffff88085ae9eb50 0000000000000001 ffff88085ae9eb50 0000000000000001
[91397.324173]  ffff88086d3efdf0 ffffffffc0036ade ffff88085ae9e940 0000000000000001
[91397.324174]  ffff88085ae9eb50 ffff88086d3efe20 ffffffffc0702e93 ffff880a5a3fe800
[91397.324175] Call Trace:
[91397.324192]  [<ffffffffc0036ade>] transport_generic_free_cmd+0x14e/0x260 [target_core_mod]
[91397.324192] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 43
[91397.324203]  [<ffffffffc0702e93>] iscsit_free_cmd+0x73/0x140 [iscsi_target_mod]
[91397.324204] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 49
[91397.324211]  [<ffffffffc070aa93>] iscsit_close_connection+0x413/0x820 [iscsi_target_mod]
[91397.324216]  [<ffffffffc06f8083>] iscsit_take_action_for_connection_exit+0x83/0x110 [iscsi_target_mod]
[91397.324221]  [<ffffffffc070a25f>] iscsi_target_rx_thread+0x9f/0xb0 [iscsi_target_mod]
[91397.324225]  [<ffffffffc070a1c0>] ? iscsi_target_tx_thread+0x210/0x210 [iscsi_target_mod]
[91397.324230]  [<ffffffff810b098f>] kthread+0xcf/0xe0
[91397.324232]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[91397.324235]  [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[91397.324236]  [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[91397.324248] Code: c0 e8 45 51 31 c1 e9 55 ff ff ff 0f 1f 44 00 00 55 48 89 e5 41 56 49 89 fe 41 55 41 89 f5 41 54 49 89 fc 53 31 db 85 f6 7e 23 90 <49> 8b 3c 24 31 f6 83 c3 01 48 83 e7 fc e8 3e 61 15 c1 4c 89 e7
[91397.324259] RIP  [<ffffffffc0035650>] target_free_sgl+0x20/0x60 [target_core_mod]
[91397.324259]  RSP <ffff88086d3efda8>
[91397.324260] CR2: 0000000000000000

~~~~~  

Kernel:

Linux dhcp47-72.lab.eng.blr.redhat.com 3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Comment 4 krishnaram Karthick 2017-08-02 08:06:42 UTC
The steps after which the issue was hit has some missing steps. Although this has been discussed among Me, Humble and Prasanna, updating the bz for any future reference.

Steps to Reproduce:
1. Create 10 app pods with gluster-block as pvc
2. Run IOs
3. create 11th app pod with block-size exceeding the block-hosting volume (i.e., On a 500Gb block-hosting volume, tried to create a 600Gb block device), This failed as expected
4. Deleteed 11th pod and pvc
5. create 12th app pod with block-size equal to that of the block-hosting volume (i.e., On a 500Gb block-hosting volume, tried to create a 500Gb block device), This worked as expected
6. Deleted 12th app pod and pvc
3. restart app pod (I had deleted the app pod and dc had respun a new pod)
4. check if app pods are running

Comment 7 Humble Chirammal 2017-08-03 05:14:17 UTC
As an additional comment, regardless of discovery part, the tcmu-runner status, restart..etc was hung in this setup. Do we know why this service was hung ?

Comment 8 krishnaram Karthick 2017-08-04 18:37:09 UTC
This issue was hit on the test setup used for testing EFK stack too. 

one of the elasticsearch pod using gluster-block device has crashed with the same error seen in this issue.

Events:
  FirstSeen     LastSeen        Count   From                                            SubObjectPath   Type            Reason          Message
  ---------     --------        -----   ----                                            -------------   --------        ------          -------
  10m           19s             6       kubelet, dhcp46-100.lab.eng.blr.redhat.com                      Warning         FailedSync      Error syncing pod
  8m            19s             5       kubelet, dhcp46-100.lab.eng.blr.redhat.com                      Warning         FailedMount     Unable to mount volumes for pod "logging-es-data-master-p1h7hxxi-1-399sq_logging(0adba18a-77fe-11e7-9fc8-005056a51dd3)": timeout expired waiting for volumes to attach/mount for pod "logging"/"logging-es-data-master-p1h7hxxi-1-399sq". list of unattached/unmounted volumes=[elasticsearch-storage]

Only 6 gluster-block devices were created and IOs were run on them continuously. Other than that, there were no other specific failure induced.

Comment 9 krishnaram Karthick 2017-08-04 18:39:43 UTC
oc get pods
NAME                                          READY     STATUS    RESTARTS   AGE
logging-curator-1-m4r99                       1/1       Running   0          1d
logging-curator-ops-1-qfvdz                   1/1       Running   1          1d
logging-es-data-master-dj9ey1jh-1-6835j       1/1       Running   0          1d
logging-es-data-master-p1h7hxxi-1-399sq       0/1       Error     0          1d
logging-es-data-master-uaaqo2e8-1-phcg1       0/1       Error     0          1d
logging-es-ops-data-master-7u5z9m9j-1-n95j1   1/1       Running   0          1d
logging-es-ops-data-master-lt229hum-1-z0z16   0/1       Error     0          1d
logging-es-ops-data-master-vw10nnol-1-bz5rx   1/1       Running   1          1d
logging-fluentd-pg3g7                         1/1       Running   1          1d
logging-fluentd-r6w3t                         1/1       Running   0          1d
logging-fluentd-sk6hx                         1/1       Running   0          1d
logging-fluentd-tk335                         1/1       Running   1          1d
logging-kibana-3-6jft3                        2/2       Running   0          1d
logging-kibana-ops-2-07x4d                    2/2       Running   17         1d

Once this issue is hit, app pods using gluster-block devices seems to be failing one by one.

Comment 10 krishnaram Karthick 2017-08-04 18:45:59 UTC
However, discovery seems to be failing with a different error now..

[root@dhcp46-100 ~]# iscsiadm -m discovery -t st -p 10.70.46.100
iscsiadm: cannot make connection to 10.70.46.100: Connection refused
iscsiadm: cannot make connection to 10.70.46.100: Connection refused
iscsiadm: cannot make connection to 10.70.46.100: Connection refused
iscsiadm: cannot make connection to 10.70.46.100: Connection refused
iscsiadm: cannot make connection to 10.70.46.100: Connection refused
iscsiadm: cannot make connection to 10.70.46.100: Connection refused
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: Could not perform SendTargets discovery: encountered connection failure

Comment 33 krishnaram Karthick 2017-09-11 14:00:15 UTC
Me and Ramky had a look at the setup and looks like multipathing was not configured. We are at the moment not sure if that could be a reason. But Ramky will be giving a try once again after configuring multipath.

===============================================================================
Following steps needs to be performed to configure multipathing on all the initiator nodes.

# mpathconf --enable
# cat >> /etc/multipath.conf
# LIO iSCSI
devices {
        device {
                vendor "LIO-ORG"
                user_friendly_names "yes" # names like mpatha
                path_grouping_policy "failover" # one path per group
                path_selector "round-robin 0"
                failback immediate
                path_checker "tur"
                prio "const"
                no_path_retry 120
                rr_weight "uniform"
        }
}
Ctrl^C
# systemctl restart multipathd
================================================================================

These steps are

Comment 38 Humble Chirammal 2017-09-12 14:23:34 UTC
For above pod see this 

[root@dhcp43-165 ~]# cd /var/lib/origin/openshift.local.volumes/pods/428bc478-9484-11e7-86b3-5254001958c3/volumes/kubernetes.io~iscsi/pvc-413e172b-9484-11e7-86b3-5254001958c3
[root@dhcp43-165 pvc-413e172b-9484-11e7-86b3-5254001958c3]# ls
ls: cannot open directory .: Input/output error
[root@dhcp43-165 pvc-413e172b-9484-11e7-86b3-5254001958c3]#

Comment 39 Humble Chirammal 2017-09-12 14:46:47 UTC
(In reply to Humble Chirammal from comment #38)
> For above pod see this 
> 
> [root@dhcp43-165 ~]# cd
> /var/lib/origin/openshift.local.volumes/pods/428bc478-9484-11e7-86b3-
> 5254001958c3/volumes/kubernetes.io~iscsi/pvc-413e172b-9484-11e7-86b3-
> 5254001958c3
> [root@dhcp43-165 pvc-413e172b-9484-11e7-86b3-5254001958c3]# ls
> ls: cannot open directory .: Input/output error
> [root@dhcp43-165 pvc-413e172b-9484-11e7-86b3-5254001958c3]#

For this particular device, 

io error count: 0x13 and eh_timeout is 10.

Ideally this node would have configured for mpath, but at present this node is operating on single path and multipath is not configured.

Comment 40 Humble Chirammal 2017-09-12 14:55:59 UTC
Created attachment 1324933 [details]
iscsi messages from problematic node.

ISCSI messages from "dhcp43-165"

Comment 41 Humble Chirammal 2017-09-12 14:58:54 UTC
Apart from iscsi errors attached, I/O errors are reported on this device.

--snip--

Sep 12 11:28:41 dhcp43-165 kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 11:28:41 dhcp43-165 kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065162
Sep 12 11:28:41 dhcp43-165 kernel: sd 8:0:0:0: rejecting I/O to offline device
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065169
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065177
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065181
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065185
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065189
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065157
Sep 12 11:28:41 dhcp43-165 kernel: blk_update_request: I/O error, dev sdg, sector 1065157
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): metadata I/O error: block 0x1040ca ("xlog_iodone") error 5 numblks 64
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): Log I/O Error Detected.  Shutting down filesystem
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): metadata I/O error: block 0x1040d1 ("xlog_iodone") error 5 numblks 64
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): metadata I/O error: block 0x1040d9 ("xlog_iodone") error 5 numblks 64
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): metadata I/O error: block 0x1040dd ("xlog_iodone") error 5 numblks 64
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): metadata I/O error: block 0x1040e1 ("xlog_iodone") error 5 numblks 64
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): metadata I/O error: block 0x1040e5 ("xlog_iodone") error 5 numblks 64
Sep 12 11:28:41 dhcp43-165 kernel: XFS (sdg): metadata I/O error: block 0x1040c5 ("xlog_iodone") error 5 numblks 64


-snip------

So its rejecting I/O and it seems to me we are into the same issue we encountered in previous iterations.

Comment 48 Humble Chirammal 2017-09-12 17:06:35 UTC
Comment >=39 refers to "sdg" that too got same errors.

Comment 49 Humble Chirammal 2017-09-12 17:11:01 UTC
(In reply to Humble Chirammal from comment #48)
> Comment >=39 refers to "sdg" that too got same errors.

The i/o errors and issue pattern look same to me against the last iteration of this issue and the dependent (1480192) bug analysis.

Comment 50 Pranith Kumar K 2017-09-12 17:34:47 UTC
(In reply to Humble Chirammal from comment #49)
> (In reply to Humble Chirammal from comment #48)
> > Comment >=39 refers to "sdg" that too got same errors.
> 
> The i/o errors and issue pattern look same to me against the last iteration
> of this issue and the dependent (1480192) bug analysis.

As per https://bugzilla.redhat.com/show_bug.cgi?id=1480192#c19, the issue is seen without gluster in picture, where as this particular iteration I/O errors happened because of ping-timeouts in gluster rpc layer. So they are not same even though end result is I/O errors in dmesg.

Comment 52 Humble Chirammal 2017-09-12 17:50:58 UTC
(In reply to Pranith Kumar K from comment #50)
> (In reply to Humble Chirammal from comment #49)
> > (In reply to Humble Chirammal from comment #48)
> > > Comment >=39 refers to "sdg" that too got same errors.
> > 
> > The i/o errors and issue pattern look same to me against the last iteration
> > of this issue and the dependent (1480192) bug analysis.
> 
> As per https://bugzilla.redhat.com/show_bug.cgi?id=1480192#c19, the issue is
> seen without gluster in picture, 

Well, thats why the entire 'software iscsi' stack ( target and initiator) was in doubt from start and we were/are trying to isolate where the culprit hide or the triggering point.  

> where as this particular iteration I/O
> errors happened because of ping-timeouts in gluster rpc layer. So they are
> not same even though end result is I/O errors in dmesg.

My assumption is this:

The "unmount" of the share fails in initiators due to the I/O error which caused  "file exists" in next container respawn. 

If all the mounted shares/targets rejects I/O at unmount time or started to reject I/O at some point ( may be after serving i/o for some time or after a change in target side)  this need to be analysed.

Comment 53 Pranith Kumar K 2017-09-12 18:00:31 UTC
(In reply to Humble Chirammal from comment #52)
> (In reply to Pranith Kumar K from comment #50)
> > (In reply to Humble Chirammal from comment #49)
> > > (In reply to Humble Chirammal from comment #48)
> > > > Comment >=39 refers to "sdg" that too got same errors.
> > > 
> > > The i/o errors and issue pattern look same to me against the last iteration
> > > of this issue and the dependent (1480192) bug analysis.
> > 
> > As per https://bugzilla.redhat.com/show_bug.cgi?id=1480192#c19, the issue is
> > seen without gluster in picture, 
> 
> Well, thats why the entire 'software iscsi' stack ( target and initiator)
> was in doubt from start and we were/are trying to isolate where the culprit
> hide or the triggering point.

I think we are going in circles. This iteration of reproducer is not what happened when krk raised this bz. I think that part of the analysis is conclusive.

> 
> > where as this particular iteration I/O
> > errors happened because of ping-timeouts in gluster rpc layer. So they are
> > not same even though end result is I/O errors in dmesg.
> 
> My assumption is this:
> 
> The "unmount" of the share fails in initiators due to the I/O error which
> caused  "file exists" in next container respawn. 

Could you help to confirm this? That would give good data point.

> 
> If all the mounted shares/targets rejects I/O at unmount time or started to
> reject I/O at some point ( may be after serving i/o for some time or after a
> change in target side)  this need to be analysed.

Comment 54 Humble Chirammal 2017-09-12 18:02:45 UTC
[Side note] 

As mentioned in previous comment, we should have configured multipath on these systems which is missing in this setup ( atleast on server which refers latest comments about sdg and sda devices) compared to previous iterations.

Comment 60 Mike Christie 2017-09-12 21:21:03 UTC
It looks like commands do get stuck, but handler_glfs gets a reply and sends it to lio.

It looks like the problem is just that something is deleting the target from lio:

We see these errors in lio:

Sep 12 11:26:58 dhcp42-226 kernel: Unable to locate Target IQN: iqn.2016-12.org.gluster-block:df906719-ba77-45ad-814e-01a2589360ae in Storage Node
Sep 12 11:26:58 dhcp42-226 kernel: iSCSI Login negotiation failed.

There is no runner or lio code that would do this that I know of.


However, it does not make sense that we would see those errors but if you logged in from another node that the login would work unless you are creating/deleting targets dynamically.

Comment 72 Prasanna Kumar Kalever 2017-09-13 19:17:25 UTC
Created attachment 1325590 [details]
kernel logs; naming kernel_logs_$host_$devname

Comment 82 krishnaram Karthick 2017-09-18 11:30:11 UTC
Ran the same test mentioned in comment 80, no app pod crash was seen.

Moving the bug to verified.

Comment 84 errata-xmlrpc 2017-09-21 04:20:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:2773