Bug 815827

Summary: Connection to root iSCSI disk is disrupted during boot
Product: [Fedora] Fedora Reporter: Tim Flink <tflink>
Component: iscsi-initiator-utilsAssignee: Mike Christie <mchristi>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: agrover, collura, hdegoede, jpopelka, mchristi, robatino, vgoyal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-07 18:14:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 752650    
Attachments:
Description Flags
boot log from updated system with iscsi target as root partition none

Description Tim Flink 2012-04-24 15:39:12 UTC
Created attachment 579895 [details]
boot log from updated system with iscsi target as root partition

I have an F17 install with the following disk setup:
 /dev/sda (local sata disk)
  sda1 - 1MB BIOS boot
  sda2 - 500MB /boot
  sda3 - 4096MB swap

 /dev/sdb (iSCSI target)
  sda1 - 20GB /

This started as a DVD install of F17 beta but I used anaconda rescue mode to update the system as of the filing of this bug.

When I boot the system, everything starts off fine until the iscsi disk is suddenly lost. At this point, the boot process pauses before a never-ending stream of IO errors starts showing up on the console.

I have attached the boot log from that system with rd.debug and systemd.log_level=debug

Comment 1 Tim Flink 2012-04-24 15:40:30 UTC
Proposing as a blocker for F17 final due to violation of the following F17 final release criterion [1]:

The installer must be able to complete an installation using any network-attached storage devices (e.g. iSCSI, FCoE, Fibre Channel)

[1] http://fedoraproject.org/wiki/Fedora_17_Final_Release_Criteria

Comment 2 Mike Christie 2012-04-24 19:50:05 UTC
Is the same issue we are seeing in this bz:
https://bugzilla.redhat.com/show_bug.cgi?id=815355

Comment 3 Mike Christie 2012-04-24 19:56:40 UTC
Tim,

When you boot and you see those error messages, can you tell if the network is up on the initiator/host box? From some other box, can you ping it?

The iscsi initaitor can deal with disruptions. The default settings have the iscsi layer wait for 120 secs before we fail IO. This is what this msg:

[  176.864039]  session1: session recovery timed out after 120 secs

indicates.

From the logs, it looks like we detected the connection issue:
[   56.336068]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294713618, last ping 4294718620, now 4294723632
We then tried to reconnect for 120 secs, but could not. We then failed IO.


I will try to replicate here and debug some more.

Comment 4 Tim Flink 2012-04-24 20:23:27 UTC
(In reply to comment #3)
> When you boot and you see those error messages, can you tell if the network is
> up on the initiator/host box? From some other box, can you ping it?

I tried pinging the box repeatedly while it was booting. It started responding to the pings when em1 came up with dracut but the pings started failing shortly before this error:

> [   56.336068]  connection1:0: ping timeout of 5 secs expired, recv timeout 5,
> last rx 4294713618, last ping 4294718620, now 4294723632

Comment 5 Tim Flink 2012-04-26 19:19:11 UTC
I'm +1 blocker on this due to the F17 final criterion listed in c#1

Comment 6 Tim Flink 2012-05-01 21:42:08 UTC
Accepted as a Fedora 17 final blocker as it violates the following final release criterion [1]:

The installer must be able to complete an installation using any network-attached storage devices (e.g. iSCSI, FCoE, Fibre Channel)

[1] https://fedoraproject.org/wiki/Fedora_17_Final_Release_Criteria

Comment 7 Jiri Popelka 2012-05-04 08:22:59 UTC
Yes, this seems to be the same problem as in bug #815355.
dhcp-4.2.4-0.4.rc1.fc17 (heading to stable atm) should fix it.

Comment 8 Tim Flink 2012-05-04 16:32:59 UTC
I did another install using TC2 + updates-testing (which pulls in dhcp-4.2.4-0.4.rc1.fc17) and I am no longer seeing the same issue.

Comment 9 Tim Flink 2012-05-04 16:41:14 UTC
(In reply to comment #8)
> I did another install using TC2 + updates-testing (which pulls in
> dhcp-4.2.4-0.4.rc1.fc17) and I am no longer seeing the same issue.

I suppose that I could have been a little more specific - the system successfully boots post-install and this looks to be fixed.

Will test again after the dhcp build has been pushed to stable but moving to ON_QA for now.

Comment 10 Tim Flink 2012-05-07 18:14:47 UTC
Verified fix with Fedora 17 Final TC3 - closing bug

Comment 11 Vivek Goyal 2012-05-08 17:51:03 UTC
*** Bug 818656 has been marked as a duplicate of this bug. ***