485353 – After a install to a iscsi disk finishes and system begins to reboot we hang on in the kernel with "detected conn error" message

Bug 485353 - After a install to a iscsi disk finishes and system begins to reboot we hang on in the kernel with "detected conn error" message

Summary: After a install to a iscsi disk finishes and system begins to reboot we hang ...

Keywords:
Status:	CLOSED DUPLICATE of bug 479824
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	NetworkManager
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Dan Williams
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-02-13 00:25 UTC by Mike Christie
Modified:	2009-02-13 15:40 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-02-13 15:40:58 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Mike Christie 2009-02-13 00:25:00 UTC

Description of problem:

When installing to a iscsi disk, the install is successful and the systems begin to shutdown so it can reboot into the new install, but you will see

rebooting system
md: stopping all md devices.
sd A:B:C:D [sdX] Synchronizing SCSI cache
connection: ping timeout of 5 secs expired, last rx A, lsat ping Y, now Z
iscsi: can not broadcast skb (-3)
connection: detected conn error (1011)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Systems hangs. Last message is the conn error one.

Expected results:

The system should reboot smoothly.

Additional info:

The problem is that there are still iscsi sessions running and iscsi disks attached.

We should either have anaconda logout the iscsi sessions when it is done so the scsi layer can clean up the disks, or not stop the network so that when the kernel is stopped the scsi layer can clean up disks there, or just add code to the kernel so that we fail more gracefully by not hanging (this is probably a bad fix since we want the cache sync to be sent and completed).

This will not happen on every target/install/setup. It will only occur when the disks is using a write back cache. In this case the "scsi cleanup" requires the scsi layer to send a sync cache command to the disk to make sure the data is written to the disk.

You can tell if you are using a write back cache by checking out the /var/log/messsages info. In there you would see write cache enabled:

sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Comment 1 Hans de Goede 2009-02-13 15:19:35 UTC

This is not an anconda problem but an NM problem, just like we need a way to tell NM to absolutely do not bring down interfaces during configuration when in use for a network based / , we also need to be able to tell NM to not down the interfaces when it exits. Currently when booting of iscsi a hack in anaconda comes in to play which write NM_CONTROLLED=no to the ifcfg file disabling NM completely for the relevant interface, which means we do not suffer the same hang when rebooting the installed system, but once we stop doing this hack and start interacting with NM here, NM needs to stop bringing down all interfaces on exit.

Since we are actually using NM for the interface during the installation, we get this hang.

Most likely we can use the same mechanism for telling NM not to down the interface during configuration, as for telling it not to down the interface on exit.

Changing component to NM and re-assigning.

Comment 2 Dan Williams 2009-02-13 15:40:58 UTC


*** This bug has been marked as a duplicate of bug 479824 ***

Note You need to log in before you can comment on or make changes to this bug.