Red Hat Bugzilla – Bug 485353
After a install to a iscsi disk finishes and system begins to reboot we hang on in the kernel with "detected conn error" message
Last modified: 2009-02-13 10:40:58 EST
Description of problem:
When installing to a iscsi disk, the install is successful and the systems begin to shutdown so it can reboot into the new install, but you will see
md: stopping all md devices.
sd A:B:C:D [sdX] Synchronizing SCSI cache
connection: ping timeout of 5 secs expired, last rx A, lsat ping Y, now Z
iscsi: can not broadcast skb (-3)
connection: detected conn error (1011)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Systems hangs. Last message is the conn error one.
The system should reboot smoothly.
The problem is that there are still iscsi sessions running and iscsi disks attached.
We should either have anaconda logout the iscsi sessions when it is done so the scsi layer can clean up the disks, or not stop the network so that when the kernel is stopped the scsi layer can clean up disks there, or just add code to the kernel so that we fail more gracefully by not hanging (this is probably a bad fix since we want the cache sync to be sent and completed).
This will not happen on every target/install/setup. It will only occur when the disks is using a write back cache. In this case the "scsi cleanup" requires the scsi layer to send a sync cache command to the disk to make sure the data is written to the disk.
You can tell if you are using a write back cache by checking out the /var/log/messsages info. In there you would see write cache enabled:
sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
This is not an anconda problem but an NM problem, just like we need a way to tell NM to absolutely do not bring down interfaces during configuration when in use for a network based / , we also need to be able to tell NM to not down the interfaces when it exits. Currently when booting of iscsi a hack in anaconda comes in to play which write NM_CONTROLLED=no to the ifcfg file disabling NM completely for the relevant interface, which means we do not suffer the same hang when rebooting the installed system, but once we stop doing this hack and start interacting with NM here, NM needs to stop bringing down all interfaces on exit.
Since we are actually using NM for the interface during the installation, we get this hang.
Most likely we can use the same mechanism for telling NM not to down the interface during configuration, as for telling it not to down the interface on exit.
Changing component to NM and re-assigning.
*** This bug has been marked as a duplicate of bug 479824 ***