Bug 485353

Summary: After a install to a iscsi disk finishes and system begins to reboot we hang on in the kernel with "detected conn error" message
Product: [Fedora] Fedora Reporter: Mike Christie <mchristi>
Component: NetworkManagerAssignee: Dan Williams <dcbw>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: rawhideCC: anaconda-maint-list, dcbw, hdegoede, jlaska
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-13 10:40:58 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Mike Christie 2009-02-12 19:25:00 EST
Description of problem:

When installing to a iscsi disk, the install is successful and the systems begin to shutdown so it can reboot into the new install, but you will see


rebooting system
md: stopping all md devices.
sd A:B:C:D [sdX] Synchronizing SCSI cache
 connection: ping timeout of 5 secs expired, last rx A, lsat ping Y, now Z
iscsi: can not broadcast skb (-3)
 connection: detected conn error (1011)




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

Systems hangs. Last message is the conn error one.


Expected results:

The system should reboot smoothly.


Additional info:


The problem is that there are still iscsi sessions running and iscsi disks attached.

We should either have anaconda logout the iscsi sessions when it is done so the scsi layer can clean up the disks, or not stop the network so that when the kernel is stopped the scsi layer can clean up disks there, or just add code to the kernel so that we fail more gracefully by not hanging (this is probably a bad fix since we want the cache sync to be sent and completed).

This will not happen on every target/install/setup. It will only occur when the disks is using a write back cache. In this case the "scsi cleanup" requires the scsi layer to send a sync cache command to the disk to make sure the data is written to the disk.

You can tell if you are using a write back cache by checking out the /var/log/messsages info. In there you would see write cache enabled:

sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Comment 1 Hans de Goede 2009-02-13 10:19:35 EST
This is not an anconda problem but an NM problem, just like we need a way to tell NM to absolutely do not bring down interfaces during configuration when in use for a network based / , we also need to be able to tell NM to not down the interfaces when it exits. Currently when booting of iscsi a hack in anaconda comes in to play which write NM_CONTROLLED=no to the ifcfg file disabling NM completely for the relevant interface, which means we do not suffer the same hang when rebooting the installed system, but once we stop doing this hack and start interacting with NM here, NM needs to stop bringing down all interfaces on exit.

Since we are actually using NM for the interface during the installation, we get this hang.

Most likely we can use the same mechanism for telling NM not to down the interface during configuration, as for telling it not to down the interface on exit.

Changing component to NM and re-assigning.
Comment 2 Dan Williams 2009-02-13 10:40:58 EST

*** This bug has been marked as a duplicate of bug 479824 ***