Created attachment 955920 [details] node logs Description of problem: Every time I configure node with latest vdsm packages (either install them on bare CentOS or use oVirt node image), after adding node to the engine, all my storage devices go offline -- all multipath devices fail or disks got corrupted after vdsm service restart. I have attached the output of dmesg of one such case. Version-Release number of selected component (if applicable): oVirt Engine 3.5 http://resources.ovirt.org/pub/ovirt-3.5-pre/iso/ovirt-node-iso-3.5.0.ovirt35.20140912.el6.iso How reproducible: Always. Steps to Reproduce: 1. Install Engine 2. Install Node from ovirt-node-iso-3.5.0.ovirt35.20140912.el6.iso or vdsm packages on CentOS 6.6. 3. Register node from node TUI or via engine. Actual results: During node registration or vdsm service restart all devices are marked as failed (no I/O possible), node logging volume becomes read only. After susequent node reboot and vdsm service start all devices become read-only. Expected results: Additional info: With node iso ovirt-node-iso-3.5.0.ovirt35.20140707.el6.iso I do not have this kind of issue. Attached logs from failed node.
This could be related to iptables settings being changed when you add the host / register it. Can you verify if there were any configuration changes in iptables before and after?
It could be related to bug 1149655. That bug is about registering 3.4 host with 3.5 engine, but maybe that is because the Node used here is very old, and might not have the relevant jsonrpc patches.
The fact that the host looses all connectivity makes me think that you are experiencing bug 1144639, and not the one suggested by Fabian. Your attached log state that you are running vdsm 4.16.4-0.el6, which is prior to ovirt-3.5.0 release and the resolution of the said bug. Could you retry installation using a post-3.5.0 release (vdsm >= 4.16.7)?
I have attached logs from latest install on CentOS 6.6, vdsm-4.16.7-1.gitdb83943.el6.src.rpm
Created attachment 956884 [details] node logs
The symptoms are exactly the same, when I issue echo "1" > /sys/class/fc_host/host/issue_lip on node host. Does vdsm rescan storage interconnects when starting?
Raul, supervdsm.log confirms your suggestion: MainProcess|storageRefresh::DEBUG::2014-11-13 00:20:37,752::supervdsmServer::101::SuperVdsm.ServerCallback::(wrapper) call hbaRescan with () {} MainProcess|storageRefresh::INFO::2014-11-13 00:20:37,752::hba::54::Storage.HBA::(rescan) Rescanning HBAs MainProcess|storageRefresh::DEBUG::2014-11-13 00:20:37,753::hba::56::Storage.HBA::(rescan) Issuing lip /sys/class/fc_host/host0/issue_lip MainProcess|storageRefresh::DEBUG::2014-11-13 00:20:38,061::hba::56::Storage.HBA::(rescan) Issuing lip /sys/class/fc_host/host1/issue_lip MainProcess|storageRefresh::DEBUG::2014-11-13 00:20:38,408::supervdsmServer::108::SuperVdsm.ServerCallback::(wrapper) return hbaRescan with None The LIP has been disabled by default in http://gerrit.ovirt.org/#/c/34215/ which would be part of ovirt-3.5.1. I'd appreciate if you verify that this is indeed your issue by taking the patch.
I can verify that this patch solves the issue I reported.
*** This bug has been marked as a duplicate of bug 1152587 ***