Description of problem: If introspection is hung, there should be a way of troubleshooting the process. Ability of starting a second VT on the host would be a starter but ability to ssh to the discover-image would be nice.
Is there anyway at the moment to get onto that discovery image?
This will be fixed with mova to IPA in OSP 8. The only thing we can do with our current image is to use virtual console..
fair enough ... i'm having trouble getting to them ... what keys give which virtual consoles? i know it should be obvious but i'm having issues, perhaps with emulated macros ....
I've tried ctrl-alt-f keys but nothing seems to happen. should it?
If inspection fails during ramdisk run, you can connect to the virtual console of the machine, and get into simple shell. If inspection fails after ramdisk successfully ran, you will get some logs on the undercloud (e.g. sudo journalctl -u openstack-ironic-discoverd). Hope that helps.
i think it is during, but i can't get a virtual console. how do i get that?
It depends on your vendor. Usually you point your browser to the BMC host (ipmi_address/ilo_address/drac_address) and get to e.g. iDRAC or ILO web UI. There you'll see an option to run virtual consoler.
Dmitry, are you referring to the discovery image? I concur with August, I have connected to the hardware console of the machine, but I haven't found a way of getting a prompt.
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
So, this bug is something never-ending. We do have several debugging features in place for OSPd8, namely: 1. passing logs from the ramdisk in case of failures, 2. ability to pass SSH key to the ramdisk by modifying its kernel command line. I think we can call this fixed.
FailedQA: Environment: openstack-ironic-conductor-4.2.5-2.el7ost.noarch openstack-ironic-common-4.2.5-2.el7ost.noarch openstack-ironic-inspector-2.2.6-1.el7ost.noarch openstack-ironic-api-4.2.5-2.el7ost.noarch So followed comment #12: Used http://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubleshooting-nodes.html as guidance: 1. passing logs from the ramdisk in case of failures. The logs became available under /var/log/ironic-inspector/ramdisk after setting "always_store_ramdisk_logs = true" in /etc/ironic-inspector/inspector.conf and bouncing the openstack-ironic-inspector service. PASS 2. ability to pass SSH key to the ramdisk by modifying its kernel command line. I think we can call this fixed. I failed to connect to the node being introspected. I also tried to set root's password with rootpwd="<HASH>" as described in the doc. Also no luck. FAILED.
*** Bug 1333026 has been marked as a duplicate of this bug. ***
(In reply to Alexander Chuzhoy from comment #17) > FailedQA: > Environment: > openstack-ironic-conductor-4.2.5-2.el7ost.noarch > openstack-ironic-common-4.2.5-2.el7ost.noarch > openstack-ironic-inspector-2.2.6-1.el7ost.noarch > openstack-ironic-api-4.2.5-2.el7ost.noarch > > > So followed comment #12: > Used > http://docs.openstack.org/developer/tripleo-docs/troubleshooting/ > troubleshooting-nodes.html as guidance: > > 1. passing logs from the ramdisk in case of failures. > The logs became available under /var/log/ironic-inspector/ramdisk after > setting "always_store_ramdisk_logs = true" in > /etc/ironic-inspector/inspector.conf and bouncing the > openstack-ironic-inspector service. > PASS > > 2. ability to pass SSH key to the ramdisk by modifying its kernel command > line. I think we can call this fixed. > > I failed to connect to the node being introspected. I also tried to set > root's password with rootpwd="<HASH>" as described in the doc. Also no luck. > FAILED. I recently tested "rootpwd" and "sshkey" with the OSP9 ramdisk and it works [0]. How did you generate that hash ? Here's how I did it: 1. Generate the password hash $ openssl passwd -1 Password: Verifying - Password: $1$shnYk4hW$GmXa4mN.duC6WQYvuIyot0 2. Update the kernel cmdline to include it. 2.1 Update the iPXE file directly $ vim /httpboot/pxelinux.cfg/<mac address> :deploy imgfree kernel --timeout ... rootpwd="$1$shnYk4hW$GmXa4mN.duC6WQYvuIyot0" || goto deploy 2.2 Update it for all instances $ vim /etc/ironic/ironic.conf [pxe] pxe_append_parameters = nofb nomodeset vga=normal rootpwd="$1$shnYk4hW$GmXa4mN.duC6WQYvuIyot0" $ systemctl restart openstack-ironic-conductor $ <now start inspection/deployment> [0] here's the logs: http://paste.openstack.org/show/577447/ ... Can you give it another go please ?
The introspection completes too quickly and shuts down the node. Red Hat Enterprise Linux Server 7.2 (Maipo) Kernel 3.10.0-327.28.3.el7.x86_64 on an x86_64 localhost login: root Password: [ 18.483805] IPMI System Interface driver. [ 18.484834] ipmi_si: Unable to find any System Interface(s) -- root: no shell: Permission denied Red Hat Enterprise Linux Server 7.2 (Maipo) Kernel 3.10.0-327.28.3.el7.x86_64 on an x86_64 localhost login: The "no shell: Permission denied" message above makes me question if a successful login was prevented by the system being shut down. Is there a way to pause the introspection or make it keep the node being introspected UP longer.
You can try to artificially prevent it from working by providing an unreachable ipa-inspection-callback-url in /httpboot/inspector.ipxe. Then it will probably loop in attempts to reach it.
Thanks Dmitry. So here's what happens: 1) specifying wrong password on purpose: Red Hat Enterprise Linux Server 7.2 (Maipo) Kernel 3.10.0-327.28.3.el7.x86_64 on an x86_64 localhost login: root Password: Login incorrect 2) specifying the right password: localhost login: root Password: Last failed login: Fri Sep 16 12:45:26 EDT 2016 on ttyS0 There was 1 failed login attempt since the last successful login. Last login: Fri Sep 16 12:44:17 on ttyS0 -- root: no shell: Permission denied
Just to clarify: was this "permission denied" fatal or were you able to log in?
I was not able to login. Thanks.
Sigh, this is strange.. Maybe it only affects OSPd8? Have you tried other versions?
(In reply to Alexander Chuzhoy from comment #25) > I was not able to login. > Thanks. Hi sasha, I was looking into it and I just found out that the element that is suppose to allow you to login the image wasn't present in it. The patch https://code.engineering.redhat.com/gerrit/#/c/64749/ should fix it (linked in the external links)
FailedQA Environment: instack-undercloud-6.0.0-2.el7ost.noarch openstack-ironic-api-7.0.1-0.20170301202959.91540cd.el7ost.noarch python-ironic-inspector-client-1.11.0-0.20170208193115.481a92e.el7ost.noarch python-ironic-lib-2.5.2-0.20170208212103.ace87b6.el7ost.noarch python-ironicclient-1.11.0-0.20170208194603.f1f10cb.el7ost.noarch puppet-ironic-10.3.0-1.el7ost.noarch openstack-ironic-inspector-5.0.0-2.el7ost.noarch openstack-ironic-conductor-7.0.1-0.20170301202959.91540cd.el7ost.noarch openstack-ironic-common-7.0.1-0.20170301202959.91540cd.el7ost.noarch When I tried to ssh into a node being introspected, I got: [stack@undercloud-0 ~]$ ssh 192.168.24.104 -l root root.24.104's password: /bin/bash: Permission denied Connection to 192.168.24.104 closed.
This is an selinux issue. I actually succeeded to login after adding "selinux=0" to the kernel line. Are we going to document it or modify the images?
Lukas, per comment #30 Are we going to document it or modify the images?
(In reply to Alexander Chuzhoy from comment #31) > Lukas, > per comment #30 > Are we going to document it or modify the images? Hi sasha, Good finding btw, yeah, probably the dynamic-login [0] element in DIB should configure selinux to allow people to SSH in the node when it's specified in the list of elements to create the image. That way we don't need people to pass "selinux=0" in the kernel cmdline (and disable selinux as a whole). [0] https://github.com/openstack/diskimage-builder/tree/master/elements/dynamic-login
The dynamic-login README [1] has a warning about this actually: "Some base operational systems might require selinux to be in permissive or disabled mode so that you can log in the image. This can be achieved by building the image with the selinux-permissive element for diskimage-builder or by passing selinux=0 in the kernel command line. RHEL/CentOS are examples of OSs which this is true." So it seems that this is by design and should probably be part of the documentation. [1] https://github.com/openstack/diskimage-builder/tree/master/diskimage_builder/elements/dynamic-login
The selinux note was added here: https://docs.openstack.org/developer/tripleo-docs/troubleshooting/troubleshooting-nodes.html#accessing-the-ramdisk Verifying the bug based on the above + comment #29 + comment #30
In my opinion, disabling selinux is a workaround at best. If a policy or labelling change is needed, we should do this rather then instruct users to disable selinux. Would you like a seperate Bz for this?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1250