by

Rebooting machines with Ansible

There are a few resources online explaining how to reboot a machine using Ansible which didn’t work for me. My task would always time out and I had no idea why. Finally I figured it out.

The tasks I was using looked roughly like these:

 1---
 2- name: Restart machine
 3  shell: shutdown -r now "Maintenance restart"
 4  async: 0
 5  poll: 0
 6
 7- name: Wait for server to come back
 8  local_action:
 9    module: wait_for
10    host: {{ inventory_hostname }}
11    state: started
12  become: false

The problem here is the use of inventory_hostname. In my inventory, I was referring to my machines by the name they had on my .ssh/config. This works well when invoking Ansible, whose CLI integrates well with OpenSSH. However it doesn’t work for modules, or at least it doesn’t for wait_for which I use above.

After trying some alternatives, I eventually settled for having all the network information on my inventory. This is, declaring ansible_host (and possibly ansible_port) for each entry, instead of relying on .ssh/config. Then I would use ansible_host in the wait_for task to indicate the host.

After some additional tweaking, currently I have a reboot role whose main task looks like this:

 1---
 2- name: Restart machine
 3  shell: sleep 2 && shutdown -r now "Maintenance restart"
 4  async: 1
 5  poll: 0
 6  ignore_errors: true
 7
 8- pause:
 9    seconds: 5
10
11- name: Waiting for server to come back
12  local_action:
13    module: wait_for
14    host: '{{ ansible_host }}'
15    port: '{{ ansible_port }}'
16    state: started
17    delay: 10
18    timeout: 60
19  become: false # as this is a local operation

Why sleep 2, async: 1 and poll: 0? I have no idea. I have tried a few things and this is the one that appears to work reliably for me. For now, I’m sticking with it, until I understand all this a bit better.