IT Automation with Ansible

  26 Nov 2020     Homelab, Linux, and IT Automation

Automation has always been the dream of any IT person. Today, requirements of fast deployment and iterations further advance the development of IT automation tools. Among the popular automation tools, Ansible is one of the simplest while still providing rich set of functions. Here I would like to give a quick overview of this powerful tool.

Why Ansible?

To answer the question why using Ansible, we first need to understand what’s wrong without Ansible. Without Ansible, we can still do IT automation using our own shell/python scripts, and actually I myself had been doing things this way for a long time before I tried Ansible. The pain point in such approach is that you need to handle numerous details not directly related to the tasks. E.g., you need to take care of the script uploading, execution, and logging; You need to write code to adapt to each specific environment. As the complexity of tasks increases, the automation script will quickly grow in size. Without careful design the automation scripts will soon be hard to maintain or reuse. Ansible provides an abstraction to hide many details mentioned above, letting us focus on the management tasks. E.g., Ansible greatly simplifies the process of grouping of an inventory of servers/VMs, connecting to a selection of machines and execute commands/modules over SSH. It is designed to be simple and even small personal projects or homelab can benefit from Ansible.

Syntax

Before further diving into Ansible itself, let me briefly cover some Ansible independent syntax.

Ansible uses YAML formats for almost all its configurations. Many in web development field may have already been familiar with JSON, YAML is yet another similar format for describing data, but intended to be more human-readable. E.g., Python style indentation is used to indicate nesting. Lists and dictionaries are still key data structures. List members are denoted by leading hyphen (-) and Key/value pairs are separated by colon space (colon must be followed by space). Full spec of YAML format can be found on the official YAML web site [1].

Jinja2 templating is another language used in Ansible to enable dynamic expressions and some programming features. Right, if you have used Flask before, you should already be familiar with this templating engine. The most commonly used features in Ansible include:

  • {{ … }} for expressions to print to output. E.g., a variable can be put inside double-curly braces, and later it will be replaced by its value.
  • Filters are used to modify variables and are separated from the variable by a pipe symbol (|). Filters are handy for text manipulation, pattern matching/search/replace, and much more.
  • Tests are used to test a variable against a common expression. They are useful in conditionals which will be introduced later.

Note that Ansible not only include standard filters/tests shipped with Jinja2, but also add lots of additional Ansible-specific filters/tests. Check both Jinja2 Docs[2] and Ansible User Guide[3] when you try to find a specific filter/test that suites your problem.

Key Concepts

Now let’s come back to Ansible. There are tons of good resources for introduction of Ansible, so here I just hope to outline some key concepts.

As I mentioned early, Ansible is designed to be simple and requires minimal dependencies. It is only required to be installed on a controlled node, and all remote hosts can be managed through SSH (most commonly). Inventory is a list of remote hosts to be managed, including the hostnames and other meta info. Ansible provides a way to group the hosts so that you can easily apply different management tasks on one group, or intersection/union of multiple groups.

Once you the inventory ready, you can already apply ad-hock commands on a single host or a group of hosts. Other than simple shell commands, you can call Ansible Modules. Ansible comes with lots of pre-defined modules for various tasks. E.g., ping module is used to test connection with remote hosts:

ansible webservers -m ping

In latest Ansible version (2.10), modules are grouped into Collections [4].

In most real-life scenarios, we need to defined a list of tasks for more complicated applications and Playbook is exactly for this purpose. It is an ordered lists of tasks that can be saved and run repeatedly. Moreover, playbooks can realize dynamic features by harnessing the power of variables, conditionals, and loops. And one can further modularize the tasks with Roles.

Though there’s much, simply with the concepts/knowledge above, we are already capable of dealing with most common IT automation tasks in Ansible, and get rid of some tedious shell/python scripts. And don’t forget there’s Ansible Galaxy where lots of roles and playbooks shared by community can be found.

References

  1. YAML 1.2 Specification
  2. Jinja Template Designer Documentation
  3. Ansible User Guide
  4. Ansible Collection Index

DNS Configuration and Using Domain Names for Applications in Local Network

  04 Jul 2020     Homelab, Linux, Networking, and DNS

Since I started my home networking project, various applications have been deployed in the local network. For a very long time, I have relied on IP addresses and ports to access different apps. However, as the number of applications increase, assigning domain names becomes a more attractive option. In order to resolve the domain names, one way is to add entries in hosts files of each machine. However, it’s time consuming to edit all machines whenever modifications are required. A central domain name server (DNS) addresses this issue. This post introduces how to configure DNS in a local home network and how to use domain names for different applications.

DNS Server Setup

The table below listed details of the hosts/applications that will be used.

IP Domain Name Role
192.168.1.101 ns.homelab.net DNS server
192.168.1.102 web.homelab.net Web server hosting application
192.168.1.102 gitlab.homelab.net Gitlab application (running in docker container, at port 7001)
192.168.1.102 wiki.homelab.net MediaWiki application (running in docker container, at port 7002)

As you can see from the table, multiple applications may share the same physical server, which is particularly common for home networking projects. Nginx can help deal with the issue. But as the first step, we only need to establish the correspondence between IP addresses and domain names. I followed the tutorial by DigitalOcean [1] to setup the DNS server. BIND is used as name server software, which can be easily installed using software packages tools like apt or dnf/yum. To setup BIND, create or modify the following corresponding files:

/etc/named.conf

options {
    # listen-on port 53 { 127.0.0.1; };
    # listen-on-v6 port 53 { ::1; };
    ...
    allow-query     { 192.168.1.0/24;};
    ...
}

zone "homelab.net" {
    type master;
    file "/etc/named/zones/db.homelab.net"; # zone file path
};

zone "1.168.192.in-addr.arpa" {
    type master;
    file "/etc/named/zones/db.192.168.1";  # 192.168.1.0/24 subnet
};

/etc/named/zones/db.homelab.net

TTL    604800
@       IN      SOA     ns.homelab.net. admin.homelab.net. (
                  3     ; Serial
             604800     ; Refresh
              86400     ; Retry
            2419200     ; Expire
             604800 )   ; Negative Cache TTL
;
; name servers - NS records
     IN      NS      ns.homelab.net.

; name servers - A records
ns.homelab.net.          IN      A      192.168.1.101

; 192.168.1.0/24 - A records
web.homelab.net.         IN      A      192.168.1.102
gitlab.homelab.net.      IN      A      192.168.1.102
wiki.homelab.net.        IN      A      192.168.1.102

/etc/named/zones/db.192.168.1

$TTL    604800
@       IN      SOA     homelab.net. admin.homelab.net. (
                              3         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
; name servers
      IN      NS      ns.homelab.net.

; PTR Records
101   IN      PTR     ns.homelab.net.      ; 192.168.1.101
102   IN      PTR     web.homelab.net.     ; 192.168.1.102
102   IN      PTR     gitlab.homelab.net.  ; 192.168.1.102
102   IN      PTR     wiki.homelab.net.    ; 192.168.1.102

The following commands can be used to check the syntax:

sudo named-checkconf
sudo named-checkzone /etc/named/zones/db.homelab.net
sudo named-checkzone /etc/named/zones/db.192.168.1

After making sure everything is correct, start/restart BIND service.

sudo systemctl start named

Nginx Setup

In the last section, I mentioned multiple applications can share the same IP address, distinguished by ports. Nginx can help get rid of ports when accessing the applications. Using the Nginx configuration below, Gitlab application and MediaWiki application are assigned with corresponding domain names.

server {
    listen       80;
    server_name  gitlab.homelab.net;
    
    location / {
        proxy_pass http://127.0.0.1:7001;
        proxy_set_header Host $http_host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 
    }
}
server {
    listen       80;
    server_name  wiki.homelab.net;
    
    location / {
        proxy_pass http://127.0.0.1:7002;
        proxy_set_header Host $http_host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 
    }
}

If you run into any firewall or SELinux issues, you can check my earlier post for solutions.

Setting DNS in Clients

As a last step, we need to setup clients like desktop PCs, or mobile devices. Simply change the DNS server from automatic to manual and enter the DNS server address 192.168.1.101. I tried all different mainstream operating systems including Windows, Linux, Mac OS, iOS, Android without having any difficulties. Now test and enjoy your private web applications in your client browser!

References

  1. How To Configure BIND as a Private Network DNS Server on CentOS 7

NAS Guide: Soft RAID Setup on Linux

  18 May 2020     Homelab, Storage, and Linux

There are many different reasons to choose a network attached storage over a cloud based solution. Some people are concerned about data security and privacy. Some make decisions after calculating the cost. And for content creators, the most important reason is speed! The file sizes have been growing exponentially – E.g., minutes of 4K raw footage can easily occupy gigabytes of storage. This is the same for scientific researchers. Simulations have been playing more and more important roles and e.g. in my field finite element analysis of a large model can quickly generate huge amounts of data. In a cloud based solution, the file transfer speed is bottlenecked by both ISP (internet speed) and cloud service provider, both you have limited control over and cost a great fortune to upgrade. In contrast, for a home NAS, by choosing the proper network hardware (such as in this Multi-Gigabit Home Network Setup Tutorial), you can easily access much faster local network at a fair cost.

However, the downside of a home NAS solution is that compared with a professional cloud service provider, the data is more vulnerable to hard drive failures due to less reliable consumer grade hardware and lack of frequent routine backups. To protect valuable data, RAID is a very popular technology to offer data redundancy automatically – which means data can be still intact even after some of the drives fail. RAID can be implemented with either hardware (RAID controller) or software. This article introduces the setup of software based RAID setup on Linux.

Purchase Hard Drives

First we need to prepare the hard drives for RAID. Mechanical drives (HDDs) are typically used due to their large capacity over low prices. One thing worth noting is that recently there have been many discussions around performance differences between Shingled Magnetic Recording (SMR) and Perpendicular Magnetic Recording (PMR). In short, most hard drive manufacturers have been adopting SMR to reduce cost over the years. However, many reports show SMR has significant performance issues in heavy read/write scenarios due to its overwriting mechanism. As a result, SMR is not ideal for NAS application, and especially it’s not recommended to mix SMR and PMR drives in a RAID setup [1]. Before purchasing any hard drive, you should investigate whether it’s SMR or not. As the time of writing, WD has published the full list of drives using SMR. Seagate has also declared all its Ironwolf series (NAS) are not using SMR.

Prepare Disks

To partition the disks, here we use gdisk command (using GPT partition table). Type gdisk to enter an interactive environment for partitioning the disk (e.g. /dev/sda).

gdisk /dev/sda

Using gdisk is very straightforward:

  1. Use o to create a new GPT partition table. Note this will erase all data in the disk.
  2. Use n to create a new partition. Select the start/end sector to include the whole disk. Select fd00 (Linux RAID) for file system type.
  3. Use w to write changes.

After the above steps, an empty partition /dev/sda1 will be created. Similarly, prepare the other disks that will be included in RAID.

Create RAID

For most Linux distros, the tools for RAID management should have already been included, and you can always easily install them if it’s not the case. mdadm command is used to create a soft RAID device:

sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1

The same mdadm command can also be used to check the RAID device information.

sudo mdadm /dev/md0

Adding --detail option allows you to get more detailed information.

sudo mdadm --detail /dev/md0

Or if you want to examine whether an actual device belongs to RAID, --examine option can be used. (Note the difference with --detail)

sudo mdadm --examine /dev/sda1

Now create a file system on the created RAID device. Here the XFS file system (default on CentOS) is used. Of course things like block sizes can be fine tuned based on your specific needs.

sudo mkfs.xfs /dev/md0

After mounting the RAID device to a folder, you can already access the space.

mkdir /mnt/raid-storage
mount /dev/md0 /mnt/raid-storage

To automatically mount the RAID device during startup, you can add the corresponding entry in /etc/fstab file.

UUID=[[RAID UUID]]    /mnt/raid-storage    xfs    defaults     0 0

In the first data column, I used Universally Unique IDentifier (UUID) instead of /dev/md0 to make sure the mounting is correct even if the hard drive device ID is changed. UUID can be found using the following command.

ls -l /dev/disk/by-uuid

File Permissions

Finally for a multi-user environment, I prefer to create a new user group nas-storage and make authorized users a member of the control group. Of course the file permission configuration strongly depend on your own use cases.

sudo groupadd -r nas-storage
sudo usermod -a -G nas-storage yeguang
sudo chgrp -R nas-storage /mnt/raid-storage
sudo chmod g+w /mnt/raid-storage

In the future article, I will further share some of my experiences in file server setup (e.g. Samba, NFS). Stay tuned.

References

  1. What are PMR and SMR hard disk drives?
  2. 直接使用 RAID 磁盘阵列