I’ve been using Linode’s object hosting for my website for ~2 years now, and it’s time for a change. I’m not unhappy with Linode, but object hosting isn’t for me any more and I’d like to move up to a virtual private server. Object hosting is a fantastic way to get started with a static website for a low cost, but I want better backend analytics and more control of the whole process, so I’m setting up my own VPS using Caddy. At the same time, I’m setting up a privacy-respecting analytics system (Goatcounter) to read the web server log files and tell me how many viewers are enjoying my website. So, come along on this adventure!

Contents

Video

Video Thumbnail

Caddyfile

Caddyfile is a super simple syntax, easy to understand and use. Default options are very good for most people.

# The Caddyfile is an easy way to configure your Caddy web server.
www.apalrd.net {
        # Set this path to your site's directory.
        root * /var/apalrd-net/public

        # Enable the static file server.
        file_server

        # Allow gzip encoding
        encode gzip zstd

        # Use TLS
        tls "adventure@apalrd.net"

        #Log to the usual place
        log {
                output file "/run/access/access-www-apalrd-net.log" {
                        roll_keep_for 1d
                        roll_size 10MiB
                }
                format transform `{request>remote_ip} - {request>user_id} [{ts}] "{request>method} {request>uri} {request>proto}" {status} {size} "{request>headers>Referer>[0]}" "{request>headers>User-Agent>[0]}"` {
                        time_format "02/Jan/2006:15:04:05 -0700"
                }
        }
}

#Global redirects
apalrd.net:80 {
        redir https://www.apalrd.net{uri} permanent
}
apalrd.net:443 {
        redir https://www.apalrd.net{uri} permanent
        tls "adventure@apalrd.net"
}


#Stats server redirects to Goatcounter
stats.apalrd.net {
        reverse_proxy localhost:8081
        tls "adventure@apalrd.net"
        encode gzip zstd
}
# Refer to the Caddy docs for more information:
# https://caddyserver.com/docs/caddyfile

Caddy Service Override

### Editing /etc/systemd/system/caddy.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file

[Service]
#Add a directory 'access' for runtime, systemd will add it to /run and deal with permissions
RuntimeDirectory=access
#Replace Exec with the /usr/local/bin versions
ExecStart=
ExecStart=/usr/local/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=
ExecReload=/usr/local/bin/caddy reload --config /etc/caddy/Caddyfile --force

### Lines below this comment will be discarded

I need to reduce the lifetime of these log files so they don’t continue to accumulate on the system, since IP addresses are considered personally-identifying information by modern privacy standards, even if the US where I’m located doesn’t actually have any sort of protections for personal information. The log files need to include the IP address so Goatcounter can feed it into a geo-IP database to guess the country of the viewer, and after that they are discarded. Once the data makes it to the Goatcounter database it’s been sufficiently anonymized that it’s no longer sensitive.

I chose to create a directory in /run for caddy, which is surprisingly easy with Systemd. I’m always amazed at how much of Linux relies on systemd and how good it is at well really everything it touches. Since /run is a RAM disk on modern Linux systems, I don’t have to worry about data encryption at rest, and the backup system will skip over mount points by default so the entire /run disk won’t end up in backups. Caddy will prevent the logs from getting over 10MB, and Goatcounter should be watching for changes every few seconds.

Goatcounter Service

Now I just need a systemd service for Goatcounter itself, plus the import daemon which I made as a template to make my life easier when I end up adding that link shortener someday. Let me tell you, systemd units are wildly powerful. I also took a moment to appreciate the Goatcounter developer’s shared hate for Docker.

Goatcounter main service: /etc/systemd/system/goatcounter.service:

[Unit]
Description=Goatcounter statistics system
After=network-online.target

[Service]
User=www-admin
Group=www-data
ExecStart=/usr/local/bin/goatcounter serve -listen localhost:8081 -db sqlite3:///var/apalrd-stats/db/goatcounter.sqlite3 -tls none
Restart=always

[Install]
WantedBy=multi-user.target

Logging service: /etc/systemd/system/goatcounter-logs@.service:

[Unit]
Description=Goatcounter statistics system
After=network-online.target caddy.service

[Service]
#Obviously put in your own API key, you get it from your user account in Goatcounter
#Combined log format is Apache style, Common Log Format + Referrer + User-Agent
Environment=GOATCOUNTER_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
User=caddy
Group=caddy
ExecStart=/usr/local/bin/goatcounter \
	import \
	-follow \
	-format=combined \
	-site="https://stats.apalrd.net" \
	-exclude 'path:glob:/assets/*' \
	-exclude 'status:404' \
	-exclude redirect \
	-exclude 'path:glob:/img/*' \
	/run/access/access-%i.log

[Install]
WantedBy=multi-user.target

I’m storing access totals for each page on the site, plus a breakdown by operating system, browser, and country for the site as a whole. It’s free to view at stats.apalrd.net, so enjoy!

Web Rebuild Service and Timer

Last step is to make the website periodically check for updates and re-sync. I’ve chosen to have this happen daily. I wrote a systemd service file that regenerates the site when started, basically by doing a git pull followed by a hugo -v. Of course, to trigger this remotely over SSH I just need to tell systemctl to run this service. How easy!

[Unit]
Description=Rebuild Website Tasks

[Service]
Type=oneshot
#Run as www-admin which has permissions to this
User=www-admin
Group=www-data
WorkingDirectory=/var/apalrd-net/
#Run git-pull here
ExecStart=git pull
#Next run Hugo
ExecStart=hugo -v

[Install]
WantedBy=default.target

Running systemctl start webrebuild gives me this beautiful log in journalctl:

Jul 24 23:20:40 web-ash1 git[34589]: Already up to date.
Jul 24 23:20:40 web-ash1 hugo[34590]: Start building sites 
Jul 24 23:20:40 web-ash1 hugo[34590]: hugo v0.111.3+extended linux/amd64 BuildDate=2023-03-16T08:41:31Z VendorInfo=debian:0.111.3-1
Jul 24 23:20:40 web-ash1 hugo[34590]: INFO 2023/07/24 23:20:40 syncing static files to /
Jul 24 23:20:41 web-ash1 hugo[34590]:                    | EN
Jul 24 23:20:41 web-ash1 hugo[34590]: -------------------+------
Jul 24 23:20:41 web-ash1 hugo[34590]:   Pages            | 251
Jul 24 23:20:41 web-ash1 hugo[34590]:   Paginator pages  |  46
Jul 24 23:20:41 web-ash1 hugo[34590]:   Non-page files   | 254
Jul 24 23:20:41 web-ash1 hugo[34590]:   Static files     |  17
Jul 24 23:20:41 web-ash1 hugo[34590]:   Processed images |   0
Jul 24 23:20:41 web-ash1 hugo[34590]:   Aliases          |  62
Jul 24 23:20:41 web-ash1 hugo[34590]:   Sitemaps         |   1
Jul 24 23:20:41 web-ash1 hugo[34590]:   Cleaned          |   0
Jul 24 23:20:41 web-ash1 hugo[34590]: Total in 771 ms
Jul 24 23:20:41 web-ash1 systemd[1]: webrebuild.service: Deactivated successfully.

Now that the service works, I can add a timer unit to trigger it regularly.

[Unit]
Description=Rebuild website daily
RefuseManualStart=no
RefuseManualStop=no

[Timer]
#Run 180 seconds after boot for the first time
OnBootSec=180
#Run at 9am EST / 4am UTC daily
OnCalendar=*-*-* 04:00:00
Unit=webrebuild.service

[Install]
WantedBy=timers.target

End

I hope you enjoyed this overview of how my website is hosted in the background. Sanitized versions of the configs are all posted on my blog if you are looking to replicate this setup. Of course, feel free to reach out to my Discord if you’d like ot hang out with like minded people, and as always, I’ll see you on the next adventure!