Are you playing with Proxmox clustering, but want faster networking without paying for multi-gig switches? For small clusters, sometimes it can make sense to use fast point to point links between nodes. This could be in a small 2 or 3 node cluster, where you can use dual port 10 gig cards and direct attach cables without a switch. Maybe you’ve got a wacky 5 node cluster with quad port gigabit cards on each node, and don’t want to buy a 20 port switch and do link aggregation. Or maybe you want to be the crazy guy who uses intel NUCs with thunderbolt between them. Whatever your use case, this video will help you setup your fully routed cluster network properly.
This is accomplished by creating point to point links between each node in any topology you can think of, and allowing OSPFv3 to exchange route information across all of the links. Once we have configured OSPF on all of the relevant interfaces, the cluster route map will automatically be generated and updated if any links go down and the shortest path will be chosen based on link speeds and priorities.
Routed Cluster Subnet⌗
Our existing network has a subnet mask, something like 192.168.1.0/24 or 2000:db8::/64. To identify traffic on the cluster network, we are going to create a completely new subnet, in this example fd69:beef:cafe::/64. Only nodes in the cluster know about this second subnet, and traffic using this second subnet will go over our high speed backbone.
Out of this subnet, we will assign an address to each node. In my case, I decided to use 551, 552, and 553 for the nodes.
To setup the individual links, we just need to bring the interfaces up so a link-local address will be assigned automatically.
We are going to use a package called free range routing for this, which implements the OSPFv3 protocol. So, naturally, we need to install it.
apt install frr -y.
Next we need to enable the OSPFv3 daemon in
/etc/frr/daemons. We could start it now, but we should probably configure it first.
To configure FRR, edit
/etc/frr/frr.conf. I’m going to show this on the first node, but you’ll need to do this on all of your nodes with the specific interfaces on that node.
Since Linux will route packets within the kernel (assuming routing is enabled), we can send packets destined to our cluster subnet on any interface and Linux will accept them. So, we don’t need to assign the address on the point to point links themselves, as long as they go across the right link. To make sure we can accept packets for our cluster network when any link is down, we are going to add our node address to the loopback interface. So, starting off the config file, we will add an address to the loopback interface. We will also add it to the zero-area and set it to passive since there’s no reason for OSPF to be active on this interface.
! interface lo ipv6 address fd69:beef:cafe::551/128 ipv6 ospf6 area 0.0.0.0 ipv6 ospf6 passive
Next up, we can add the existing gigabit interface as a backup. If there is no fast path to the other node, we can fall back on the ‘public’ network. So, set up vmbr0 as a broadcast network like this. When setting cost you can choose any integer you want, the standard is to divide some reference bandwidth by the link speed, but the actual routing path used will be the path that results in the lowest cost across all links. In my example I set the backup cost to 100 and point to point cost to 10.
#Backup links via primary gigabit link (vmbr0) #Cost for 1G assumptions (100 gig reference / 1 gig = 100 cost) ! interface vmbr0 ipv6 ospf6 area 0.0.0.0 ipv6 ospf6 network broadcast ipv6 ospf6 cost 100
Now setup all of your point to point links. They can have a different cost if your interfaces have different bandwidths. Use the interface names here (use
ip a to find them).
! interface ens19 ipv6 ospf6 area 0.0.0.0 ipv6 ospf6 network point-to-point ipv6 ospf6 cost 10 !
Finally, let’s setup the router. Set the router ID to something unique for each node, it can be whatever you want.
#OSPF router settings (unique router ID required for each router) ! router ospf6 ospf6 router-id 0.5.5.1 redistribute connected auto-cost reference-bandwidth 100000
And finally, we can restart frr so it will come up with the new config -
systemctl restart frr.
Within a few seconds of starting frr, routes should start coming in. Try viewing them with
ip -6 route, you should see something like
fd69 something via fe80 dev ens19 proto ospf.
Full Example Config⌗
# default to using syslog. /etc/rsyslog.d/45-frr.conf places the log in # /var/log/frr/frr.log # # Note: # FRR's configuration shell, vtysh, dynamically edits the live, in-memory # configuration while FRR is running. When instructed, vtysh will persist the # live configuration to this file, overwriting its contents. If you want to # avoid this, you can edit this file manually before starting FRR, or instruct # vtysh to write configuration to a different file. log syslog informational #Enable IPv6 forwarding since we are using IPv6 ipv6 forwarding #Add our router's private address on lo (loopback) #This address is a single address (/128) out of the subnet (/64) #of our 'cluster' network, of which routes to individial /128s are #distributed using OSPF ! interface lo ipv6 address fd69:beef:cafe::551/128 ipv6 ospf6 area 0.0.0.0 ipv6 ospf6 passive #Backup links via primary gigabit link (vmbr0) #Cost for 1G assumptions (100 gig reference / 1 gig = 100 cost) ! interface vmbr0 ipv6 ospf6 area 0.0.0.0 ipv6 ospf6 network broadcast ipv6 ospf6 cost 100 #Two p2p links ens19 and ens20 #Since we are using IPv6 we do not need to assign #addresses on these links, relying on link-local addresses #Cost for 10G assumptions (100 gig reference / 10 gig = 10 cost) #Feel free to edit your cost as appropriate #You can tweak these cost values to change the traffic flow ! interface ens19 ipv6 ospf6 area 0.0.0.0 ipv6 ospf6 network point-to-point ipv6 ospf6 cost 10 ! interface ens20 ipv6 ospf6 area 0.0.0.0 ipv6 ospf6 network point-to-point ipv6 ospf6 cost 10 #OSPF router settings (unique router ID required for each router) ! router ospf6 ospf6 router-id 0.5.5.1 redistribute connected auto-cost reference-bandwidth 100000