Flannel is a general purpose container network interface (CNI) for Kubernetes clusters but offers limited integration for mTLS and container security. Calico has more robust capabilities and direct support for Istio service mesh, offering pod traffic controls, ingress traffic, and additional firewall rules. Calico is compatible with a wide range of Kubernetes versions and cloud environments. It supports various Linux distributions and integrates with different container runtimes. Additionally, Calico can work with both on-premises and cloud-native infrastructures, including AWS, Azure, and Google Cloud. The below steps were successfully tested on a cluster running:
  • Kubernetes v1.29.8
  • Flannel v0.22.0
  • Calico v3.28.1
NOTE: live migration will NOT work on a full Lilt deployment. Nodes are unable to reschedule due to LLM image file sizes.
Since Live Migration is not available, must use the following steps for manual migration. This requires the the cluster be be DOWN for approximately 15 minutes. Please notify all users prior to proceeding.

Delete Flannel

Switch to root user:
sudo su -
On main (control-plane) node, default flannel installation is via a daemonset yaml and must be deleted:
kubectl delete -f kube-flannel-v0.22.0.yaml
On EVERY node in the cluster, ssh and stop kubelet and containerd services:
sudo systemctl stop kubelet.service
sudo systemctl stop containerd
On EVERY node in the cluster, ssh and delete the following local files associated with Flannel CNI:
sudo rm -rf /var/lib/cni
sudo rm -rf /run/flannel
sudo rm -rf /etc/cni
# delete flannel drvier
rmmod vxlan
On EVERY node in the cluster, ssh and delete ip interfaces associated with Flannel:
# Look for any CNI/Flannel related interfaces, and remove them
sudo ip link
# links are usually the following
sudo ip link delete cni0
sudo ip link delete flannel.1
On the main node (or one of the control-plane nodes), restart kubelet and containerd services:
sudo systemctl restart containerd
sudo systemctl restart kubelet.service
Verify that the kube-flannel namespace does not exist:
kubectl get namespace
If the namespace kube-flannel is listed, delete it:
kubectl delete namespace kube-flannel

Install Calico (tigera-operator)

There are two options for installing Calico; operator and manifest. Manifest if the easiest option but only installs the basic CNI interface. Lilt requires the Calico operator for integration with Istio mesh services. If installing on a single-node cluster, need to remove taint on the control-plane so that the operator will schedule:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
On the main (control-plane) node, create a values override file for the helm install. Need to ensure that the CIDR range is the same as the previous Flannel installation:
  • default Flannel CIDR: 192.168.0.0/17
mkdir /tigera-operator
cat <<EOF | sudo tee /tigera-operator/on-prem-values.yaml
tigeraOperator:
  registry: us-central1-docker.pkg.dev/lilt-service-48916b30/third-party
  image: tigera/operator
  version: v1.34.3
calicoctl:
  image: us-central1-docker.pkg.dev/lilt-service-48916b30/third-party/calico/ctl
  tag: v3.28.1

# this creates the secret for you, DO NOT use if already have existing secrets in the cluster
imagePullSecrets: {}

installation:
  enabled: true
  # custom registry for pulling additional images, similar to operator above
  # image tags MUST be default, i.e. calico/apiserver
  # must end with "/"
  registry: us-central1-docker.pkg.dev/lilt-service-48916b30/third-party/
  # set if using existing secrets already in the cluster
  imagePullSecrets:
    - name: third-party
    - name: gar-json-key
  # network settings
  calicoNetwork:
    ipPools:
      # MUST match kubeadm init and previous flannel CIDRs
      - cidr: 192.168.0.0/17

defaultFelixConfiguration:
  enabled: false
EOF
If have external internet access, pull helm chart and install with override file:
helm repo add projectcalico https://projectcalico.docs.tigera.io/charts
kubectl create namespace tigera-operator
helm upgrade calico projectcalico/tigera-operator \
  --install \
  --namespace tigera-operator \
  --create-namespace \
  --version v3.28.1 \
  -f /tigera-operator/on-prem-values.yaml
Check install status with the following command (can take up to a minute to be fully ready):
kubectl get tigerastatus
Result when fully ready:
NAME        AVAILABLE   PROGRESSING   DEGRADED   SINCE
apiserver   True        False         False      3s
calico      True        False         False      13s
ippools     True        False         False      38s
OPTIONAL: If coredns pods are installed but not ready, or have CrashLoopBack errors, need to restart coredns:
kube-system   coredns-76f75df574-2xkkf   0/1     Running   0   27m
kube-system   coredns-76f75df574-q6z45   0/1     Running   0   27m
# or
kube-system   coredns-76f75df574-2xkkf   0/1     CrashLoopBackOff   7 (3m13s ago)   46m
kube-system   coredns-76f75df574-q6z45   0/1     CrashLoopBackOff   7 (3m13s ago)   46m
Restart coredns:
kubectl rollout restart -n kube-system deployment/coredns
Coredns will then recognize Calico CNI and be in ready state:
kube-system   coredns-6b48cb45df-9qvn7   1/1     Running   0          4s
kube-system   coredns-6b48cb45df-xgs62   1/1     Running   0          4s
OPTIONAL: If Calico will not schedule, can also restart containerd again:
sudo systemctl restart containerd
Restart ALL updated/modified nodes. Required to implement new CNI settings:
sudo reboot
After reboot, some pods might be in pending/running state but not healthy/complete:
lilt-dataflow-ingest-wpa-minio-cronjob-28767000-7mk44  1/2   Running  4 192.168.119.203  node-1  21h
lilt-dataflow-ingest-wpa-minio-cronjob-28768440-4xxqt  1/2   Running  0 192.168.119.177  node-1  10m 
Delete the pods and they should be back to healthy:
kubectl delete pod lilt-dataflow-ingest-wpa-minio-cronjob-28767000-7mk44
Result:
lilt-dataflow-ingest-wpa-minio-cronjob-28767000-ftmhj  0/2   Completed                     
calicoctl is a command line tool that can be used to manage the Calico network and security policies and other Calico configurations. It communicates directly with etcd to manipulate the datastore. It provides a number of resource management commands and can be used to troubleshoot Calico network issues. Install calicoctl as a binary on a single host (usually the main control-plane node):
curl -L https://github.com/projectcalico/calico/releases/download/v3.28.1/calicoctl-linux-amd64  -o calicoctl
Set the file to executable:
chmod +x ./calicoctl
Move to local bin dir:
mv calicoctl /usr/local/bin/calicoctl
Verify functionality and version:
calicoctl version