Production Scaling and High Availability
This guide explains how to configure an existing Upbound Space deployment for production operation at scale.
Use this guide when you're ready to deploy production scaling, high availability, and monitoring in your Space.
Prerequisites
Before you begin scaling your Spaces deployment, make sure you have:
- A working Space deployment
- Cluster administrator access
- An understanding of load patterns and growth in your organization
- A familiarity with node affinity, tainting, and Horizontal Pod Autoscaling (HPA)
Production scaling strategy
In this guide, you will:
- Create dedicated node pools for different component types
- Configure high-availability to ensure there are no single points of failure
- Set dynamic scaling for variable workloads
- Optimize your storage and component operations
- Monitor your deployment health and performance
Spaces architecture
The basic Spaces workflow follows the pattern below:

Node architecture
You can mitigate resource contention and improve reliability by separating system components into dedicated node pools.
etcd dedicated nodes
etcd performance directly impacts your entire Space, so isolate it for
consistent performance.
-
Create a dedicated
etcdnode poolRequirements:
- Minimum: 3 nodes for HA
- Instance type: General purpose with high network throughput/low latency
- Storage: High performance storage (
etcdis I/O sensitive)
-
Taint
etcdnodes to reserve themkubectl taint nodes <etcd-node> target=etcd:NoSchedule -
Configure
etcdstorageetcdis sensitive to storage I/O performance. Review theetcdscaling documentation for specific storage guidance.
API server dedicated nodes
API servers handle all control plane requests and should run on dedicated infrastructure.
-
Create dedicated API server nodes
Requirements:
- Minimum: 2 nodes for HA
- Instance type: Compute-optimized, memory-optimized, or general-purpose
- Scaling: Scale vertically based on API server load patterns
-
Taint API server nodes
kubectl taint nodes <api-server-node> target=apiserver:NoSchedule
Configure cluster autoscaling
Enable cluster autoscaling for all node pools.
For AWS EKS clusters, Upbound recommends using Karpenter for
improved bin-packing and instance type selection.
For GCP GKE clusters, follow the GKE autoscaling guide.
For Azure AKS clusters, follow the AKS autoscaling guide.
Configure high availability
Ensure control plane components can survive node and zone failures.
Enable high availability mode
-
Configure control planes for high availability
controlPlanes:
ha:
enabled: trueThis configures control plane pods to run with multiple replicas and associated pod disruption budgets.
Configure component distribution
-
Set up API server pod distribution
controlPlanes:
vcluster:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: target
operator: In
values:
- apiserver
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vcluster
topologyKey: "kubernetes.io/hostname"
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vcluster
topologyKey: topology.kubernetes.io/zone
weight: 100 -
Configure
etcdpod distributioncontrolPlanes:
etcd:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: target
operator: In
values:
- etcd
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vcluster-etcd
topologyKey: "kubernetes.io/hostname"
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vcluster-etcd
topologyKey: topology.kubernetes.io/zone
weight: 100
Configure tolerations
Allow control plane pods to schedule on the tainted dedicated nodes (available in Spaces v1.14+).
-
Add tolerations for
etcdpodscontrolPlanes:
etcd:
tolerations:
- key: "target"
operator: "Equal"
value: "etcd"
effect: "NoSchedule" -
Add tolerations for API server pods
controlPlanes:
vcluster:
tolerations:
- key: "target"
operator: "Equal"
value: "apiserver"
effect: "NoSchedule"