-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exhausted IP addresses from unbalanced zone distribution #7311
Comments
Can you provide your Karpenter configuration? Karpenter should launch nodes into the IPs with the most available IP expect for affinity and |
Sure @engedaam , please find the config below configurationapiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
annotations:
karpenter.sh/nodepool-hash: "5078040335181941408"
karpenter.sh/nodepool-hash-version: v2
name: al2023
spec:
disruption:
budgets:
- nodes: 20%
- duration: 55m
nodes: "0"
schedule: '@hourly'
consolidationPolicy: WhenUnderutilized
expireAfter: 168h
limits:
cpu: "125"
memory: 1000Gi
template:
spec:
nodeClassRef:
name: al2023
requirements:
- key: karpenter.sh/capacity-type
operator: In
values:
- spot
- on-demand
- key: karpenter.k8s.aws/instance-category
operator: In
values:
- c
- m
- r
- t
- key: karpenter.k8s.aws/instance-cpu
operator: Gt
values:
- "3"
- key: karpenter.k8s.aws/instance-cpu
operator: Lt
values:
- "33"
- key: karpenter.k8s.aws/instance-memory
operator: Gt
values:
- "4000"
- key: karpenter.k8s.aws/instance-memory
operator: Lt
values:
- "66000"
- key: karpenter.k8s.aws/instance-ebs-bandwidth
operator: Gt
values:
- "2000"
- key: karpenter.k8s.aws/instance-hypervisor
operator: In
values:
- nitro
- key: kubernetes.io/arch
operator: In
values:
- amd64
- key: kubernetes.io/os
operator: In
values:
- linux
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values:
- "3"
startupTaints:
- effect: NoExecute
key: node.cilium.io/agent-not-ready
value: "true"
weight: 90
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
annotations:
karpenter.k8s.aws/ec2nodeclass-hash: "11350300940085964065"
karpenter.k8s.aws/ec2nodeclass-hash-version: v2
finalizers:
- karpenter.k8s.aws/termination
name: al2023
spec:
amiFamily: AL2023
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
deleteOnTermination: true
encrypted: true
throughput: 125
volumeSize: 200Gi
volumeType: gp3
instanceProfile: o11n-eks-xxx
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required
securityGroupSelectorTerms:
- id: sg-022e0610xxx
subnetSelectorTerms:
- id: subnet-0e0d9c1xx
- id: subnet-0fff56bxx
- id: subnet-0884d45xx
tags:
Name: kubernetes.io/cluster/o11n-eks-o11n-union
System: o11n-eks-o11n-union
jw:owner: eks
jw:project: o11n/eks
jw:stage: union
userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: application/node.eks.aws
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
featureGates:
InstanceIdNodeName: false # https://github.com/awslabs/amazon-eks-ami/issues/1821
kubelet:
config:
featureGates:
DisableKubeletCloudCredentialProviders: true
registryPullQPS: 100
serializeImagePulls: false
shutdownGracePeriod: 30s
--//
We did not - however as a dirty workaround we do now to force Karpenter into the other zones. That is not a solution. |
Description
Observed Behavior:
IP range of one subnet is exhausted causing "dead" nodes while other zones are left empty.
This is a followup of #1810 #1292 as that
topology-spread
solution does not scale on large clusters, accross independent teams, namespaces and such.On dozens of deployments we cannot instruct every developer to care for https://karpenter.sh/v0.10.0/tasks/scheduling/#topology-spread to match exactly accross all teams.
ClusterAutoscaler has this option for a reason https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler
Expected Behavior:
Karpenter will take balancing as a requirement for node scheduling
The text was updated successfully, but these errors were encountered: