Skip to content

feat(eks): enhancements and fixes for EKS#766

Open
zdrapela wants to merge 4 commits intoredhat-developer:mainfrom
zdrapela:eks-enhancements
Open

feat(eks): enhancements and fixes for EKS#766
zdrapela wants to merge 4 commits intoredhat-developer:mainfrom
zdrapela:eks-enhancements

Conversation

@zdrapela
Copy link
Copy Markdown
Member

@zdrapela zdrapela commented Apr 7, 2026

Summary

Self-managed node groups with spot support

  • Replace EKS managed node group with a self-managed Auto Scaling Group (ASG) for direct spot price control
  • Add spotPrice parameter to set maximum bid for spot instances
  • Use API authentication mode with access entries for self-managed node authentication

Cluster reliability fixes

  • Add service CIDR to nodeadm NodeConfig (required by AL2023 for proper pod networking without DescribeCluster API calls)
  • Deploy addons in phased dependency order: infrastructure addons (vpc-cni, kube-proxy, eks-pod-identity-agent) → coredns → remaining addons (aws-ebs-csi-driver)
  • Add WaitForCapacityTimeout, HealthCheckType, and HealthCheckGracePeriod to ASG so Pulumi waits for nodes to be InService before deploying addons
  • Add ResolveConflictsOnCreate and extended timeouts to all EKS addons
  • Make LB controller depend on coredns for DNS resolution
  • Remove unused NAT gateway (NatGatewayModeNone) since EKS uses only public subnets

VPC endpoint extraction

  • Extract VPC endpoint creation from per-subnet code into a shared EndpointsRequest module
  • Endpoints are created once per VPC across all public subnets (required for multi-AZ EKS — AWS allows only one S3 gateway endpoint per VPC)
  • Integrates with the opt-in ServiceEndpoints pattern from feat(aws): Optional service endpoints #754

Other

  • Remove AWS CLI dependency from EKS cluster creation
  • Add resource tags to all EKS-specific AWS resources (cluster, IAM roles, OIDC provider, addons, ASG, etc.)
  • Extend EKS documentation

Resolves #499

@zdrapela zdrapela changed the title feat(eks): self-managed node groups with spot support and cluster reliability fixes feat(eks): self-managed node groups with spot support, cluster reliability fixes, and VPC endpoint extraction Apr 7, 2026
@zdrapela zdrapela changed the title feat(eks): self-managed node groups with spot support, cluster reliability fixes, and VPC endpoint extraction feat(eks): enhancements and fixes for EKS Apr 7, 2026
@zdrapela zdrapela marked this pull request as ready for review April 7, 2026 13:34
@zdrapela
Copy link
Copy Markdown
Member Author

zdrapela commented Apr 7, 2026

@adrianriobo Hi, I tested this PR on creating an EKS cluster, but I haven't tested any other infra creation, which may be affected.
If this PR is too big, I can split it.
I would create a Tekton task, but unfortunately, I don't have a place to test it. I created the Tekton task, but I still don't have where to test it.

@adrianriobo
Copy link
Copy Markdown
Collaborator

hey nice contribution, yeah I think most of the changes should not affect other targets but I want to give a try, in any case can you clean a bit the commits? basically you can either group EKS improvements and Networking improvements? WDYT?

@zdrapela zdrapela force-pushed the eks-enhancements branch 2 times, most recently from f24206c to dca05de Compare April 8, 2026 11:33
@zdrapela
Copy link
Copy Markdown
Member Author

zdrapela commented Apr 8, 2026

Sure, I split it 👍

@adrianriobo
Copy link
Copy Markdown
Collaborator

@anjannath would you find time to review this one?

@adrianriobo adrianriobo requested a review from anjannath April 16, 2026 11:29
Comment thread pkg/provider/aws/action/eks/eks.go
Comment thread pkg/provider/aws/action/eks/nodegroup.go Outdated
Since MapPublicIp is set to true, nodes get public IPs directly and
do not need a NAT gateway for outbound internet access.
…ging

- Replace managed node group with self-managed ASG for spot price control
- Add tekton task for EKS cluster management
- Add resource tags to all EKS-specific AWS resources
- Resolve EKS cluster creation failures
- Fix EKS creation without AWS CLI
- Extend EKS documentation
@zdrapela
Copy link
Copy Markdown
Member Author

Thank you @anjannath, good points. I addressed your review comments.

@adrianriobo
Copy link
Copy Markdown
Collaborator

Hey we added a new release process for mapt offering now the binary too, I will wait for this PR to goes in and I release v0.13.0

so if you have a moment and changes are good @anjannath LGTM and we can merge

@anjannath
Copy link
Copy Markdown
Collaborator

@zdrapela @adrianriobo seeing an error when trying to deploye arm64 cluster:

Diagnostics:
  pulumi:pulumi:Stack (anath-mapt-eks-stackCreateEKS-anath-mapt-eks):
    error: update failed

  aws:autoscaling:Group (main-aeks-asg):
    error:   sdk-v2/provider2.go:572: sdk.helper_schema: creating Auto Scaling Group (main-aeks-asg-d8e26a7): operation error Auto Scaling: CreateAutoScalingGroup, https response error StatusCode: 400, RequestID: 5c723822-46a0-4004-a933-e6225b9f7f66, api error ValidationError: You must use a valid fully-formed launch template. The architecture 'x86_64' of the specified instance type does not match the architecture 'arm64' of the specified AMI. Specify an instance type and an AMI that have matching architectures, and try again. You can use 'describe-instance-types' or 'describe-images' to discover the architecture of the instance type or AMI.: provider=aws@7.25.0
    error: 1 error occurred:
    	* creating Auto Scaling Group (main-aeks-asg-d8e26a7): operation error Auto Scaling: CreateAutoScalingGroup, https response error StatusCode: 400, RequestID: 5c723822-46a0-4004-a933-e6225b9f7f66, api error ValidationError: You must use a valid fully-formed launch template. The architecture 'x86_64' of the specified instance type does not match the architecture 'arm64' of the specified AMI. Specify an instance type and an AMI that have matching architectures, and try again. You can use 'describe-instance-types' or 'describe-images' to discover the architecture of the instance type or AMI.

@adrianriobo
Copy link
Copy Markdown
Collaborator

@zdrapela how did you try it, either you set --compute-size so you need to ensure types in the list are arm64 or either --arch allowing mapt pick the instance for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EKS Enhancements

3 participants