diff --git a/docs/superpowers/specs/2026-06-19-iam-agent-role-split-design.md b/docs/superpowers/specs/2026-06-19-iam-agent-role-split-design.md new file mode 100644 index 00000000..cbf36b53 --- /dev/null +++ b/docs/superpowers/specs/2026-06-19-iam-agent-role-split-design.md @@ -0,0 +1,123 @@ +# Design: split del módulo `infrastructure/aws/iam/agent` en rol agente + rol de permisos + +Fecha: 2026-06-19 + +## Contexto + +Hoy el módulo `infrastructure/aws/iam/agent` crea un único rol IRSA +(`nullplatform_agent_role`) confiado por el OIDC provider del cluster, con todas +las políticas pegadas directamente: + +- `nullplatform_route53_policy` +- `nullplatform_eks_policy` +- `nullplatform_elb_policy` +- `nullplatform_avp_policy` +- `nullplatform_assume_role_policy` (condicional, solo si `assume_role_arns` no está vacío) +- `var.additional_policies` + +El token IRSA del service account, por lo tanto, tiene acceso directo a todos +esos permisos. + +## Objetivo + +Aplicar separación de privilegios (*role chaining*): + +- El rol IRSA (Rol A) queda solo con capacidad de `sts:AssumeRole`. +- Un nuevo rol de permisos (Rol B) concentra las políticas reales y confía en + el Rol A. + +Esto reduce el blast radius del token IRSA: el token por sí solo no puede tocar +Route53/EKS/ELB/AVP; primero tiene que asumir el Rol B. + +## Arquitectura resultante + +``` +Service Account (K8s) + │ OIDC / IRSA + ▼ +Rol A: nullplatform-{cluster}-agent-role ← rol agente (IRSA) + │ política única: sts:AssumeRole → [ Rol B, *assume_role_arns ] + │ + additional_policies (sin cambios) + ▼ (assume) +Rol B: nullplatform-{cluster}-agent-permissions-role ← rol de permisos (NUEVO) + trust: principal AWS = ARN de Rol A + políticas pegadas: route53, eks, elb, avp +``` + +## Decisiones de diseño + +1. **Qué políticas se mueven al Rol B:** solo las 4 gestionadas (route53, eks, + elb, avp). `additional_policies` se siguen pegando al Rol A. +2. **`assume_role_arns`:** el permiso se otorga en el Rol A. El Rol A puede + asumir tanto el Rol B como los `assume_role_arns` externos (se mantiene el + comportamiento actual de a quién apunta el agente). +3. **Outputs:** se mantiene `nullplatform_agent_role_arn` (Rol A) y se agrega + `nullplatform_agent_permissions_role_arn` (Rol B). No rompe consumidores. + +## Romper la dependencia circular + +El Rol A necesita el ARN del Rol B (en su policy de assume) y el Rol B necesita +el ARN del Rol A (en su trust policy). Para evitar el ciclo en el grafo de +Terraform, ambos ARNs se construyen como `locals` a partir de +`data.aws_caller_identity.current` + los nombres (que ya son `locals` +deterministas, con `use_name_prefix = false`), sin referenciar el recurso del +otro rol. + +## Cambios por archivo + +- **`data.tf`**: agregar `data.aws_caller_identity.current`. +- **`main.tf`**: + - `locals`: agregar `permissions_role_name` + (default `nullplatform-{cluster}-agent-permissions-role`), + `permissions_role_arn` y `agent_role_arn` (computados desde caller identity). + - Módulo `nullplatform_agent_role` (Rol A): el mapa `policies` pasa a ser solo + `nullplatform_assume_role_policy` + `var.additional_policies`. Se quitan + route53/eks/elb/avp. + - `aws_iam_policy.nullplatform_assume_role_policy`: deja de ser condicional + (siempre se crea). `Resource = concat([local.permissions_role_arn], var.assume_role_arns)`. + Se elimina el `count`. Se agrega `moved` block para migrar + `nullplatform_assume_role_policy[0]` → `nullplatform_assume_role_policy`. + - Nuevo `aws_iam_role.nullplatform_agent_permissions` (Rol B), con + `assume_role_policy` cuyo principal AWS = `local.agent_role_arn`. + - Nuevos 4 `aws_iam_role_policy_attachment` que pegan route53/eks/elb/avp al + Rol B. Los `aws_iam_policy` de esas 4 se mantienen iguales (mismo contenido y + nombres). +- **`variables.tf`**: agregar `permissions_role_name` (override, default `""`). +- **`outputs.tf`**: agregar `nullplatform_agent_permissions_role_arn`. +- **`tests/agent.tftest.hcl`**: la `assume_role_policy` ahora siempre existe (ya + no es `[0]`); agregar el nuevo Rol B y attachments en los mocks; actualizar + `assume_role_policy_not_created_by_default` (ahora sí se crea por defecto, + apuntando al rol de permisos). +- **`README.md`**: regenerar descripción/arquitectura/features/inputs/outputs + (bloques `BEGIN_TF_DOCS` y `BEGIN_AI_METADATA`). + +## Extensión: múltiples roles de permisos + +Además del rol de permisos default (fijo, con las 4 políticas), el módulo permite +crear N roles de permisos adicionales vía `var.permissions_roles` (mapa +`logical_name => { name?, policy_arns }`), resueltos con `for_each`: + +- Cada entrada crea un `aws_iam_role.extra_permissions[key]` cuyo trust permite + solo al rol agente asumirlo, y le pega los `policy_arns` provistos mediante + `aws_iam_role_policy_attachment.extra_permissions` (clave `"role::arn"`). +- Los nombres y ARNs de los roles extra se computan en `locals` + (`extra_permissions_role_names` / `extra_permissions_role_arns`) desde el nombre + + account id, igual que el rol default, para mantener la política de assume del + agente determinista y sin dependencias circulares. +- La política `sts:AssumeRole` del rol agente concatena: + `[permissions_role_arn] + extra_permissions_role_arns + var.assume_role_arns`. +- Nuevo output `nullplatform_agent_extra_permissions_role_arns` (mapa + `logical_name => arn`). + +Roles que ya existen fuera del módulo siguen cubiertos por `var.assume_role_arns` +(no los crea el módulo, solo se permite asumirlos). + +## Testing + +`tofu test` sobre el módulo con el provider mockeado. Se verifica: +- Nombres de las 4 políticas (sin cambios). +- JSON válido de todas las políticas. +- El rol de permisos existe y tiene los 4 attachments. +- La assume policy del rol agente referencia el ARN del rol de permisos por + defecto, y suma `assume_role_arns` cuando se proveen. +- El trust del rol de permisos referencia el ARN del rol agente. diff --git a/infrastructure/aws/iam/agent/README.md b/infrastructure/aws/iam/agent/README.md index 5b3ca8ef..965e61c8 100644 --- a/infrastructure/aws/iam/agent/README.md +++ b/infrastructure/aws/iam/agent/README.md @@ -2,21 +2,21 @@ ## Description -Creates an IRSA-enabled IAM role with scoped policies for the nullplatform agent Kubernetes service account on EKS +Creates an IRSA-enabled IAM agent role for the nullplatform Kubernetes service account on EKS, using privilege separation: the agent role only carries an sts:AssumeRole policy and assumes a separate permissions role (provisioned outside this module) that holds the scoped workload policies ## Architecture -The module uses the terraform-aws-modules/iam//modules/iam-role-for-service-accounts submodule to create an aws_iam_role with an OIDC trust policy bound to a specific Kubernetes namespace and service account. Four aws_iam_policy resources are created for Route53, ELB, EKS, and Amazon Verified Permissions, and conditionally a fifth for sts:AssumeRole when assume_role_arns is non-empty. All policies are attached to the IAM role via the submodule's policies map, and the resulting role ARN is exposed as an output. +The module uses the terraform-aws-modules/iam//modules/iam-role-for-service-accounts submodule to create an aws_iam_role (the agent role) with an OIDC trust policy bound to a specific Kubernetes namespace and service account. The agent role only carries an sts:AssumeRole policy that allows it to assume a permissions role (and any additional assume_role_arns). + +The default permissions role and its workload policies (Route53, ELB, EKS, AVP) are **no longer created by this module**: they are provisioned per-cluster by the k8s scope's OpenTofu module (`k8s/scope/tofu/iam/modules` in the scopes repo). This module still authorizes assuming that role by its conventional ARN (`nullplatform-{cluster_name}-agent-permissions-role`), derived from the role name and the caller account id, and exposes that ARN as an output. The scope module must create the permissions role with that same conventional name so the wiring matches. ## Features -- Creates an IRSA IAM role scoped to a specific Kubernetes namespace and service account via OIDC provider trust -- Attaches a Route53 policy granting DNS record management permissions for hosted zones -- Attaches an ELB policy granting describe permissions for load balancers and target groups -- Attaches an EKS policy granting read access to clusters, node groups, and addons -- Attaches an Amazon Verified Permissions (AVP) policy granting full verifiedpermissions access -- Conditionally creates and attaches an sts:AssumeRole policy when assume_role_arns is provided -- Supports attaching additional custom IAM policies via the additional_policies map +- Creates an IRSA IAM agent role scoped to a specific Kubernetes namespace and service account via OIDC provider trust +- Keeps the agent role minimal: it only carries an sts:AssumeRole policy targeting the (externally-created) permissions role and any additional assume_role_arns +- Authorizes assuming the conventional permissions role ARN even though the role itself is created elsewhere (k8s scope tofu module) +- Supports attaching additional custom IAM policies to the agent role via the additional_policies map +- Supports creating additional permissions roles via the permissions_roles map, each trusting the agent role and assumable by it ## Basic Usage @@ -30,6 +30,39 @@ module "agent" { } ``` +## Multiple permissions roles + +The agent is always allowed to assume the default permissions role by its +conventional ARN (`nullplatform-{cluster_name}-agent-permissions-role`), which is +created externally by the k8s scope tofu module. To have the agent assume +additional, module-created roles with their own policies, use the +`permissions_roles` map. Each entry creates a role that trusts the agent role and +gets the given policy ARNs attached; the agent's assume policy is extended with +all of them. + +```hcl +module "agent" { + source = "git::https://github.com/nullplatform/tofu-modules.git//infrastructure/aws/iam/agent?ref=v4.5.0" + + agent_namespace = "your-agent-namespace" + aws_iam_openid_connect_provider_arn = "your-aws-iam-openid-connect-provider-arn" + cluster_name = "your-cluster-name" + + permissions_roles = { + data = { + policy_arns = ["arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"] + } + ops = { + name = "custom-ops-role" + policy_arns = ["arn:aws:iam::123456789012:policy/ops-policy"] + } + } +} +``` + +For roles that already exist elsewhere (not created by this module), use +`assume_role_arns` instead — the agent will be allowed to assume them directly. + ## Using Outputs ```hcl @@ -59,10 +92,9 @@ resource "example_resource" "this" { | Name | Type | |------|------| | [aws_iam_policy.nullplatform_assume_role_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | -| [aws_iam_policy.nullplatform_avp_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | -| [aws_iam_policy.nullplatform_eks_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | -| [aws_iam_policy.nullplatform_elb_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | -| [aws_iam_policy.nullplatform_route53_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | +| [aws_iam_role.extra_permissions](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | +| [aws_iam_role_policy_attachment.extra_permissions](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | +| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source | ## Inputs @@ -73,6 +105,8 @@ resource "example_resource" "this" { | [assume\_role\_arns](#input\_assume\_role\_arns) | List of IAM role ARNs the agent is allowed to assume via sts:AssumeRole | `list(string)` | `[]` | no | | [aws\_iam\_openid\_connect\_provider\_arn](#input\_aws\_iam\_openid\_connect\_provider\_arn) | ARN of the AWS IAM OIDC provider for EKS service account authentication | `string` | n/a | yes | | [cluster\_name](#input\_cluster\_name) | Name of the cluster where the policy runs | `string` | n/a | yes | +| [permissions\_role\_name](#input\_permissions\_role\_name) | Override for the permissions IAM role name. Defaults to nullplatform-{cluster\_name}-agent-permissions-role | `string` | `""` | no | +| [permissions\_roles](#input\_permissions\_roles) | Additional permissions roles created by this module and assumable by the agent role. Map key is a logical name; name overrides the role name (defaults to nullplatform-{cluster\_name}-{key}); policy\_arns are the policy ARNs attached to the role. |
map(object({
name = optional(string)
policy_arns = optional(list(string), [])
})) | `{}` | no |
| [policies\_name\_prefix](#input\_policies\_name\_prefix) | Override for IAM policy name prefix. Defaults to nullplatform\_{cluster\_name} | `string` | `""` | no |
| [role\_name](#input\_role\_name) | Override for the IAM role name. Defaults to nullplatform-{cluster\_name}-agent-role | `string` | `""` | no |
| [service\_account\_name](#input\_service\_account\_name) | Kubernetes service account name trusted by the IRSA role | `string` | `"nullplatform-agent"` | no |
@@ -81,22 +115,26 @@ resource "example_resource" "this" {
| Name | Description |
|------|-------------|
+| [nullplatform\_agent\_extra\_permissions\_role\_arns](#output\_nullplatform\_agent\_extra\_permissions\_role\_arns) | Map of logical name to ARN for each additional permissions role created via permissions\_roles |
+| [nullplatform\_agent\_permissions\_role\_arn](#output\_nullplatform\_agent\_permissions\_role\_arn) | Conventional ARN of the permissions role the agent role is allowed to assume. The role itself is created externally (k8s scope tofu module), not by this module. |
| [nullplatform\_agent\_role\_arn](#output\_nullplatform\_agent\_role\_arn) | ARN of the agent role |