Reissued certificates occasionally have authority_id=None

## Symptom

Some certificates created via the reissue path end up persisted with `authority_id=None` despite the input schema requiring authority. Once a cert is in this state, all subsequent reissue attempts fail because the rotation pipeline can't determine which CA to talk to.

The same pattern affects certs reissued in distinct date clusters rather than continuously, suggesting transient conditions during particular celery cycles. Predecessors in the rotation chain have the correct `authority_id` set; the broken cert is generated at a specific reissue, then the next-generation reissue inherits the broken state.

## Why I think the bug is real

In commit `e874f83c` ("various fixups to disable autorotate task", 2025-05-22), the codebase added defensive None-handling for `cert.authority_id` in logging:

```python
log_data["authority_name"] = "unknown"
authority = authorities_get_by_id(cert.authority_id)
if authority:
    log_data["authority_name"] = authority.name
```

That patch makes the logger not crash when a cert has authority_id=None, but doesn't address why a cert can reach that state in the first place. The change pretty strongly implies the maintainers have seen these certs in their own deployment.

## Expected vs actual

`CertificateInputSchema.authority` is `fields.Nested(AssociatedAuthoritySchema, required=True)` (`lemur/certificates/schemas.py:105`), and `validate_authority` at `lemur/certificates/schemas.py:154-163` raises `ValidationError` if authority is missing or inactive. So a cert created through `service.create()` with `validate_schema(certificate_input_schema, ...)` should never lack an authority.

The `reissue_certificate` path goes through `get_certificate_primitives()` (`lemur/certificates/service.py:942`), which round-trips `CertificateOutputSchema` → `CertificateInputSchema` and asserts `not ser.errors`. So the reissue path should also enforce authority.

Both paths set `cert.authority = kwargs["authority"]` on line 554 after the constructor, which sets `self.authority_id = kwargs.get("authority_id")` (None, since the schema produces `authority` as an Authority object, not `authority_id` as an integer key). The relationship assignment should populate the FK on flush.

Despite this, the bug fires occasionally.

## Hypotheses (none confirmed)

1. A SQLAlchemy session/flush issue: if `kwargs["authority"]` is fetched in one session and `database.commit()` happens in another (e.g. celery worker race), the relationship may not flush the FK column.
2. An older code path or migration that bypassed the schema and is still in the lineage of some certs.
3. A specific race between `cert.authority = ...` and the commit that occasionally drops the relationship.

## Reproduction

I haven't been able to reproduce this in a controlled setup — the symptom appears in production every ~6 months, often across multiple certs reissued in the same celery cycle.

## Asks

1. Has the team seen this internally? The `e874f83c` log-defensive change implies yes.
2. Any insight into why the reissue path can produce certs with `authority_id=None`?
3. Would a more defensive setter on `Certificate.authority` (e.g. asserting authority_id != None at end of `__init__` or after `cert.authority = X`) be welcome as a guard?

Happy to send a PR if there's a direction you'd prefer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reissued certificates occasionally have authority_id=None #5447

Symptom

Why I think the bug is real

Expected vs actual

Hypotheses (none confirmed)

Reproduction

Asks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reissued certificates occasionally have authority_id=None #5447

Description

Symptom

Why I think the bug is real

Expected vs actual

Hypotheses (none confirmed)

Reproduction

Asks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions