This document describes how the Salesforce PDF Generation system implements idempotency to prevent duplicate document generation for identical requests.
Idempotency ensures that making the same request multiple times produces the same result without unwanted side effects (e.g., generating the same document twice). This is critical for:
- User error prevention: Protect against double-clicks or accidental retries
- Network reliability: Handle transient failures and retries safely
- Cost optimization: Avoid wasteful regeneration of identical documents
- Audit consistency: Maintain a single source of truth for each unique request
Apex is responsible for computing a deterministic hash that uniquely identifies a request. This hash is used as an External ID on the Generated_Document__c object.
RequestHash = sha256(templateId | outputFormat | sha256(canonical_json(data)))
Where:
templateId: Salesforce ContentVersionId of the template (18 chars)outputFormat:PDForDOCXdata: The complete data envelope (Account, Opportunity, LineItems, etc.)
See DocgenEnvelopeService.computeHash():
public static String computeHash(String templateId, String outputFormat, String dataJson) {
// Create deterministic input string
String input = templateId + '|' + outputFormat + '|' + dataJson;
// Generate SHA-256 hash
Blob hash = Crypto.generateDigest('SHA-256', Blob.valueOf(input));
// Convert to hex and prepend prefix
return 'sha256:' + EncodingUtil.convertToHex(hash);
}Example Output:
sha256:a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6a7b8c9d0e1f2
Before making an HTTP callout to the Node API, the Apex DocgenController checks if a document with the same RequestHash__c has already been successfully generated within the last 24 hours.
- Performance: Limits query scope to recent documents
- Flexibility: Allows re-generation if business data changes over time
- Balance: Prevents immediate duplicates while not caching forever
See DocgenController.checkExistingDocument():
private static Generated_Document__c checkExistingDocument(String requestHash) {
List<Generated_Document__c> existing = [
SELECT Id, OutputFileId__c, Status__c, CreatedDate, CorrelationId__c
FROM Generated_Document__c
WHERE RequestHash__c = :requestHash
AND Status__c = 'SUCCEEDED'
AND CreatedDate = LAST_N_DAYS:1 // 24-hour cache window
ORDER BY CreatedDate DESC
LIMIT 1
];
return existing.isEmpty() ? null : existing[0];
}- Cache Hit: Return existing
ContentVersionIddownload URL immediately (no callout, no DML) - Cache Miss: Proceed with document generation
Node does NOT implement its own idempotency check.
Per the T-12 design decisions:
- Apex handles idempotency via pre-callout check
- Node relies on Apex to prevent duplicate requests
- This simplifies Node implementation and avoids redundant queries
| Consideration | Reasoning |
|---|---|
| Separation of concerns | Apex owns the request orchestration and caching strategy |
| Performance | Avoids an extra Salesforce query from Node |
| Consistency | Single source of truth (Apex) for cache policy |
| Simplicity | Node focuses on generation, not request deduplication |
For batch generation (T-14), the Node poller will process Generated_Document__c records directly. Idempotency will still be enforced via:
- Unique External ID constraint on
RequestHash__c(prevents duplicate inserts) - Status-based locking (LockedUntil__c field prevents concurrent processing)
| Field | Type | Purpose |
|---|---|---|
RequestHash__c |
Text(80), External ID, Unique | Idempotency key |
Status__c |
Picklist | QUEUED, PROCESSING, SUCCEEDED, FAILED, CANCELED |
OutputFileId__c |
Text(18) | ContentVersionId of generated PDF |
MergedDocxFileId__c |
Text(18) | ContentVersionId of merged DOCX (if stored) |
RequestJSON__c |
Long Text Area | Full request envelope (for audit/replay) |
Attempts__c |
Number | Retry counter (for batch flows) |
LockedUntil__c |
Datetime | Pessimistic lock for poller (prevents double-processing) |
CreatedDate |
Datetime | Timestamp for cache window queries |
CorrelationId__c |
Text(36) | UUIDv4 for tracing |
sequenceDiagram
autonumber
participant U as User (Browser)
participant L as LWC Button
participant A as Apex Controller
participant N as Node API
participant SF as Salesforce (Files)
U->>L: Click "Generate PDF"
L->>A: @AuraEnabled generate(templateId, recordId, 'PDF')
A->>A: Build envelope + compute RequestHash
A->>A: Query for existing doc (RequestHash, Status=SUCCEEDED, last 24h)
alt Cache Hit (Idempotent Request)
A-->>L: Return existing ContentVersionId download URL
Note over A,L: NO HTTP callout, NO DML, instant response
else Cache Miss (New Request)
A->>A: Create Generated_Document__c (Status=PROCESSING)
A->>A: Set envelope.generatedDocumentId
A->>N: POST /generate with envelope (includes generatedDocumentId)
N->>SF: Download template
N->>N: Merge DOCX + Convert to PDF
N->>SF: Upload PDF (ContentVersion)
N->>SF: Create ContentDocumentLinks
N->>SF: Update Generated_Document__c (Status=SUCCEEDED, OutputFileId__c)
N-->>A: 200 {downloadUrl, contentVersionId, correlationId}
A-->>L: Return downloadUrl
L->>U: Open PDF in new tab
end
The hash must be deterministic across runs:
- JSON serialization must be canonical (consistent key ordering)
- Apex
JSON.serialize()provides this guarantee - Avoid including volatile fields (timestamps, user IDs) in the data envelope
The RequestHash__c field has:
- Unique constraint (enforced by Salesforce)
- External ID flag (indexed for fast lookups)
This prevents accidental duplicate inserts even if the Apex check is bypassed.
Scenario: Two users click "Generate" simultaneously for the same data.
Protection:
- Apex idempotency check runs in parallel for both requests
- Both might miss the cache (no existing document yet)
- Both create
Generated_Document__crecords - Salesforce enforces unique constraint on
RequestHash__c - One insert succeeds, the other fails with
DUPLICATE_VALUEerror - The failed request can retry and hit the cache
Result: Only one document is generated; both users eventually get the same file.
AND CreatedDate = LAST_N_DAYS:1Pros:
- Simple to implement and understand
- Prevents immediate duplicates (most common case)
- Allows re-generation if data changes overnight
Cons:
- Documents are regenerated even if data hasn't changed after 24 hours
- No awareness of actual data changes
Instead of time-based expiry, detect if the source data has changed:
// Pseudo-code for future enhancement
if (existing document found) {
if (currentDataHash == existing.DataHash__c) {
// Data unchanged - reuse document
return existingDoc;
} else {
// Data changed - regenerate
proceed with generation;
}
}This would require adding a DataHash__c field to track the content hash separately from the full request hash.
See DocgenControllerTest.testIdempotency():
@isTest
static void testIdempotency() {
// Create template and account
Docgen_Template__c template = createTestTemplate();
Account acc = createTestAccount();
// First request - generates document
String firstUrl = DocgenController.generate(template.Id, acc.Id, 'PDF');
// Verify Generated_Document__c was created
List<Generated_Document__c> docs = [SELECT Id, RequestHash__c, Status__c FROM Generated_Document__c];
System.assertEquals(1, docs.size());
// Second request with SAME data - should hit cache
String secondUrl = DocgenController.generate(template.Id, acc.Id, 'PDF');
// Verify NO new document was created
docs = [SELECT Id FROM Generated_Document__c];
System.assertEquals(1, docs.size(), 'Idempotency check should prevent duplicate');
// Verify same download URL returned
System.assertEquals(firstUrl, secondUrl);
}See test/sf.files.test.ts:
it('should upload PDF, create links, and update Generated_Document__c', async () => {
const mockRequest: DocgenRequest = {
// ... full request payload
generatedDocumentId: 'a00xx000000gdocXXX', // Passed from Apex
};
// Mock Salesforce API calls
nock('https://test.salesforce.com')
.patch(`/services/data/v59.0/sobjects/Generated_Document__c/${mockRequest.generatedDocumentId}`)
.reply(204);
// Call upload function
const result = await uploadAndLinkFiles(pdfBuffer, null, mockRequest, api);
// Verify update was called with SUCCESS status
expect(result.pdfContentVersionId).toBeTruthy();
});| Metric | Description | Alert Threshold |
|---|---|---|
idempotency_cache_hit_rate |
% of requests served from cache | < 30% (investigate if low) |
duplicate_hash_errors |
Number of unique constraint violations | > 10/hour (race conditions) |
cache_hit_latency_ms |
Time to serve cached response | > 100ms (query performance issue) |
All idempotency events are logged with correlation IDs:
System.debug('Idempotency cache hit: RequestHash=' + requestHash + ', CorrelationId=' + correlationId);logger.info(
{ requestHash, existingDocId, correlationId },
'Idempotency check: cache hit'
);Diagnosis: Check if the request is within the 24-hour cache window.
Solution:
- Manually delete the
Generated_Document__crecord to force regeneration - Wait 24 hours for cache to expire naturally
- (Future) Implement data change detection
Diagnosis: Race condition where two simultaneous requests create documents with the same RequestHash__c.
Expected Behavior: This is normal and self-correcting. One request succeeds, the other retries and hits the cache.
Action: Monitor frequency. If > 10/hour, investigate:
- Is the cache check working? (Query might be missing index)
- Are requests truly simultaneous, or is there a bug causing rapid retries?
Diagnosis: Users are requesting unique documents (different data every time).
Expected Behavior: This is normal if data changes frequently.
Action:
- Review cache window (24 hours might be too short)
- Check if volatile fields (e.g., timestamps) are accidentally included in data envelope
SHA-256 provides 2^256 possible hashes, making collisions computationally infeasible. Even with billions of documents, the probability of a collision is negligible.
The RequestHash__c field is a hash, not the raw data. It cannot be reverse-engineered to reveal the source data.
However, RequestJSON__c stores the full envelope (including all data). Ensure:
- Field-level security restricts access to admins only
- (Optional) Enable Salesforce Shield Platform Encryption for
RequestJSON__c
The RequestJSON__c field may contain personal data (names, emails, addresses).
Compliance:
- Retention policy: Automatically delete
Generated_Document__crecords older than required retention period (e.g., 7 years for financial documents) - Right to erasure: Provide mechanism to purge documents on user request
- Encryption: Use Salesforce Shield for encryption-at-rest
| Layer | Responsibility | Implementation |
|---|---|---|
| Apex | Compute RequestHash | DocgenEnvelopeService.computeHash() |
| Apex | Pre-callout cache check | DocgenController.checkExistingDocument() |
| Salesforce DB | Enforce uniqueness | RequestHash__c External ID constraint |
| Node | Generate & upload | No idempotency logic (trusts Apex) |
Key Takeaway: Idempotency is Apex-owned, with a 24-hour cache window for interactive flows. Node focuses on generation, not deduplication.
Hash algorithm: sha256(compositeDocId | outputFormat | JSON.serialize(recordIds) | computeDataHash(compositeData)).
Key Differences: Uses compositeDocId (not templateId), includes recordIds map, hashes all namespace data.
Hash Changes When: compositeDocId changes, outputFormat changes, recordIds map changes (values OR keys), data in any namespace changes.
Hash Unchanged When: Template sequence changes, junction IsActive changes, template DOCX content changes.
Cache Performance: Composites have ~40-60% hit rate vs ~70-80% for single templates due to multi-dimensional data volatility.
Best Practices: Use consistent recordIds key names, avoid volatile data (e.g., Datetime.now()), query only fields used in templates.
See: Architecture