Skip to content

Fix: Add stevebauman/purify for HTML sanitization#51

Merged
ardelato merged 12 commits into
hermesfrom
fix/xss-remediation
Apr 27, 2026
Merged

Fix: Add stevebauman/purify for HTML sanitization#51
ardelato merged 12 commits into
hermesfrom
fix/xss-remediation

Conversation

@ardelato
Copy link
Copy Markdown
Collaborator

Description

Addresses multiple stored XSS vulnerabilities where user-controlled content was rendered as raw HTML. A security report flagged group names as a vector, but our investigation found the more critical issue is in rich-text description fields (free_text) and flash messages that interpolate user data into HTML without escaping.

Changes

Fix 1: Sanitize rich-text on storage

  • Adds stevebauman/purify (HTMLPurifier wrapper) to sanitize group and event descriptions before saving to the database
  • Strips dangerous tags (<script>, event handlers like onerror) while preserving safe HTML (<b>, <a>, <p>, etc.)
  • Files: API/GroupController.php, API/EventController.php

Fix 2: Escape user data in flash messages and v-html

  • Wraps user-controlled values ($group->name, $event->venue, $user->name, $network->name) in e() before interpolation into translation strings rendered via {!! !!}
  • HTML-escapes group name in Vue before v-html rendering
  • Files: GroupController.php, AcceptUserInvites.php, networks/show.blade.php, GroupPage.vue

QA Notes

Deploy to the staging instance and validate script tags are stripped.

Adds HTMLPurifier via the stevebauman/purify Laravel wrapper to
sanitize user-supplied HTML content before storage, preventing
stored XSS attacks through rich-text fields.
Group and event descriptions (free_text) were stored as raw user
HTML and rendered unescaped, allowing stored XSS. Now sanitized
with HTMLPurifier before storage, stripping dangerous tags like
script and event handlers while preserving safe HTML.
User-controlled values (group names, event venues, user names,
network names) were interpolated into HTML translation strings
and rendered unescaped via {!! !!} or v-html. Now all user data
is escaped with e() (PHP) or DOM textContent (JS) before
interpolation into HTML contexts.
Publishes the default HTMLPurifier config from stevebauman/purify,
which controls which HTML tags and attributes are allowed through
sanitization.
mlahargou
mlahargou previously approved these changes Apr 22, 2026
Copy link
Copy Markdown
Member

@mlahargou mlahargou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR 📱 Cool!

@ardelato
Copy link
Copy Markdown
Collaborator Author

ardelato commented Apr 22, 2026

dev_block ⚡ it seems this doesn't fully cover all edge cases.

Comment thread config/purify.php
Comment thread config/purify.php
Comment thread resources/js/components/GroupPage.vue Outdated
@sctice-ifixit
Copy link
Copy Markdown
Member

Sorry, I lost a comment: that review comprises lightly-edited Claude review notes. I also think it's worth considering whether we can add some regression tests.

Claude also mentions that Purify can have a 100–300ms cold start to set up its definitions, and that's paid by the first request. I think that's probably fine (I don't know what we'd do about it that's worth the trouble) unless caching doesn't work as we expect.

Adds sanitization mutators at the model layer so all write paths
(controllers, CSV imports, tinker, seeders) are covered. Plain-text
fields like names and venues use strip_tags() to remove HTML entirely.
Rich-text fields like descriptions and biographies use Purify::clean()
to strip dangerous tags while preserving safe formatting HTML.
Model mutators now handle sanitization of free_text fields, so the
explicit Purify::clean() calls in the API controllers are redundant
and can be removed.
Escapes group.name before passing to ConfirmModal's v-html rendered
delete/archive confirmation messages in GroupActions.vue. Also escapes
user name in the soft-delete flash message in UserController to prevent
stored XSS when an admin deletes a user with a malicious name.
Adds client-side HTML sanitization via DOMPurify to the ReadMore
component, which renders group and event descriptions via v-html.
Server-side Purify::clean() model mutators are the primary defense,
but this provides a second layer in case any write path bypasses
the model (raw DB queries, migrations, imports).
Updates HTMLPurifier config to use HTML 5 doctype for consistency
with Html5Definition, adds URI.AllowedSchemes to block javascript:
and data: URLs in sanitized HTML, and extracts the inline DOM
escaping pattern into a named escapeHtml() method for readability.
HTMLPurifier doesn't support HTML 5 as a doctype value — it throws at
runtime. The Html5Definition class from stevebauman/purify adds HTML5
element support on top of the HTML 4.01 base, so HTML 4.01 Transitional
+ Html5Definition is the intended pairing.
Replaces raw <?php echo ?> with Blade {{ }} for user name and
location in the user list template. Adds setLocationAttribute
mutator to User model to strip HTML tags from the location field
on storage, preventing stored XSS via the location/town field.
@ardelato
Copy link
Copy Markdown
Collaborator Author

ardelato commented Apr 23, 2026

un_dev_block ⚡

QA 👍 there were a lot more locations, but things look good now

Comment thread app/Models/Group.php

public function setFreeTextAttribute($value)
{
$this->attributes['free_text'] = $value === null ? null : Purify::clean((string) $value);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to sanitize input text is a mistake I think. This concept has been around for decades, and the solution is almost always: Treat all input as dangerous, and thus escape appropriately it when outputting.

Why are we trying to clean the human input here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a safety measure, we've audited the code and until we address every location, we are taking the extra step to sanitize the input as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason to clean human input going forward (where we want to end up) would be if it's allowed to contain HTML. I think that may be applicable here (edited via rich text editor).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason to clean human input going forward (where we want to end up) would be if it's allowed to contain HTML

That evidence sounds like an argument for not trying to sanitize inputs.

Even if that's the case (we accept html-containing input), I still think it's better to keep the original value and strip / sanitize on output.

Rules about what to strip / sanitize change, which makes it much harder to retroactively apply those changes if we are depending on DB content already being safe.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've convinced me it's better to sanitize on output.


@if(App\Helpers\Fixometer::hasRole($user, 'Administrator'))
<a href="/user/edit/<?php echo $u->id; ?>"><?php echo $u->name; ?></a>
<a href="/user/edit/{{ $u->id }}">{{ $u->name }}</a>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about this templating system, so I don't even know what to search for, but this {{ pattern seems to be the canonical way to safely echo variables.

I see lots of other places in the codebase that use echo and it's not clear why the dangerous way was chosen. Seems like we should fix all of those.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the scope has gotten too big right now. We also discovered other issues but will be addressing them in follow-up pulls.

{!! __('networks.general.count', [
'count' => $network->groups->count(),
'name' => $network->name,
'name' => e($network->name),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we need to escape html here indicates to me that the __() function is designed to produce or allow raw html and doesn't escape its output.

This leaves it dangerous by default and thus this likely will happily output html-like content in translated strings.

Do we control the __() function? If so, we might want to consider making it safe-by-default like our php variant. i.e. you have to explicitly mark a string as html(...).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't control __() its Laravel's built-in translation helper.

@danielbeardsley
Copy link
Copy Markdown
Member

dev_block 👍 on some more conversations.

Copy link
Copy Markdown
Member

@sctice-ifixit sctice-ifixit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from Claude:

The overall shape of the fix is right: model mutators cover every write path (controllers, API, CSV imports, seeders, tinker), e() at the flash-message interpolation sites is the correct Blade pattern, and DOMPurify in ReadMore.vue gives a real second layer for the rich-text sink.

Main gap: two flash-message sites still interpolate user data into translated HTML rendered via {!! !!}.

  • app/Http/Controllers/GroupController.php:378'emails' => rtrim(implode(', ', $not_sent)) passed into __('groups.invite_success_apart_from', [...]), rendered via {!! \Session::get('warning') !!}
  • app/Http/Controllers/PartyController.php:706 — same shape with __('events.invite_invalid_emails', ['emails' => implode(', ', $not_sent)])

Both hold strings that failed email validation, so exploitation is narrow, but this PR is exactly about applying the e() pattern everywhere user data meets {!! !!}. Worth the consistency.

Smaller notes left inline below.

@@ -4,10 +4,10 @@
<p v-html="formattedString"></p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This v-html renders formattedString (derived from this.text) with no sanitization. Every current call site uses :html="..." so the branch is unreachable today, but leaving an unsanitized v-html sink in the component invites a future caller to wire up the text prop and ship a vuln. Either delete the <span v-if="text"> block and the text prop, or route formattedString through DOMPurify for symmetry with the html branch.

Line 87 reinforces that this path isn't exercised — text.length > maxChars is missing this. and would throw if the branch ever ran.

Comment thread config/purify.php
'URI.AllowedSchemes' => [
'http' => true,
'https' => true,
'mailto' => true,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add 'tel' => true here. Mobile click-to-call links are a reasonable thing to allow through sanitized descriptions, and tel: URIs don't carry XSS risk the way javascript: and data: do.

}
},
methods: {
escapeHtml(str) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This escapeHtml method is duplicated verbatim in resources/js/components/GroupPage.vue:163. Worth pulling into a small shared helper (resources/js/helpers.js or a mixin) — the pattern is going to spread as more v-html sites adopt this defense, and having one canonical implementation means the next dev doesn't have to decide whether to copy or refactor.

}
},
methods: {
escapeHtml(str) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicates the escapeHtml in GroupActions.vue:103. See the note there on extracting to a shared helper.

@sctice-ifixit
Copy link
Copy Markdown
Member

I think it would be great to convert uses of <? echo … ?> but that sounds out of scope for this pull. Generally speaking, I think we want a policy something like:

  • Clean (Purify) user text that is allowed to contain HTML on save
  • Avoid encoding as much as possible before the view layer
  • Use {{ … }} to encode user text (that cannot contain HTML) wherever we can in the view layer
  • Use e() in the controller layer when interpolating user text (that cannot contain HTML) into HTML that's then going to be passed as safe HTML into the view layer or output directly (but we'd like to minimize situations like this)

@danielbeardsley
Copy link
Copy Markdown
Member

but that sounds out of scope for this pull.

Correct me if I'm wrong... but isn't that the point of this pull? Isn't this the vulnerability? Maybe you meant "Fixing more instances of this vulnerability than the one that was noticed should happen in a follow up pull in the interest of expediency"?

Clean (Purify) user text that is allowed to contain HTML on save

I still see this as ignoring decades of learning all across the industry. If cleaning is expensive, cache it. Don't trust stored DB data to be safe unless the DB types define it as such (i.e. int columns can't contain html)

@sctice-ifixit
Copy link
Copy Markdown
Member

Maybe you meant "Fixing more instances of this vulnerability than the one that was noticed should happen in a follow up pull in the interest of expediency"?

Yes. We're already fixing up everything we've found by doing an initial audit. We can do successive, more exhaustive audits as follow-ups.

Clean (Purify) user text that is allowed to contain HTML on save

I still see this as ignoring decades of learning all across the industry. If cleaning is expensive, cache it. Don't trust stored DB data to be safe unless the DB types define it as such (i.e. int columns can't contain html)

Fair. This is also something that I think can be improved in follow-up work.

@ardelato
Copy link
Copy Markdown
Collaborator Author

un_dev_block ⚡ I am going to address all other things in a follow-up pull

@mlahargou
Copy link
Copy Markdown
Member

CR 📱

@ardelato ardelato merged commit 64b9602 into hermes Apr 27, 2026
@ardelato ardelato deleted the fix/xss-remediation branch April 27, 2026 17:45
@ardelato ardelato mentioned this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants