Skip to content

Fix tuple syntax when using key sequence in distinct()#804

Open
Kirkman wants to merge 1 commit intowireservice:masterfrom
PostDispatchInteractive:distinct_fix
Open

Fix tuple syntax when using key sequence in distinct()#804
Kirkman wants to merge 1 commit intowireservice:masterfrom
PostDispatchInteractive:distinct_fix

Conversation

@Kirkman
Copy link
Copy Markdown

@Kirkman Kirkman commented Apr 20, 2026

I have noticed in the past that Table.distinct() does not deduplicate when the key is a list or sequence of column names.

Code like this (wrongly) results in no rows being dropped ...

table = table.distinct(['PARCELID', 'SITEADDR'])

... whereas code like this does successfully drop duplicate rows:

table = table.distinct(lambda row: str(row['PARCELID']) + str(row['SITEADDR']))

I believe the cause is line 31 of agate/table/distinct.py, which constructs the row's key for de-duplicating:

k = (row[j] for j in key)

I think the parentheses here create are creating a generator expression, not a tuple. k is always unique, and no de-duplication ever occurs.

My suggested fix is to wrap it with tuple(...):

k = tuple(row[j] for j in key)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant