Skip to content

Wrong input column for exploded blocking columns when expand_length not set #145

Description

@riley-harper

When working on #142, I noticed that in hlink/linking/matching/link_step_explode.py, if expand_length is not set for a blocking column, we run the following code:

explode_col_expr = explode(col(exploding_column_name))

However, the rest of the code treats exploding_column_name as the output column name and derived_from_column as the input column name. So I think there is a bug here. This should be

explode_col_expr = explode(col(derived_from_column))

instead unless I am misunderstanding something. This is probably a low-impact bug as you need to be blocking on an input column that is an array type to hit it. I believe that most exploded columns are integer columns with expand_length set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Fields

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions