Skip to content

[SPARK-52382][PYTHON] Fix TypeError when collecting MapType with ArrayType keys#55413

Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-52382-map-array-key
Open

[SPARK-52382][PYTHON] Fix TypeError when collecting MapType with ArrayType keys#55413
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-52382-map-array-key

Conversation

@yadavay-amzn
Copy link
Copy Markdown

What changes were proposed in this pull request?

Fix TypeError: unhashable type: 'list' when calling collect() on a DataFrame with MapType(ArrayType(...), ...) columns.

JVM side (EvaluatePython.scala): Added makeHashable() that recursively converts java.util.ArrayList to Array for map keys before pickling. Pyrolite pickles Java arrays as Python tuples (hashable) instead of lists (unhashable).

Python side (types.py, conversion.py): Added _make_hashable() in MapType converters to convert list keys to tuples, covering both the classic pickle path and the Arrow/Spark Connect path.

Why are the changes needed?

MapType with ArrayType keys is valid per the PySpark documentation, but collect() fails because:

  1. JVM converts array keys to java.util.ArrayList
  2. Pyrolite pickles these as Python lists
  3. Python dicts require hashable keys, and lists are not hashable

Does this PR introduce any user-facing change?

Yes. DataFrame.collect() now works correctly for MapType columns with ArrayType keys. Array keys are returned as tuples instead of raising TypeError.

How was this patch tested?

Added test_collect_map_with_array_key in test_collection.py that creates a DataFrame with MapType(ArrayType(StringType()), StringType()), calls collect(), and verifies the result.

Was this patch authored or co-authored using generative AI tooling?

Yes.

…yType keys

When a DataFrame has a MapType column with ArrayType keys (e.g.,
MapType(ArrayType(StringType()), StringType())), calling collect()
raises TypeError: unhashable type: 'list'.

Root cause: The JVM-side EvaluatePython.toJava converts ArrayType data
to java.util.ArrayList, which Pyrolite pickles as Python lists. Since
Python lists are unhashable, they cannot be used as dict keys when the
map is deserialized on the Python side.

Fix:
- JVM side (EvaluatePython.scala): Add makeHashable() that recursively
  converts ArrayList to Array for map keys, so Pyrolite pickles them as
  Python tuples (which are hashable).
- Python side (types.py, conversion.py): Add _make_hashable() in the
  MapType converters to convert any list keys to tuples, handling both
  the classic pickle path and the Arrow/Spark Connect path.

Closes #XXXXX
@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-52382-map-array-key branch from b8681aa to 67778a1 Compare April 21, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant