[SPARK-52382][PYTHON] Fix TypeError when collecting MapType with ArrayType keys by yadavay-amzn · Pull Request #55413 · apache/spark

yadavay-amzn · 2026-04-19T07:08:33Z

What changes were proposed in this pull request?

Fix TypeError: unhashable type: 'list' when calling collect() on a DataFrame with MapType(ArrayType(...), ...) columns.

JVM side (EvaluatePython.scala): Added makeHashable() that recursively converts java.util.ArrayList to Array for map keys before pickling. Pyrolite pickles Java arrays as Python tuples (hashable) instead of lists (unhashable).

Python side (types.py, conversion.py): Added _make_hashable() in MapType converters to convert list keys to tuples, covering both the classic pickle path and the Arrow/Spark Connect path.

Why are the changes needed?

MapType with ArrayType keys is valid per the PySpark documentation, but collect() fails because:

JVM converts array keys to java.util.ArrayList
Pyrolite pickles these as Python lists
Python dicts require hashable keys, and lists are not hashable

Does this PR introduce any user-facing change?

Yes. DataFrame.collect() now works correctly for MapType columns with ArrayType keys. Array keys are returned as tuples instead of raising TypeError.

How was this patch tested?

Added test_collect_map_with_array_key in test_collection.py that creates a DataFrame with MapType(ArrayType(StringType()), StringType()), calls collect(), and verifies the result.

Was this patch authored or co-authored using generative AI tooling?

Yes.

…yType keys When a DataFrame has a MapType column with ArrayType keys (e.g., MapType(ArrayType(StringType()), StringType())), calling collect() raises TypeError: unhashable type: 'list'. Root cause: The JVM-side EvaluatePython.toJava converts ArrayType data to java.util.ArrayList, which Pyrolite pickles as Python lists. Since Python lists are unhashable, they cannot be used as dict keys when the map is deserialized on the Python side. Fix: - JVM side (EvaluatePython.scala): Add makeHashable() that recursively converts ArrayList to Array for map keys, so Pyrolite pickles them as Python tuples (which are hashable). - Python side (types.py, conversion.py): Add _make_hashable() in the MapType converters to convert any list keys to tuples, handling both the classic pickle path and the Arrow/Spark Connect path. Closes #XXXXX

yadavay-amzn force-pushed the fix/SPARK-52382-map-array-key branch from b8681aa to 67778a1 Compare April 21, 2026 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-52382][PYTHON] Fix TypeError when collecting MapType with ArrayType keys#55413

[SPARK-52382][PYTHON] Fix TypeError when collecting MapType with ArrayType keys#55413
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-52382-map-array-key

yadavay-amzn commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yadavay-amzn commented Apr 19, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant