PySpark JSON functions are used to query or extract the elements from JSON string of DataFrame column by path, convert it to struct, map type e.t.c, In this article, I will explain the most used JSON SQL functions with Python examples.
1. PySpark JSON Functions
from_json() – Converts JSON string into Struct type or Map type.
to_json() – Converts MapType or Struct type to JSON string.
json_tuple() – Extract the Data from JSON and create them as a new column.
get_json_object() – Extracts JSON element from a JSON string based on JSON path specified.
schema_of_json() – Create schema string from JSON string
1.1. Create DataFrame with Column Containing JSON String
In order to explain these JSON functions first, let’s create DataFrame with a column containing JSON string.
2. PySpark JSON Functions Examples
2.1. from_json()
PySpark from_json()
the function is used to convert JSON string into Struct type or Map type. The below example converts JSON string to Map key-value pair. I will leave it to you to convert to a struct type.
2.2. to_json()
to_json()
function is used to convert DataFrame columns MapType or Struct type to JSON string. Here, I am using df2 that created from above from_json()
example.
2.3. json_tuple()
Function json_tuple()
is used the query or extract the elements from JSON column and create the result as a new columns.
2.4. get_json_object()
get_json_object()
is used to extract the JSON string based on path from the JSON column.
2.5. schema_of_json()
Use schema_of_json() to create schema string from JSON string column.