Home » Tutorials » PySpark Tutorial » PySpark – date_format()

PySpark – date_format()

In PySpark use date_format() function to convert the DataFrame column from Date to String format. In this tutorial, we will show you a Spark SQL example of how to convert Date to String format using  date_format() function on DataFrame.

date_format() – function formats Date to String format. This function supports all Java Date formats specified in DateTimeFormatter.

Following are Syntax and Examples of date_format() Function

Syntax:  date_format(column,format)
Example: date_format(current_timestamp(),"yyyy MM dd").alias("date_format")

The below code snippet takes the current system date from current_date() and timestamp from the current_timestamp() function and converts it to String format on DataFrame.

from pyspark.sql.functions import *

df=spark.createDataFrame([["1"]],["id"])
df.select(current_date().alias("current_date"), \
      date_format(current_timestamp(),"yyyy MM dd").alias("yyyy MM dd"), \
      date_format(current_timestamp(),"MM/dd/yyyy hh:mm").alias("MM/dd/yyyy"), \
      date_format(current_timestamp(),"yyyy MMM dd").alias("yyyy MMMM dd"), \
      date_format(current_timestamp(),"yyyy MMMM dd E").alias("yyyy MMMM dd E") \
   ).show()

Output:

+------------+----------+----------------+------------+--------------------+
|current_date|yyyy MM dd|      MM/dd/yyyy|yyyy MMMM dd|      yyyy MMMM dd E|
+------------+----------+----------------+------------+--------------------+
|  2021-02-23|2021 02 23|02/23/2021 02:18| 2021 Feb 23|2021 February 23 Tue|
+------------+----------+----------------+------------+--------------------+

Alternatively, you can convert Data to String with SQL by using the same functions.

#SQL
spark.sql("select current_date() as current_date, "+
      "date_format(current_timestamp(),'yyyy MM dd') as yyyy_MM_dd, "+
      "date_format(current_timestamp(),'MM/dd/yyyy hh:mm') as MM_dd_yyyy, "+
      "date_format(current_timestamp(),'yyyy MMM dd') as yyyy_MMMM_dd, "+
      "date_format(current_timestamp(),'yyyy MMMM dd E') as yyyy_MMMM_dd_E").show()

Complete Example

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
               .appName('mytechmint') \
               .getOrCreate()

from pyspark.sql.functions import *

df=spark.createDataFrame([["1"]],["id"])
df.select(current_date().alias("current_date"), \
      date_format(current_date(),"yyyy MM dd").alias("yyyy MM dd"), \
      date_format(current_timestamp(),"MM/dd/yyyy hh:mm").alias("MM/dd/yyyy"), \
      date_format(current_timestamp(),"yyyy MMM dd").alias("yyyy MMMM dd"), \
      date_format(current_timestamp(),"yyyy MMMM dd E").alias("yyyy MMMM dd E") \
   ).show()

#SQL

spark.sql("select current_date() as current_date, "+
      "date_format(current_timestamp(),'yyyy MM dd') as yyyy_MM_dd, "+
      "date_format(current_timestamp(),'MM/dd/yyyy hh:mm') as MM_dd_yyyy, "+
      "date_format(current_timestamp(),'yyyy MMM dd') as yyyy_MMMM_dd, "+
      "date_format(current_timestamp(),'yyyy MMMM dd E') as yyyy_MMMM_dd_E").show()

Conclusion

In this article, we have learned how to convert Date to String format using the Date function date_format().

Leave a Comment