Home » Tutorials » PySpark Tutorial » PySpark – to_timestamp()

PySpark – to_timestamp()

Use to_timestamp() function to convert String to Timestamp (TimestampType) in PySpark. The converted time would be in a default format of MM-dd-yyyy HH:mm:ss.SSS, I will explain how to use this function with a few examples.

Syntax – to_timestamp()

Syntax: to_timestamp(timestampString:Column) 
Syntax: to_timestamp(timestampString:Column,format:String) 

This function has above two signatures that defined in PySpark SQL Date & Timestamp Functions, the first syntax takes just one argument and the argument should be in Timestamp format ‘MM-dd-yyyy HH:mm:ss.SSS‘, when the format is not in this format, it returns null.

The second signature takes an additional String argument to specify the format of the input Timestamp; this support formats specified in SimeDateFormat. Using this additional argument, you can cast String from any format to Timestamp type in PySpark.

Convert String to PySpark Timestamp Type

In the below example, we convert the string pattern which is in PySpark default format to Timestamp type, since the input DataFrame column is in default Timestamp format, we use the first signature for conversion. And the second example uses the cast function to do the same.

from pyspark.sql.functions import *

df=spark.createDataFrame(
        data = [ ("1","2019-06-24 12:01:19.000")],
        schema=["id","input_timestamp"])
df.printSchema()

#Timestamp String to DateType
df.withColumn("timestamp",to_timestamp("input_timestamp")) \
  .show(truncate=False)

# Using Cast to convert TimestampType to DateType
df.withColumn('timestamp_string', \
         to_timestamp('timestamp').cast('string')) \
  .show(truncate=False)

In this snippet, we just add a new column timestamp by converting the input column from string to Timestamp type.

#Output
root
 |-- id: string (nullable = true)
 |-- timestamp: string (nullable = true)

+---+-----------------------+-------------------+
|id |input_timestamp        |timestamp          |
+---+-----------------------+-------------------+
|1  |2019-06-24 12:01:19.000|2019-06-24 12:01:19|
+---+-----------------------+-------------------+

Custom String Format to Timestamp Type

This example converts input timestamp string from custom format to PySpark Timestamp type, to do this, we use the second syntax where it takes an additional argument to specify user-defined patterns for date-time formatting,

#When dates are not in Spark TimestampType format 'yyyy-MM-dd  HH:mm:ss.SSS'.
#Note that when dates are not in Spark Tiemstamp format, all Spark functions returns null
#Hence, first convert the input dates to Spark DateType using to_timestamp function
df.select(to_timestamp(lit('06-24-2019 12:01:19.000'),'MM-dd-yyyy HH:mm:ss.SSSS')) \
  .show()
#Output
+-------------------------------------------------------------------+
|to_timestamp('06-24-2019 12:01:19.000', 'MM-dd-yyyy HH:mm:ss.SSSS')|
+-------------------------------------------------------------------+
|                                                2019-06-24 12:01:19|
+-------------------------------------------------------------------+

In case if you want to convert string to date format use to_date() the function. And here is another example to convert Timestamp to custom string pattern format.

Related:  PySpark - Count Distinct

SQL Example

#SQL string to TimestampType
spark.sql("select to_timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL CAST timestamp string to TimestampType
spark.sql("select timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL Custom string to TimestampType
spark.sql("select to_timestamp('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as timestamp")

Complete Example

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
          .appName('mytechmint') \
          .getOrCreate()

from pyspark.sql.functions import *

df=spark.createDataFrame(
        data = [ ("1","2019-06-24 12:01:19.000")],
        schema=["id","input_timestamp"])
df.printSchema()

#Timestamp String to DateType
df.withColumn("timestamp",to_timestamp("input_timestamp")) \
  .show(truncate=False)
  
# Using Cast to convert TimestampType to DateType
df.withColumn('timestamp', \
         to_timestamp('input_timestamp').cast('string')) \
  .show(truncate=False)
  

df.select(to_timestamp(lit('06-24-2019 12:01:19.000'),'MM-dd-yyyy HH:mm:ss.SSSS')) \
  .show(truncate=False)
  
#SQL string to TimestampType
spark.sql("select to_timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL CAST timestamp string to TimestampType
spark.sql("select timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL Custom string to TimestampType
spark.sql("select to_timestamp('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as timestamp")

Leave a Comment