Using PySpark SQL functions datediff()
, months_between()
you can calculate the difference between two dates in days, months, and years, let’s see this by using a DataFrame example. You can also use these to calculate age.
datediff() Function
First Let’s see getting the difference between two dates using datediff()
PySpark function.
months_between() Function
Now, Let’s see how to get month and year differences between two dates using months_between()
function.
Yields below output. Note that here we use round() function and lit() functions on top of months_between() to get the year between two dates.
Let’s see another example of the difference between two dates when dates are not in PySpark DateType format yyyy-MM-dd
. when dates are not in DateType format, all date functions return null. Hence, you need to first convert the input date to Spark DateType using to_date()
the function.
SQL Example
Let’s see how to calculate the difference between two dates in years using the PySpark SQL example. Similarly, you can calculate the days and months between two dates.
Complete Example
Conclusion
In this tutorial, we have learned how to calculate days, months, and years between two dates using PySpark Date and time functions datediff(), months_between().