Home » Tutorials » PySpark Tutorial » PySpark – alias

PySpark – alias

Introduction to PySpark Alias

PySpark Alias is a function in PySpark that is used to make a special signature for a column or table that is more often readable and shorter. We can alias more as a derived name for a Table or column in a PySpark Data frame / Data set. The aliasing gives access to the certain properties of the column/table which is being aliased to in PySpark.

The Alias function can be used in case of certain joins where there be a condition of self-join of dealing with more tables or columns in a Data frame. The Alias gives a new name for a certain column and table and the property can be used out of it.

Let us try to see about PySpark Alias in some more detail.

Syntax of PySpark Alias

The syntax for PySpark Alias function is:

from pyspark.sql.functions import col
b = b.select(col("ID").alias("New_IDd"))
b.show()

B: The PySpark Data Frame to be used.

Alias (“”):- The function used for renaming the column of Data Frame with the new column name.

Select (col(“Column name”)):- The column to be used for aliasing.

Screenshot:

Related:  PySpark RDD Basics Cheat Sheet

pyspark 1

Working of Alias in PySaprk

The PySpark alias function just gives a new name as the reference that can be used further for the data frame in PySpark. The alias can be used to rename a column in PySpark. Once assigning the aliasing the property of the particular table or data is frame is assigned it can be used to access the property of the same. While operating with join the aliasing can be used to join the column based on Table column operation.

The alias function can be used as a substitute for the column or table in PySpark which can be further used to access all its properties. They are just like a Temporary name. This makes the column name easier accessible. When the column name or table name is big enough aliasing can be used for the same. The Alias can be called a correlation name for the table or the column in a PySpark Data Frame.

Let’s check the creation and usage with some coding examples.

Examples of PySpark Alias

Let us see some examples of How the PySpark Alias operation works? Let’s start by creating simple data in PySpark.

data1  = [{'Name':'Jhon','ID':21.528,'Add':'USA'},{'Name':'Joe','ID':3.69,'Add':'USA'},{'Name':'Tina','ID':2.48,'Add':'IND'},{'Name':'Jhon','ID':22.22, 'Add':'USA'},{'Name':'Joe','ID':5.33,'Add':'INA'}]

A sample data is created with Name, ID, and ADD as the field.

a = sc.parallelize(data1)

RDD is created using sc. parallelize.

b = spark.createDataFrame(a)
b.show()

Created Data Frame using Spark.createDataFrame.

Related:  PySpark - DateTime Functions

Screenshot:

pyspark alias 2

Let’s the ALIAS Function to cover it over the data frame.

The ALIAS issued to change the name of the column ID to a new Name New_Id.

b = b.select(col("ID").alias("New_ID")).show()

Output:

Screenshot:

pyspark alias 3

The data frame can be used by aliasing to a new data frame or name.

b.alias("New_Name")

Screenshot:

pyspark alias 4

The aliasing function can be used to change a column name in the existing data frame also.

In the above data frame, the same column can be renamed to a new column as New_id by using the alias function and the result can have the new column as data.

b.select("add",col("Id").alias("New_ID"),"Name").show()

Screenshot:

pyspark alias 5

The alias function can also be used while using the PySpark SQL operation SQL operation when used for join operation or for select operation generally aliases the table and the column value can be used by using the Dot(.) operator.

The table name. the column name is used to access the particular column of a table, in the same way, the alias name as A.columname can be used for the same purpose in the PySpark SQL function.

Related:  PySpark - repartition

The Aliasing there can be done simply put putting the name after the element whose aliasing needs to be done or just simply using the table name AS function followed by the Alias name.

Spark.sql(“Select * from Demo d where d.id = “123”)

The example shows the alias d for the table Demo which can access all the elements of the table Demo so the where the condition can be written as d.id that is equivalent to Demo.id.

Note:

  1. PySpark Alias is a function used to rename a column in the data frame in PySpark.
  2. PySpark Alias can be used in the join operations.
  3. PySpark Alias makes the column or a table in a readable and easy form
  4. PySpark Alias is a temporary name given to a Data Frame / Column or table in PySpark.
  5. PySpark Alias inherits all the property of the element it is referenced to.

Leave a Comment