Transform data with Spark in Azure Synapse Analytics

Data engineers commonly need to transform large volumes of data. Apache Spark pools in Azure Synapse Analytics provide a distributed processing platform that they can use to accomplish this goal.

Data Engineer

Synapse Analytics

Module Objectives

In this module, you will learn how to:

Use Apache Spark to modify and save dataframes
Partition data files for improved performance and scalability.
Transform data with SQL

Units

Introduction min
Modify and save dataframes min
Partition data files min
Transform data with SQL min
Exercise: Transform data with Spark in Azure Synapse Analytics min
Knowledge check min
Summary min

Prerequisites

Before taking this module, you should be familiar with Apache Spark pools in Azure Synapse Analytics. Consider completing the Analyze data with Apache Spark in Azure Synapse Analytics module first.