IL - Apache Spark Programming with Azure Databricks

Course Overview

In this course, you will explore the Spark Internals and Architecture of Azure Databricks. The course will start with a brief introduction to Scala. Using the Scala programming language, you will be introduced to the core functionalities and use cases of Azure Databricks including Spark SQL, Spark Streaming, MLlib, and GraphFrames.

Course Details
  • Duration: 3 Days
  • Level: 300

Who this course is designed for
  • Data Engineers
  • Data Administrators
  • Data Scientists
  • Data Architects
  • Software Developers

What You Will Learn

  • Understand the Azure Databricks architecture
  • Understand the Apache Spark internals
  • Manipulating data using Spark APIs
  • Working with large data sets and querying data with Spark SQL
  • Building structured streaming jobs
  • Implementing machine learning pipelines with the MLlib API
  • Processing data using the GraphFrames API

Prerequisites:

  • Familiarity with cloud computing concepts
  • Familiarity with Azure
  • Familiarity with SQL
  • Background in programming

Introduction to Azure Databricks

In this module, you will learn an overview of Azure Databricks and Spark and where Azure Databricks fits in the big data landscape in Azure. Key features of Azure Databricks such as Workspaces and Notebooks will be covered. You will also learn the basic architecture of Spark and cover basic Spark internals including core APIs, job scheduling and execution. This module will prepare developers and administrators for more advanced work in Azure Databricks such as Python or Scala development.

;

Introduction to Scala

This session will introduce students to the Scala programming language. We will look at basic Scala syntax including variables, types, control flow, functions, scoping, inference, imports, and object-oriented programming.;

Introduction to Spark Programming

In this session, students will learn the basics of Spark and Spark programming. We will cover the DataFrames and Datasets API, processing data with Spark SQL, and working with the Functions API. Students will also look at basic Spark programming concepts and techniques such as aggregation, column operations, joins and broadcasting, user defined functions, caching and performance analysis.;

Spark Architecture and Internals

In this session, students will learn about Spark internals. We will look at the Spark cluster architecture covering topics such as job and task executing and scheduling, shuffling and the Catalyst optimizer.;

Building Structured Streaming Jobs in Spark

In this session, students will be introduced to Spark Structured Streaming. Students will learn about data sources and data sinks and working with the Structured Streaming APIs. Students will look at stream processing techniques such as windowing and aggregation functions, checkpointing and watermarking and their use in stream processing jobs. Finally, students will investigate fault tolerance in stream processing jobs.;

Implementing Machine Learning Pipelines in Spark using MLlib

In this session, students will learn how to use the Spark Machine Learning Libraries to build machine learning pipelines. Students will be introduced to machine learning pipeline concepts such as Transformer and Estimator. They will learn to perform feature processing and how to evaluate and apply machine learning models.;

Graph Processing with the GraphFrames API

In this session, attendees will learn to leverage the GraphFrames API for graph processing. Topics will include transforming data frames into a graph and performing graph analysis including page rank, shortest path, connected components, and label propagation.;

Contact the experts at Opsgility to schedule this class at your location or to discuss a more comprehensive readiness solution for your organization. Contact us to enroll or book a class

Contact Us
Looking for on-demand training?
Try SkillMeUp.com