- Full-time
- About
- Who Should Attend
- Pre-requisites
- Course Outline And Duration
- Exam Details
- Certification
- Course Fee
Big Data has evolved over the last few years and has become mainstream for many big organisations. This entry-level course is a gateway to the Data Scientist/Analytics career pathways, which are roles in great demand as Singapore is experiencing a shortage of these professionals. It covers the fundaments of Big Data using PySpark. “Spark” being a “fast cluster computing framework” for Big Data Processing that lets you run programmes and operations at up to 100 times faster in memory.
You will be exposed to various libraries in PySpark for Data Processing and Machine Learning and have the chance to work with various datasets through guided hands-on training. At the end of this course, you will gain an in-depth understanding of PySpark and its application to general Big Data analysis. This workshop will be conducted using a tool called Databricks, which is used to run big data loads on Spark.
Existing, graduating and graduated Master’s/Degree/Diploma students as well as anyone interested in coding.
- Minimum age: 16 years old
- An understanding of the basics of programming
- Knowledge of basic PC skills
- A basic proficiency in reading, writing and understanding English
Duration: 8 hours – 1 day
Consists of 7 modules:
Module 1: Introduction to Big Data and Databricks
The participants will be introduced to the basics of Big Data as well as the various concepts and different frameworks for processing it.
Module 2: Introduction to Big Data Analysis using PySpark
The participants will learn the basics of Spark with Python, experimenting with data and functional programming.
Module 3: Programming in PySpark
The participants will study about the backbone of Spark, Resilient Distributed Datasets (RDD). We will learn how RDDs are created and executed, and various transformations and actions (map, reduce, collect among others) using them.
Module 4: PySpark SQL & Dataframes
Structured data processing is important when profiling and understanding data. Spark provides an elegant method for the above using Spark SQL. The participants will be exposed to dataframes, the distributed SQL query engine, various operations using Spark SQL and data visualisation using PySpark.
Module 5: Machine Learning with PySpark
Participants will study various machine learning methods and algorithms, and will work with different datasets to perform regression, clustering and classification among other such
operations. They will also understand and learn to go through the process of model training and evaluation.
Module 6: Streaming Analytics
Participants will learn about real-time streaming data and how Spark can be leveraged to deal with real-time data and perform real-time analytics.
Module 7: Practice & Extra Workshops
Participants will have the opportunity to work with various datasets and practice all the operations and techniques learnt over various modules. Trainers and Assistant Trainers will help the participants through their exercises and practice sessions. This will give them an opportunity to get a thorough grasp of programming using PySpark.
No Exam
Certificate of Completion
Course Fee: S$481.50 (inclusive of GST)