Data Science at Scale using Spark and Hadoop (DSSH)

Who should attend

Developers
Data analysts
Statisticians

Prerequisites

Proficiency in a scripting language
- Python is strongly preferred
- Perl or Ruby is sufficient
Basic knowledge of Apache Hadoop
Experience working in Linux environments

Course Objectives

After completing this class, you will learn:

How to identify potential business use cases where data science can provide impactful results
How to obtain, clean and combine disparate data sources to create a coherent picture for analysis
What statistical methods to leverage for data exploration that will provide critical insight into your data
Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
What machine learning technique to use for a particular data science project
How to implement and manage recommenders using Spark’s MLlib, and how to set up and evaluate data experiments
What are the pitfalls of deploying new analytics projects to production, at scale

Course Content

Data Science at Scale using Spark and Hadoop is a 3 day instructor-led class where you will learn how scientists use data to solve problems by understanding the tools and techniques they use. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and prepare for data scientist roles in the field.

Preise & Trainingsmethoden

Online Training

Dauer
3 Tage

Preis

auf Anfrage

Termine und Buchen

Termin anfragen

Classroom Training

Dauer
3 Tage

Preis

Deutschland: 2.230,– €

Termine und Buchen

Termin anfragen

Derzeit gibt es keine Trainingstermine für diesen Kurs.

Termin anfragen