Data Science at Scale using Spark and Hadoop (DSSH)

Course Description Schedule Course Outline
 

Who should attend

  • Developers
  • Data analysts
  • Statisticians

Prerequisites

  • Proficiency in a scripting language
    • Python is strongly preferred
    • Perl or Ruby is sufficient
  • Basic knowledge of Apache Hadoop
  • Experience working in Linux environments

Course Objectives

After completing this class, you will learn:

  • How to identify potential business use cases where data science can provide impactful results
  • How to obtain, clean and combine disparate data sources to create a coherent picture for analysis
  • What statistical methods to leverage for data exploration that will provide critical insight into your data
  • Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
  • What machine learning technique to use for a particular data science project
  • How to implement and manage recommenders using Spark’s MLlib, and how to set up and evaluate data experiments
  • What are the pitfalls of deploying new analytics projects to production, at scale

Course Content

Data Science at Scale using Spark and Hadoop is a 3 day instructor-led class where you will learn how scientists use data to solve problems by understanding the tools and techniques they use. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and prepare for data scientist roles in the field.

Classroom Training
Modality: G

Duration 3 days

Price (excl. tax)
  • Germany: 2,195.- €
Dates and Booking
E-Learning
Modality: P
Price (excl. tax)
  • Germany: 1,600.- €
Buy E-Learning
 
Schedule

Currently there are no training dates scheduled for this course.  Enquire a date

 

Cookies help us deliver our services. By using our services, you agree to our use of cookies.   Got it!