HDP Analyst: Data Science Training

We can host this training at your preferred location. Contact us!

This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course

Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.

At the completion of the course students will be able to:Recognize use cases for data scienceDescribe the architecture of Hadoop and YARN
Recognize use cases for data science
Describe the architecture of Hadoop and YARN
Describe supervised and unsupervised learning differences
List the six machine learning tasks
Use Mahout to run a machine learning algorithm on Hadoop
Describe the data science life cycle
Use Pig to transform and prepare data on Hadoop
Write a Python script
Use NumPy to analyze big data
Use the data structure classes in the pandas library
Write a Python script that invokes SciPy machine learning
Describe options for running Python code on a Hadoop cluster
Write a Pig User-Defined Function in Python
Use Pig streaming on Hadoop with a Python script
Write a Python script that invokes scikit-learn
Use the k-nearest neighbor algorithm to predict values
Run a machine learning algorithm on a distributed data set
Describe use cases for Natural Language Processing (NLP)
Perform sentence segmentation on a large body of text
Perform part-of-speech tagging
Use the Natural Language Toolkit (NLTK)
Describe the components of a Spark application
Write a Spark application in Python
Run machine learning algorithms using Spark MLlib
Take data science into production

Setting Up a Development Environment
Using HDFS Commands
Using Mahout for Machine Learning
Getting Started with Pig
Exploring Data with Pig
Using the IPython Notebook
Data Analysis with Python
Interpolating Data Points
Define a Pig UDF in Python
Streaming Python with Pig
K-Nearest Neighbor and K-Means Clustering
Using NLTK for Natural Language Processing
Classifying Text using Naive Bayes
Spark Programming and Spark MLlib

Upcoming Trainings

Join our public courses in our Istanbul, London and Ankara facilities. Private class trainings will be organized at the location of your preference, according to your schedule.

Classroom / Virtual Classroom

13 May 2024

Istanbul, Ankara, London

3 Days

Classroom / Virtual Classroom

19 May 2024

Istanbul, Ankara, London

3 Days

Classroom / Virtual Classroom

19 May 2024

Istanbul, Ankara, London

3 Days

Classroom / Virtual Classroom

25 May 2024

Istanbul, Ankara, London

3 Days

Classroom / Virtual Classroom

04 June 2024

Istanbul, Ankara, London

3 Days

Classroom / Virtual Classroom

14 June 2024

Istanbul, Ankara, London

3 Days

Classroom / Virtual Classroom

26 June 2024

Istanbul, Ankara, London

3 Days

Classroom / Virtual Classroom

24 July 2024

Istanbul, Ankara, London

3 Days

HDP Analyst: Data Science Training

Prerequisites

Who Should Attend

What You Will Learn

Outline

HOW TO LEARN DATA SCIENCE IN 2024

AN OVERVIEW OF THE HORTONWORKS DATA PLATFORM (HDP)

Upcoming Trainings