We can host this training at your preferred location. Contact us!
Upcoming Training
23 March 2021
3 Days
This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.
Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course
Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.
At the completion of the course students will be able to:Recognize use cases for data scienceDescribe the architecture of Hadoop and YARN
Recognize use cases for data science
Describe the architecture of Hadoop and YARN
Describe supervised and unsupervised learning differences
List the six machine learning tasks
Use Mahout to run a machine learning algorithm on Hadoop
Describe the data science life cycle
Use Pig to transform and prepare data on Hadoop
Write a Python script
Use NumPy to analyze big data
Use the data structure classes in the pandas library
Write a Python script that invokes SciPy machine learning
Describe options for running Python code on a Hadoop cluster
Write a Pig User-Defined Function in Python
Use Pig streaming on Hadoop with a Python script
Write a Python script that invokes scikit-learn
Use the k-nearest neighbor algorithm to predict values
Run a machine learning algorithm on a distributed data set
Describe use cases for Natural Language Processing (NLP)
Perform sentence segmentation on a large body of text
Perform part-of-speech tagging
Use the Natural Language Toolkit (NLTK)
Describe the components of a Spark application
Write a Spark application in Python
Run machine learning algorithms using Spark MLlib
Take data science into production
Setting Up a Development Environment
Using HDFS Commands
Using Mahout for Machine Learning
Getting Started with Pig
Exploring Data with Pig
Using the IPython Notebook
Data Analysis with Python
Interpolating Data Points
Define a Pig UDF in Python
Streaming Python with Pig
K-Nearest Neighbor and K-Means Clustering
Using NLTK for Natural Language Processing
Classifying Text using Naive Bayes
Spark Programming and Spark MLlib
Blog posts related to HDP Analyst: Data Science Training
ITIL DISASTER RECOVERY PLAN
ITIL DISASTER RECOVERY Failure to meet with mobile phones for a while after the recent natural disasters brought up measures that can be taken to ensure communication between citizens or institutions after natural disasters. Mapping your disaster rec...
Upcoming Trainings
Join our public courses in our Istanbul, London and Ankara facilities. Private class trainings will be organized at the location of your preference, according to your schedule.