1. Introduction to Programming for Data Handling
- Describe the pros and cons of using programming languages to work with data
- Identify the languages most suitable for data handling
- Explain the challenges of using programming languages versus data analysis tools
2. Introduction to Python and IDEs
- Describe the key attributes of the Python programming language.
- Explain the role of the Jupyter IDE for Python programming.
- Use the Jupyter IDE to write a basic Python program.
- Write a program which uses string, integer, float and boolean data types.
3. Data Structures, Flow Control, Functions, and Basic Types
- Construct collections to solve data problems.
- Utilise selection and iteration syntax to control the flow of a Python program.
- Write reusable functions which can be used to alter data & automate repetitive tasks.
- Use Python's built-in open function to create, read, and edit files.
4. Mathematical and Statistical Programming with NumPy
- Describe the core features of NumPy arrays.
- Create, index, and manipulate NumPy arrays to solve data problems.
- Use masking and querying syntax to retrieve desired values.
- Use vectorised ufuncs.
5. Introduction to Pandas
- Create, manipulate, and alter Series and DataFrames with Pandas.
- Define and change the indices of Series & Dataframes.
- Use Pandas' functions and methods to change column types, compute summary statistics and aggregate data.
- Read, manipulate, and write data from csv, xlsx, json and other structured file formats.
6. Data Cleaning with Pandas
- Identify missing data and apply techniques to deal with it.
- Deduplicate, transform and replace values.
- Use DataFrame string methods to manipulate text data.
- Write regular expressions which munge text data.
7. Data Manipulation with Pandas
- Construct Pivot tables in Pandas.
- Time series manipulation.
- Stream data into Pandas to handle data size problems.
8. Methods for Visualising Data
- Construct and tailor basic data visualisations using Matplotlib & Seaborn for both numeric & non-numeric data.
- Meaningfully visualise aggregate data using Matplotlib and Seaborn.
Related learning
Data Science Learning Pathways can be selected by choosing either Python or R and a Cloud Platform certification:
- QAIDSDP Introduction to Data Science for Data Professionals
- Sourcing and handling data:
- QADHPYTHON Data Handling with Python
- QADHR Data Handling with R
- QAPDHAI Python Data Handling with AI APIs
- Statistics for Data Analysis:
- QASDAPY Statistics for Data Analysis with Python
- QASDAR Statistics for Data Analysis with R
- Programming and Software Development skills:
- QAPYTH3 Python Programming
- QARPROG R Programming
- Machine Learning Development:
- QADSMLP Data Science and Machine Learning with Python
- QADSMLR Data Science and Machine Learning with R
- Mathematics for Developing Algorithms for AI models, Big Data Mining, and working with Neural Networks:
- QAMFDS Mathematics for Data Science
- Forecasting:
- QATSFP Time Series and Forecasting with Python
- QATSFR Time Series and Forecasting with R
Suggested courses leading to Certification:
- MDP100 Designing and Implementing a Data Science Solution on Azure (DP-100)
- AMWSMLP Machine Learning Pipelines on AWS
- GCPMLGC Machine Learning on Google Cloud