Module 1 – Data Preparation Basics
- Data Preparation Defined
- Historical Perspective – How Did We Get Here?
- The Need for Self-Service Data Preparation
- Historical Perspective
- Introduction to Data Preparation Tools
- Types of tools
- Programming/scripting vs. visual interface
- Standalone vs. integrated into analytics platforms
- Cloud, on-premises, and hybrid deployment
- Machine learning and data preparation
- Users of Data Preparation Tools
- Data scientists
- Data engineers
- Business analysts
- Data analysts
- oInformation workers
- Data Preparation in Analytics Architecture
- Data Preparation and Analytics Life Cycle
- Continuous exploration and discovery
- Iterative and adaptive
Module 2 – Data Discovery
- Data Sources
- Enterprise databases
- Local data
- Desktop data
- Cloud data
- Web data
- Files
- NoSQL
- Geospatial
- Media
- Data Sourcing
- Choosing data sources
- Physical data source connections
- Virtual data source connections
- Data Exploration
- Understanding content
- Estimating quality
- Discovering patterns
- Discovering data types
- Discovering data structure
- Discovering data relationships
- Data enrichment opportunities
- Developing data profiles
- Capturing metadata
Exercise 1 – Data Exploration and Data Sourcing
Module 3 – Data Transformation
- The Scope of Data Preparation
- Improving Data
- Standardization and conforming
- Cleansing and quality
- De-duplication
- Enriching Data
- Derivation
- Appending
- Aggregation
- Formatting Data
- Aggregation
- Sorting and sequencing
- Pivoting / de-pivoting
- Sampling and filtering
- Masking sensitive data
- Constructing records
- Data Blending
- Blending defined
- Blend vs. join
- Blending vs. Warehousing
Exercise 2 – Data Transformation and Data Blending
Module 4 – Data Governance
- Data Validation
- Visual validation
- Rules-based validation
- Data auditing
- Data Protection
- Activity logging
- Activity audits
- Data Management
- Metadata management
- Data lineage
- Model on use – “just in time” data models
- Data curation – what and why
Module 5 – The Technology Landscape
- Data Preparation Platforms
- Core functions and features
- Product Overviews and Selected Demonstrations