Building Modern Data Analytics Solutions on AWS
Dive deep into Amazon Lake Formation, Amazon Glue, Amazon EMR, Amazon Kinesis, and Amazon Redshift and the current thinking in building and operating data analytics pipelines to turn data into insights.
The Building Modern Data Analytics Solutions on AWS collection of one-day, intermediate level instructor-led courses dives deep into Amazon Lake Formation, Amazon Glue, Amazon EMR, Amazon Kinesis, and Amazon Redshift and the current thinking in building and operating data analytics pipelines to turn data into insights.
Wherever you or your customers are in the data modernization journey, our Building Modern Data Analytics Solutions on AWS collection of courses let you select the right training to meet your specific learning needs.
Duration
4 days/28 hours of instructionPublic Classroom Pricing
$2700(USD)
Group Rate: $2600
Private Group Pricing
Have a group of 5 or more students? Request special pricing for private group training today.
Building Data Lakes on AWS
Part 1: Introduction to Data Lakes
- Describe the value of data lakes
- Compare data lakes and data warehouses
- Describe the components of a data lake
- Recognize common architectures built on data lakes
Part 2: Data ingestion, cataloging, and preparation
- Describe the relationship between data lake storage and data ingestion
- Describe AWS Glue crawlers and how they are used to create a data catalog
- Identify data formatting, partitioning, and compression for efficient storage and query
- Lab 1: Set up a simple data lake
Part 3: Data Processing and Analytics
- Recognize how data processing applies to a data lake
- Use AWS Glue to process data within a data lake
- Describe how to use Amazon Athena to analyze data in a data lake
Part 4: Building a Data Lake with AWS Lake Formation
- Describe the features and benefits of AWS Lake Formation
- Use AWS Lake Formation to create a data lake
- Understand the AWS Lake Formation security model
- Lab 2: Build a data lake using AWS Lake Formation
Part 5: Additional Lake Formation Configurations
- Automate AWS Lake Formation using blueprints and workflows
- Apply security and access controls to AWS Lake Formation
- Match records with AWS Lake Formation FindMatches
- Visualize data with Amazon QuickSight
- Lab 3: Automate data lake creation using AWS Lake Formation blueprints
- Lab 4: Data visualization using Amazon QuickSight
Part 6: Architecture and Course Review
- Post course knowledge check
- Architecture review
- Course review
Building Batch Data Analytics Solutions on AWS
Part 1: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Part 2: Introduction to Amazon EMR
- Using Amazon EMR in analytics solutions
- Amazon EMR cluster architecture
- Interactive Demo 1: Launching an Amazon EMR cluster
- Cost management strategies
Part 3: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
- Storage optimization with Amazon EMR
- Data ingestion techniques
Part 4: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
- Apache Spark on Amazon EMR use cases
- Why Apache Spark on Amazon EMR
- Spark concepts
- Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
- Transformation, processing, and analytics
- Using notebooks with Amazon EMR
- Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR
Part 5: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
- Using Amazon EMR with Hive to process batch data
- Transformation, processing, and analytics
- Practice Lab 2: Batch data processing using Amazon EMR with Hive
- Introduction to Apache HBase on Amazon EMR
Part 6: Serverless Data Processing
- Serverless data processing, transformation, and analytics
- Using AWS Glue with Amazon EMR workloads
- Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions
Part 7: Security and Monitoring of Amazon EMR Clusters
- Securing EMR clusters
- Interactive Demo 3: Client-side encryption with EMRFS
- Monitoring and troubleshooting Amazon EMR clusters
- Demo: Reviewing Apache Spark cluster history
Part 8: Designing Batch Data Analytics Solutions
- Batch data analytics use cases
Building Streaming Data Analytics Solutions on AWS
Part 1: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Part 2: Using Streaming Services in the Data Analytics Pipeline
- The importance of streaming data analytics
- The streaming data analytics pipeline
- Streaming concepts
Part 3: Introduction to AWS Streaming Services
- Streaming data services in AWS
- Amazon Kinesis in analytics solutions
- Demonstration: Explore Amazon Kinesis Data Streams
- Practice Lab: Setting up a streaming delivery pipeline with Amazon Kinesis
- Using Amazon Kinesis Data Analytics
- Introduction to Amazon MSK
- Overview of Spark Streaming
Part 4: Using Amazon Kinesis for Real-time Data Analytics
- Exploring Amazon Kinesis using a clickstream workload
- Creating Kinesis data and delivery streams
- Demonstration: Understanding producers and consumers
- Building stream producers
- Building stream consumers
- Building and deploying Flink applications in Kinesis Data Analytics
- Demonstration: Explore Zeppelin notebooks for Kinesis Data Analytics
- Practice Lab: Streaming analytics with Amazon Kinesis Data Analytics and Apache Flink
Part 5: Securing, Monitoring, and Optimizing Amazon Kinesis
- Optimize Amazon Kinesis to gain actionable business insights
- Security and monitoring best practices
Part 6: Using Amazon MSK in Streaming Data Analytics Solutions
- Use cases for Amazon MSK
- Creating MSK clusters
- Demonstration: Provisioning an MSK Cluster
- Ingesting data into Amazon MSK
- Practice Lab: Introduction to access control with Amazon MSK
- Transforming and processing in Amazon MSK
Part 7: Securing, Monitoring, and Optimizing Amazon MSK
- Optimizing Amazon MSK
- Demonstration: Scaling up Amazon MSK storage
- Practice Lab: Amazon MSK streaming pipeline and application deployment
- Security and monitoring
- Demonstration: Monitoring an MSK cluster
Part 8: Designing Streaming Data Analytics Solutions
- Use case review
- Class Exercise: Designing a streaming data analytics workflow
Part 9: Developing Modern Data Architectures on AWS
- Modern data architectures
Building Data Analytics Solutions Using Amazon Redshift
Part 1: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Part 2: Using Amazon Redshift in the Data Analytics Pipeline
- Why Amazon Redshift for data warehousing?
- Overview of Amazon Redshift
Part 3: Introduction to Amazon Redshift
- Amazon Redshift architecture
- Interactive Demo 1: Touring the Amazon Redshift console
- Amazon Redshift features
- Practice Lab 1: Load and query data in an Amazon Redshift cluster
Part 4: Ingestion and Storage
- Ingestion
- Interactive Demo 2: Connecting your Amazon Redshift cluster using a Jupyter notebook with Data API
- Data distribution and storage
- Interactive Demo 3: Analyzing semi-structured data using the SUPER data type
- Querying data in Amazon Redshift
- Practice Lab 2: Data analytics using Amazon Redshift Spectrum
Part 5: Processing and Optimizing Data
- Data transformation
- Advanced querying
- Practice Lab 3: Data transformation and querying in Amazon Redshift
- Resource management
- Interactive Demo 4: Applying mixed workload management on Amazon Redshift
- Automation and optimization
- Interactive demo 5: Amazon Redshift cluster resizing from the dc2.large to ra3.xlplus cluster
Part 6: Security and Monitoring of Amazon Redshift Clusters
- Securing the Amazon Redshift cluster
- Monitoring and troubleshooting Amazon Redshift clusters
Part 7: Designing Data Warehouse Analytics Solutions
- Data warehouse use case review
- Activity: Designing a data warehouse analytics workflow
Part 8: Developing Modern Data Architectures on AWS
- Modern data architectures
Professionals who would benefit from this training include:
- Data warehouse engineers
- Data platform engineers
- Solutions architects
- How to leverage AWS data Services to store, process, analyze, stream, and query data to make decisions with speed and agility at scale
- How to modernize data solutions end to end
- Skills to put your data to work to make better, more informed decisions, respond faster to the unexpected, and uncover new opportunities
A full refund will be issued for class cancellations made at least 15 business days before the course begins. Payment is non‑refundable for cancellations or reschedules made within 15 business days from the course start date and for No‑Shows (students who do not attend class).
For reschedules made within 15 business days from the course start date, students must reschedule immediately for a current, published course, up to a maximum of 90 days from the original date.
A student may reschedule a class or exam up to 2 times. Any additional reschedules will not be allowed.