Data Engineering
Overview
Our Data Engineering program offers you a comprehensive overview of foundational and advanced concepts. Beginning with a deep dive into Python, you will gain practical skills through hands-on exercises covering basic syntax and sophisticated data processing libraries. The course will guide you through exploring various data formats and storage solutions, including traditional databases and modern cloud platforms, giving you exposure to various scenarios and environments.
Advancing through the curriculum, you will master data modeling and learn to design efficient, normalized databases and structures essential for robust projects. Our mentors will introduce you to the scale and complexity of big data challenges and learn data warehousing and data lake strategies. With lessons on distributed systems and parallel computing, you will soon excel at large-scale data processing and become an expert in big data, graph analytics, and ETL (Extract, Transform, Load) processes.
Course Details
Program Length: 16 weeks
Weekday Classes Length: 6 hours
Weekend Classes Length: 4 hours
Lectures Details
Lectures: 11 + Final Project
Lectures Length: 153 hours
Classes Schedule:
- Monday – 2 hours
- Tuesday – 2 hours
- Thursday – 2 hours
- Saturday – 4 hours
Outcomes
This course will provide you with:
- A solid foundation in data engineering
- Preparation, to tackle real-world data challenges
- A path to a successful career in data engineering
- Supported operating systems: macOS, Linux, or Windows (Pro edition required).
- Latest OS version, fully up to date.
- All security updates installed.
- At least 100GB of free space on the hard drive.
- At least 16GB of RAM, 32GB RAM is strongly preferred.
- Support for video conferencing and screen-sharing, with a reliable webcam and microphone.
How to use Python for data analysis?
- Introduction to Python, basic syntax, data types, variable declarations, conditions, loops and strings.
- Data structures, lists, tuples and ranges.
- Functions and modules.
- Libraries for data processing and data visualization in Python and virtual Environments.
What are Data Formats?
- Learning the essential data formats used in data engineering, including JSON, XML, and CSV.
What is storage and data modeling
- Various storage options, from relational and NoSQL databases to vector and graph databases.
- Cloud storage solutions like Amazon S3 and Azure Blob.
- Data design and modeling.
- Normalization.
What are data processes?
- Data discovery, integration, transformation, and enrichment.
Introduction to big data, big data analytics and graph analytic
- Definition and Characteristics of big data.
- Use Cases.
- Big Data Enabling Technologies.
- Decision trees for Big Data Analytics.
- Big Data Analytics.
- Graph Analytic.
What is data warehousing?
- Introduction to Data Warehousing.
- What is the role of Data Warehousing.
What are data lakes?
- Introduction to Data Lakes.
- What is the role of Data Lakes.
Distributed Systems and Parallel Computing
- Introduction to Distributed Systems.
- Basic concepts of distributed computing.
- Introduction to Parallel Computing.
- Understanding parallel processing and its role in handling large datasets.
ETL processes
- Basics of ETL processes and tools.
- Building ETL processes.