Skip to main content

Data Engineering

Overview

Course Details

Schedule

Program Length: 16 weeks  

Class schedule

Weekday Classes Length: 6 hours  

Weekend Classes Length: 4 hours

Lectures Details

Lectures

Lectures: 11 + Final Project

Schedule

Lectures Length: 153 hours  

Class schedule

Classes Schedule:  

  • Monday – 2 hours 
  • Tuesday – 2 hours 
  • Thursday – 2 hours 
  • Saturday – 4 hours 

Outcomes

This course will provide you with:

  • A solid foundation in data engineering
  • Preparation, to tackle real-world data challenges
  • A path to a successful career in data engineering
Please ensure to have a computer that meets the specified requirements:

 

  • Supported operating systems: macOS, Linux, or Windows (Pro edition required).
  • Latest OS version, fully up to date.
  • All security updates installed.
  • At least 100GB of free space on the hard drive.
  • At least 16GB of RAM, 32GB RAM is strongly preferred.
  • Support for video conferencing and screen-sharing, with a reliable webcam and microphone.

How to use Python for data analysis?

  • Introduction to Python, basic syntax, data types, variable declarations, conditions, loops and strings.
  • Data structures, lists, tuples and ranges.
  • Functions and modules.
  • Libraries for data processing and data visualization in Python and virtual Environments.

What are Data Formats?

  • Learning the essential data formats used in data engineering, including JSON, XML, and CSV.

What is storage and data modeling

  • Various storage options, from relational and NoSQL databases to vector and graph databases.
  • Cloud storage solutions like Amazon S3 and Azure Blob.
  • Data design and modeling.
  • Normalization.

What are data processes?

  • Data discovery, integration, transformation, and enrichment.

Introduction to big data, big data analytics and graph analytic

  • Definition and Characteristics of big data.
  • Use Cases.
  • Big Data Enabling Technologies.
  • Decision trees for Big Data Analytics.
  • Big Data Analytics.
  • Graph Analytic.

What is data warehousing?

  • Introduction to Data Warehousing.
  • What is the role of Data Warehousing.

What are data lakes?

  • Introduction to Data Lakes.
  • What is the role of Data Lakes.

Distributed Systems and Parallel Computing

  •  Introduction to Distributed Systems.
  • Basic concepts of distributed computing.
  • Introduction to Parallel Computing.
  • Understanding parallel processing and its role in handling large datasets.

ETL processes

  • Basics of ETL processes and tools.
  • Building ETL processes.
To successfully pass the class, students should aim to reach a minimum of 90% of the available points. We’ve created a flexible environment which will enable you to have the best learning experience and elevate you on to greater heights. 
Punctuality, participation in discussions, completion of assignments, and demonstration of professional courtesy to others are required, in accordance with our Code of Conduct. Attendance will be taken at the beginning of every class. Passing requires at least 90% attendance. Students should always contact the instructors ahead of time if they are unable to attend all or part of the published class/lab hours.

Related Courses