Mining from Massive Data
About This Course
This course is a comprehensive guide to big data analytics, machine learning, and advanced NLP using PySpark, from exploratory data analysis to cloud-native production deployment via Docker and Kubernetes.

Course Syllabus (2025)
Databricks
How to do exploratory data analysis in PySpark(slides)
Linear regression in Python and Pyspark(slides)
How to do grid search in PySpark and Classification in PySpark(slides)
Imbalanced data with classification model(slides)
Predicting Stock Price(slides)
Docker and Kubernetes(slides)
How to deploy PySpark to a Kubernetes cluster(slides)
Spark NLP Models Hub(slides)
How to use John Snow’s spark-nlp for sentiment analysis(slides)
Hugging Face(slides)
Horovod(slides)