Skip to main content
  1. Posts/

Hive Partitioning and Bucketing

·1 min· loading · loading ·
Naci Simsek
Docker Hadoop Data Engineering Tutorial Hdfs Hive Mapreduce Postgres Catalog
Table of Contents

Hive Partitioning and Bucketing
#

In the previous article, we created Hive tables and observe data usage on HDFS and metadata management.

In this article, we will be performing partitioning and bucketing options and observe how applying these techniques can help us on query performance.

If you directly opened this article without setting up your Docker environment, I suggest you visit that article to deploy your cluster first.


Related

Airflow Introduction Pipeline
·1 min· loading · loading
Naci Simsek
Docker Hadoop Data Engineering Tutorial Hdfs Hive Mapreduce Postgres Catalog
Change Data Capture (CDC) Pipeline Implementation
·1 min· loading · loading
Naci Simsek
Docker Hadoop Data Engineering Tutorial Hdfs Hive Mapreduce Postgres Catalog
Elasticsearch Indexing and Kibana Dashboard with PySpark
·1 min· loading · loading
Naci Simsek
Docker Hadoop Data Engineering Tutorial Hdfs Hive Mapreduce Postgres Catalog
Optimizing Spark Applications
·1 min· loading · loading
Naci Simsek
Docker Hadoop Data Engineering Tutorial Hdfs Hive Mapreduce Postgres Catalog
Processing Complex Nested JSON File with Spark
·1 min· loading · loading
Naci Simsek
Docker Hadoop Data Engineering Tutorial Hdfs Hive Mapreduce Postgres Catalog
Spark Streaming Hands On from/to Kafka
·1 min· loading · loading
Naci Simsek
Docker Hadoop Data Engineering Tutorial Hdfs Hive Mapreduce Postgres Catalog