10/7/21

Athena vs. Redshift


Amazon Athena should be your preferred choice when running ad-hoc SQL queries on data that is stored in Amazon S3. It doesn’t require you to set up or manage any infrastructure resources, and you don’t need to move any data. It supports structured, unstructured, and semi-structured data. With Athena, you are defining a “schema on read” -- you basically just log in, create a table and you are good to go.


Amazon Redshift is targeted for modern data analytics on large, peta-byte scale, sets of structured data. Here, you need to have a predefined “schema on write”. Unlike serverless Athena, Redshift requires you to create a cluster (compute and storage resources), ingest the data and build tables before you can start to query, but caters to performance and scale. So for any highly-relational data with a transactional nature (data gets updated), workloads which involve complex joins, and latency requirements to be sub-second, Redshift is the right choice.


Athena and Redshift are optimized for read-heavy analytics workloads - and therefore not replacements for write-heavy, relational databases such as Amazon Relational Database System (RDS) and Aurora. At a high level, use Athena for exploratory analytics and operational debugging; use Redshift for business-critical reports and dashboards.