Amazon Web Services (AWS) has become the de-facto leader in cloud computing services, and has been consistently pushing the envelope in the breadth of services provided since it launched in 2006. AWS now provides more than 70 services that span a wide range of areas such as machine learning (Amazon Machine Learning), storage (Amazon S3), computing (Amazon EC2), real-time data processing (Amazon Kinesis), scalable cloud database server services (Amazon RDS) and Big Data (Amazon Elastic MapReduce service, which runs Hadoop on top of EC2 and S3).
For this post I will focus on AWS’s data warehousing service, Amazon Redshift, which changed the game for a lot of companies looking for a less expensive alternative to traditional massively parallel processing (MPP) databases platforms such as Teradata, Vertica, Greenplum, etc. Amazon claims that the cost of implementing Redshift is about a tenth that of standard MPP platforms.
In short, Amazon Redshift is a fast, powerful, fully managed, petabyte-scale data warehouse service in the cloud. One of the main differentiators that separates Amazon Redshift from most (usually very expensive) MPP solutions in that it has no upfront costs or commitments. Customers can start small at $0.25 per hour and $1000 per terabyte per year, scaling up to petabytes as needed. Amazon Redshift achieves very high performance using 3 strategies:
- Columnar Data Storage – Storing data at the column level allows for much more efficient computation of aggregates.
- Advanced Compression – Amazon Redshift is unique among its competitors in that it employs multiple compression techniques and so can achieve significant compression relative to traditional data stores. When loading data into an empty table, it automatically samples the data and selects the most appropriate compression scheme.
- Massively Parallel Processing (MPP)
Another interesting fact to point out is that Amazon Redshift is based on the open source relational database PostgreSQL.
Various third party assessments confirm that Amazon Redshift is the real deal. With large organizations growing increasingly more comfortable with storing their data in the cloud, those of us who want to stay at the forefront of BI technologies will inevitably need to become familiar with Redshift’s capabilities as well as best practices for designing and implementing a Redshift data warehousing solution.
Here’s another post that describes how Amplitude, a web and mobile analytics company, chose Redshift over Hadoop/Hive and traditional MPP platforms for their data warehousing solution.
You can find the official Amazon Redshift homepage at the following link: Amazon Redshift