AWS Big Data Blog

Official Big Data Blog of Amazon Web Services

  • How Cookpad scaled its Amazon Redshift cluster while controlling costs with usage limits

    How Cookpad scaled its Amazon Redshift cluster while controlling costs with usage limits

    This is a guest post by Shimpei Kodama, data engineer at Cookpad Inc. Cookpad is a tech company that builds a community platform where people share recipe ideas and cooking tips. The company’s...

    Watch Now
  • Making ETL easier with AWS Glue Studio

    Making ETL easier with AWS Glue Studio

    AWS Glue Studio is an easy-to-use graphical interface that speeds up the process of authoring, running, and monitoring extract, transform, and load (ETL) jobs in AWS Glue. The visual interface...

    Watch Now
  • Automating bucketing of streaming data using Amazon Athena and AWS Lambda

    Automating bucketing of streaming data using Amazon Athena and AWS Lambda

    In today’s world, data plays a vital role in helping businesses understand and improve their processes and services to reduce cost. You can use several tools to gain insights from your data, such...

    Watch Now
  • Best practices using AWS SCT and AWS Snowball to migrate from Teradata to Amazon Redshift

    Best practices using AWS SCT and AWS Snowball to migrate from Teradata to Amazon Redshift

    This is a guest post from ZS. In their own words, “ZS is a professional services firm that works closely with companies to help develop and deliver products and solutions that drive customer value...

    Watch Now
  • Bringing the power of embedded analytics to your apps and services with Amazon QuickSight

    Bringing the power of embedded analytics to your apps and services with Amazon QuickSight

    In the world we live in today, companies need to quickly react to change—and to anticipate it. Customers tell us that their reliance on data has never been greater than what it is today. To...

    Watch Now
  • Building an AWS Glue ETL pipeline locally without an AWS account

    Building an AWS Glue ETL pipeline locally without an AWS account

    If you’re new to AWS Glue and looking to understand its transformation capabilities without incurring an added expense, or if you’re simply wondering if AWS Glue ETL is the right tool for your use...

    Watch Now
  • How to delete user data in an AWS data lake

    How to delete user data in an AWS data lake

    General Data Protection Regulation (GDPR) is an important aspect of today’s technology world, and processing data in compliance with GDPR is a necessity for those who implement solutions within...

    Watch Now
  • Streaming data from Amazon S3 to Amazon Kinesis Data Streams using AWS DMS

    Streaming data from Amazon S3 to Amazon Kinesis Data Streams using AWS DMS

    Stream processing is very useful in use cases where we need to detect a problem quickly and improve the outcome based on data, for example production line monitoring or supply chain optimizations....

    Watch Now
  • Using the Amazon Redshift Data API to interact with Amazon Redshift clusters

    Using the Amazon Redshift Data API to interact with Amazon Redshift clusters

    Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL, and your existing ETL,...

    Watch Now
  • Analyzing Amazon S3 server access logs using Amazon ES

    Analyzing Amazon S3 server access logs using Amazon ES

    When you use Amazon Simple Storage Service (Amazon S3) to store corporate data and host websites, you need additional logging to monitor access to your data and the performance of your...

    Watch Now
  • Implementing LDAP authentication for Hive on a multi-tenant Amazon EMR cluster

    Implementing LDAP authentication for Hive on a multi-tenant Amazon EMR cluster

    As Amazon EMR continues its widespread adoption, it’s important to enforce separation of duties using role-based access when submitting your hive jobs on EMR clusters in multi-tenant environments....

    Watch Now
  • Enhanced monitoring and automatic scaling for Apache Flink

    Enhanced monitoring and automatic scaling for Apache Flink

    Thousands of developers use Apache Flink to build streaming applications to transform and analyze data in real time. Apache Flink is an open-source framework and engine for processing data...

    Watch Now
  • Stream CDC into an Amazon S3 data lake in Parquet format with AWS DMS

    Stream CDC into an Amazon S3 data lake in Parquet format with AWS DMS

    Most organizations generate data in real time and ever-increasing volumes. Data is captured from a variety of sources, such as transactional and reporting databases, application logs,...

    Watch Now
  • Developing AWS Glue ETL jobs locally using a container

    Developing AWS Glue ETL jobs locally using a container

    AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. In the fourth post of the series, we discussed optimizing...

    Watch Now
  • Amazon EMR supports Apache Hive ACID transactions

    Amazon EMR supports Apache Hive ACID transactions

    Apache Hive is an open-source data warehouse package that runs on top of an Apache Hadoop cluster. You can use Hive for batch processing and large-scale data analysis. Hive uses Hive Query...

    Watch Now
  • Zoopla drives KPIs with centralized data using Fivetran ELT for Amazon Redshift

    Zoopla drives KPIs with centralized data using Fivetran ELT for Amazon Redshift

    This is a guest post by Steven Collings, Senior Data Consultant at Zoopla Zoopla is a property website that enables users to find residential or commercial property to buy or rent in the UK and...

    Watch Now
  • Fast and predictable performance with serverless compilation using Amazon Redshift

    Fast and predictable performance with serverless compilation using Amazon Redshift

    Amazon Redshift is a fast, fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI)...

    Watch Now
  • Power data analytics, monitoring, and search use cases with the Open Distro for Elasticsearch SQL Engine on Amazon ES

    Power data analytics, monitoring, and search use cases with the Open Distro for Elasticsearch SQL Engine on Amazon ES

    Amazon Elasticsearch Service (Amazon ES) is a popular choice for log analytics, search, real-time application monitoring, clickstream analysis, and more. One commonality among these use cases is...

    Watch Now
  • How Aruba Networks built a cost analysis solution using AWS Glue, Amazon Redshift, and Amazon QuickSight

    How Aruba Networks built a cost analysis solution using AWS Glue, Amazon Redshift, and Amazon QuickSight

    This is a guest post co-written by Siddharth Thacker and Swatishree Sahu from Aruba Networks. Aruba Networks is a Silicon Valley company based in Santa Clara that was founded in 2002 by Keerti...

    Watch Now
  • Build a self-service environment for each line of business using Amazon EMR and AWS Service Catalog

    Build a self-service environment for each line of business using Amazon EMR and AWS Service Catalog

    Enterprises often want to centralize governance and compliance requirements, and provide a common set of policies on how Amazon EMR instances should be set up. You can use AWS Service Catalog to...

    Watch Now
  • loading
    Loading More...