Amazon Web Services Inc. announced five new capabilities across its database and analytics products today during AWS re:Invent designed to provide customers the tools needed to help them manage and analyze data at petabyte scale.
Amazon announced capabilities coming to DocumentDB, OpenSearch Service and interactive query service Athena that will enhance high-performance database analytics at scale. In addition, data integration processor AWS Glue has been updated to automatically manage data quality at scale. Redshift, a managed data warehouse product, has also been updated to support high availability configurations across multiple AWS availability zones.
“Data is inherently dynamic, and harnessing it to its full potential requires an end-to-end data strategy that can scale with a customer’s needs and accommodate all types of use cases — both now and in the future,” said Swami Sivasubramanian (pictured), vice president of databases, analytics and machine learning at AWS. “To help customers make the most of their growing volume and variety of data, we are committed to offering the broadest and deepest set of database and analytics services.”
Sivasubramanian voiced Amazon’s commitment to building resources that customers could use to manage and query data at scale in order to make better decisions with their data.
According to Amazon, customers today are facing ever-increasing data needs as they create and store petabytes, and even exabytes, of data from numerous sources. The tools to access, query and analyze that data have become even more complex from integrating data, to storing it and finally to make it available to generate insights.
Amazon DocumentDB has released a new type of cluster that allows customers to elastically scale their document databases. The new capability, known as DocumentDB Elastic Clusters, allows customers to scale document databases to handle millions of reads and writes per second and store 2 petabytes of data within minutes, a capacity beyond a single node. Previously, customers needed to write specialized code to spread workloads across multiple nodes when workloads became demanding, now this feature is inherent in the Elastic Clusters and managed automatically for customers who need this capability.
With the release of Amazon OpenSearch Serverless, customers can have search indexes automatically provision, configure and scale allowing for petabyte-scale search. It does this by decoupling indexing of information from search, allowing it to rapidly scale without any performance hit during massive spikes in workload on either side. Customers of OpenSearch Serverless get the benefits of scalability along with standard features such as built-in data visualization for understanding log data and search relevance rankings.
Amazon Athena now supports Apache Spark, an open-source processing framework for big data workloads, that upgrades its ability to provide interactive queries as one of the fastest ways to search petabytes of data across Amazon Simple Storage Service. The addition of Apache Spark should help developers write applications in the languages they prefer, such as Java, Scala, Python and R, without needing to set up, manage and scale their own Apache Spark instance every time they want to run a query. With support for Apache Spark on AWS, customers can now run queries, complex analyses and quickly visualize results.
AWS Glue, a serverless, scalable data integration service that makes it possible to integrate and manage data from multiple sources, is getting a preview of AWS Glue Data Quality. This feature automatically analyzes data and gathers statistics, then recommends data quality rules to get customers started. Customers can set their own rules. If the data quality being ingested falls below certain thresholds, the customer will be alerted and take action.
Finally, Amazon Redshift, AWS’ large-scale fully managed data warehouse service, announced support for the deployment of Redshift across multiple availability zones with Redshift Multi-AZ. Redshift already actively increases availability and reliability by automatically backing up clusters in case of critical failures and allows workloads to relocate to other clusters without applications noticing. However, with multi-AZ, clusters are deployed across multiple availability zones simultaneously and are still managed as a single data warehouse with one endpoint. As such, if a zone has a failure, live data can be shifted quickly to another zone.
“The new capabilities announced today build on this by making it even easier for customers to query, manage, and scale their data to make faster, data-driven decisions,” said Sivasubramanian.
Photo: Robert Hof/SiliconANGLE
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.
Source link