Tableau on Big Data: How to enhance and optimize performanceAugust 23, 2019
Author: Brahmajeet Desai
Response times in Tableau depend upon several factors such as the size and complexity of your data, number of concurrent users, and the way your data is modeled. When datasets are smaller, you can improve performance by optimizing your queries, worksheets, and dashboards. Additionally, you can review your data store and optimize the tables and data structures for better performance.
Performance of Tableau on big data
However, when it comes to big data, these techniques bring only marginal performance improvements. As a result, users face significant slowdowns while working on large datasets. Poor analytics can slow down a business, and improving performance becomes critical.
Tableau is designed to simplify visual analytics. It enables users to create intuitive visualizations and analyze their data interactively. However, like any other BI tool, it is not equipped to handle big data analysis. Therefore, it should not be plugged in directly to a big data platform, whether on-premise or on the cloud.
A popular method to handle this is to pull data extracts and use them for analytics. However, if you use extracts, you will reach a performance wall as the size of your data grows beyond a threshold limit. As the size of your data increases to hundreds and billions of rows, pulling extracts can become cost-prohibitive. Besides, when you extract data, irrespective of whether it goes into an in-memory solution or an external data mart, you do not work on live data, and the delay can cost business.
Achieving high performance with a BI acceleration layer
An innovative way to enhance the performance of Tableau on big data is to prepare the data for consumption before it lands in Tableau. This can be done on a separate layer that sits between Tableau and your big data platform.
Several fortune enterprises have achieved superior performance and high scalability with Tableau by building a BI acceleration layer directly on their big data platform. This layer utilizes the unlimited storage and processing capacity of the big data platform to pre-aggregate massive data, and prepare it for Tableau consumption. Since it leverages the big data platform for pre-aggregation as well as querying, it can deliver high performance irrespective of the size of the data.
Once this layer is in place, instead of sending queries to the big data platform, Tableau connects to this layer and gets immediate responses even for complex queries. There is no need to pull extracts or move data out of the big data platform for analysis. The distributed architecture accelerates big data access for Tableau users and also solves concurrency issues.
The BI acceleration layer can also be used to store business logic. If you are doing complex calculations or have embedded business logic in Tableau, you can move them to this layer. Offloading complicated data manipulation or modeling tasks to the BI acceleration layer results in significant performance improvements.
Case Study: How Walgreens achieved instant response times with Tableau on big data
Let us take the case of Walgreens, one of the largest pharmacy store chain in the United States, to understand how they built a BI acceleration layer on their Hadoop platform to achieve instant responses with Tableau on big data. Being a hundred-year-old company with a massive network of suppliers, distribution centers, and stores, Walgreens had massive volumes of data coming from a wide variety of internal and external sources. They found it difficult to provide timely insights to their business users as their existing architecture could not deal with the scale and cardinality of their data.
Having centralized all data across their business on Hadoop, they wanted to enable business users to pull data from the big data platform fast enough to find out what was going on at a granular level. They were using Tableau for analytics, and their data volumes and query complexity had exceeded Tableau’s recommended best practices. As a result, supply chain leaders at Walgreens found it challenging to consume and digest their data in a meaningful way.
To solve this, they built a BI Acceleration layer within Hadoop using Kyvos’ OLAP on big data technology. The innovative technology helped them achieve sub-second response times for queries and enabled seamless connectivity with Tableau and other reporting tools. They could analyze two years of historical inventory, operations, sales, and supplier data amounting to hundreds and billions of rows on their Tableau dashboards without any performance issues. They could slice and dice data across several dimensions, explore it to the lowest level of granularity, and get instant answers to all their business questions. The solution helped them reduce costs and transform the way they manage their supply chain operations.
In Conclusion If you have a big data system, you can use it for pre-aggregation to enhance and optimize Tableau performance. By building a separate layer that handles pre-aggregation, you can eliminate the limitations of extracts and live connections. The new architecture will speed up big data analytics, enable concurrent access to thousands of users, and allow you to scale up quickly to meet future requirements.
About the Author
Brahmajeet Desai is the Director of Marketing at Kyvos Insights, a leading BI acceleration platform. He has been working on Big Data technologies for over 11 years now. He’s an avid blogger, a technology enthusiast, and an experienced marketer who loves to solve business challenges through innovative technology. Brahmajeet’s career spans for more than 20 years, includes several sales and marketing roles at HCL, Giesecke & Devrien, and Intellicus. He holds a Masters in Management Studies from Devi Ahilya Vishwavidyalaya, India.