Big Data Analytics with TIBCO Spotfire®

Interactive, Visual Analytics for Hadoop and other Big Data Stores

Democratizing Big Data with Visual Analytics

Turbo-charge your business analytics and address your routine to complex Big Data challenges with the Spotfire analytics platform. Spotfire is the only platform that empowers business users with an intuitive, easy-to-use interface to leverage the full spectrum of big data analytics technology, without requiring any data science or IT expertise.

The Spotfire interface remains consistent whether you are analyzing a small dataset or performing advanced analytics on a multi-terabyte big data cluster with complex data from sensor, social, Point-of-Sale (PoS), and geo-location sources.  Users of any skill level navigate rich, insightful dashboards and analytical workflows simply by interacting with visualizations that represent aggregations of billions of data points.

Overview

Big Data Connectivity for High Performance Analytics

Big Data Connectivity for High Performance Analytics

Spotfire offers three primary types of native integration with Hadoop and other big data sources:

  • Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations.
  • Performing Calculations:
    • Bring the engine to the data: Integration with in-datasource distributed computing frameworks that enable data calculations of any complexity on big data.
    • Bring the data to the engine: Integration with external statistical engines that get data directly from any data source, including traditional databases.

Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures with dashboards and workflows that are powerful and easy to use.

Big Data Connectivity for High Performance Analytics

Spotfire offers three primary types of native integration with Hadoop and other big data sources:

  • Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations.
  • Performing Calculations:
    • Bring the engine to the data: Integration with in-datasource distributed computing frameworks that enable data calculations of any complexity on big data.
    • Bring the data to the engine: Integration with external statistical engines that get data directly from any data source, including traditional databases.

Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures with dashboards and workflows that are powerful and easy to use.

Big Data Connectivity for High Performance Analytics

Big Data Connectors

Spotfire Big Data connectors support in-datasource, in-memory and on-demand data access modes. As a result of this data access flexibility, fast interactive visualizations are made possible such that data calculations occur within the data stores and the data is moved into client memory if and when it is needed. Spotfire native data connectors include:

  • Certified Hadoop data connectors for Apache Hive, Apache Spark SQL, Cloudera Hive, Cloudera Impala, Databricks Cloud, Hortonworks, MapR Drill and Pivotal HAWQ
  • Other certified big data connectors include Teradata, Teradata Aster and Netezza
  • Connectors for OSI PI historical and real-time sensor data sources

Learn more about data access with Spotfire data connectors.  

Big Data Connectors
In-Datasource Distributed Computing

In-Datasource Distributed Computing

In addition to convenient Spotfire point-click SQL operations running distributed within the datasource, advanced statistical and machine learning algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results needed for visualizations in Spotfire:

  • Users interact with point-and-click dashboards that call scripts using the TERR instance embedded in Spotfire.    
  • The TERR scripts initiate distributed computing jobs via Map/Reduce, H2O, SparkR, or Fuzzy Logix.  
  • These jobs drive high-performance engines deployed on the Hadoop or other datasource nodes.  
  • TERR can be deployed as the advanced analytics engine in Hadoop nodes that are driven by MapReduce or Spark. It can also be called on Teradata nodes.  
  • Results are visualized in Spotfire.

In-Datasource Distributed Computing

In addition to convenient Spotfire point-click SQL operations running distributed within the datasource, advanced statistical and machine learning algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results needed for visualizations in Spotfire:

  • Users interact with point-and-click dashboards that call scripts using the TERR instance embedded in Spotfire.    
  • The TERR scripts initiate distributed computing jobs via Map/Reduce, H2O, SparkR, or Fuzzy Logix.  
  • These jobs drive high-performance engines deployed on the Hadoop or other datasource nodes.  
  • TERR can be deployed as the advanced analytics engine in Hadoop nodes that are driven by MapReduce or Spark. It can also be called on Teradata nodes.  
  • Results are visualized in Spotfire.
In-Datasource Distributed Computing

Putting it all together

Combining all these powerful functionalities means that very sophisticated and robust analytic use cases can be encapsulated in easy-to-use interactive workflows. This empowers business users to visualize, analyze, and share the results without worrying about the details of the underlying data architecture. 

Example: Spotfire interface for configuring, running and visualizing the results of a model that identifies characteristics of lost shipments. Through this interface business users can perform calculations using both TERR and the H2O distributed computing framework against shipment transaction data stored in a Hadoop cluster.

Putting it all together