Big Data Analytics with TIBCO Spotfire®

Interactive, Visual Analytics for Hadoop and other Big Data Stores

Democratizing Big Data with Visual Analytics

Turbo-charge your business analytics and address your routine to complex Big Data challenges with the Spotfire analytics platform. Spotfire is the only platform that empowers business users with an intuitive, easy-to-use interface to leverage the full spectrum of big data analytics technology, without requiring any data science or IT expertise.

The Spotfire interface remains consistent whether you are analyzing a small dataset or performing advanced analytics on a multi-terabyte big data cluster with complex data from sensor, social, Point-of-Sale (PoS), and geo-location sources.  Users of any skill level navigate rich, insightful dashboards and analytical workflows simply by interacting with visualizations that represent aggregations of billions of data points.

Overview

Big Data Connectivity for High Performance Analytics

Big Data Connectivity for High Performance Analytics

Spotfire offers three primary types of native integration with Hadoop and other big data sources:

  • Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations.
  • Performing Calculations:
    • Bring the engine to the data: Integration with in-datasource distributed computing frameworks that enable data calculations of any complexity on big data.
    • Bring the data to the engine: Integration with external statistical engines that get data directly from any data source, including traditional databases.

Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures with dashboards and workflows that are powerful and easy to use.

Big Data Connectivity for High Performance Analytics

Spotfire offers three primary types of native integration with Hadoop and other big data sources:

  • Visualizing Data: Native out-of-the-box data connectors that facilitate super fast interactive data visualizations.
  • Performing Calculations:
    • Bring the engine to the data: Integration with in-datasource distributed computing frameworks that enable data calculations of any complexity on big data.
    • Bring the data to the engine: Integration with external statistical engines that get data directly from any data source, including traditional databases.

Together, these modes of integration offer a combination of visual data discovery and advanced analytics. They enable business users to access, combine, and analyze data from any underlying data structures with dashboards and workflows that are powerful and easy to use.

Big Data Connectivity for High Performance Analytics

Big Data Connectors

Spotfire Big Data connectors support in-datasource, in-memory and on-demand data access modes. As a result of this data access flexibility, fast interactive visualizations are made possible such that data calculations occur within the data stores and the data is moved into client memory if and when it is needed. Spotfire native data connectors include:

  • Certified Hadoop data connectors for Apache Hive, Apache Spark SQL, Cloudera Hive, Cloudera Impala, Databricks Cloud, Hortonworks, MapR Drill and Pivotal HAWQ
  • Other certified big data connectors include Teradata, Teradata Aster and Netezza
  • Connectors for OSI PI historical and real-time sensor data sources

Learn more about data access with Spotfire data connectors.  

Big Data Connectors
In-Datasource Distributed Computing

In-Datasource Distributed Computing

In addition to convenient Spotfire point-click SQL operations running distributed within the datasource, advanced statistical and machine learning algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results needed for visualizations in Spotfire:

  • Users interact with point-and-click dashboards that call scripts using the TERR instance embedded in Spotfire.    
  • The TERR scripts initiate distributed computing jobs via Map/Reduce, H2O, SparkR, or Fuzzy Logix.  
  • These jobs drive high-performance engines deployed on the Hadoop or other datasource nodes.  
  • TERR can be deployed as the advanced analytics engine in Hadoop nodes that are driven by MapReduce or Spark. It can also be called on Teradata nodes.  
  • Results are visualized in Spotfire.

In-Datasource Distributed Computing

In addition to convenient Spotfire point-click SQL operations running distributed within the datasource, advanced statistical and machine learning algorithms can be initiated from Spotfire to be run in-datasource on very large datasets, only returning the results needed for visualizations in Spotfire:

  • Users interact with point-and-click dashboards that call scripts using the TERR instance embedded in Spotfire.    
  • The TERR scripts initiate distributed computing jobs via Map/Reduce, H2O, SparkR, or Fuzzy Logix.  
  • These jobs drive high-performance engines deployed on the Hadoop or other datasource nodes.  
  • TERR can be deployed as the advanced analytics engine in Hadoop nodes that are driven by MapReduce or Spark. It can also be called on Teradata nodes.  
  • Results are visualized in Spotfire.
In-Datasource Distributed Computing

Putting it all together

Combining all these powerful functionalities means that very sophisticated and robust analytic use cases can be encapsulated in easy-to-use interactive workflows. This empowers business users to visualize, analyze, and share the results without worrying about the details of the underlying data architecture. 

Example: Spotfire interface for configuring, running and visualizing the results of a model that identifies characteristics of lost shipments. Through this interface business users can perform calculations using both TERR and the H2O distributed computing framework against shipment transaction data stored in a Hadoop cluster.

Putting it all together

Analytical Breadth for Big Data

Advanced and Predictive Analytics for Big Data

Advanced and Predictive Analytics for Big Data

Users interact with point-and-click Spotfire dashboards to drive a rich array of advanced capabilities that enable prediction, simulation, and optimization. With big data, analysis can be performed in-datasource, only bringing back the aggregations and results needed to populate Spotfire visualizations.

Advanced and Predictive Analytics for Big Data

Users interact with point-and-click Spotfire dashboards to drive a rich array of advanced capabilities that enable prediction, simulation, and optimization. With big data, analysis can be performed in-datasource, only bringing back the aggregations and results needed to populate Spotfire visualizations.

Advanced and Predictive Analytics for Big Data

Content Analytics for Big Data

Spotfire provides visualization and analytics on the largely untapped dimension of big data: unstructured text that is captured but hidden in documents, reports, CRM notes, weblogs, social posts, and other sources. Spotfire allows you to visually analyze text-based data in 27 languages and blend it with structured data to add context and detail and obtain deeper insights.

Content Analytics for Big Data
Location Analytics for Big Data

Location Analytics for Big Data

Multi-layer high resolution maps are an excellent way to visualize big data. Spotfire's rich mapping capabilities allow you to create maps with as many reference and feature layers as you need, including calculated advanced analytics features. In addition to geographical maps, Spotfire supports custom maps to visualize data for warehouses, factory floors, semiconductor wafers, and many others.

Location Analytics for Big Data

Multi-layer high resolution maps are an excellent way to visualize big data. Spotfire's rich mapping capabilities allow you to create maps with as many reference and feature layers as you need, including calculated advanced analytics features. In addition to geographical maps, Spotfire supports custom maps to visualize data for warehouses, factory floors, semiconductor wafers, and many others.

Location Analytics for Big Data

Machine Learning for Big Data

A broad class of machine learning methods are available in Spotfire as point and click data functions that users can invoke. Data scientists have access to the underlying R code and can extend the data function collection. The machine learning functions are shared with the user community for easy reuse.

Machine learning methods for continuous and categorical response variables are available in Spotfire and TERR including: 

  • Linear and logistic regression
  • Decision trees, random forests, gradient boosting machines (gbm)
  • Generalized additive models 
  • Neural networks
Machine Learning for Big Data
Real-time Event Analytics for Big Data

Real-time Event Analytics for Big Data

Insights from visual analytics and modeling in Spotfire can be deployed, at the press of a button, to event processing systems and scored/run on real-time streaming data. This allows you to monitor real-time data and alert end users, such as marketers or engineers, when an anomaly occurs or a new trend begin to emerge. The alerts can combine recent event data with historical data, providing context to enable users to investigate an event's importance and quickly decide on any necessary intervention. 

TIBCO Streambase is integrated with Spotfire for such real-time streaming analytics. Streambase does real-time math on streaming data; using rules and models published in Spotfire. Streambase applies the Spotfire insights to streaming data in an automated manner, pushing notifications to a wide array of channels including text, email, database, and BPM systems.

Real-time Event Analytics for Big Data

Insights from visual analytics and modeling in Spotfire can be deployed, at the press of a button, to event processing systems and scored/run on real-time streaming data. This allows you to monitor real-time data and alert end users, such as marketers or engineers, when an anomaly occurs or a new trend begin to emerge. The alerts can combine recent event data with historical data, providing context to enable users to investigate an event's importance and quickly decide on any necessary intervention. 

TIBCO Streambase is integrated with Spotfire for such real-time streaming analytics. Streambase does real-time math on streaming data; using rules and models published in Spotfire. Streambase applies the Spotfire insights to streaming data in an automated manner, pushing notifications to a wide array of channels including text, email, database, and BPM systems.

Real-time Event Analytics for Big Data

Key Features

Scalable data visualizations

Spotfire big data data visualizations can scale to represent billions of rows of data within an analysis.

Intuitive user interface

Spotfire dashboards and analytic workflows can encapsulate sophisticated use cases that enable business users to visualize, analyze, run calculations, and share the results.

Flexible data architecture

Spotfire's seamless user experience is made possible by the richness of options to access data of any size, perform calculations of any type, and efficiently visualize data aggregations or row-level details.

Agile platform

Spotfire's agile platform empowers business analysts to drive advanced analytic workflows and applications for big data and become truly data-driven.