-
In this week's post, we will introduce you to another excited new feature of TIBCO Spotfire 3.1, Conditional Coloring. Let’s assume you have a Data Table which provides sales data for a particular store. The sales are broken up into three separate product categories: Furniture, Office Supplies, and Technology, and sales are measured across the different store locations per state.

Let’s assume we want to highlight the top 5 sales result and the bottom 5 sales result for the combination of products and states. To do this, we can use TIBCO Spotfire’s Conditional Coloring to color the top 5 values green and the bottom 5 values red.
We configure these settings inside the Color section of the Table’s Properties Dialog. First we create a Color Scheme Group that includes all the product columns: Furniture, Office Supplies, and Technology. We then add a Conditional Coloring rule, using the ‘Add Rule’ button and select the Top rule and set the value to 5 as shown below:

This will then color the table so that the top 5 highest sales are colored green.

As you can see, all 5 are from the office supplies column. What if we wanted to see the top 5 for each product? We then change the color scaling for this group to be by Column.
 The result now shows the top 5 sales in each product colored.

We can now repeat this, but using the Bottom rule instead of Top to color the bottom 5 sales in each product red.

Some of the additional benefits of using the Conditional Coloring Rules are that the coloring is updated to reflect the changes in the data , including filtering. In addition, you can expose the values for top and bottom as Property Controls so that the business user can alter the values when desired.
In the screen shot below, you can see we exposed the top and bottom values as a slider control to let the user define how many values to color.

More details on Conditional Coloring, including Color Schemes, Color Scaling, Using Expressions in Coloring, additional Conditional Coloring Rules, and other updates are discussed in detail in both the 3.1 Delta Training Course and the TIBCO Spotfire Business Author Bootcamp.
|
-
Last week we discussed an important new concept in TIBCO Spotfire version 3.1, scenario analysis. This week we will continue learning about new 3.1 features by discussing another type of analytics: Predictive Analytics.
With TIBCO Spotfire version 3.1, you can communicate directly with S+ or R to execute scripts and functions. You can then return the data back into Spotfire as rows, columns, tables, and even values to properties, like Document Properties, Data Table Properties, and Column Properties.
For this example, we will be looking at interest rate differentials (IRD) (http://www.investopedia.com/terms/i/interest-rate-differential.asp) for US and Canada from February 1976 to June 1996 and will try to calculate the IRD for future months.
 The first thing we need to do is to create a S+ or R script that will take the data in and will predict the IRD out 5 months. There are a variety of ways to do time series predictions: two of the common ones are Spline Analysis and the ARMA process (http://en.wikipedia.org/wiki/Time_series) For this tip we will use the ARMA process. If you would like additional information or training on other models, please consider taking our S+ or R training courses.
To execute the S+ or R script in Spotfire we have to create a Data Function. A Data Function is basically a S+ or R Script or Function that is stored in the Spotfire Library to be used in Spotfire analysis files.
We create the Data Function by going to Tools > Register Data Functions and filling in the required information. The first part is filling in the script itself: 
Then we need to define the input parameters. These are the expected inputs going into the script. In this case, we are going to expect the script to receive a Data Table.
 We then define the output via the output tab. In this case, we are expecting the output to be a Data Table called result.

When you are done, you click the Save button to save the Data Function in the Spotfire Library.
Next, you need to add the Data Function into your document. To do this, you go to the Tools menu and click on the Data Functions tool and select the desired function. 
Once you click OK, you can now define what to send into the Data Function as inputs and what to send the output of the Data Function to. These are called Input and Output Handlers. The Input Handler will be both the date and Interest Ratio Differential columns from the Data Table. 
The Output Handler will be a new Data Table called result.

Once we click OK, the calculation will occur and return a new Data Table called result, which includes the original date column, a value column for the Interest Rate Differentiator, and then a column called valueType which tells us if the row was an original row or a new, predicted row.

We can then create a line chart showing the date on the x-axis and the average value on the y-axis and color by Value Type, so you can see the predicted months at the end.

To take this example further, it is likely that a consumer may want to adjust how many months to predict. They may also want to adjust some variables in the algorithm. With TIBCO Spotfire 3.1, we can implement this by exposing variables like the length of prediction (in months) as properties where the consumer can pass in the value.
See the example below, where the consumer set the number of months to predict to 3 and the result Data Table and Line Chart get updated to show 3 month’s predictions.

If you would like more information on Data Functions and how to create them, please consider taking our 3.1 Delta Training Course SP1631 For more information on the specific algorithms and statistics we used, please consider taking our S+ or R training courses.
|
-
I’d like to be one of the first to start introducing you to the amazing new features of TIBCO Spotfire Professional 3.1. This will kick off an arc of tips related to 3.1 features. This tip will introduce you to the updated Property capabilities.
If you remember our previous tip discussing Column from Marked, you will remember that one use of it was a workaround to allow users of the Web Player to update axis in visualizations. You may also remember an earlier tip on using Global Variables inside TIBCO Spotfire , where we discussed creating a document property that held the exchange rate, which could then be used in various visualizations in place of hard coded values.
What if you wanted to allow consumers who use the web player or any client to update the property themselves? This would allow consumers to update the expressions included in many visualizations, creating a kind of scenario analysis.
Let’s assume we have a Bar Chart that shows the initial amount of an investment for three separate investments (100, 200, and 300).

We can then create a new property, just as we could have before version 3.1. If we want to create a Document Property , we can do that from the Edit > Document Properties. In this case, we will create a property called rInterestRate. It will be a Real and the starting value will be 0.05. There are new ways to create these properties as well in the UI for 3.1, but we will not be discussing those in this tip.

Now that we have the property created, we need to include the properties into our logic for our analysis file. Let’s assume we want to create a second Bar Chart, and this one will show amount of the three investments after 10 years using a user specified interest rate. This user specified interest rate is going to be stored in the property we just created.
In this example, we will right-click on the Y-Axis of the second Bar Chart and select Custom Expression…
In the resulting Custom Expression dialog, there is new support to enter a property into the expression. There was a method for doing this before 3.1, which is still supported, but there is a new and improved Custom Expression dialog which shows all the available properties as a pane. This will allow you to highlight the property and click the Insert Property button to add the property into your expression.
The specific expression we need to use to calculate the total of your investment over 10 years is the following:
Sum([Amount]) * Power(1 + <<interestRate>>,10)
where <<interestRate>> needs to be replaced with the interest rate property.

If you use this method, you will notice a new syntax, ${propertyName}, has been used to add your property into the expression .This is a preprocessor , which will preprocess the variable and replace it with the value back into the expression. This allows you to use the precocessor syntax in other locations of the expression, if you are familiar with our expression language, included in the value for the As keyword.
You can now see the second Bar Chart displaying the total of each of the 3 initial investment amounts over 10 days using the default 0.05 interest rate.

The final step is to now expose the rInterestRate property to Business Users so they can update the value themselves, thus updating the expression and the second Bar Chart.
To do this, we can use the 3.1 Property Controls (new in version 3.1) to expose a property in a form control, which can be set. This is supported in the Text Area. When in edit mode in the Text Area, click on the Property Control button, as shown below and select one of the form controls to use. Assume we want our interest rate to go from 0.0 to .10, we could use a list box, a drop-down list, or a Slider. A slider is probably most appropriate so we will select that.

When defining the property controls in the resulting dialog, we hook these controls up to a specific property. In this example, we will attach the control to the rInterestRate property we created earlier.

Once we do that, we can add a little text before it, and we have a property control which allows users to update the rInterestRate property:

If we, for example, update the slider to 0.08, the second Bar Chart will update accordingly:

Later on, we can add another property control, this time an input box, to allow the user to adjust the number of years as well:

If you are interested getting formal training on all new 3.1 features, please register for one of our 3.1 delta trainings. They are delivered using our blended training model which provides access to asynchronous lectures and demos for 30 days, and then provides access to a live 2 hour webinar to review concepts and ask questions of a live instructor. Exercises and solutions are available as well for people to download and use when they have 3.1 installed, or they can just take the training without doing the exercises (no need to have 3.1 installed to get value from the course). The course is $300 globally.
Topics include new ways to create parameterized applications, enhancements to coloring, including conditional coloring, calling S+ and R scripts and functions, new and updated visualizations and statistical capabilities, API updates, Administration updates.
Click here for registration details.
|
-
Data preprocessing describes the concept of processing raw data to prepare it for analysis. The preprocessing will transform the data into a format that will be more easily and effectively processed for the purpose of the data analysis. There are a number of different tools and methods used for preprocessing. Some people will perform preprocessing in a database, others will perform it in a 3rd party application, and yet another solution is to use Spotfire to both pre process your raw data and also analyze the output.
Even though Spotfire is quite good at supporting large scale enterprise-wide analytical applications, it is also useful for smaller-scale analysis. Typically, in this scenario, the data that needs to be quickly analyzed is not in a suitable form, but there is too much overhead pre-processing them in another application. This is where some of the built-in capabilities come in handy. Spotfire provides a variety of capabilities to pre-process data before (and after) it is loaded inside Spotfire. One such capability we will look at in this article is unpivoting and pivoting.
Assume you have survey data which asks a series of questions where responders can choose one of many possible responses. When the data is collected, it may be stored in a data table as shown below:

However, we want to analyze the distribution of results for each individual question. To do this, we need to transform the data. But how? This is actually a two step process.
The first process is we need to normalize the data so that we have a single column for the questions asked in the survey and then another single column for the responses. We do this by using the ‘unpivot’ transformation. We select the questions we want to analyze as the ‘Columns to transform’ as shown in the screen shot below:

At this point we have the normalized data we need. The next step is we need to then pivot the data to have a row for each unique question which has fields showing the count for each response for the question.
You can do this using the ‘pivot’ transformation to pivot as shown below.

Using TIBCO Spotfire Professional, you can combine both the unpivot and pivot into the data access process so this process happens before the data is even loaded into TIBCO Spotfire (or you can also do this to the raw data set after it is loaded in TIBCO Spotfire so you can have both the raw data and the transformed data into Spotfire at once):

The final result is one row for each question and a column for each response:

If you have many different questions with different responses, then you should unpivot and pivot each grouping so they end up as a different data table in Spotfire.
|
-
A couple weeks ago we did a post that discussed how to rank something over various groups in their dataset. As a variation of this, many situations arise where customs need to find values in a specific column inside an expression available in another column. We will explain this scenario with an example.
Let’s assume we have the following data set: 
As part of the analytics required for a workflow, we may need to retrieve the value from the Volume column that corresponds to the last Date entry per month. This value will eventually be used in another expression. So , for example, using the data above, for November 2009, we want to use the Volume value from Nov 20,2009 (the last entry in November). This value is 32212. First, we need to use what we learned in our post from a couple weeks ago to create a unique id column for each month. For this we can create a calculated column using the expression below , and call the column UID:
integer(Year([Date])*100 + Month([date]))

Now, what we need to do is ideally use the expression below to get the value from the Volume column:
Sum(if(([Date])=(Max([Date]) OVER [UID]),[Volume],0))
However, this will not work, as we cannot put the Max([Date]) aggregate function inside another expression like this.
To solve this, we need to add an intermediate step, which is to create another calculated column. This calculated column will contain the last date entry per each month:
Max([Date]) OVER UID 
Once we have this, assuming the newly created column is called “Max Date per Month”, we can use the following expression to retrieve the value from the Volume column for the last entry in each Month:
if(([Date]) = ([Max Date per Month]),[Volume],0)

To go one step further, what if we wanted each row to have the value corresponding to the last entry for that month, so all rows from November 2009 would have the value 32212? For this, we would need to add another OVER statement to group by the Max Date per Month.
Sum(if(([Date])=([Max Date per Month]),[Volume],0)) OVER ([Max Date per Month])
Since this has an OVER method in it, we need to add an aggregation method. For this, we used Sum. 
|
-
This week’s tip explains two concepts for drilling down into your data: marking and filtering.
Marking is the ability to visually highlight records of interest very easily. The effects of marking are not just to visually highlight data on the page, it is also so that we can apply further analytics to that marked data, whether it is applying statistics to it, exporting it, displaying it in another visualization, or a variety of other capabilities. In this tip we will look at using marking to create details visualizations.
In the example below, we show sales data from stores in four locations: Boston, Los Angeles, New York, and Seattle. Suppose we want to drill-down into the data and only look at sales from the Los Angeles stores?
For this, we can setup a details visualization, which will only show data that is marked in another parent visualization.
Assume the Bar Chart below is the parent visualization:

If we want the details visualization to be a Line Chart, only showing data marked in the Bar Chart, we can right click in the Bar Chart and select ‘Create Details Visualization > Line Chart’.
This will create a new Line Chart that appears just like any other Visualization. We can configure it just like any visualization also. However, there will be no data loaded in it by default.

Since this is a details visualization, we only see data when records are marked in the Bar Chart.

With Spotfire, you can also define multiple markings, so you can create a series of unique markings, each their own details visualization(s), which can also span across multiple pages. We will see an example of this later on.
We can also combine marking with filtering. Filtering allows us to filter out records based off certain criteria using the Filter Panel.
Suppose we only want to look at one specific age group in our analysis? Let’s say, for example, we now want to only show customers in Los Angeles who are in our target age group of 45-60. We can then update the ‘Customer age’ filter in the Filter panel to only show records between 45 and 60 using the slider.
This will now update all our Visualizations to only show records where the customer age is between 45-60. Combine this with the criteria we already used to drill-down into Los Angeles stores only, we are able to see and analyze a specific cross-section of our customers.

Want to see what happens when you put this all together? View our latest Analytics in Action file where we walk using through a Spotfire analysis to help the store answer two analytical questions: Which 50 customers should they target for a marketing campaign, and which single customer should they select as a new spokesperson. You will see marking and filtering, combined with some other Spotfire techniques like bookmarks, used to create a Spotfire analytical file. Also note that use of multiple markings, each with their own sets of details visualizations…try doing this in Excel.
http://ondemand.spotfire.com/Public/ViewAnalysis.aspx?file=Public/Spotfire%20Analytics&waid=b4ffe34134c8533b76e30-c14e
|
-
The last couple tips have been more advanced uses of expressions, so I thought I would follow those with a basic, yet powerful example. The first thing that came to mind was to explain how to rename an expression. Even though you can create powerful and complex expressions to alter how the data in your visualizations is displayed, there are also very simple expressions. In fact, expressions are used behind the scenes even when setting an axes value through the User Interface.
Suppose you want a Bar Chart that displays the Sum of ‘Electronics’ sold per ‘Store Location’. You can set the X-axis to use the ‘Store Location’ column and the Y-axis to use the ‘Electronics’ column and select Sum as the aggregation method.
 When you do that, you actually have created a basic expression. You can view the expression dialog (by right clicking on the axis and selecting ‘Custom Expressions…’) to see the expression created:
 Let’s say you do not want the axes to show Sum(Electronics). If you want to rename the output, you can use the ‘Display name:’ field , also in the expression dialog, to enter the display name,

This works great. However, let’s say you want the Bar Chart to have multiple columns on the y-axis (or any access for that matter). For example, the Bar Chart below shows both the Sum of Electronics and the Sum of Furniture on the Y-axis.

The expression used internally for this is:

With this, we cannot use the display name field to rename the output as there are multiple columns used . If you want to rename the output expression for each column, you must use the As keyword inside your expression.
The As keyword will be paired with an expression and allow you to specify an output that is displayed anywhere that expression should be displayed (in all the axis selectors, tooltips, labels, etc…)
In this example, we can right click on any of the y-axis columns and select ‘Custom Expressions…’ and enter the following in the expression dialog:
You can see from the result below that anywhere the expression is used, in this case the Color By and y-Axis, the display names set from the As keyword are used.
 When using the As keyword, if your desired display name is a single string without spaces, you can just enter it as is (i.e. As Electronics). However, if your display name is multiple strings, you need to enclose the name in brackets (i.e. As [Total Amount of Electronics Sold])
The As keyword as well as many other features are discussed in our SP 141 Computational Analytics class , which can be taken using one of our various delivery options: instructor-led onsite or public, blended training, or our bootcamp offering.
|
-
One request that we get a lot is a way to
calculate the difference between two dates, but only using business or working
days (so not including the weekend). As
with many of these types of solutions, we will need to create a new calculated
column to hold these values.
If we just wanted to calculate the number of
days between two dates, we can do that easily using our DateDiff function
available anywhere you can create an expression in Spotfire. For example,
assume we have a column called ‘ORDER DATE’ and another one called ‘DELIVERY
DATE’, the expression to calculate the number of days between them would be the
following:
DateDiff("day",[ORDER
DATE],[DELIVERY DATE])
However, this calculation includes all days,
not just weekdays.
Suppose we want to output the number of
weekdays (not Saturday or Sunday)? We can use a more complicated expression to calculate
only days in between the two dates that are Monday through Friday:
((((Integer(((Integer(DateDiff("day",[ORDER
DATE],[DELIVERY DATE]) / 7) * 5) + DayOfWeek([DELIVERY DATE])) - DayOfWeek([ORDER
DATE])) + (if((integer(DayOfWeek([ORDER DATE])))>(integer(DayOfWeek([DELIVERY
DATE]))),5,0))) - (if((integer(DayOfWeek([DELIVERY DATE])))=(6),1,0))) +
(if((integer(DayOfWeek([DELIVERY DATE])))<>(0),1,0))) + (if((integer(DayOfWeek([DELIVERY
DATE])))=(0),1,0))) - (if((integer(DayOfWeek([ORDER DATE])))=(0),1,0))
If you have a non-standard work week (like
you also include Saturdays), you can update the expression above to include
Saturdays and only exclude Sundays.
This of course will not pick up holidays that
fall into the work week. If you want to
include this, there are a couple of workarounds that are not ideal , but will
do the trick. One such workaround is to create an if statement for each holiday
you want to include, and check if the holiday is between the ‘ORDER DATE’ and ‘DELIVERY
DATE’. Then sum up all of the if statements and you will have a list of
holidays that fell between the two dates.
You can subtract this from the first column you created and you have
your working days that exclude holidays (assuming you only include the holidays
that fall on working days)…I did say it was a workaround after all.
As an example, suppose we wanted to exclude
Christmas and Easter. In 2009, Christmas was the 359th day of the
year and Easter was the 102nd day of the year. We can then create another
calculated column that gives us the
total number of holidays falling on weekdays between our two dates, using the
following expression:
sum(if (DayOfYear([ORDER DATE]) < 359 and 359
< DayOfYear([DELIVERY DATE]),1,0),if (DayOfYear([ORDER DATE]) < 102 and 102
< DayOfYear([DELIVERY DATE]),1,0))
Of course ,maintaining a hard-coded list of
holidays inside your calculation may not be the most effective solution. If you
want a more robust solution that allows the user to enter the holidays in ,
then we would recommend using our SDK to either create a tool that allows the
user to configure a working week and specify holidays, or create a calculation
that shows up in the function list inside any expression dialog. Similar to the
tool, the calculation would take in as arguments, the two date columns, and
then a list of strings defining the work week (i.e. ‘M,T,W,Th,F’), and also a
listing of holidays (‘Apr-12-2009, Dec-25-2009,etc…’)
|
-
One common question we hear all the time in training is how a person can rank something over various groups in their dataset. The answer to this depends on your groups and how they are constructed. Ill explain by using two examples.
First, let’s assume your groups are subjects for a film. 
Since all the records in each desired group (subject) have the same values, we can use the rank function inside our expression language and build a calculated column to define the rank per subject. In this case, let’s assume we want to rank the length of each film in each subject, in descending order.

The Rank function takes in one mandatory argument and two optional arguments. The mandatory argument is the column you wish to rank (in this case the Length column). The next argument, which is optional, defines the sort order. The default value is ascending “asc”, and if you want to sort in descending order you use “desc”. The final argument, also optional, defines a column you want to group the rankings by. In this case, we want to group based off the subject column.
This works fine and the length of the movies in each subject are ranked independently of movies in other subjects. But what if we wanted to group the rankings based off something that does not have unique values, like the Subject column does? Well, the best answer is to create a new column that does create a unique value for each group. The example we will show below is using a Date column where we want to group by each month in each year.
In the data shown below, each value in the date column is unique.

We can create an intermediary calculated column to create a value per month/year. To do this, we can retrieve the Year from each date, multiply it by 100, and then add the numeric representation of the Month to it (1 for Jan ,2 for Feb, etc…). Finally we cast it all as an integer and we get a Column which has the same value for all rows of the same month and year.
 
You can then use the UID column as the grouping column inside the Rank function:
|
-
Sometimes you would like to link certain values in your document to
external information, maybe a google search, maybe a database that
pulls back images or video, etc… You can always create a hyperlink
inside of a Text Area to goto an external website, but there is no way
to pass in specific cell values. This is where the Link Renderer for a
Table visualization comes in handy.
Imagine you have a Table included in your Analysis file , which shows a list of films from 1920 to 1997, as shown below. 
You
may want to link this analysis to some outside source, say the Internet
Movie Database (IMDB.com) so that if the user wants more information on
one of the films, they can be redirected to that webpage passing in a
value from one of the cells as a key.
To do this, you need to open the properties dialog for the table and click on the Column section, as shown below: 
Then
in the ‘Selected columns:’ pane, select the column you wish to apply
the hyperlink to and change the ‘Renderer’ option from Text to Link.
You then need to click the ‘Settings’ button to configure it.
In
here, you need to specify the URL that you want to connect to and you
can use the {$} keyword wherever you want to enter the cells value.
For the IMBD URL, there is a query parameter called q which allows you
to pass in the title. An example of this is shown below:

Once you click ‘OK’, each cell in the Title column inside your Table becomes a hyperlink:

Interested
in learning how to become an expert Spotfire Business Author? Enroll in
our winter bootcamps now. Bootcamps are an intensive 8 week learning
program, using a unique blended delivery model. Students get weekly
notes and training modules available to consume via our training portal
at their own convenience, and this is complemented with course forums
and also 3 live 2-hour webinars throughout the course. Details are
available at:
http://inter.viewcentral.com/events/cust/search_results.aspx?cid=tibco&pid=1&event_id=592
If you have any questions, please contact us at spotfireregistrar@tibco.com
|
-
People always ask us if there is a way to show changes in values from within the same column over different intervals, like weeks or months. For example, what was the sales this week compared to the sales last week, how much did a patient weight this visit compared to last visit, or what was the change in stock price from this quarter compared to last quarter? You can do this using a custom expression, but then the values are not available to use elsewhere. If you wish to view them elsewhere (like in a table plot), you will need to use a calculated column.
Let’s assume we have the following data loaded that displays sales data per week: . If we plot this on a Line Chart , as shown below, we can see the trend of sales per week. 
We may want to be more specific and show the change in sales from one week to another. For this, we can use the previous Node Navigation Method in a calculated column expression as shown below:

The result will be a new column called Weekly Change. We can then plot the Weekly Change on a Bar chart as shown below.  What if your data contains another category you want to group by? Say you have multiple salespeople and you want to look at weekly change per salesperson?  For this you can use the same expression as before , but also use the Intersect method to say to group by the Salesperson column.
Once this is done, the newly created column contains the weekly change per salesperson. 
Notice the Weekly Change value for the first row is empty, since there was no previous value to compare it to. You can add methods into the expression to remove the null value with a zero if desired. You can also plot this on a Bar Chart using the Trellis feature to Trellis by Salesperson to see their individual weekly change.  Keep in mind the differences between using calculated columns and custom expressions. If you filter your data, the custom expression will be recalculated, but the calculated column will not be.
If you want more information on the Intersect method, the previous method (or any other Node Navigation) methods, please take our SP 141 Computational Analytics course using one of our three delivery options: onsite, regional, or blended. This course will teach you all the core concepts required to understand the Spotfire Expression Language in either a calculated column or a custom expression.
|
-
Many people are unaware of the Nest and Cross features inside TIBCO Spotfire so this tip is aimed at explaining what they are and how they can be used. Let’s assume we have the following survey data loaded in TIBCO Spotfire Professional:

We may want to plot the total number of responses for each answer category (yes, maybe, no) for each question (A,B,C). Using the default settings on a Bar Chart, we can configure it to show this info, as shown below:

This will interact with filtering, so if someone decides to filter out the Answer ‘No’ using the Filter Panel, the result will look like the following:

You can clearly see ‘No’ is removed altogether from the bar chart. What if you wanted to still include the ‘No’ category to show its there, but filtered out? If you wanted to do this, you would you would click on the ‘Settings’ button inside the X-axis Properties Dialog.

If you change the category mode to ‘Nest’, then this will show all values in the data set, regardless of whether the filter is applied. This is shown in the Bar Chart below where the ‘No’ category in the Answer column is still filtered out, but it shows up on the x-axis.

Lastly, if you want to show all possible values, you would use the Cross setting. In this scenario, lets say there is no filtering applied ,and you want to show all possible values (yes, no, maybe), even if there are no responses with that value.
This is shown in the Bar Chart below. There were no responses for no in Question A, yes in Question B, and maybe in Question C, but the values are shown in the Bar Chart as a possible value asked during the survey .
Below are screen shots showing all the three different settings in action when there is no filtering and then when the 'No' values are filtered out.  No filter
 The 'No' values are filtered out of the 'Answer' Column
|
-
One question that I get all the time when people are looking at Spotfire analysis files is “This data does not look like the raw data, what transformations and calculations were applied?” This can be useful if you need to keep a log or record of all actions applied to the data, or if you just want to understand the steps performed in order to replicate them on similar data. TIBCO Spotfire captures the ‘Source Information’ for each Data Table. This ‘Source Information’ acts as a recipe of how the Data Table was generated, started with loading the raw data, merging in any columns and rows, applying any transformations, and then adding any calculated columns, and any other manipulations on the data table , like renaming or freezing columns.
To access the Data Table properties from within TIBCO Spotfire Professional, click on Edit > Data Table Properties. Click on a Data Table and then select the ‘Source Information’ tab.
Publish
|
-
Did you know TIBCO Spotfire Professional has built in regular expression capabilities that can be used to validate and clean your data either before or after it has been loaded?
Data quality is a big issue with all customers. Whether the data quality is an issue with customers merging data from different sources that use different standards, or whether the data is coming straight from an instrument or a public forum, like a survey, the data needs to be ‘standardized’ before it is analyzed in TIBCO Spotfire.
This standardization can be accomplished in TIBCO Spotfire by using the RxReplace function inside an expression (either in the Calculated Column UI, or the Custom Expression UI).
RxReplace is a Regular Expression Replace function which allows you to search for a pattern in your data, and then replace it with another pattern. For example, search for all dates (in the form dd/mm/yyyy) and replace them with the pattern mm/dd/yyyy)
We will see this example and a few other common ones below.
Removing New Line Characters
The first example is one that happens all the time in training. We allow people to load their own data and they will typically load it from Excel. Many Excel data cells have carriage returns in them , as shown below. 
When importing this into Spotfire it imports the carriage returns as an ASCII special character, which makes it hard to read (as shown below in both the header of a table visualization and in a properties dialog).

If the columns are string based, it will not affect the analysis too much, but it can be an annoyance to look at in general. To fix it, when loading data, we can apply a ‘replace column’ transformation.
We can use the RxReplace function, passing in the original column, searching for the carriage return (identified as ‘\n’), and replacing it with an empty string.

This will input the values in an easier to view format.

Converting Date Formats
Assume you are loading data from Europe that will be merged with data from US. The date formats are different. One is DD/MM/YYYY and the other is MM/DD/YYYY. What do you do? Assuming your data is in the format show below, DD/MM/YYYY, you can use the RxReplace method shown below to transform it to MM/DD/YYYY.


Without getting into too many details on the Regular Expression syntax, the pattern we are looking for is a string that starts with 2 digits (note that the \ character needs to be escaped in Spotfire’s expression language) following by a / character, then 2 more digits, followed by another / character, and then 4 digits at the end of the string. When found, it will be replaced with the month ($2), followed by the day ($1), and then the year ($3), all separated by the / character.
And the result is just what we want. Below you can see the original column and then the updated column after the expression has been executed. 
Converting Name Formats
A third example would be one I got from Brian Prather. Assume you are in a situation where one data table had a “Names” column that was formatted as <Last name, First name> and a second data table had a “Names” column that was formatted as <First name Last name>. If you need to merge these into Spotfire, you will need to perform a regular expression transformation on one of them to standardize on one format.
Assuming we want to format into <First name Last name>, we could use the following expression using the RxReplace method:

In this expression, we want to search for a string that starts with any amount of characters followed by a comma and then any amount of characters. This will match any names in the format <last name, first name>. When found, it will be replaced with the last name ($2) followed by the first name ($1). This is since grouping was used inside the pattern.
And the result is just what we want. Below you can see the original column and then the updated column after the expression has been executed. 
This tip is not meant to be a tip teaching regular expression, you can spend months on that. There are plenty of good resources on the internet to help you out. This tip was meant to show you the power of using them to transform, clean, and standardize your data either before you load it in Spotfire, either as a new column in the data table, or as a replacement for an original column in the data table (as shown below).

If you are intersted in learning more about regular expressions and other types of data transformations, please have a look at our Data Cleaning and Transformation techniques TIBCO Spotfire Professional webinar.
Have any other cool Regular expressions to share? Reply to this post with them.
|
-
One of the most commonly asked questions I get in training
is how to show distinct rows inside a Data Table. People need the full set of rows for most of
the analysis, but may also want to be able to display a table with just
distinct values.
Assume you have a simple data set like the following: 
And in at least one location, you want to show only distinct
rows, as shown below: 
What you need to do is create a new Data Table , which takes
your original Data Table as the source and then pivots it.
To start, you goto File > Add Data Table. 
You will then choose the source type to be ‘Existing Data
Table in my Analysis’ (shown as Step 1 in the figure above), then if you have
more than one Data Table loaded, make sure you select the right Data Table
(shown in Step 2 in the figure above), and then select the ‘Pivot’ Transformation and click ‘Add…’ (shown
in Step 3 in the figure above). The Pivot Data dialog will appear.

The ‘Row identifier’ should be the first column in the data
set, (or the key column).
The ‘Column titles’ and ‘Values and aggregation methods’ should
both be ‘None’.
The ‘Transfer columns and aggregation methods’ should list
all the remaining columns in the original Data Table, with the expression set
to UniqueConcatentate for all columns.
You can then set the ‘Transfer column naming pattern’ to %T (which just references the Column name and
no aggregation method).
What this does is not really pivot the data, rather we use
the pivot tool to transfer over all the columns in our original Data Table, but
we apply a UniqueConcatenate aggregation to all the transfer columns. This will
create the distinct effect we are looking for.
When you are done, you can click ‘OK’ to close the Pivot
Data dialog and then ‘Finish’ to close the Add Data Table dialog. 
When you are done, you will see a new data table added that
has just distinct row.
You can also add a Data Relation so that the two Data Tables
are related. This will allow users to mark rows in one of the data tables , and
also mark related rows in the second Data Table. This is shown below where the
user marked a row in the ‘Distinct’ Data Table, and then the related rows were
also marked in the ‘Original Data Table’
|
More Posts Next page »
|
|
|