Conjoint Analysis using R

Posted on Updated on

Combining concepts from Choice theory (Economics) and applications of Parameter estimation (Statistics), Conjoint analysis is a quantitative market research technique which helps identify which attributes of a given product or service users most value.

Employed in the areas of Product development, Marketing (Product positioning), and Operations, knowing user choices allow to improve product offering (by modeling and testing focused set of product options), and best allocate resources.

A conjoint study typically starts with identifying an approach exercise starts with the design of a Conjoint research following by estimating model parameters. This involves breaking the product/service into constituent parts (attributes) to build profiles and gathering preference data through surveys.

Design of conjoint study typically involves the below steps:

  1. Recognize approach (type of analysis required in the situation)
  2. Identify attributes and assign levels
  3. Define utilities and design experiment (based on choice situations)
  4. Parameter estimation and synthesize results
  5. Develop implications

For the purposes of the post, I present a hypothetical situation where I want to develop the next billion-dollar cola product based on what attributes cola drinkers most like.


In this survey, I had 10 respondents rate 8 profiles of cola products. The profiles have been created based on varying levels of a cola drink’s taste profile. These distinguishing attributes used are “kola (the kola nut bitter to ending in vanilla-sweet flavor)”, “fizz (effervescence)”, “sweetness” and “price”.


Based on the profiles presented, the respondents were asked to provide their liking for each of the product profile.



#Read in the files
profiles <- read.csv("profiles.csv", header=T, na.strings=c(""))
preferences <- read.csv("preference.csv", header=T, na.strings=c(""))

#Add the levels
levels <- c("low", "high", "low", "high","low", "high","1.5", "2")
levels.df <- data.frame(levels)

For simplicity, the four attributes identified have been configured with levels across the flavor profiles and price.

Once the data is loaded, we will call the conjoint function by passing three inputs:

  1. (dataset: preference) Survey responses from participants with ratings across each of the flavor+price profile created
  2. (dataset: profiles) Profiles created based on varying levels of flavors and price points
  3. (dataframe: levels.df) Levels across the four attributes

Plot: Utilities of individual attributes
Plot: Average importance of factors

Looks like based on survey results, the most appreciated attributes are fizz followed by kola. So we might want to create a cola with higher fizz and kola flavor (like Thums up). Sweetness is relatively less important but based on the positive utility of the attribute “sweetness”, there are respondents who have liked a sweeter cola. Of all the attributes, the least important is price. This in fact could be observed by eyeballing the two paired profiles which have the same taste profile but a different price (for example: profile 1 & profile 2 are such a pair). This means people are ok to pay $0.5 premium for the cola which most appeal to them.

The challenges in setting up a conjoint study are often complex and multi-faceted. In a more realistic setting, typically products or services have both more attributes and levels which leads to a huge number of possible profiles to evaluate for candidacy. And during evaluation of these large set of profiles by respondents, the results are subject to various type of response bias.

Data aggregation using dplyr package in R (sample snippet)

Posted on Updated on

In this post, I am using SuperStore data to explore some of the data wrangling functions from these two packages.

The data is in the form of an Excel workbook with three sheets namely – Orders, Returns, and Region. I am loading the three different sheets into separate datasets into R, joining them and performing necessary aggregations.


#For accessing and dumping excel files

#Used for data wrangling

#Used for data wrangling

#Load three individuals sheets into separate datasets 


# Sum of Sales by Product Category 
 group_by(Product.Category) %>%
 summarise(Total.Sales = sum(Sales)) %>%


#Join data sets and aggregate as per requirement

# Inner join the two data sets Order and Users by Region and look at Total Sales by Region %
 group_by(Region) %>%
 summarise(Total.Sales = sum(Sales))


Spotfire IronPython: Reset Markings

Posted on Updated on

Here is a code snippet for removing markings across visualizations/data tables within an analyses. This script will not reset the filters and only work on the marking. To illustrate this, I have constricted the date range for couple of date filters in the filter column.


I will now make some selections on the bar chart


To remove the markings without resetting the applied filters, we can execute the script through the button Reset Markings


Code snippet for reference: Reset_Markings_4

# Import required libraries
from Spotfire.Dxp.Data import *
from Spotfire.Dxp.Application.Filters import *

def resetMarking():
# Loop through each data table
for dataTable in Document.Data.Tables:
# Navigate through each marking in a given data table
for marking in Document.Data.Markings:
# Unmark the selection
rows = RowSelection(IndexSet(dataTable.RowCount, False))
marking.SetSelection(rows, dataTable)

# Call the function

Below is one more use of the snippet applied for multiple visualizations which use cascading markings.


When the analysis contains multiple markings, then visualizations using these multiple markings have to be set individually. Below I am removing the marking for a single data table.


Though we Unmark marking for a single visualization, the markings which have got applied through cascading are not reset in other visualizations. In the bottom visualization, the markings are not reset as it inherits markings from the center visualization.


In such cases, our script should do the job.


Cohort Analysis

Posted on Updated on

Cohort Analysis is a technique used to analyze characteristics of a cohort (a group of customers distinguished on a common characteristic) over time. It is actually another type of customer segmentation which extends the analysis over a defined period.

One of the frequently applied use case in sales function is to segment customer base based on some set of characteristics. The criteria could be to categorize them into groups who are likely to continue buying, who are likely to defect or who have already defected (went inactive).

Once these groups are formed, some of the common applications for analysis would be to:

  1. Study customer retention – use the results to learn about conversion rates of certain groups and accordingly focus marketing initiatives (may be try to focus on customers who could be retained)
  2. Forecast transactions for cohorts/individual customers and predict purchase volume
  3. Bring more business – Identify groups for upselling and cross-selling
  4. Estimate marketing costs by calculating lifetime value of a customer by cohort
  5. Improve customer experience based on individual customer needs across websites and stores


Marketing Analytics

Posted on Updated on

Marketing is hugely important for a business to succeed. Being able to clearly define marketing objectives and accordingly prioritize on marketing spend is one of the major challenges marketers face. And in order to tune their approach, marketers need important metrics from various business functions to determine marketing effectiveness. Below is an attempt to categorize some of the generally applied analytic techniques that can be used to measure the marketing performance.

Our first step in this analysis would be to identify relevant data sources and develop automation capabilities to streamline data into well-defined repositories. Next, we could use a combination of descriptive and predictive analytic techniques to gain insights. And further we could integrate different models and automate their execution to perform prescriptive analytics for continuous monitoring and feedback.

Marketing drives sales and sales in turn should help improve marketing strategy. Let’s look at some techniques to identify sales patterns and then work on improving our mix of marketing activities.

Applications Applicable Tools/Techniques Required Measures/Expected Results
Sales Performance (Descriptive) Visualizing data using Time Series Analysis and other metrics using standard/ad hoc reporting and operational dashboards that cater to different audiences Use accumulated data over time to learn about correlations and identify patterns
ARIMA models for time series data
Sales Performance (Predictive) Simple and multiple linear regression techniques for forecasting and simulation Determine future possibilities and predicting events to make more informed decisions
Customer Service
Applications Applicable Tools/Techniques Required Measures/Expected Results
Customer Acquisition and Retention Logistic Regression (Churn Analysis) Using historical data to identify ingress and egress of customers
Customer Segmentation Cluster Analysis Identify potential markets and improve on promotion, product, pricing and distribution decisions
Decision Trees
Hypothesis Testing
Product and Brand Feedback Text Analytics using Natural Language Processing Toolkit from Python Analyze unstructured data from social media platforms such as Facebook, Twitter, Yelp etc.
Sentiment Analysis using Stanford NLP
Customer Loyalty Logistic Regression Understand customer behavior and improve decisions around targeted promotions
Multivariate Analysis using Factor Analysis, Principal Component Analysis or Canonical Correlation Analysis
E-Marketing Clickstream Analysis (Traffic and E-commerce-based) Improve conversion and sales
Drive email marketing campaigns
Google Analytics for website statistics Search engine optimization (SEO)
Channel adaptation

Note: The above mentioned techniques can always be used across a set of problems depending on their applicability.

After analyzing the results from our analytical models, we have to take measures on improving crucial marketing activities such as generating leads, demand creation and product promotion. Further, above analysis could be used to design and implement marketing strategies including product and brand promotion, pricing strategy, distribution and customer service. And the findings can be employed in improving questionnaires and other mechanisms of collecting marketing data and customer feedback to learn about product performance and brand value.

With these new analytics capabilities, we can make predictions much more accurately and provide our marketing teams with new ideas to drive promotions and boost sales.

In general, adoption and effective application of these analytic techniques is challenging. Building the right analytics should be informed by industry knowledge and subject to the business function in context. However, this is a process which requires constructive iteration over a long term and in most cases should lead in optimizing marketing performance and delivering tremendous value to the organization.

Spotfire IronPython: Accessing Column Values in Script Context

Posted on Updated on

Below is a code snippet to pull data from a data table into Script context. The data from a particular column(s) could be used to perform validations or compared against values from other data tables.


IronPython code:

from Spotfire.Dxp.Data import *

columnToFetch='Order Date'
rowCount = activeTable.RowCount
rowsToInclude = IndexSet(rowCount,True)
cursor1 = DataValueCursor.CreateFormatted(activeTable.Columns[columnToFetch])
ctr1 = 0
for row in activeTable.GetRows(rowsToInclude,cursor1):
rowIndex = row.Index
val1 = cursor1.CurrentValue
ctr1 = ctr1 + 1
if (ctr1 == 5):

Further, we could push the data into an array for temporary storage and use as per requirement.