Wednesday, February 19, 2020
Home techsocialnetwork analytics Association Analysis in a Nutshell

Association Analysis in a Nutshell

Advertisement

In this post, I will be explaining the uses and how can one apply association analysis onto solving real-life issues. Let’s begin by defining Association analysis.

What is Association analysis?

In short, association analysis is used to determine how input variables are associated with the outputs or the relationships between them. Inputs are termed “antecedents” and outputs are termed “consequents”.

- Advertisement -

Possible applications and uses for Association analysis often include the Market basket analysis. The Market basket analysis is useful to determine what items are frequently purchased by consumers. By using the results obtained, the store can create new discounts, bundles or even a change in their layout to increase sales of targeted products.

An example of the Market basket transactions are as follows:

Itemset 1: {jam,ham,bread}

Itemset 2: {jam,milk,bread}

Itemset 3: {milk,rice,jam}

By running an Association analysis on the Market basket transactions, the analyst can obtain various relationships between the items a customer buys. For example, jam -> bread (If a customer buys jam, he/she may buy bread).

One of the most commonly used algorithms for Association analysis is the Apriori algorithm. The Apriori algorithm generates association rules in the form of antecedents and consequents, as mentioned above.

Where X = antecedent and Y = consequent and the rule = X -> Y. And the chance of X occurring is termed the “support” and Y as the “confidence.”

However, unlike usual logical rules, association rules involve some level of uncertainty. To quantify this uncertainty, we can apply the Support and Confidence Framework.

The framework incorporates the Rule support, which is the percentage of X and Y appearing together and the Confidence that Y appears when X occurs.

Rule Support = P(X and Y occurring together)

Confidence = [P(X and Y) / P(X)]

Additionally, the Apriori algorithm works best with categorical data in a tabular or transactional format. It does not work well with numeric data. For that, we would have to bin or convert numeric data into categories which I would not explain in too much detail in this post.

Tabular data format, aka truth-table or basket data, is represented by having a flag field indicating the absence or presence for each item as seen in the table below.

IDItem 1Item 2Item 3
Cust 1TTT
Cust 2FFT
Cust 3TFF

Unlike the tabular data format, the transactional format has a separate record for each transaction or item as seen in the table below.

IDItems
Cust 1A
Cust 2B
Cust 3C

Thus, by applying the Apriori algorithm, we can generate rules based on user-specified support and confidence %. This can be seen as the threshold for which association rules are created.

However, not all rules with high support and confidence value are useful. For example: If nearly all customers buy jam and almost all customers buy bread, the confidence will be high regardless of whether there is any real association between these variables.

There are also alternatives which one can use to establish association rules. Several techniques include:

  • Confidence Difference
  • Confidence Ratio
  • Information Difference
  • Normalised Chi-Square

Well, this pretty much sums up Association analysis, what would you apply Association analysis on?

- Advertisement -
advertisement

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Interested in learning more? Check out this selection of books.

advertisement

Must Read

video

Making the Web Accessible

Strategies, standards, and supporting resources to help you make the Web more accessible to people with disabilities. Source and...

The Rise of AI

History of AI Credit: https://earlybirdz.co.in/2020/02/08/the-rise-of-ai/ Arguably, Artificial intelligence or AI debuted at a conference at...

Math in Data analytics

Credit Author: Vaish https://myworldofelectronics.wordpress.com/2020/02/05/math-in-data-analytics/ Digital data is growing at a very rapid rate, and changing the way we live....

Python Training: Intro to Python

# this just gets the notebook to print all the output from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" Credit Author: Andrew...

Echo Show devices can now add items to your shopping list by barcode

If you manage your grocery list using Amazon’s Alexa, good news: It just became easier to add items in need of restocking....