Association Rule Learning

cem istanbullu
6 min readFeb 22, 2021

--

This article will about working flow of Apriori algorithm, association rule learning and how we able to make logical recommendations to customers.

Suggesting a tire to a customer who adds a t-shirt to their cart or to recommend R&B to someone listening to opera doesn’t make sense. How to recommend the right product to the customer and how can we make decisions to increase profits by increasing transactions?

With the use of association analysis, it is possible to come across in everywhere we use in daily life, from e-commerce sites to market shelves, from movie sites to music applications etc.. As long as you have data, you can analyze your data and make decisions based on the results.

Let’s illustrate this with real data, with two unrelated products, diapers and beer 😅.

According to the american retail company Walmart’s datas it was observed that in %47 of the total diapers sold, beer was also sold. It is statistically obvious that there is a relation between these two products.

After that analyze Walmart arrange the places between these two irrelevant products by decreasing the distance and planned to increase the profit. After a while they were seemed that arranging the rayons according to these associations increase the total transactions.

It’s time to dive on the apriori algorithm. Apriori is very useful algorithm which shows the relation of two group of product in a statistically way. So how can we make comment on these values?

I will discuss the association rule learning with a data set related to the shopping cart.

Support: Possibility of X and Y seen together.

Confidence: Probability of selling Y items when X is purchased.

Lift: When purchasing X product, the probability of purchasing Y product increases by a lift.

Antecedents: First product group.

Antecedents Support: Probability of observing supportive of the first product group alone.

Consequents: Second product group.

Consequents Support: Probability of observing supportive of the second product group alone.

How did we get here!!! 😱

I promise you that it isn’t a hard job. It is just took in 14 lines of code 🤪. All necessary algorithm functions are ready to import. You just need to prepare your data as same as in did for CRM. Here invoices are our shopping carts and we need to clean our dataset from outliers fro quantity, price and prepare it for total price.

(invoice: shopping cart) (stockCode-Description: item) (quantity: frequency)

Apriori it doesn’t concern on the amount of the product. It just take care about true or false states. So we fill the NA values with 0 and converted transacted values to 1. This output shows us which products were observed together in a shopping cart. Product quantities don’t concern us. We just need to know whether it was bought or not. So we can go further to apriori to calculate statistical rates.

I reduced the dataset to France for time efficiency

I determined a minimum support values which subjective for your own projects. Low limit can be changed. Use apriori and association_rules functions to get the rules. Let’s interpret the first line together.

10002 is the first product’ stock code and 21791 is the second product’s stock code. Probability of observing supportive of the first product alone is %2 and for second one is also %2. Possibility of first one and second product seen together is %1. Probability of selling second item when first one is purchased is %50 which says that half of the customers take second item when bought the first one. When purchasing first product, the probability of purchasing second product increases by a 17.6 which is not bad.

whole project with 14 lines of code ✌🏼

What to do with these ratios?

  • You get the opportunity to see the combination of products purchased according to a threshold value to be determined.
  • Rayon layout can be done accordingly. Companies list the campaign products between high confidence products.
  • Additional products that customers may like may be recommended.

Final topic to discuss is about recommendation in segmented customers. It is a little bit more complex than the observing the ratios. First again you need to prepare your data as same as in did for CRM and create CLTV ( I talked it about my previous story as you remember). The critical point to note here is that we expect the rules to be learned from all data and from within each segment. However, we expect the recommendations to be country-specific and segment-specific.

For example the rules will be learned from A segment in the whole dataset but the recommendation will be made to A segment in United Kingdom.

The biggest advantage of this method is that if customer data in a country becomes sterile, it can feed the data in that country from another country.

getting the ID’s || reduction of df’s according to these ids

Here I get the id’s according to the segments and store their indexes to create segment based new data frames. Now it’s time to create rules according to these segmented frames. Create_rule function is the same as the previous one and after we execute it we need to create rules country based. Every country has A, B and C segmented customers. So we must create rules for all of them to get the most acceptable recommendation.

Now machine learned all segmented country based customers association products what to be recommended.

You just need to add the recommended_product to CLTV dataframe.

You see that all segmented customers have a logical recommended product. According to these values e-commerce websites can make suggestions, music applications make suggestion on next wish. This type of analysis can increase companies profits by making customers happy.

--

--

cem istanbullu
cem istanbullu

Written by cem istanbullu

Data Scientist | Computer Engineer

No responses yet