Introduction

Scalpers or “Yellow Cows” have always been problematic in store data analysis, audience selections, event recruitment and target marketing. For example, when we try to select potential participants for Live store events, we used to select those customers who have high Demand per Member (DPM). However, selecting members with only looking at one or two dimensions tends to accidentally select scalpers. Therefore, there is a very high demand for a quick and scientific way to identify scalpers, so that we can make a fair selection of our valuable customers, not scalpers.

The model - “Inline Scalper Detection Machine” I am presenting in this report takes much more dimension in consideration, which can safely separate high value customers and scalpers.

Method

1. Frequency and Cadence

Before we go any further, I want to introduce the concept of Frequency and Cadence.

Frequency: How many times a customer purchased in a period of time

Cadence: How many days until a customer comes back and makes a purchase

Currently I am using frequency and cadence to help stakeholders to improve unlock box strategy. Gifts in unlock box get a refreshment in a period of time, however, if the gifts are refreshed much later than customer’s cadence, customers may lose interest on the gifts.

2. Features

We found that a scalper may have the these characteristics:

  • Scalpers have abnormally high frequency and very low cadence, meaning they tend to purchase very frequently in a very short period of time.

  • Scalpers purchase many items with same SKU same day.

  • Scalpers hire other people to purchase limited quantity items to avoid restriction, but use the same membership cellphone number. Therefore, they have abnormally high number of transactions in a day.

  • Scalpers generally have high demand and very high units purchased.

With these findings, these features are selected to be used for Exploratory Data Analysis and Feature Selection:

  • Purchase Pattern: Average Transactions per Day , Average Units with Same SKU Purchase, Frequency, Cadence, Total Demand, Total Units.

  • Membership Information: Gender, Age.

3. Data

1. Inline Transaction Data 1

Inline transaction data used includes all inline transactions from all stores in Shanghai. Purchase activities for most recent one-year period is being used for training.

2. Member Data 2

Member data includes information about the member who makes online purchases, including gender and age. Upm_id from member data is used as the index/identifier for each member.

4. Procedure

  • Feature Engineering: Calculate purchase pattern related features using inline transaction data and reference each feature to member’s upm_id. Scale the data using z-score standardization. Transform store visiting date data to visiting cadence.

  • Exploratory Data Analysis: Take a look at each features’ pattern, distribution and correlations.

  • Unsupervised Learning: Use data mining and unsupervised learning to identify clusters with possible scalpers.

  • Marking: Deep dive on each cluster, and identify border clusters, then manually separate possible high value members and low-level scalpers. Mark possible scalpers. Use Predictive Power Score to get predictive relationship between features to newly created dependent variable.

  • Supervised Learning: Use machine learning algorithms to train binary classification models, and select one algorithm that out perform others using testing set.

  • Road Test and Use Case: Use the model to predict which customer is scalper from a list of previously selected customers for a promotional activity. Use the model to calculated percentage of demand purchased by scalper in NVS.

Feature Engineering

Data is processed and transformed to the format needed for training. All features require calculation or transformation, and aggregated to UPM_ID. For example, cadence is transformed from transaction dates. Below is the sample data after transformation.

upm_id gender age trans_day same_sku freq amount units cadence
1 000b2ee8-ebbd-412e-a325-5cbc32010266 F 39 2 1 2 1642.4781036377 4 3
2 015ad623-687f-4cdb-ba9d-35f31e11b73c F 20 1 1 5 5581.41725921631 20 21.75
3 01cc98cd-ccc2-4a8c-9053-3e15611949e9 U 33 1 1.22 3 3869.02709960938 11 32.5
4 01fa16fd-0807-4bb4-bfa4-038ed35e2a7f M 22 1 1 6 4596.46044921875 6 33.2
5 036ce041-5005-4cde-9187-91c6948aa037 F 31 1 1 2 2387.61071777344 2 244
6 06cd454a-e5b0-48d7-8f97-8e40d2186d6a M 38 1 1 2 1555.75230407715 2 140
7 088965ad-486f-4fe6-94ce-8b69ecf8e78e M 31 1 1 2 2227.43383789062 3 10
8 08f64b45-0711-4d6a-9eab-4b4ce0795473 F 19 1 1 2 1856.63732910156 2 293
9 0aa5e96e-9b29-4e06-acd8-8edaef9c2917 M 23 1 1 2 2404.42514038086 8 271
10 0c3b89f5-7cf0-43f2-bdf6-693ba9e5950f F 32 1 1 3 2144.24855041504 12 126

Dealing with One-time Buyer in Training Data

Most of the customers are one-time buyer for inline stores, and one-time buyers are considered as non-scalpers. In order for our model to instantly classify one-time buyer to non-scalper, I marked those who have frequency of 1 with very significant marking - a cadence of 999. The machine will learn this very significant and obvious marking very easily.

Exploratory Data Analysis

1. Frequency and Cadence

This histogram shows that most of customers revisit our inline stores within 50 days, mostly concentrated at around 30 days.

Customer with more frequency has less cadence. Extreme cases exist, showing some very odd and stand-out customers.

2. Correlations

Correlation matrix shows that there is a very high correlation between amount and units purchased, which makes perfect sense. Because of this near-perfect correlation, I am not using units as one of the features for modeling. Frequency and cadence are negatively correlated as expected.

Unsupervised Learning for Data Mining Purpose

1. Objective

The sole objective of using unsupervised learning is to find the odd and stand-out observations, so that we can mark which observation is likely to be scalper. And when we train our model, the machine will learn the patterns from these selected observations.

2. Clustering Result

The graph is a 2D representation of cluster result, transformed using Principle Components.
Five of the continuous variables are used for clustering. Hartigan-Wong algorithm is used. The clustering result is very significant. Some clusters stand out and get separated immediately. When we test the quality of distance-based clustering model, we can use: between cluster sum of square / total sum of square = 83.5%
This shows that the clustering result is usable for our purpose.

3. Marking

unsupervised learning is very powerful solution for asking a question. However, in order to proceed to supervised learning stage, we need to do the marking to engineer our dependent variable.

##    trans_per_day same_sku_each_time        freq
## 1    -0.23847295       144.00714874 -0.23323064
## 2    -0.06700777        -0.10357478 -0.23323064
## 3    -0.12741546        -0.11483271 -0.22178514
## 4     0.77244233         0.42430993  3.77177376
## 5    58.36202403         0.23082836 32.55967944
## 6    -0.13351380        -0.05334024 -0.05348606
## 7     3.59045331         1.75609613 11.60883519
## 8    -0.15657613        -0.10595499 -0.10362770
## 9     1.61509921         0.15124613  0.03679525
## 10   -0.13629487        -0.12012179 -0.17972362
## 11   11.14550411         4.08097273  1.30430825
## 12   13.63343732         3.00109111 17.34772281
## 13    0.85127577         3.10815029  0.14291990
## 14    2.19310092        25.01242754 -0.08066470
##          amount    cadence
## 1   -0.01614685 -0.7317354
## 2   -0.04632527  3.0693977
## 3   -0.04604916  1.6240220
## 4    0.47483279 -0.8888389
## 5  116.16021132 -0.9914609
## 6   -0.03360003 -0.7780144
## 7    3.87197356 -0.9635556
## 8   -0.03901638 -0.1597557
## 9    0.01497496 -0.5620205
## 10  -0.04358757  0.5964371
## 11   1.18891773 -0.7481037
## 12  15.90558023 -0.9786579
## 13   0.05262178 -0.5497743
## 14   0.17486208 -0.3221669

The obvious scalper clusters are marked, however, there are many manual markings needed for some clusters in order to separate high value members and scalpers. After auto and manual marking, binary dependent variable “ox” is created.

Predictive Power Score

Predictive Power Score is a new approach for feature selection or model relationship. It creates simple decision trees and compare the variable importance and other validation metrics. Comparing to correlation matrix, which only takes continuous variable into consideration, it has the following advantages:

  • PPS can very well find out non-linear relationships amongst different columns which cannot be obtained from correlation.

  • The correlation matrix works only with numerical variables but PPS can also handle categorical data as well, in addition to numerical values.

  • Unlike the correlation matrix, PPS is asymmetric. This means that if column A can predict column B values, it does not mean that column B can also predict column A values, and PPS will show this interpretation. 3

Based on the PPS graph, we see that when ox is the target (dependent variable), freq, amount, and cadence are the most power predictors.

Supervised Machine Learning

Now, with the dependent variable create from the previous marking, and features we engineered and examine with PPS. We can train our machine to come up models using supervised learning algorithms.

Algorithms

  • Tree-based Ensemble Algorithms : Gradient Boosting Machine (GBM), Distributed Random Forest (DRF)

  • Neural Networks : Deep Learning - Forward Feeding Deep Neural Networks (FNN)

Distributed Random Forest Peformance

A model trained with DRF out performed FNN and GBM when validate with validation set from the model leaderboard. Validation metric used is AUC.

ROC Curve on Testing Set

AUC achieved 0.98.

Variable Importance

Freqency, amount, transaction per day and cadence contributed the most predictive importance to this model.

Road Test - Scalper Detection on Previous Member Selection

Prediction Result

upm_id predict
1 14525782998 1
2 15799447555 1
3 e861ba04-1ace-43f4-b795-2db2606677d2 1
4 d6b94669-5075-460f-bdbe-f6b49c513f86 1
5 166e1188-4586-49c0-9e12-54e040a576f1 1
6 13192017067 1
7 14146224804 1
8 13616225672 1
9 14665921368 1
10 33fe659b-4cad-418e-b20e-9d1944ab87da 1
11 16097490180 1
12 cc174d04-e3ad-473a-b072-144a6c377f4a 1
13 19c1272c-d437-42aa-b3f2-4c04fdd7578f 1
14 0d04ec33-861d-4574-ba29-ed3fe081119f 0.999298245625363
15 c431d8fe-d0b6-4343-ba9d-22fe3fedc49c 0.997672229425775
16 d4ca5172-41ab-405f-afb0-a5dac81d3328 0.991518062932624
17 15896028801 0.988888888888889
18 68fa7ec5-577a-4e13-8785-8346debae773 0.974298245625363
19 a875fbb7-2441-46f0-8ac4-9ab3252abc74 0.973029486338298
20 fbf5ddce-4488-42af-8e0c-cb3185816437 0.969549954185883
21 d2ce3649-cd22-48c3-89cc-ae338e82e37c 0.963994398630328
22 11069918166 0.957769845591651
23 10995514189 0.311114870177375
24 16016907917 0.155555555555556
25 44e22784-4bcf-4d0f-9d63-1f393176ad44 0.128193180097474
26 10263084764 0.0667743629879421
27 65a530e7-9462-4772-b084-dbe86d03e0a8 0.0666666666666667
28 c0662acc-c631-426e-9f01-de912ce118dd 0.0445521407657199
29 77458519-091f-43c8-ba89-1dad07cb1f3e 0.0444663405418396
30 958df274-8849-4b2d-b1c6-d5398f54aba0 0.0222479091750251

The model predict the probability of being a scalper.

Use Case - NVS Scalper Purchases

2105/52533 4% scalpers were detected from this study.

During Sep NVS promotion period, 2105 scalpers spent 35,322,387 RMB, or 16,780 RMB per Scalper. 50,428 regular customer spent 42,207,453 RMB, or just 837 RMB per person.

Conclusion

Inline Scalper Detection Machine learns the typical scalper purchase pattern. It is not just a model, but a quick and easy way to train new models, to predict scalpers, and to be deployed. The resulting model file is in MOJO format, which can be used or deployed on any platform. Based on real-life work related use cases, this Machine Leaning method is very robust and reliable. We can simply use this to clean outliers when we perform data analysis, or pinpoint scalpers in store operations, even generate black list when we what to invite real high value customers to our store events.


  1. Source: Nike B&M Order Line↩︎

  2. Source: Nike Membership Data↩︎

  3. What is Predictive Power Score (PPS), https://machinelearningknowledge.ai/predictive-power-score-vs-correlation-with-python-implementation/↩︎