Introduction

A/B testing has been widely used in online world, mostly in web-page and app design. However, A/B testing in brick-and-mortar stores is very difficult, because unlike online, there are many variables that can interfere with the testing variable, and even testing variable itself is hard to be identified. This study will use an innovated method to test two variants of signage in Shanghai Live store, as an exploration of new ways to test the effectiveness of in-store signage.

Experiment Setup

1. Testing Signages

Original New Design

The original design of the signage (left) is in orange color (A), the new variant (right) is in multi-color (B). All contents and size are identical, which means only color is being tested and all other variables on the signage are fixed.

2. Store

Shanghai Live Store is selected for the experiment.

The signage is placed at the location shown on the floor plan. This area is covered by 4 trackers which is well covered for the FPA data.

3. FPA data 1

the raw data for Full-path Analysis (FPA) is a set of traffic and dwell time data collected using trackers and sensors installed across Shanghai Live. By aggregating this raw data, a heatmap of the floors can be generated, and key KPIs like area traffic and average dwell time can also be calculated. In this study, dwell time for each customer who stayed in the area where the sign was being displayed is used as the testing metric.

4. Procedure

Place signage A for two hours, then replace with signage B for another two hours. Make six rotations a day, for 5 consecutive days, from 2021/06/11 to 2021/06/15. The reason for setting the two-hour display time for each rotation is to avoid mixing time-of-the-day effect into our test. We would be testing AM vs PM, if we make the display time too long. Mixing up the sequence is also important. time-of-the-day should be mixed for each sign. Each customer who was exposed to the signage was tracked by the sensors near the signage, and dwell time was recorded.

The time period for signage A is marked in gray, and green for signage B.

5. Sample Size

Since we ran the experiment for 5 days, enough sample was collected. sample size (n) is 1109.

Sign n
A 530
B 579

530 customers were exposed to sign A, and 579 customers were exposed to sign B.

Distributions

The distributions of dwell time for each customer who was exposed to sign A and sign B are very similar. outliers are taken out.

Two Sample T-test

\(H_0\) = True difference in means is equal to 0

 p-value = 0.3989
95 percent confidence interval:
 -14.035046   5.592479
sample estimates:
mean of x mean of y 
 91.24348  95.46476 
 

The test result above shows a p-value of 0.3989, which means we fail to reject our null hypothesis of “true difference in means is equal to 0”. At 95% confidence level, the interval is between -14 and 5.59, which includes 0. The difference between dwell time of sign A and dwell time of sign B is statistically insignificant.

Conclusion

Based on our experiment, the effect of new variant (multi-colored signage) on customers’ dwell time at the signage area is indifferent than original (orange signage).

Suggestions

  • The signs are very small, and they are very hard to be noticed by customers. If the customer cannot see the signage, it is impossible to test the effect.

  • The dwell time data is from a set of sensors from the store. The problem for sensors is double counting and inaccuracy of tracking. Also, with sensors, the tracking area is not easily adjustable due to technical limitations. New generation of technology like Computer Vision performed through surveillance cameras would fix these problems and make future A/B testing more reliable.


  1. source: Full-path Analysis raw data, 2021/06/11 - 2021/06/15↩︎