the story behind our 1,200% improvement in model accuracy

Jan 31, 2018


Jonathan Kay

Jan 31, 2018

Co-Founder & Chief Executive Officer

Estimating app usage is one of the most challenging things we do in the App Intelligence industry.

With Downloads or IAP Revenue there are clear correlations with app store rank, which make designing models relatively straightforward. Yet when it comes to estimating DAUs and MAUs, there are so many different components, features, and indicators that the picture becomes very complex, very quickly. And while usage estimates are difficult to generate, they are oftentimes the most important metrics to watch.

Now it’s time for a confession: our early attempts at building usage models really missed the mark.

So, about 18 months ago, we decided to start from scratch. We asked ourselves: if time wasn’t a factor, how could we build the most actionable models possible?

And today, over a year later, we can finally share the results with you.

Our new models have made unbelievable gains in their ability to explain trends in usage volume and usage movement (e.g. daily, weekly, and monthly trends in overall user base).

Apptopia COO Jonathan Kay Discusses the Model Update

They are, by far, the most accurate models we’ve ever created. And we want you to know why.

Our Approach & Data Methodology

There are four key reasons we were able to dramatically improve the quality of our usage models:

1. More Data Collected

Over the last 6 months, we have ramped up data collection dramatically and have kicked off quite a few new relationships with developers sharing their developer accounts with Apptopia. More so, through these partnerships, we’ve been able to increase the amount of data we have on top ranked apps by over 25%.

2. New Data Sources

To date, we have only ever used “Direct Measurement Data” to build our estimation algorithms. Direct Measurement Data is defined as data that we are collecting directly from iTunes Connect and Google Play, from app developers who are sharing this data with us.

As part of this new model release, we’ve partnered with a couple of today’s biggest SDKs to get device data on more than 40 million devices worldwide. This data includes what apps are on these devices as well as what apps are being opened/used. We’ve found that layering this data on top of our existing data set (collected directly from iTunes Connect and Google Play) allows us to really broaden our view of a user’s lifecycle/journey.

3. Strong Understanding of Retention

As outlined earlier, there are a lot of different ingredients which go into having a full understanding of app usage. One of the core elements our previous models had been missing was Retention. This is important as Retention is actually a data point that we have a lot of data on and a feature we’ve received consistent positive feedback on from our user base.

While our previous models looked at each day of usage in isolation and tried to create the most accurate estimate, our new models utilize a much longer time period in combination with Retention to achieve our new, higher quality results.

4. Core Model Improvements

In addition to the above, we’ve made some major improvements to the actual code and core model infrastructure used to create these estimations. For instance, previous models would look at each model and each day as an isolated data point it needed to estimate. Therefore, estimating Snapchat’s DAU for today, had no correlation with the number of downloads Snapchat received last week. There were two major negatives that came from this logic:

Models were shortsighted and would miss out on key learnings and indicators which came from understanding an app’s history.

Models had the ability to produce “illogical” results in certain edge cases. For example, many of you have noticed that in certain situations you might find an app which had downloads for a 30-day period which were greater than that specific app’s MAU. Now while this is technically possible (as users can download, and not open an app), the situations we had been presenting were highly unlikely.

With our new models, not only do they look at a very long history of an app, they use other models as part of their inputs. This means that DAU has a view into downloads, and MAU a view into DAU. The above issues are not only resolved, but now technically impossible given this release.

An Analysis of Our Accuracy

It’s extremely important to us to be transparent with regard to how accurate our new Usage Models are.  This is something we agonize over internally, and as we measure accuracy in a lot of different ways we wanted to give you some insight into our process.  Below is a summary of the internal analysis we did:

Apptopia vs. Actuals

This measures, on average, how far away our data is between the Actual Data we collect from developers.

= 24.53%

Apptopia Old vs. Apptopia New

We are big believers consistent improvement, of any shape / size.  We are always checking ourselves to make sure that what we did today was better than what we did yesterday.  As such I want to give you some insight into how our New Usage compares to our Old Usage…

New Usage = 1,291% closer to actuals than Old Usage.  Here is an example from a Top 50 Grossing App Overall in the United States:

Put Apptopia to The Test

Don't believe us? Put our data to the test. Tell us the name of any app and we’ll provide DAU and MAU estimations. Simply book time for a 1-on-1 demo and we’ll bring the data.

Still have questions? Email us.


Jonathan Kay

Co-Founder & Chief Executive Officer