Two Months FREE!
Pre-pay for your annual subscription with WP Engine. Do it.
Click Here To Subscribe To The WP Engine Blog
 

How we count “visits”

Written by Jason Cohen on April 9, 2012

Like WP Engine on Facebook Follow WP Engine on Twitter Connect with WP Engine on LinkedIn Buffer icon RSS Feed

Our pricing is partially based on the number of monthly visits to your site, so we’d better have an accurate definition of “visits.”

This is an interesting question anyway, because it’s one of the primary web analytics metrics.

But, this is harder to define than it seems.  There are two fundamental questions:

1. How should a “visit” be defined?

2. How do you measure “visits” in practice?

Defining a “visit”

Let’s just write down some some events that we think should and shouldn’t be a visit:

  1. When a human being first arrives on the site and loads the page, staying there for 31 seconds, that’s a visit.
  2. If that same human then clicks a link and sees another page, that’s not a new visit; that’s part of the same visit.
  3. If that same human doesn’t have cookies or javascript enabled, still all that should count as one visit.
  4. If that same human loads the site with a different browsers, that’s still not a new visit; that’s part of the same visit.
  5. If that same human bookmarks the site, then 11 days later comes back to the site, that is a new visit.
  6. When a robot loads the site (like a Google or Bing search bot), that’s a visit, but if one robot scans 100 pages quickly, that’s one visit.  (You might disagree that a robot is a “visit,” but consider that from a hosting perspective, we still had to process and serve all those pages just like it was a human being, so from a cost or scaling perspective, bots count the same as humans.)
  7. If a robot scans 20,000 pages over the course of a month, that’s not just one visit.  It shouldn’t be 20,000 visits, but neither should it be 1. Something in the range of 100-1,000 visits is acceptable.
  8. There are additional cases too where the “right thing to do” is less clear. For example, take the case of a “quick bounce.” Suppose a human clicks a link to the site, then before the site has a chance to load the human clicks “back.” Does that count as a visit? Our servers still had to render and attempt to return the page, so in that sense “yes.”  But a human didn’t see the site and Google Analytics isn’t going to see that hit, so in that sense “no.” Because we need the notion of a “visit” to correspond to “the amount of computing resources required to serve traffic,” we round off in favor of saying “yes.”

Rather than attempting to write down an exact definition of a “visit,” we’ll just say that whatever it is, it has to be consistent with all the notions written above.

Measuring a “visit”

This is where things get tricky.

It’s tempting to say “Whatever Google Analytics says is the ‘number of visitors’ in a month, that’s the number of visits in a month.” But it’s clear that this metric does not satisfy the definition above. GA doesn’t measure bot traffic or “quick bounces.” And GA would double-count the case of a human using two browsers or (sometimes) who has cookies disabled.

We also need something clear and simple so it’s trivial to compute and easy to analyze if it’s not behavior like we expect.

So we’ve settled on this metric:

We take the number of unique IP addresses seen in a 24-hour period as the number of “visits” to the site during that period. The number of “visits” in a given month is the sum of those daily visits during that month.

Does this satisfy the conditions above?

  1. Yes, because that’s an IP address.
  2. Yes, because that’s the same IP address, so it won’t be counted again.
  3. Yes, because we’re not using cookies or Javascript or any other feature of the browser.
  4. Yes, because it’s tied to the network, not the browser.
  5. Yes, because we reset our notion of “unique IP address” every day.
  6. Yes, because robots and humans are treated the same — both have an IP address.
  7. Yes, because robots have the same few IP addresses, so they will be consolidated within one day, but will count again the next day.
  8. Yes, because we’ll see the hit in our logs.

This does mean there’s some cases where you could theoretically argue we’re counting visitors too often. For example, a person visits a site from work, then drives home and visits the site again later that day.  That will count as two visits because the IP addresses will be different.

But, we’d argue, (a) that doesn’t happen much, (b) it’s not terribly unreasonable for that to count as two visits, (c) those events are counter-balanced by times where we count only one visit where really it’s two.

As an example of that last point, what if two people in the same office visit a site from two computers? That should be two visits; even Google Analytics would count it as two. But we count it only as one because their IP addresses (from our perspective) are the same.

So the cases where we count too few are counterbalanced — to the first approximation anyway — by those where we count too many, and therefore we think this is still a fair metric.

13 Responses

  1. Sky says:

    Thanks for your nice explanation, Jason. I’ve been having conversations with WPEngine engineers about this for the last month and this matches what I’ve been told, with a few additional explanations, for which I’m grateful.

    One issue that is perplexing to those of us running WP sites is the number of “unsolicited” and “useless” bots that parasitically survive by feeding on the “Twitter ecosystem.” Many of these bots are experimental, hit us from multiple (sometimes -many-) IP addresses (most frequently based on Amazon AWS) and may spider hundreds or thousands of pages very quickly. They place great stress on our site (or on WPEngine in this case) and since they do not identify themselves in many cases (User-Agent:), we cannot perceive any value in serving pages to them. All they do is hit pages and walk all around the site hitting page after page, which is the most expensive object to server, and they generally do not hit graphics at all, so it’s clear they are not “real browsers” (human beings).

    So, when we tweet something and include a link to our WPEngine site, we may be hit with hundreds of bot requests within 10 to 30 seconds. Almost all unsolicited. And all of which end up “costing us” (and WPEngine of course) by adding to the load on the site.

    Operating my own WP servers, which I have done for years, I see this kind of behavior in logs and I firewall these guys out when they do not properly identify themselves or do not obey robots.txt. That is a constant, intensive, and ongoing battle. But on WPEngine I 1) don’t have logs; and 2) can’t firewall them; and finally 3) they cost me incrementally if they hit from enough IP addresses on enough days. And they end up kicking me and my customers into a higher service tier.

    My thought has been to firewall out all “bot-like” requests that do not present a useful User-Agent: … and perhaps that is a way of applying pressure to these guys to get them to properly ID, but it would have to be done on a larger scale to make a difference. Otherwise they just move to more IP addresses and change their User-Agent: to look more like a real browser (which some of them do).

    There is a further issue, which is that if someone wants to “attack” a site because it espouses views they do not like, one way to attack it is increase (useless) traffic in such a way to cause it to scale up its number of servers and/or its level of service until the sponsor can no longer afford the cost. This is called an “Economic Denial of Sustainability” attack or EDoS attack. I can give you several examples of this from my personal experience, since we deal with clients who have been attacked in this fashion. If WPEngine has not seen this yet, you may rest assured that at some point you will. (It depends a bit on your client base, of course.)

    You are clearly (all) thinking about this, but what are your ideas specifically about all of these seemingly-useless bots?

    • Hey Sky,

      I wanted to follow up with this question. In general, what you do to attract traffic varies site by site, and of course that’s an independent decision that WP Engine does not monitor. We want to make sure all our customers can employ their preferred strategies thundredso grow site traffic, whether that’s via twitter strategies or otherwise. Bots, in any case, will be a part of managing and growing a site.

      However, with that in mind, this is a question best suited to directly engaging our sales team, which I understand you may have already done. I’m going to close comments on this particular post because I want to make sure that questions like this one, which may be relevant to the particulars of one site or another can be fully addressed directly one-on-one.

      If you have more followup questions, please let us know at sales@wpengine.com.

      Thanks again!

      -Austin Gunter

  2. This is an interesting approach to a problem with a lot of technical and social pitfalls. Initially, I was concerned that you would miss a tremendous number of visits where users were behind a gateway.

    But, I went back to check, and the EFF’s Panopticlick (https://panopticlick.eff.org/) browser fingerprinting dataset did not see a huge number of visitors behind gateways, and their audience is skewed toward the technical end of the spectrum.

    From the PDF: “We saw interleaved cookies from 2,585
    IP addresses, which was 3.5% of the total number of IP addresses that exhibited
    either multiple signatures or multiple cookies.”

    Thanks for sharing.

  3. Devin Price says:

    Thanks for clarifying what a “visit” is. I was curious about the metric you used since pageviews are very different than visitor counts in Google Analytics, for example, but that might even be thought of as a new visit since each pageview requires additional resources.

    If someone is browsing exactly when the 24-hour period restarts, there’s the chance their unique IP will get logged twice, correct? (Edge case, but I thought I’d ask).

    If you exceed your visitor count of the personal plan for a single month, are you automatically moved to the professional plan?

    Is the visitor count displayed anywhere in the dashboard so you might have an idea when to upgrade plans?

  4. jim says:

    This is a very reasonable, and well-articulated policy. (The anti-example would be the Google Maps rep I asked about what constitutes a map view – I felt like I was speaking Venutian trying to an approximation so I can budget.) The key point is you’ve got no motivation for nickeling and dimeing.

  5. Per-BKWine says:

    If it is “correct” is not terribly important.

    It is important that it is something that is
    a) measurable (by you)
    b) consistent (doesn’t give different results for similar traffic)
    c) relatively easy to understand (from a customer perspective)
    d) relatively consistent with how other people measure “visits” (to give it some credibility)

    Google Analytics fails on a) (since you don’t have access to Analytic data for customers).

    Your previous measure, where you used something akin to “server request” (if I understood right) was wildly off the mark on c).

    This new measure seems reasonable.

    If you then start giving customers more that just a single digit as stats (“Usage – Last 31 Days”) but instead, say, daily numbers and a (at least) yearly history, in addition to a monthly number would be even better.

    With the added benefit that customers could compare it to their own stats from e.g. Analytics (even if the exact numbers are different) which should show similar trends if different numbers.

  6. How do you count visits that come through a proxy server such those used by large companies and ISPs? In that cases, doesn’t each computer on the network appear to the web server as if it generated from a single IP address?

    • BonnieBlue says:

      The answer is in the post:

      “As an example of that last point, what if two people in the same office visit a site from two computers? That should be two visits; even Google Analytics would count it as two. But we count it only as one because their IP addresses (from our perspective) are the same.”

  7. Syed Karim says:

    My guess is that many of your customers are bloggers, most of whom monetize their content through advertising. The ones that are making money from their blogs understand the importance of the term RPM, revenue per thousand. So if that is a term that everyone understands, why not just price the service in a similar manner? Instead of pricing by visits, why not price by pageviews? As a publisher, I should be most concerned with three things: creating good content, the RPM of that content, and the cost of serving those pages.

  8. I found that the “visits” stat I know about is not the same as WP Engine sees it when I first contacted sales. This article helped a lot in understanding how “visits” are counted and I can see the logic in all arguments, but when I come to the point of deciding to go either way, the “hidden” visits (as mentioned by @Sky) are coming out of left field for most of us.

    I became aware of those when testing cloudflare for the first time and noticed a drop in my stats. At first there was a bit of panic, but the I realized that it actually does the job it should and well… blocking those unsolicited and potentially harmful visits and thus save on resources and speeds up the site performance.

    I did not use it for long due to other conflicts with my WP site setup, but I keep thinking getting back to it, especially in tandem with WP Engine so that the total visits count will reflect the true human visitors count.

    IT might be better if WP Engine puts a system in place that will allow to filter / firewall these extra visits so that they do not get counted as it is today. This blog post is great and should be linked to the visits number in the pricing plan so people are more aware of it – I was surprised to learn the first time that my GA visits stats is one or two tiers lower than what I actually need ;)

    The main problem of this situation is that you can’t have a stable predictable notion of the costs if you can track the hidden visits metrics.

    Does that make sense?

  9. Michael says:

    A very good explanation what a visit is. I think using IPs as a metric is a fair deal. Thanks

  10. Phil Ruggera says:

    One issue that concerns me is the number of useless (any non search page related) bots that scan WP sites. I don’t perceive any value in serving pages to them. And they are coming from all over the world. Do you firewall out these useless requests?

  11. Gerard says:

    I hope this is easy to answer. Its clear that you guys must measure our visits on our sites to see if we are going over the limit of our account or not. So – is there anywhere where we can look up the number of visits to our site in our wpe account? I know you must have the numbers but I dont know if we can easily see where we are at in a given month. If there is no way for us to access this in our account; let me be the first to suggest this as something you guys start doing…

    Thanks!

    Gerard

Leave a Reply