Jump to the millipedia homepage

Museum of London Webstats Talk

These are the notes from a talk I gave at the Museum of London in May 2012 about how to gather, understand and make use of their web stats.

It covers a lot of ground in a short amount of time (but in quite a long page) and was mainly a way to kick of lots of discussion, but hopefully it's a useful overview.




The old google analytics glossary gives short simple definitions of terms you're likely to come across. Anything in a block quote below is probably from there....


Hits is a measure of requests for every element on a webpage page - including stylesheets, images etc. So a single page can be hundreds of hits. This means that it's a pretty useless metric and you should be very suspicious of anyone who still talks about the number of hits their website is getting. Unless of course you need a big number for a funding application ... although you'll have to hope the funding body hasn't read that previous sentence.


A pageview (or file in old money) is a request for the main html file (or php or asp or whatever) rather than any of the assets such as stylesheets or images that appear on the page. This means that counting pageviews is much more useful than counting hits.

Google also distinguish between Pageviews and Unique Pageviews

A pageview is defined as a view of a page on your site that is being tracked by the Analytics tracking code. If a visitor hits reload after reaching the page, this will be counted as an additional pageview. If a user navigates to a different page and then returns to the original page, a second pageview will be recorded as well.

A unique pageview, as seen in the Top Content report, aggregates pageviews that are generated by the same user during the same session. A unique pageview represents the number of sessions during which that page was viewed one or more times.

There are issues with counting unique pageviews which we'll get onto later, but Google Analytics probably does as good a job as can be done with working them out.

visits / visitors / unique visitors

A visit is counted when a user comes to your site and looks at one or more pages (this is also known as a session). Depending on which stats package you're using and how the limits have been defined then any pages viewed within a certain time will count as a the same visit. Normally if a user doesn't load a page for 30 mins then any subequent pages will count as a new visit.

Not though as a new visitor. Any future sessions from the same user during a selected time period are counted as additional visits, but not as additional visitors.

Unique Visitors represents the number of unduplicated (counted only once) visitors to your website over a specified time period. This relies heavily on the use of cookies to track your visitors.

bounce rate

Bounce rate is the percentage of visitors to your site who look only at a single page of your website. A high bounce rate generally indicates that entrance pages aren't relevant to users reaching your site and they are leaving straight away. Of course it could mean that they are finding exactly what they want straight away and don't need to look any further....

Ways of collecting stats.

server logs

Any web server worth its salt will log requests for web pages and assets from a user. These will typicallly record IP addresses, user agents (i.e. which browser is being used), the file requested, and time and date. They look a bit like this: - - [31/May/2012:13:10:01 +0100] "GET /millcolib/layout/quickLinks_arrow_bg.png HTTP/1.1" 304 - "http://www.chilternsaonb.org/about-chilterns/chilterns-commons-project/commons.html" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB7.3; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET4.0C; .NET CLR 3.0.30729; .NET4.0E)"

javascript trackers

Google Analytics uses javaScript to track users, which they do by setting a cookie. Of course awkward types like me who block cookies will upset things, or if javaScript is broken or not turned on then equally that'll muck things up. However when it does work (which is most of the time) then it's very effective.


If you use a URL shortener such as bit.ly - which we use for the QR codes in the Galleries of Modern London - or t.co which is used by Twitter then these will collect statistics when they send users to the full URL. We'll look at these in more detail in a moment because these are particularly useful for us.

CMS calls / download trackers etc.

Each time a page is generated or content is pulled from the database then your CMS should count this (obviously this depends on your CMS). If you use a download manager to handle any documents on your website then this should also collect statistics.

user interactions and conversions

Sell things... honestly, those ecommerce sites actually have a much better metric to use than sites like ours which are providing information (and education and entertainment and all the rest of course). If you can show that your sales increase because of changes you've made then that's great.

You might be able to collect other information as well of course - signups to newsletters or membership applications.

Why none of them can be trusted

Ah - my favourite bit. There are lies, damn lies and webstats.

None of the methods above will ever give you an accurate number of visitors for at least the following reasons:

arbitrary values

Google Analytics and Webalizer default to 30 mins for a visit unless you specify other values. But why should a page loaded at 29 mins be a single visit but one at 31 mins count as 2 separate visits? I often have a dozen tabs open at once and can easily be distracted by kittenwar or whatever in another tab for longer than 30 minutes. Hell, by the time I finish this page it will probably take longer than half an hour to read.


If you click on a PDF link and it opens up in your browser then (depending on your browser) it almost certainly won't be downloaded in one go but will be broken up into several requests so that you can start reading the first pages. A single PDF file of 100 pages may end up counting as 100 requests. We had clients in the past who were really pleased with the number of times their annual report was being downloaded untill we pointed out it was more than the number of visits their site was getting...

ajax and direct links

Data loaded without a page reload might not be counted as a visit. So you might spend an hour playing with a fancy map but have it counted as a single pageview.

Links directly to assets won’t involve a pageview and hence might not be tracked at all. For example search for the MOLA Archaeology Handbook:


The first result on Google is a direct link to the PDF. This won't necessaily be tracked (Google does now seem to be doing redirects from its search results so it may track them, but not on other search engines).

firewalls, proxies and caches

The maintainer of Analog (a logfile analysis package) has some great infomation on what results you can get from logfiles if you fancy a more technical discussion:


It's a little out of date now that we rely on Google Analytics and its javascript tracking but it's still worth considering a few of the points:

  • You can't tell how many visitors you've had just from an IP address and request to your server. Your browser will cache web pages, your IT department might cache pages, your ISP will probably cache pages. Every computer in the Museum might appear as a single IP address to the outside world.
  • You often can't tell where users entered your site, or where they found out about you from. If they are using a cache server, they will often be able to retrieve your home page from their cache, but not all of the deeper pages they visit. Then the first page you know about them requesting will be one in the middle of their true visit.
  • You can't tell how they left your site, or where they went next. They never tell you about their connection to any subsequent site.
  • You can't really tell how long people spent reading each page. You can't tell which pages they are reading between successive requests for pages. They might be reading some pages they downloaded earlier. They might have followed a link out of your site, and then come back later.
  • You can't really tell how long people spend on your site. Programs which report the length of a visit count the time between the first and the last request. But they don't count the time spent on the final page, and this is often the majority of the whole visit.


OK - but what you can use them for


How traffic to your website is changing over time (although it’s not necessarily a bad thing to decrease traffic). Again this could be affected by some big ISP changing their caching regime but over a long period you should be able to follow things.

The important thing here is to choose a stats package and stick with it. We have a client who have always used Webalizer to work our their visitors - when we suggested using Google Analytics which is much harsher at counting visitors then told us they couldn't really switch because they'd have to explain to the trustees why the stats had dropped so dramatically.

user flows

How users are interacting with your site - which bits are the most popular and may need better signposting.


Why big numbers aren't always better

data driven vs. audience driven

Baseline metrics aren't really matters for most organisations (they may well be if you're a shop but probably not if you're a museum).

Chasing higher visitor numbers will undermine your long term positioning because you'll design gimmicks rather than build features that bring people back and turn them into devotees and customers.

It's very easy to form an over-reliance on tools and stats packages - these users are people, remember (well some of them may be robots but mostly they'll be people).

Visitor Segmentation ("Unique Visitors" Must Die)

There's a nice piece from Jakob Nielsen (high priest of usability) about visitor segmentation:


He says that using "unique visitors" is not a useful metric for measuring site success but we should concentrate on bounce rate instead.

However - you have different kinds of visitors - you need to make sure you're looking at the right ones:

Low-value referrers, such as Digg. People arriving through these sources are notoriously fickle and are probably not in your target audience. You should expect most of them to leave immediately, once they've satisfied their idle curiosity. Consider any value derived from Digg and its ilk as pure gravy; don't worry if this traffic source has a sky-high bounce rate.

Direct links from other websites. These links are the equivalent of a vague recommendation: "You might want to check out this site." People who click such links haven't expressed a direct intent to engage with your topic to the same degree as someone who actively enters a search engine query. These visitors do have some degree of interest, however, so a high bounce rate is a symptom of a user experience problem.

Search engine traffic, whether from organic SEO or paid links. By clicking your link, these users have actively indicated an acute interest in the topic and should engage intensely with your content. If they leave immediately, it's a sign that something is seriously wrong with your landing pages.

Loyal users who return repeatedly to your site. On the one hand, you'd expect the highest engagement from your biggest fans. On the other hand, this engagement might not show up on every visit if they visit often. As long as people keep coming back, there's nothing wrong with having them sometimes leave after a page view or two.


Analysing our data

Server logs / webalizer etc.

There are several packages that analyse server logs for you. The ones you'll come across most often are Webalizer and Analog. Neither of them are particularly pretty but they can provide useful information.

What Webalizer looks like

bit.ly redirects

Adding a + to the end of a bit.ly link will show stats for that link.

eg. http://bit.ly/Mf2DqF+

Logging into your bitly account will show stats for all of your links:


Looking at the stats for our QR codes we see a big jump for 7th May which was a Bank Holiday.

Example bitly stats

The great thing about these stats from bit.ly is that because they only appear in the QR codes in the exhibition we can be pretty certain that each count for these really is someone getting out their phone and scanning our code.

You could even go further with QR codes and use different ones in different places - even if the user gets sent to the same place.

Other URL shorteners do similar things with stats. Goo.gl will reveal stats by adding .info to the end of a short URL

eg. http://goo.gl/VwrDK.info

Email campaigns / mailchimp

You can get lots of interesting statistics from newsletter campaigns. We use a newsletter provider called mailchimp for several of our clients and it's always fascinating watching mailings go out and watching the emails being opened in real time.


Caveat: a lot of users won’t allow tracking from within their email client (me for one), but for those that do you can see who tweets or forwards your newsletter - and you can see which subscribers always open your newsletters and so on.

The open rate is about 20% for non-profit sector.

Example Mailchimp stats

Mailchimp will now also let you do A/B testing on a newsletter campaign so that you can try out a couple of slight variations on a subset of list and then send the variation that gets the most response to the rest of the list.



Feedburner provides statistics about subscribers to your RSS feeds. Again these are really useful because people have actively saved them.

Social media aggregators - ShareThis / AddThis etc

If you use a service to add social media icons to your page then they will also collect stats.



Logging in to your channel will give you metrics on how many videos have been watched.


Remember - YouTube is now the second biggest search engine... if you’ve got videos then get them up on YouTube.

Internal CMS metrics

Depending on your CMS you might have stats about page requests or downloads etc.

You can also get information about users that have logged in if you have areas of your site that require a login. This is very handy for working out who are your regular users, but making people log in is bad of course.

Open graph

OG is a protocol that can be added to web pages to enable tracking within social media sites (i.e. Facebook) and control how information is displayed when a Facebook user adds your page or app to their timeline.

Google Analytics

I shan't go into much detail about all the ins and outs of GA in these notes, but here's a brief overview:

Remember that you can specify a date range by clicking on the dates in the top right hand corner of pretty much any analytics page. What is particulary useful is to be able to compare recent stats with stats from a previous date range. Don't forget that what analytics is best at is comparing trends rather than absolute numbers - comparing this month with the same month last year is a very effective way of seeing how traffic to your site is changing.


The Audiences section tells you where in the world your users are coming from and whether they're new or returning visitors. It also tells you what technology they use to get there - the rise of mobile is particular interesting at the moment. At the moment about 15% of visitors to the MoL site use mobile devices (including tablets) - it's something you should definitely keep an eye on.

Also under the Audiences section is visitor flows which gives a graphical demonstration of how users move through your site.

Traffic Sources

Let's face it - most of your visitors get to your site via Google. You can find out what they searched for under the Search / Organic menu.

It's also interesting to view the Social dimension section here. Google wants you to set up goals to track your Social Value, but even without doing that it's interesting to see how many visitors arrive via Facebook, Twitter etc.


Under the content section you can drill down through your site to find data on specific pages.

A very interstesting item here is the In page Analytics which shows data on the percentage of clicks each link on your site gets. It's a very immediate way of seeing what users are looking for when they arrive at your site.

In page analytics example

Remember though that if you have two links to the same place on a page, then the percentage of clickthroughs displayed is the aggregate for clicks on both - so if you want to see e.g. whether a banner ad or a link in an article is getting more clicks, then you need to add some kind of label to the links so as to be able to differentiate between them.

Other fun sources of data

Google Trends


You can see there's a slight downward trend in general interest in the search phrase 'Museum of London'; this is bound to impact on your website traffic.

Google Webmaster Tools / Bing Webmaster tools.

These tools provide very interesting information on what users are searching for to get to your site.



One thing we often use Webmaster Tools for is finding sites that link to content no longer on our site (normally old content that we've removed or renamed) and then we can set up redirects to relevant content, which keeps the users happy (they don't get a 404 error) and also is good for search engine optimisation.

Google Adwords Tool

The Adwords tool can tell you what Google thinks people who visit your site are interested in. If the keywords that it suggests to you are different from those you expect it may indicate that you need to look again at the content on your site.


(remember to turn off AdBlock if you have it on, otherwise you won't see anything ....).

What you can do now

Search engine optimization

SEO is a bit too wide to cover in this talk - but you can always, always do things better. If you have a search engine on your site as well then make sure it's up to the job as well.

Information Architecture

Are users keen to get to an area of your site that’s deeper than it should be?
Can you provide an easier route to that information?

IA - use in page analytics to enable users to get straight to the info they want.

Card sorting is a great way of being able to group pages and find out from your users where they would expect to find information.

Usability testing. Just get your users in front of the designers - the more often the better. If you can do your own usability testing - you should know your audience better than anyone. If you don't have the resources to do it in house then of course we are available at reasonable rates :-)

A/B testing

Great for making small changes that can make a big difference.

Especially if you want to get users to a particular goal such as buying something.

Google provide a tool for setting up A/B tests:


MailChimp will also let you run A/B tests on email campaigns. You can send a small percentage of emails out with variations in your email campaign and then whichever one gets the most interest is then sent out to the rest of your list.

Ask real people

It's obvious - although sometimes it gets overlooked with all these tools - but going out into the museum and asking real people wandering around whether they used the website and if so how easy they found it is just a really good way of getting information.

Get your web developers to sit down with real users (as often as possible) to watch them using the website. It really does make a big difference. You know how the website works, your designers do ... I do now as well, but watching someone coming to it for the first time can be a real eye opener.

Jun 1, 2012
in : stats ,