Measuring online influence: The case for Big Data

Measuring influence online is in its infancy. Unrefined ‘metrics’ dominate the space and much of what exists currently is of little value and has insignificant statistical meaning. Most measure only what is easy to measure – number of Twitter followers, number of times a word is mentioned in the last week, Facebook ‘likes’, bizarre – often undisclosed – methods of calculating someone’s online social ‘rank’. There are even gimmicky schemes to produce ‘measurements’ like Fast Company’s “Influence Project”*.

Why do we do things this way? Because it is easy and because dealing with big data is hard.

The most valuable measurements that come out of this space live in the analysis of big data. To be effective, one needs a global perspective and all the connections – not just of the 100 active million Twitter users, but the 4 billion connections between them and even more, the billions of additional connections implied by mentions, retweets and replies.

A simple search of Twitter will yield you a small sample of unfiltered tweets in a short time frame. A count of followers is as easy as visiting a user’s profile page.
Those metrics miss out on what is important. Which tweets were most significant? Which users made the most impact? Are someone’s followers actual people or just spam bots? Are they just part of an auto-follow-back ponzi-scheme initiated by some live-for-3-days-and-milk-AdWords-for-all-it’s-worth-webapp?

But what should we measure?

Last week, HP published a study on what makes a tweet influential and the problems around the measure of “Influence”. Several bloggers responded (1,2,3). In general it is agreed that “Influence” is still not fully defined and that retweets alone are not the definition. Retweets are one way to measure, links are another. “Engagement” as a whole is an intersecting issue. Clearly though, there is an interest in measuring all of these ‘things’, if only we could define them.

How should measurements be delivered?

Companies such as Klout give composite numbers that, while useful, fall short of being helpful for a wider range of use cases such as spam filtering, topic relevancy and understanding relationships in one’s network.  “Influence” is a broader topic that spans much more than just global rank or one’s own reach.  Transparency is an issue as well.  While a single number helps at being actionable, it is also a set of magically combined factors that comes with no clue as to how they are combined and weighted.  People should know where their data came from and how it was produced if they are to trust it.

Solving for Influence

The data is out there, the tools exist to analyze it, but the world still unsure of what to tell those tools to do.

Infochimps uses the full friend graph and historical tweets of Twitter back to 2006 to produce similar metrics to what HP discusses such as Sway (how much a user gets retweeted), trstrank (global ranking of influence, Google PageRank style) and Enthusiasm, which is a reverse measure of what HP refers to as “Passivity” – that is, how often someone retweets someone else.

There exists a very basic problem in nomenclature.  What are the words we should be using and what are the human behaviors they represent?

More importantly – HP, Infochimps and others are still discovering just what should be analyzed. What is still needed is for people in other fields to fill in those gaps.  In sociology and marketing: what is the difference between fame, interestingness, influence and even, infamy?  What constitutes the ‘humanness’ of a user (are they real, are they just a bunch of employees tweeting for a celebrity)?  What are influence and engagement? What should we be looking for in relationships? In the social CRM space: how should you identify relationship networks?  What are the best ways to find a route to new leads?

What are the use cases in each of these fields for the data?  The big data world has the tools to quantify the data; it just needs to know which questions to answer.  Once we know what behaviors to look for, they can be translated into signatures that are identifiable in data.

Finally, we also need the answer to what the methodology is for combining the data into actionable metrics.

Measuring what is difficult instead of what is easy is a game changer. In fact, not just a game or a ballpark, but a whole sport. The knowledge is in the data.

* If you must visit it, here’s the address:

Comments are closed.