Three years can often feel like a long time, but in the world of social media, it goes by like a flash.
For three years, behind the scenes, Converseon has had its team of data scientists and machine learning experts toiling away on taking on a particularly difficult challenge: how to provide the next generation of text analytics for the social age that is close to the gold standard of human coding so that we can do so at scale. Language is a challenge – sarcasm and slang make up vast parts of social conversations. Syntax is still not fully understood.
But plunge into this we did, and for good reason: we have long recognized what many of you also probably have too — that text analytics, sentiment analysis, etc for social conversation data was well…just not very good. In fact it was quite poor. And that limits its value and uses.
We also knew that to truly do it right it would take a massive effort and time. Three years in time, in fact. And millions of meticulous human coded records across industries and brands. We recognized the need to build an end to end semi-supervised system that would allow it to continue to evolve and learn as human language evolves and transforms; one that could be trained to specific industries and companies. Because we know that language means different things in different contexts. Off the shelf, generic solutions simply couldn’t get us to where we wanted to go.
But we made this effort because we believe that solving this challenge would open up a world of amazing insight and value.
And today, almost a thousand days later, we have achieved what we believe to be the most accurate social intelligence data in the industry — which now enables us to fuel many other applications, including advanced uses like predictive modeling.
Today we introduce to the world ConveyAPI.
Yes, the performance numbers cited below are impressive…and they’re real. In fact, in transparency, we have put forth not only how we tested the system but set forth a process we think the rest of the industry should follow so that everyone can have some standards that they can believe in and work with.
ConveyAPI is designed to truly convey the meaning of social conversation. We look forward to showing it to you as it rolls out.
For a buyer of social media analytics, comparing the performance of various technologies is nothing short of baffling. This is especially true with respect to sentiment analysis — indeed text analytics in general — where scientific jargon, marketing puffery, and a laundry list of features can often obscure what really matters: using a technology meant to measure human expression, are we obtaining the value of a human analysis?
This notion of human performance as the ultimate goal is based on an important observation: when people analyze social media, we get valuable results.
When we built our social text analytics solutions, we recognized that, if only we could somehow take a few thousand people, shrink them and put them into a little box, and then get them to work thousands of times faster (to deal with seriously big data), we would have an incredible solution to our clients’ problems. Yes, people do make mistakes, and they disagree with each other about things. (Consider: “At this price point, I guess the smartphone meets the minimum requirements”. Three different people might fairly call this either positive or negative or neutral.) But even though human performance is imperfect, we know from our long-tested experience that human analysis provides all kinds of value that clients need.
So, when building and benchmarking our social media analysis technology, we set our sights on how close our system could get to human performance. One doesn’t need the technology to be 100% perfect, because people aren’t perfect, and we know people can get the job done just fine. (See the second paragraph again.) The right goal is for the technology to be as good as people.1
With that in mind, here’s how we’re approaching the measurement challenge. The first step is to figure out how well people can do at the analysis we care about, so we know what we’re aiming for. How can you do that? Well, take someone’s analysis and have a second person judge it. Hmm. Wait a second. How do we judge whether the second person is a good judge? Add a third person to judge the second person. How do you now judge whether the third person is a good — Uh oh. You see the problem.
The problem is that there’s no ultimate, ideal judge at the end of the line. Nobody’s perfect. (But that’s ok, because we know that when people do the job, it delivers great value despite those imperfections. See that second paragraph yet again.) As it turns out, there’s a different solution: let your three people take turns judging each other. Here’s how it works. Treat Person 1’s analysis as “truth”, and see how Persons 2 and 3 do. Then treat Person 2’s analysis as truth, and see how Persons 1 and 3 do. Then treat Person 3’s analysis as truth, and see how Persons 1 and 2 do. It turns out that if we take turns allowing each person to define the “true” analysis for the others, and then average out the results, we’ll get a statistically reliable number for human performance — without ever having to pick any one of them as the person who holds the ultimate “truth”. This will give us a number that we can call the average human performance. 2
If we want to know if our system is good, we’ll compare how it does to average human performance. It’s the same turn-taking idea all over again, this time comparing system to humans rather than comparing humans to humans. That is: Treat Person 1’s analysis as “truth” and see how the system does. Do it again with Person 2 as “truth”. And Person 3. Average those three numbers, and we’ve got raw system performance.
The final step: what we really want to know is, how close is the raw system performance to average human performance? To get this you divide the former by the latter to get percentage of human performance. For example, let’s suppose that the average human performance is 74%. That is, on average, humans agree with each other 74% of the time. (If that number seems low, yes, you guessed it; second paragraph.) Suppose Systems A and B turn in raw system performances of 69% and 59%, respectively. Is one system really better than the other? How can you tell? System A is achieving 69/74 = 93% of human performance. System B achieves 59/74 = 80% of human performance. Out of all this numbers soup comes something that you can translate into understandable terms: System A is within spitting distance of human performance, but System B isn’t even within shouting distance. System A is better. 3
What we’ve just described is a rigorous and transparent method for evaluating the performance of social analytics methods. When you’re evaluating technologies on your short list, we suggest you use this approach, too.
If you don’t have the resources for such a rigorous comparison, let us know, and we’ll lend you a hand.
1 In a seminal paper about evaluation of language technology, Gale, Church, and Yarowsky established the idea of benchmarking systems against an upper bound defined by “the ability for human judges to agree with one another.” That’s been the standard in the field ever since. (William Gale, Kenneth Ward Church, and David Yarowsky. 1992. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In Proceedings of the 30th annual meeting on Association for Computational Linguistics (ACL ’92). Association for Computational Linguistics, Stroudsburg, PA, USA, 249-256. DOI=10.3115/981967.981999 http://dx.doi.org/10.3115/981967.981999).
2 This is an instance of a general statistical technique called cross validation.
3 You’re about to ask how we decide that 93% is “spitting distance” and 80% isn’t, aren’t you? Fair enough. But we never said that the buyer’s judgment wasn’t going to be important. Our point is that you should be asking 93% of what and 80% of what, and the what should be defined in terms of the goal that matters to you. If what you’re after is human-quality analysis, then percentage of human performance is the right measure. Subjectively we’ve found that if a system isn’t comfortably over 90% on this measure, it might be faster and more scalable, but it’s not providing the kind of quality that yields genuine insights for buyers.
I presented last week at the 2012 CASRO Technology Conference. Having come from a ‘traditional’ research background it’s always great to catch up with old colleagues and find out what’s top of mind for them. Additionally, from a presentation perspective, there are so many parallels between survey methodology and social research that it’s relatively straightforward to address some of the methodological issues that surround the latter by borrowing concepts from the former.
The focus of my presentation was how researchers can use social data. I wasn’t coming at this from a business function perspective – i.e. discussing how to use social for product development, or competitive insights, etc. – but rather from the perspective of thinking about some of the questions researchers are now having to address in terms of the enabling technologies used for analysis of social data.
First, researchers need to ensure they’re looking at both the text of a social media message and its metadata. The metadata often include information that is crucial to derving insight; perhaps from the point of view of understanding the consumer segment of the author, which when aggregated (and anonymized) is crucial to understanding if you’re analyzing the right conversations. It’s just like a survey – you need to know who’s answering your questionnaire.
Second, social data need to be sorted before you can work with it as a researcher. Messages need to be sorted by relevancy, sorted by the topic discussed, sorted by the sentiment expressed, by the emotion shown and so on; with the exception of relevancy, what you’re sorting for depends on the type of research question you’re going to use the data to answer. So researchers need to define what the pieces of information are that they’re sorting for – and make sure that the data are classified in such a way that this sorting is possible.
Third, there are a number of ways you can do this sorting. Machines are great at doing a lot of tasks in a short space of time, and humans are great at doing tasks to a high degree of quality. If you can combine those approaches, you’re getting the best of both worlds. That’s what machine learning does, and we’ve spent a lot of time here at Converseon developing ways to measure – and optimize – the performance of our machine learning technology to the extent that in many scenarios we cannot tell the difference between a human and a machine. One crucial point that we’ve embedded in our measurement efforts is that not all mistakes are created equal; in most cases, it’s worse to classify a positive message as a negative message than it is to classify it as a neutral one. So if you’re just looking at a machine’s ‘accuracy’ count you might in fact be getting a distorted picture of how well or badly a machine’s doing something.
You can download the presentation here or email me at email@example.com if you’d like to talk more about accurate social data and how to use it for market research purposes.
The explosion of social media data is having a transformative effect on market intelligence and research.
An Economist article from late last year states the context well: “Big companies now obsessively monitor social media to find out what their customers really think about them…As communication grows ever easier, the important thing is detecting whispers of useful information in a howling hurricane of noise…the new world will be expensive. Companies will have to invest in ever more channels to capture the same number of ears. For listeners, it will be baffling. Everyone will need better filters—editors, analysts, middle managers and so on—to help them extract meaning from the blizzard of buzz.”
Being able to extract this meaning is a challenge – it’s not easy to do – but it represents a significant opportunity for market researchers to gain competitive advantage. In a series of posts, we’ll be addressing some of the questions that a researcher has to answer before they can drive that advantage for their employer. These questions include:
1) How do I make sure my research is based on relevant data?
2) Which social data are most useful to a researcher?
3) Is ‘automated’ analysis – for example, sentiment analysis software – usable by market research professionals?
Before we address the first question, let’s take a moment to consider the context. The fundamental challenge to anyone trying to make sense of how social media fits into a researcher’s toolkit has to understand is that social media ‘conversation’ essentially has two different types of use. First, it can be used for ‘monitoring’ purposes (e.g., crisis response or customer service); second, it can be used for ‘insights’ purposes (e.g., analyzing online conversations that might inform product development, or as a way to measure brand perception). These types of purposes have different requirements in terms of data, but the way in which social media monitoring tools are being used today often obscures this distinction. Buyers end up looking for a silver bullet to hit both targets. The difficulty with that approach is that when you’re using a monitoring tool for customer service, for example, you need to see every message that might be relevant; you have to err on the side of making sure you don’t miss any content, so you undoubtedly set your keywords or searches up with that in mind. On the other hand, someone trying to analyze social media conversation to understand whether their company’s key brand values are resonating online, for example, needs to make sure that they’re only analyzing relevant content; irrelevant content here only serves to muddy the analytical waters.
The competition between these types of purpose – an analog of the trade-off between recall and precision in text analytics, in fact – should be clearly understood by any researcher looking to use social media for market research purposes.
So how do we identify which data are relevant and represent the opinions that we want to analyze? Last year, my colleague Chris Boudreaux co-authored a research paper looking at the correlation between online sentiment and an offline brand-tracking. The research showed that there is a correlation between the two measures, but only after controlling for one or a range of factors. One of the controls identified as being key to any correlation was making sure that the person commenting online had experience with the brand in question.
This makes total sense: make sure you’re listening to the right people. Analyzing social media data without controlling for whose comments you’re looking at would be like sending out an online survey to everyone in your sample database; you just wouldn’t do it. Do you want to listen to what your customers thinks about your latest product? If so, don’t listen to your own employees, and don’t listen to your competitors. The opinions of both of these groups have their place, but not to answer that specific question. So how do you configure your social media research with that in mind?
First, at the author level, you can choose to only include messages in your analysis that are posted by the people whose opinions you’re interested in. The way you define groups of people here may in fact map to your existing customer segmentation taxonomy.
Second, you could choose to ‘listen’ only in those venues where the audience whose opinion you’re interested in is likely to be engaging.
Third, you can make sure that you’re only including in your analysis messages where your product is being talked about in a relevant context.
Using these three approaches will help you make sure you’re analyzing data from the relevant people, discussing the relevant issues – giving you a solid foundation from which to start your analysis.
For information on how Converseon can help you get to the right data, contact firstname.lastname@example.org.
Language is the holy grail of artificial intelligence. When we imagine sharing a world with smart machines, we don’t think about logic, or problem solving, or winning at chess. We hear HAL-9000 declining to open the pod bay doors, and the Terminator saying he’ll be baaack. Researchers have been working on building computers we can talk to for 60 years; in the 1990s, Bill Gates predicted that speech would soon be “a primary way of interacting with the machine”. So why aren’t we talking to our computers yet ….Or are we? Thanks to new developments in human language technology (also known as “natural language processing”) and text analytics, computers are analyzing everything from e-mail and tweets to clinical records and and speed-date conversations. How does the technology work, when does it work well (and when not), what’s it doing for us, and where is it headed.
Our senior data scientist, Jason Baldridge will be presenting.
Here are three ways to create differentiated and compelling content, that you won’t find mentioned in blogs about content marketing:
1. Web Apps
Build a unique and compelling web application to drive ongoing, evergreen traffic to your property. Common examples include a dealer locator on the web site of a tire manufacturer, but that is somewhat obvious. If you really want to differentiate your brand, create an application that no competing brands offer, like, oh… I don’t know… maybe an online database of social media policies — a simple example, but that page has generated thousands of visits per day for three years.
What kind of web app could you create to give your customers something of value, establish a relationship based on trust, and keep them coming back? Bonus points if you build it atop an asset that your competitors do not possess.
Do you know the attributes of your content that generate the greatest engagement or sharing? Most brands don’t. Most brands outside of the media industry don’t think about it at the level required to optimize content development at large scale.
3. Intent Research
When your marketers, or SMEs, or agency staff are writing content for the brand, they should have ready access to the latest search trends and conversation insights to understand the language that online audiences are using at that time.
Most use cases do not require the information in real-time, up-to-the-moment, but many campaigns would benefit from daily updates, if not weekly or monthly. When was the last time you wrote a press release whose keywords were informed by SEO and SEM goals, and the latest search volumes on those keywords? These tools and approaches iare growing more widely understood and blogged about, but almost no large brand is executing it with consistency.
Chris Boudreaux leads Converseon’s Strategy and Measurement practice, which designs and delivers social and digital measurement and content optimization services to global brands. Follow Chris on Twitter, or email Chris to continue the conversation.
2011 was a whirlwind here at Converseon. After more than doubling in size in 2010, our mission in 2011 was to focus on stabilizing and evolving new “socially-intelligent” solutions — products and services — that will come to market in 2012. In fact, we nearly doubled our technology spend in 2011 purposefully to build the robust infrastructure and technologies needed to help brands leverage social media to meet business objectives. Some of these are now in beta and others will be coming soon. On the services side, we doubled down on our talent and solutions — and expanded our offerings especially in the area of creative and social CRM consulting. In short, it was a time of great metamorphosis as we again challenged ourselves to evolve ahead of the marketplace and meet the needs of market as we move into 2012.
In fact, while we celebrated our ten year anniversary — and was cited by Shel Israel as the industry’s first pure play social media agency — we believe 2011 represented some of our most significant evolution internally. We did so because we see 2012 as the year of “social rigor” and have evolved our technologies and solutions in a manner to uniquely meet these market demands.
What is “social rigor?” In our experience, 2007-2011 represented a time significant experimentation at brands in social. The approach was often to seed the garden, see what took root, and let it grow, pilot, evolve and do so again. The result for some is messy gardens and far too unclear, in many cases, impact on business outcomes. This isn’t surprising, as it mirrors very much the earlier days of digital. But those days are coming to an end, quickly.
Santa’s bag is going to be full of iPhones and iPads this year, judging from the products people mention in tweets with the #DearSanta hashtag. Converseon pulled all of the tweets between 12/5 and 12/12 for a total of 10,680 messages. In these tweets, 25% mention a specific product by name in a positive manner for a total of 2,670 free consumer endorsements. Electronics was the most frequently mentioned product category, and Apple was the most frequently mentioned brand.
- Microsoft Xbox was the most frequently mentioned video game system, and Microsoft the second most frequently mentioned brand.
- Samsung’s Galaxy tab also had a strong showing, appearing in around 4% of #DearSanta tweets — this is the same number of tweets that mention the iPad.
- 40% of #DearSanta product tweets mentioned electronics products.
- Around 20% of #DearSanta product tweets mentioned apparel products.
- Shoes were the most frequently mentioned product type in the apparel category.
n = 800, confidence level of 95% and a confidence interval of +/- 4%
Customers are normally the best sources of product and marketing ideas*, and social media are perfect for harvesting those ideas, every day. In fact, one Converseon client recently used our social media research to dramatically improve campaign performance, simply by understanding how customers wished they could use the product.
Simply stated, they recognized a trend in real — and imagined — product usage, then tailored their marketing messages to fit customer beliefs and needs, in real time.
For example, see the following chart, which summarizes disguised conversation data pertaining to a technology server product shortly after its release to market:
At product launch, the server’s manufacturer promoted the server’s superior performance in business applications, such as retail analytics and cloud computing. During the campaign, our research found that customers were attracted to the server’s role in cloud computing, but expressed less interest in retail analytical applications. Instead, many conversations focused on the server’s reliability in data protection.
Recognizing the conversational trend, the brand adapted its marketing messages with significant improvement in the performance of the campaign.
In addition to improving campaign performance, social media research has helped brands understand how product applications evolve over time, as input into product development.
Learn how to use social media research to improve campaign performance or generate new insights for product development.
* “Re-invention”, as coined by social scientist Everett M. Rogers in his book, Diffusion of Innovations.