Structured and unstructured data

data vizThe Guardian recently used a useful analogy to explain the difference between content and metadata. Content being the letter and metadata the envelope.

So, for example, in an email the “to”, “from” and “cc” fields are metadata, but the subject line is content.

Essentially, the metadata is structured and the content is unstructured.

In social media research, the distinction between the two is not always made clear.

While you can glean information from the structured data, analysing the unstructured data is the only way to uncover insights. It’s also the most difficult part to do (and do well).

Senior European consumer insight manager at Avery Dennison, Edward Appleton, defines insight as:

1. Invariably below the surface. It isn’t immediately visible or apparent.

2. Not already common knowledge or part of prevailing wisdom.

3. Leading to a new opportunity or growth potential that can be effectively exploited.

The structured data can provide the what, where and when, but not the how or the why.

Unfortunately, attempts to standardise the measurement of social media often focus on the structured data. The quantitative metrics like re-tweets, pins and likes. The kind of data that allows you to do a network analysis of who’s talking to whom, or (attempt to) measure ‘influence’ and ‘engagement’.

All of which can be interesting and provides a valuable context to any subsequent analysis of the content. However, when it comes to establishing a single framework for how to approach the research, treating the structured and unstructured data as the same thing makes about as much sense as having a single framework for running a focus group and a survey.

It also betrays a digital dualism in viewing social media as a single entity, in which a throwaway tweet about a trending hashtag is treated the same as an Instragrammed photo, or an in-depth discussion on a message board.

It further fails to differentiate in the many different ways different people use different sites.

Simply put, there will never be one way to interpret a conversation. It will always depend entirely on the context of how the discussion is being categorised and to what purpose.

You cannot always make the unstructured data from social media do what you want it do, which is why examining it in isolation does not always work.

Pursuing a single framework strikes me as erroneous as pursuing a single metric to measure influence. And you don’t find too many credible people continuing to advocate the latter.

Rather than imposing rigid standards, I think we should aim to explain clearly and transparently how we went about collecting, organising and interpreting the data in a way that makes sense to the original objective each time.

Creating broad guidelines, rather than a standardised framework, will also enable us to respond more quickly to emerging types of social media.

We should also always aim to distinguish between the structured data and the conversations that are not (until we categorise them in some manner) data as such.

You can bring order to unstructured data but you cannot impose order on it.