3 Ways to Improve the Quality of Your Website Data

Often, we think of data quality as an issue for our data warehouse or customer relationship management (CRM) system. In truth, data quality is an issue for all of us as users of the World Wide Web. In order to ensure the internet continues to be a useful resource, the indexes we use need to be reliable, current and accurate to make information easy to find.

Google is continually revising its algorithms in order to ensure publishers of quality content are rewarded with more prominence in search results. Its actions serve as a reminder that the web is continually evolving, and Google is effectively an enormous database that links data consumers with good quality information.

So: what can we do to give Google the quality data it craves?

1. Remove Duplicate Content

In search engine optimisation (SEO) terms, this is one of the core principles of quality content and quality data. In order to give users a good experience of the web, publishers and writers have a responsibility to produce original content.

In data quality terms, Google is asking us to deduplicate our content. When publishing on the web, we can use matching software to identify duplicates that need to be removed. And if other publishers have duplicated our data, services like Copyscape and CopyGator can match that content using fuzzy search and alert us to its existence.

2. Implement Schema.org Metadata

Google is keen to ensure the data we publish is rich and packed with detail, and that means ensuring that metadata is standardised in a format it can process.

Schema.org incorporates data from various web companies and search engines to standardise rich snippets, small chunks of data that enhance the search engine’s understanding of a page. Google Authorship is a good example of a rich snippet. So are the star ratings seen in search results. You can see other examples of Schema.org data in this informative article.

3. Use Co-Citation

As Google has tweaked and honed its algorithms, it has identified content themes using keywords. The problem is that keywords are easy to manipulate, and they can be used to trick search engines that mainly use automated methods to determine the theme of a page.

Now, Google is using something new: co-citation. The idea is simple: if you link to quality content, you associate your own content with it and create a semantic link. Even if no specific keywords are used, Google understands that the two articles relate to each other and share a common theme.

Think of co-citation as a way of assuring Google of the quality of your content. There’s no need to link keywords to emphasise their importance. Google can figure it out from the context of your link. All of the sites involved in this shared linking benefit from the assurance that they share the same high standards.

Why Web Data Quality Matters

In business, we ensure our data is deduplicated, matched and merged in order to create reliable records and (ideally) a single customer view. This gives everyone in the organisation a consistent and reliable outlook on the business’ activities, and it helps the business serve customers effectively.

On the web, data quality matters too: we all rely on search engines to present us with the best match for our query so we can find what we’re looking for quickly. Next time you produce content for your website, make a concerted effort to ensure the data you publish is of the highest possible quality.

Featured images:
Claire Broadley

Claire Broadley is a technical content writer working with DQGlobal.

comments powered by Disqus