How MajesticSEO Metrics Predict Penguin Vulnerability

Posted By on Oct 10, 2013 | 2 comments




Hopefully by now you have heard of Remove’ems latest innovation Penguin Analysis. Many folks in the SEO world have been asking what factors used in our machine learned algorithm appear to have the greatest impact over Penguin vulnerability and how we collect those factors for our innovative Penguin Vulnerability Score. While most of the raw data is available over at Open Penguin Data, we definitely have a few other tricks up our sleeves.

However, today I want to talk specifically about some of the factors that we were able to build using data from MajesticSEO’s site explorer and, in particular, their amazing API. We grabbed millions of data points from Majestic SEO and crunched them into little 1s and 0s representing the triggers behind the Penguin algorithm. So, let’s get to it. What factors from MajesticSEO’s data appears to have the biggest impact on the Penguin Vulnerability Score…

Domain Trust Flow

This one was shocking to say the least. Both MajesticSEO and Moz give us excellent metrics related to trust, but the single metric without any derivation that had the greatest impact of the trust rankings was MajesticSEO’s Domain Trust Flow. (To be fair, a derived ranking of MozTrust less than MozRank was slightly more predictive, but is not a single metric in itself).

domain-trust-flow

Majestic SEO provides a great tutorial on what their Trust and Citation flow metrics are exactly. For our machine learning project, we needed to compact this value into 1s and 0s rather than a sliding scale. Subsequently, we built a metric called “Domain Trust Flow 1+ Standard Deviations Below the Mean”. Simply put, if a URL had Domain Trust Flow at the bottom 1/3 of all the links we analyzed, it would receive a 1. If it was above that, the URL would receive a 0. Using this methodology, we could highlight domains that performed particularly poorly on trust metrics.

Now, it is very important to point out that we are not saying that these metrics cause a Penguin penalty. Rather, they simply help us predict those that have characteristics similar to sites regularly impacted by Penguin. Whatever the factors are that improve ones Domain Trust Flow are the ones you need to target, not the metric itself. Google isn’t consuming MajesticSEO’s API to determine which pages to penalize. However, if you have a low Domain Trust Flow, you better be on the watch out.

No Government Links to Domain

MajesticSEO’s API provides quick and ready access to a number of link metrics, one of which is links from .gov domains. The lack of .gov links was more than twice as predictive of a Penguin penalty than the lack of .edu links. This seems to be a pretty fair assessment, as getting .gov links is sufficiently harder than getting .edu links. Of course, it is important to recognize that having .gov links is not necessarily an inoculation against Penguin, but having content and a site that earns .gov links likely is. So don’t just go out trying to spam .gov sites for links. Create a site that deserves them, and work on outreach that helps reel them in.

The no government links does not appear to be influential at the URL level. It is far too sparse a data point (ie: too few URLs have any GOV links) to be an effective categorization metric.

If you are paying attention, you should have noticed that the No Government Links metric likely influences Domain Trust Flow from before. Getting trustworthy links is definitely starting to add up to avoiding the Penguin algorithm!

Anchor Text

Time and time again we return to this clear signal for over optimization. Optimized anchor text is still the fastest way to rank and the fastest way to get penalized. Simply having a single link with the anchor text set to the exact keyword for which you are trying to rank is one of the strongest influencers.

anchor-text

However, what we find to be most influential is the mix of anchor text metrics across the board: the combination of phrase match anchor text to the domain or page plus exact match anchor text. The higher these metrics, such as the most common anchor being your keyword, the greater your risk for being caught in a Penguin update.

Conclusions

While there are tons of other factors that go into the Penguin Vulnerability Score, MajesticSEO is definitely one of the larger data sources upon which we relied for building our risk assessment model. If you haven’t had a chance yet, sign up and get your Penguin Vulnerability Score which is only $.99! If your score turns out risky, you can use the tool to find which of these factors appears to be impacting your score. After that, head on over to MajesticSEO to dig deeper into your link profile to find the problem areas.

2 Comments

  1. Hmm… not sure about the governmental links that you’re talking about. Penguin doesn’t necessarily just focus on links coming from pages with a low TrustFlow either, so not a great metric to base it around.

    I’ve been working with a lot of websites that have suffered at the hands of Penguin 1, 2 and now 2.1. The main thing that I always look for is anchor text distribution. If a linking URL has a low TrustFlow it doesn’t necessarily mean it is ‘low quality’. It could be a new URL or just not have many links pointing to it – the Penguin algorithm isn’t going to penalise you for that.

    Using Majestic SEO, I find the best way to find patterns is to export all of the links to a .csv and use the COUNTIF function on Excel to find the number of times that any given anchor text has been used to link to a site. Once I have this for all of the anchor text, I place it into a bar chart, sorted from highest to lowest.

    With this, you can start to see the distribution of anchors and see if there’s any over-optimisation that’s occured. I also do the same for Trust/Citation flow around each anchor just to get an idea of the type of site that’s linking.

    Having said all of this, this is just one factor of Penguin and there’s loads more.

    Post a Reply
    • Hi Matthew,

      Thanks for your comments. These correlations are just part of the OpenPenguinData.org analysis, which we ultimately built into the machine-learned model for Penguin vulnerability. Anchor text related issues are one of the largest factors, but only about a quarter of sites impacted by Penguin even have exact match anchor text for the keyword for which they lost traffic.

      Russ

      Post a Reply

Trackbacks/Pingbacks

  1. Twitter Rankings! | Majestic SEO BlogMajestic SEO Blog - […] is also a measureable quality signal, Something that has been shown statistically internally and by third parties. We also …

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>