Tumgik
#everything should be suitably tagged for filtering purposes
itesservices · 1 year
Text
Make Informed Business Decisions With Customized Data Processing Solutions
We live in a world of data. It can fuel almost everything in your business—from SEO to scalability, but only if you know how to harness its power. This is where data processing services come into the picture. It is the process of transforming raw humongous volumes of data into valuable information. Businesses across financial analysis to healthcare leverage data processing to make sense of the data. No doubt it has become an increasingly important aspect and a valuable tool for modern businesses looking to gain an edge in the industry.
Data Processing Cycle
Raw data is of no use to any business. It needs to be collected, cleansed, processed, analyzed, and presented in a readable format, making it fit for business consumption. By translating raw facts and figures into readable formats such as charts, graphs, and documents, employees can understand and use this data to propel growth—and this is what data processing does.
The process includes a series of steps where input in the form of raw data is converted into actionable insights using machines. The entire procedure is repeated in a cyclic manner and each step is taken in a specific order. The output of the first cycle can be saved and stored as well as fed as the input for the next data processing cycle.
Step 1: Collection
Data collection lays the foundation for the data processing cycle. The quality of raw data collected hugely impacts the output. Therefore, data should be collected from defined and verified sources to ensure that subsequent findings are valid and reliable. This includes user behavior, website cookies, monetary figures, profit/loss statements of a company, and so on.
Step 2: Preparation
As the name suggests, data is prepared in this stage. The raw data is sorted and filtered to eliminate inaccuracies, duplication, miscalculations, incomplete, or missing entries. It is then transformed into a suitable format for further processing and analysis.
The data cleansing/preparation step removes bad data (redundant, erroneous, or unnecessary data) to get high-quality information at easy disposal. This info can be further used in the best possible way for business intelligence. In short, the purpose of this step is to ensure that only the highest quality data enters the processing unit.
Step 3: Input
Raw data is translated into a machine-readable language, which is then fed into the processing unit. The input can be in the form of data entry through a keyboard, RFID tags through a barcode scanner, or any other input device.
Step 4: Processing
Here, raw data is processed using Machine Learning, Deep Learning, and Artificial Intelligence algorithms to generate an output. Based on the source of data being processed, such as online databases, data lakes, connected devices, etc., this step may vary slightly from process to process as well as the intended use of the results.
Step 5: Output
The input is finally translated and displayed in a human-readable format such as pie charts, graphs, tables, audio, video, vector files, documents, etc. The results of this data processing cycle can be stored for further processing in the next cycle.
Step 6: Storage
As the last step of the data processing cycle, results from the previous steps are stored for future use. It not only facilitates quick transfer, access, and retrieval of information but also allows analysts to use this result as input for the next data processing cycle directly.
Types of Data Processing
Just as every business has unique requirements, there’s no one-size-fits-all approach that can be used to process data. Based on the source of data as well as the steps taken by the processing unit to generate an output, there are different types of data processing. Some of these are listed below:
Batch Processing: Like its name, data is gathered and processed in batches. This method is ideal for processing large amounts of data; for example a payroll system.
Real-time Processing: You can use this type to process small amounts of data and get the output within seconds. Withdrawing money from an ATM is an apt example of this.
Online Processing: As soon as the data becomes available, it is automatically fed into the CPU. Online processing is best used for the continuous processing of data. For example, barcode scanning.
Multiprocessing: It is also known as parallel processing. Unlike online processing, multiprocessing leverages two or more CPUs within a single computer system to break down data into frames and process the same. For example, weather forecasting.
Data Processing Applications
Data processing helps businesses to unlock their true potential, streamline workflows, enhance security, make data-based decisions, and maintain their competitive edge. Here are some important reasons why companies should lay special emphasis on high-quality data processing:
Extract Valuable insights: Data processing enables enterprises to extract useful information and game-changing insights from large complicated sets of raw data.
Foster Data-driven Decision-Making: Organizations can make decisions based on in-depth insights gained with the help of customized data processing solutions. Hence, it becomes an essential tool for business intelligence.
Enhanced Efficiency: You can leverage data processing insights to streamline operations and remodel existing workflows (if needed) by identifying inefficiencies in the process and optimizing them.
Customer Behavior Analysis: Customer data like their demographics, purchasing behavior, and frequency, including items bought, etc., can be processed to determine their purchasing patterns and preferences. This helps e-tailers to develop customized marketing strategies and improve customer satisfaction.
Delivering Better Treatment: Data processing insights can be used to improve diagnosis and treatment plans in the healthcare industry. Patient data such as medical history, symptoms, and test results can be processed to get a comprehensive understanding of their condition. This results in improved patient outcomes and better treatment decisions.
Financial Management: The financial sector leverages data processing results to monitor and manage financial performance, which includes monitoring cash flow, tracking investments, and analyzing financial statements. Processing financial data enables enterprises to make informed decisions about investment, spending, and other financial management strategies.
Wrapping Up
Data processing is the method of gathering and deciphering raw data into valuable information. Experienced data processing companies can help enterprises to collect, filter, sort, process, analyze, and present their most valuable digital asset into usable insights. Thus, businesses of B2B and B2C contexts can leverage these insights to create data-driven strategies and maintain a competitive edge.
Read here the originally posted content :
0 notes
john-dennis · 3 years
Text
Transform Your Team Collaboration With Team Chat Software
When building an organization of different employees, you must ensure that your staff communicates effectively. Smooth and swift communication will ensure that information and instructions can be transmitted seamlessly. There are various means and channels through which staff members can communicate effectively, but Team Chat Software stands out from the rest of the pack.
Communication media has undergone numerous changes that improved its scaling capabilities. For example, team chat software provides organizations with a simple means to communicate with administrative staff and team members. Some people prefer to refer to it as a business communication tool or channel, but this application is more than just a communication tool. It's powerful enough to connect multiple departments and reach people in other countries.
Tumblr media
Recently, team chat apps and collaboration tools have become an increasingly popular solution for small and large companies. If you operate a small remote business, this communication tool can hire small staff in different locations and quickly integrate them into an efficient communication channel. In addition, this software solution can also foster collaboration between employees and team members. This article will look at how team chat software can spur cooperation among employees.
Let's begin!
How Can Team Chat Software Transform Team Collaboration?
Many organizations have gradually agreed that team chat software can boost their work operations. According to reports gathered by Statista in 2016, 53% of organizations have successfully adopted and implemented team chat software into their work operations. These small and large companies have fallen in love with the increased functionality and scale this app offers. As communication between employees becomes more crucial than ever, software solutions that can provide suitable solutions must be developed too. Now, there are hundreds of business communication tools to choose from. But it would help if you also learned why they are important.
So, how can a team chat software inspire more team collaboration in your work operations? The following points explain how these apps can affect workplace interaction;
It can help to manage conversations
In simple terms, team chat software is designed to allow teams to communicate optimally. Everyone and anyone will be carried along as duties, roles, and responsibilities are handed out by administrative officers.
Modern-day team chat software is customizable and can manage conversations on any level. Team Chat Software is beneficial because it can allow different scales of conversation. It can support interaction between two people or the entire organization at once. It may also boost the interaction between a single team or solely for administrative officers in certain scenarios.
Depending on the manner of operation in the organization, you can have preset channels for specific purposes or create them immediately if needed. This software will also allow workforce members to be tagged into conversations when their input is required.
It has an effective search filter
A crucial aspect of communication between employees is accountability. This software will be possible to keep an accurate record of what anyone says. Then, when you need to refer to the conversation, all you need to do would be to search for a specific keyword.
This solution is quite impressive because you don't have to remember everything said during meetings. Instead, you can go back to refer to it. These conversations will be recorded in compliance with workforce regulations. Employees should be encouraged to keep private conversations of these official channels to prevent such content from unwanted discovery.
It allows employees to see and hear each other regularly
There's no better way to foster collaboration and teamwork in a workforce that provides them a communication channel to interact through text, voice, and video. Some conversations and meeting types are better organized when participants can see and hear each other, even in the same geographical location. Voice and video communication are now an integral part of most business communication tools. These tools boost teamwork and efficiency through clear communication in the long run.
It encourages faster decision-making
When members of your workforce can communicate over long distances without having to see one another physically, your decision-making will be remarkable. To make crucial decisions, employees no longer have to travel long distances. Instead, they can log on to the communication tool and let their views be aired. As a result, different teams and departments in organizations will meet on short notice and ponder important issues. This ability to make prompt decisions can help compensate for unplanned market changes and some form of crisis.
It allows joint sharing and storage of files
Quick haring and storage of files can save time spent on work operations. When using team chat software, every member of the organization will be provided with equal access to a database of files. Employees will also be able to share files quickly with whoever needs them. These shared files will be stored in the conversation history and readily made available when they are required. It can be searched for with the tool's filter and used for future references.
Conclusion
Finally, we've come to the end of this article. That's all on how team chat software can boost teamwork and team collaboration. Before the advent of this software solution, organizations had to rely on manual and less effective communication channels. These channels were slow and failed to involve more than a few employees at once. However, modern-day innovative solutions have taken care of all these problems. If you're an organization that cares about staff collaboration, you should consider investing in a business communication tool. You'll be taking a significant step towards boosting your efficiency and productivity. It could be the final piece that you need for an autonomous unit.
0 notes
onstipe · 4 years
Text
How to Embed Instagram Feed on Your Website for Free
Tumblr media
From 1 million to over 1 billion users in 2020. Amazing!
It’s all about Instagram. Instagram doesn’t need any kind of introduction now. Its popularity and huge numbers of daily active users tell everything.
Assume a day without Instagram? Like a nightmare. Right!
This is the addiction of Instagram that we can’t spend even a single day without it. Even we can’t remember the days before Instagram.
Now, it is basically a part of our daily life, where we wake up in the morning checking our Instagram is our first task of the day.
From I don’t use Instagram to I can’t live without Instagram, we all have faced that era.
Not only normal people like to use it frequently, but brands also rely on it in their day-to-day marketing planning.
Why not? Because Instagram is the most powerful social network for marketers to reach out to their target audience.
Instagram is one such platform that helps to bridge the gap between brands and audiences in a direct way.
Instagram is the only platform that gives brands a great opportunity to increase awareness and generate authentic user-generated content.
But how does it sound? If you can connect your website visitors to your Instagram and can display your user-generated content to them.
Sounds great, right?
So, embedding Instagram feeds on websites is the best option for you.
Now so many questions are popping up in your mind? Like how is it possible? What are the benefits? Is this free or affordable?
Calm down, calm down.
Yes, it is possible. And Yes free and affordable options also are available.
Wait, what?
Free?
Yes, you heard it right. Absolutely free.
Now you can embed Instagram feeds on your website with free and affordable methods that can easily fit in your tight budget. But before starting explaining the methods, you need to understand the benefits of embedding Instagram feeds on a website.
Benefits of embedding Instagram feed on your website
There are so many benefits of embedding Instagram feeds on websites. You can improve your brand image, increase more followers, likes, build more trust, and can boost your marketing game anytime.
Not enough? Then there are more–
Drive your website visitors to Instagram profile;
Generate more authentic and real user-generated content;
Improve website UI by adding Instagram feed on the website;
Turn your leads to sales, boost conversions;
Build more trust among users and improve brand reputation;
Extend the reach of your Instagram content;
Generate more conversations and discussions;
Hold visitors for a long time and improve dwell time;
Encourage website visitors to use your brand hashtag;
Increases product visibility and reach;
Add more spark and style to the website;
Display real reviews of real users – Social Proofs;
Increase audience engagement.
Are you feeling the same? Excited to embed Instagram feed on your website after knowing lots of benefits.
So, what are you waiting for?
Let’s start embedding Instagram feed on websites with us.
In this blog, we will guide you on how to embed Instagram feed on the website. So, you can save your time and money and can show Instagram photos on the website beautifully.
Best 3 ways to embed Instagram feed on your website
By using these 3 ultimate options, you can easily embed Instagram feeds on the website.
Onstipe– Best Instagram Aggregator Tool
Instagram’s Official Embedding Tool – Individual Posts
Instagram Feed WD – Feed Plugin
Table of Contents
1. Embed Instagram Feed Using Onstipe
What is Onstipe?
What is an Instagram Wall?
How to Embed Instagram Feed on Website?
2. Instagram’s Official Embedding Tool – Individual Posts
3. Instagram Feed Plugins
1. Embed Instagram Feed on Website Using Onstipe for Free
Before starting to embed Instagram feed on the website using Onstipe, you need to know about Onstipe.
A). What is Onstipe?
Onstipe is an Instagram aggregator tool that helps you to collect, curate, and display Instagram feed on Website or Tv/digital screens. Apart from Instagram, you can also use Onstipe to aggregate social media content from multiple social media channels such as Facebook, Twitter, LinkedIn, Vimeo, etc.
It is also known as social media aggregator, hashtag aggregator, feed aggregator, UGC aggregator, and so on.
Onstipe allows you to collect Instagram feeds through handles or hashtags and creates an amazing Instagram wall. Onstipe helps brands to boost their marketing efforts and increase audience engagement. Onstipe can be used for various purposes such as events, websites, hashtag campaigns, conferences, weddings, eCommerce, etc.
This is the best free option for everyone. You can embed Instagram feed on the website for free with Onstipe. Onstipe has a free forever plan for startups and small or medium-sized businesses. You can create a free Instagram wall for your website using Onstipe and embed the Instagram feeds in a beautiful way. It also has premium plans, you can unlock more features of Onstipe in these pro plans.
B). What is an Instagram Wall?
Instagram Wall is a platform that helps you to display Instagram feeds or posts beautifully on websites or on digital screens. You can create an Instagram wall with the help of the Instagram aggregator tool such as Onstipe.
Instagram aggregator tool empowers you to collect all the live Instagram feed in one place, that can be displayed anywhere on the website or screen.
The Instagram wall is the most powerful way to increase engagement and generate buzz on websites or during events, conferences, exhibitions.
If you are a brand and seeking a solution that can help you to boost your brand’s social reach and build trust among people, then you should employ an Instagram wall for your marketing strategy.
By embedding Instagram feed on website through an Instagram wall, you can display your earned user-generated content in an attractive and effective way to your visitors and audience.
C). How to Embed Instagram Feed on Website
Create a free account on Onstipe
Add a hashtag or handle
Connect your Instagram business account (Connect with Facebook)
Generate your Instagram feed embed code
Copy and paste the Instagram feed embed code on your website.
As simple as it looks, right.
For a more detailed way, read the step-by-step guide below.
Step-by-Step Guide to Embed Instagram Feed on Website
Follow the below steps to embed Instagram feeds on the website-
1. Create a free account with Onstipe (start with a 14-day free trial). If you are an existing user then log in to your Onstipe account from here.
2. Create a Stipe. Enter a suitable name for your Stipe (Instagram Wall)
3. Select Stipe for Website Embed.
4. Click on the Create button.
5. After clicking on the Create button, you can see multiple social channels. Select Instagram Business Icon as a source.
6. Select your preferred connection type >> Hashtag, Handle, or Tagged
Hashtag (#) – To embed Instagram hashtag feed on website. Collect your earned user-generated content or hashtag included posts.
Handle (@) – To embed Instagram profile on website. Collect your profile Instagram posts using your username or handle.
Tagged (@) – To embed Instagram tagged feed on website. Collect posts in which you are tagged.
7. After choosing your connection type, tick check, or untick the Enable Moderation option according to you.
~If you Tick check this option, then your Instagram feed will go under the private section. From where you can make them public manually. If you want to make all the feeds public then leave this option Untick.
Now click on the Create button.
8. Connect your account to an Instagram Business Profile to authorize your feed.
For feed authorization, you need to sign in with your Facebook account.
Click on Continue with Facebook. Your Facebook account must be connected with your Instagram Business Profile and has a business page.
9. After feed authorization, your Instagram feed will come in the Moderation section.
10. If you wish to filter and design your feeds then you can do it from the Moderation section and Layout design section. If you are okay with your feeds then skip this step.
11. Now click on the Display Option tab. And click Embed on Website.
12. Set Width and Height for your Instagram wall as per your need.
13. Click on Copy Code to copy the generated embed code. You can see a preview of your Instagram wall anytime by clicking on the Display Stipe button.
14. Simply, paste the generated embedded code on any page of your Website. Place this code into the <body>section of your web page.
Now enjoy your live Instagram feed on Website.
2. Instagram’s Official Embedding Tool – Individual Posts
Instagram gives you an easy and free way to embed Instagram feed on website. You can use this feature to embed your Instagram feed on any website building platform such as WordPress, Wix, Weebly, Shopify, Jimdo, Squarespace, and so on.
But there are two major drawbacks you can face if you go with this feature of Instagram.
First, you can only embed a single Instagram post at a time using this feature. You need to add all the posts one by one manually on the website. So there are no updates available and you can’t get posts automatically from Instagram.
The second major drawback, you can’t customize and design your Instagram feed according to you.
Follow the below steps to embed Instagram post on website Using Instagram’s Official Embedding Tool-
1. Open your Instagram Business App in a web browser.
2. Pick a post that you want to embed and open it.
3. Select the three dots menu option from the right top corner of the post.
4. Choose the Embed option and copy the code.
5. Simply, paste the copied embed code on any page of your Website.
Now you can see your Instagram post on Website.
3. Instagram Feed Plugins
If you have a WordPress website then plugins can do the job for you. The third-party plugins allow you to embed your Instagram feeds on the website with ease.
There are a lot of plugins available, some are free and some are paid. Choose the best Instagram feed plugin as per your need and budget too.
You can visit the plugin section for WordPress plugin from here and get the best one for you. Make the right choice and choose the perfect one from Smash balloon, Instagram feed ed, Social feed gallery, or many more.
Summing up
Here we have shared everything about how to embed Instagram feeds on a website free. So, what are you waiting for? Start displaying your all earned user-generated content on your website and show your website visitors your Instagram presence.
Source - How to Embed Instagram Feed on Your Website for Free
0 notes
ghostmartyr · 7 years
Text
Attack on Titan Episode 30
LIVEBLOG
I have this acquaintance who seems to believe that I’ve been unfairly circumspect regarding my opinion of this (and other) episodes. I am aghast (aghast, I tell you) at this ruthless judgment of how I best enjoy my cartoons.
To defang such a callous accusation, this seemed like the way to go.
(Featuring xtreme whining, manga spoilers like whoa, more whining, and maybe a few spots of joy. Who can say. I haven’t started yet, and I’ve never done a liveblog before. It’s a surprise for everyone.)
So, Attack on Titan Episode 30, “Historia.” Let us begin!
I appreciate that it starts with the opening instead of pretending that the content outside of this week means anything.
Tumblr media
Tag your spoilers though. Sheesh. That’s going to continue to bug me every time I watch an episode from this era.
Yes, we could have given these characters with a surprising amount of lines this season something new and exciting to do in the opening considering that we’re going to exclude them from all the group shots (they aren’t traitorous  enough for traitoring, but boy howdy are they too shady to pal up with their innocent buddies), or, or... we could just go ahead and borrow animation from six episodes in and throw it through some filters.
Complete with dramatic stills. Still. The other one can have dramatic motion. She’s going to be a main character soon, after all.
It still makes me happy that the opening spends time remembering that these two matter outside of everything else that’s going on. Their dramatic anvil of emotional trauma has meaning enough to be dropped in the first minute and thirty seconds of every episode kind enough to skip flashbacks. Most good and excellent.
I like this opening on its own, too. The first one has the epic music that goes with anything, the second has the epic music and really tired anime tropes, but this one manages to grasp that the epic music belongs with suitable animation. I don’t know how it would compare head-to-head, but this one feels like a more complete work.
But enough with the opening.
Bring me the feels that I have graciously waited four years for.
Tumblr media
Yes, good, excellent.
Tumblr media
...
You mock me.
I don’t understand. Is there something wrong with suddenly shifting your story’s entire focus to two girls who have yet to contribute anything relevant to the plot in a season where there are only twelve episodes and the fanbase has not been reared on monthly frustration?
Why would you want to give the filler moments to characters that people already know something about and care for? How very dare.
(I have watched this before, in case that was unclear, and I don’t remember my exact reaction to this episode opening with filler, but I do remember moments of pain as the snowy boot failed to lead to the scene I wanted it to.
You cut the flashbacks to taunt me with filler, WIT.)
However much it floats about the wrong people, the snow is really beautiful. I don’t live anywhere I get to experience snow, but I like the feeling of muted emptiness it brings an atmosphere. Things are allowed to be still and quiet.
As a bunch of young recruits are trying not to freeze to death, but it’s okay. We already know everyone we care about makes it through.
Tumblr media
Hark, the first reference to this episode’s true purpose!
(Why couldn’t Crunchyroll show me kindness and use the K version of her name? It isn’t like it’s going to matter soon.)
Tumblr media
I am against this filler on general principle of not getting exactly what I want at all times, but Mikasa showing awareness of what Krista gets up to is always going to blindside me with feels. Mikasa doesn’t know it, but they’ve both watched their mother die thanks to the world’s malevolence, and they both latch on to the person who comes to shape their new place in life.
Neither Eren or Ymir is especially delicate about it, but when they speak their hearts, Mikasa and Kristoria hear them like they’ve heard nothing else.
Of course, that’s all based on later things, but whenever Mikasa has a scene with Kristoria, there’s this extra weight of subtextual understanding that just sings to me.
It helps that it’s mostly one-sided. Everyone in the 104th knows Mikasa, because how could you not, but Kristoria, outside of being rescued repeatedly and bargaining for certain people’s lives, doesn’t show any special acknowledgment of Mikasa.
Meanwhile, Mikasa notices Krista. She’s not the blonde or tiny one, she’s the one who sticks with Ymir--or, in this case, stays behind with Daz.
In this section of the story, Mikasa really has no idea how alike she and Kristoria are, but I like that even before she knows, she notices. ...Or maybe more accurately, some part of the writing staff notices the similarities, so allows them to be continually linked.
...I really like Historia and Mikasa’s nonexistent irrefutable bond.
Why is the OVA that has more of it not stateside when we were given the crack one.
BUT HEY GUESS WHAT THAT’S NOT WHAT THIS EPISODE’S ABOUT!
Tumblr media
Look, look, it’s what the episode didn’t start with.
Tumblr media Tumblr media
...
...
Oh help.
Excuse me, I think my heart grew three sizes and I need to lie down thanks to unforeseen feels because oh wow, this is somehow the perfect and I don’t know how to deal.
How.
Just how.
I don’t care if it’s a translation flair or not. There’s something--heck, just help.
Not “no.” “Never.”
Kristoria is a melodramatic stubborn moppet and what even.
You’re dragging a dying body through the snow. Be less perfect.
Ymir, of course, continues to talk, going through all the reasons why a dead body is going to be involved in their night--because some titans get their energy from sunlight, and some get it from pointing out as many inconvenient truths as they can in the space of a single conversation--and Kristoria, of course, continues to be perfect.
Tumblr media
I swear, my favorite part of half of the training scenes between these two is that Ymir spends most of her time rightfully criticizing every single thing Kristoria does, and after the initial confusion, Kristoria just refuses to listen.
She puts up a good fight, and can talk with shining eyes about Sasha choosing to be herself regardless of her word choices, and play the heroic role of still believing that there’s a way out while she’s basically in the middle of a suicide attempt, but she is so, so wrong.
This kid is so wrapped up in whatever role her head thinks she’s playing that she listens to her common sense maybe about half as much as any rational person would. Then she uses whatever’s left to try and defend herself to Ymir, because Ymir has the nerve to suggest that she’s thinking about as little as she actually is.
Tumblr media
And good grief I just love this scene.
Because yeah, she’s about ten seconds away from being bashed over the head with how unproductive this all is, but look at that face.
The anime version is going with a lot less dead eyes here, and I should and will maybe find time to complain about that, but what it’s turned so horribly glorious is Kristoria’s overall tone when she starts telling Ymir to get lost. It’s downright mocking.
Tumblr media
Also fake.
So, so so so fake.
Yet somehow, one of the genuine things Kristoria does as Krista. She doesn’t try to convince Ymir to save herself with a warm smile and proper actions; she plays Ymir’s own game and taunts her into wanting to leave Kristoria and Daz behind.
Kristoria’s basically given up at this point. She’s marching in the middle of a blizzard tugging a pre-corpse behind her, and I don’t think she considers her own life to be in better shape than Daz’s. They’re both dead. Game over man, game over.
Ymir’s outside of that picture, though. Ymir’s heart is still beating, and she obviously doesn’t want to stay, so why should she stick around and watch all of this misery?
This is the early version of how Historia always negotiates. Whenever there’s something she wants, she picks her arguments based on what the other person will find convincing, not necessarily her own logic for making a case.
So with Ymir, she chooses to be obnoxiously cocky about her chances.
(help.)
The manga has this byplay so much quieter, and you can see so much more of Historia from the next arc coming through, but Kristoria makes affected arrogance look damn good and why why why.
Tumblr media Tumblr media Tumblr media
WELL NOW THAT’S RUINED, ISN’T IT.
Tumblr media
Tough break, Kristoria. You’re going to have to earn being cool from now on.
The anime does such a good job of this moment.
What always gets me in the manga, and what carries over here, is the look of pure horror on Kristoria’s face when Ymir puts words to her thinking. When it’s said out loud, it sounds horrible. She isn’t trying to save someone’s life. She’s given up on Daz.
I don’t think the jab about giving up on herself hits that hard. Kristoria’s a suicidal mess.
But Daz, he who spends this entire scene basically being treated like a sack of potatoes by both of the people responsible for his eventual survival, is a life Kristoria cares about. I think a lot gets lost when that isn’t taken under consideration.
She doesn’t mind killing herself. But what hits is that her resignation regarding her own life has crept out and threatened someone else.
Kristoria’s been responsible for death before. It terrifies her.
Before Ymir draws it out, I honestly don’t think Kristoria has any idea what she’s doing here. Her own life has never mattered to her. Daz’s fate is pretty much inevitable. She’ll stay with him until the end, and put in the token effort, but they’re both screwed, and deep in her heart, all of the talk of third options and hope is a lie. The only thing she can do is keep Ymir from being taken by the hopelessness as well.
But giving up the way she has means that she’s hurt Daz’s chances of survival beyond what they already were. She never asks for help. She just accepts death and carries on walking straight into its embrace.
And when Ymir says it, like this is all on purpose, Kristoria immediately denies it.
She does not want Daz to die. She thought herself a witness, at worst. Not his executioner.
Like I said earlier, Kristoria just does not think about this. Her fatalist tendencies take the wheel and drive her off a cliff that wasn’t even on the route.
So when she’s made to think about what she’s doing, and when she sees, for the first time, where it’s landed her, she’s horrified. She’s a screwed up mess, but she isn’t intending to get anyone else killed.
There’s no denying that that’s where she’s sitting, though.
Tumblr media
This is so well done. It’s... this is one of my favorite scenes in the series. Most ones involving these two are, but these moments make such strong use of silence. There’s nearly a full page of beat panels after Ymir starts this conversation, and the tension and the swirling snow stand out even better in a medium dependent on motion.
The world stops when Ymir calls Kristoria on her actions. They’re probably all going to die, and in what Kristoria is thinking will be her last moments, the deepest part of her soul is on full display, and she can’t come up with a single way to defend herself.
She’s out of hope, doesn’t have a sense of self-worth to begin with, and Ymir is confronting her with every sordid detail of the life she wants to forget.
...That part’s me skipping ahead, but look, that’s the mood. Just this lost little girl in the snow wondering how the hell she’s fallen so low.
Tumblr media
...While Ymir continues to make it worse.
Because why not. Blizzards are a great time to chat.
(Daz ends up dependent on the two people with the some of the strongest saving-people instincts in the series, and he still nearly dies because they only know how to have honest conversations if death is nearby. That is his purpose in this scene. He is the conversation starter.)
Tumblr media
"Hey, you’re about to kill a guy, but btw, I am totes not a thief.”
Who are you trying to impress. I mean, Kristoria, obviously, at all hours of the day, but even at this point she knows you too well to buy that you’re too morally pure to steal things when you’re starving.
Also, there’s that blizzard thing. How are you still trying to act cool.
Tumblr media Tumblr media
Oh Ymir...
That ability to instantly empathize and decide a course of action based on those feelings is a little scary, really. Because she knows the story, this girl she’s never met sends a hook through her heart, and suddenly she’s in the military.
Her gift of perception is what makes her so fun when she’s around other characters, but combined with her smarts and impulsiveness... she’s good at finding just enough rope to hang herself with.
Tumblr media
...Yeah, meanwhile there’s you.
Tumblr media Tumblr media
...
Fine, let’s be real, it’s both of you.
These two are so innocent that it physically pains me.
There is some humor in Ymir resorting to blatant lies to cover up having *~feelings~* in a conversation largely about being true to yourself (Ymir and Historia are both human disasters whose emotional maturity lingers somewhere around toddler level), especially when it’s in response to the person lying about her entire identity posing an honest question, but mainly, oh no.
Like.
No.
Ymir and Kristoria are having this dramatic conversation in the middle of a blizzard while some guy dies at their feet. They are working the tension like it’s going out of style, and they aren’t going to stop anytime soon.
They’re reaching Batman levels of extra angst.
...Holy crap, Historia’s Batman.
No no no, listen, see, she’s got the blue blood, and she’s got the piles of influence, she has the tortured dark loneliness, she watches her parents die in front of her (admittedly, one has help), AND SHE ADOPTS SCORES OF ORPHANS. HISTORIA REISS IS THE ONE TRUE BATMAN FIGHT ME.
But then Kristoria swoops in, mid-suicide attempt, and goes all angelic shiny eyes, because oh my gosh, friend??!!
She is the epitome of a kicked puppy, and it is adorable.
Unbelievably tragic, but. That is a puppy expression. Over friendship.
While Ymir tries to pretend she’s too cool to want any of that.
When she’s just as bad.
She’s not the one dragging someone’s body through the snow out of a warped sense of self-hatred and heroism only to go all doki doki over the possibility of someone wanting her as a friend, oh no.
She just joins the military because she hears a story about some girl and she can relate.
I know the episode isn’t there yet, and since we’ve been graciously spared a flashback start, it might be hard to remember. But for the sake of perspective:
Ymir is standing on top of a collapsing tower surrounded by titans entirely because she’s so desperate for human connection that she ran off looking for some girl whose first name she didn’t even know because she thought they had something in common.
THIS IS THE PERSON WHO HAS THE NERVE TO PLAY TSUNDERE ABOUT WANTING FRIENDS.
TO REVIEW.
THIS IS WHAT COMES OUT OF HER MOUTH
Tumblr media
LITERALLY ONE MINUTE AFTER SHE SAYS THIS
Tumblr media
“HI I’M YMIR AND I WEAR METAPHORICAL REINCARNATION BETTER THAN YOU, SEE HOW PRETTY MY BLACK AND BLUE DRESS IS NEXT TO YOUR SILLY WHITE AND GOLD ONE.”
Tumblr media
This is a very mature conversation between two people who have been through too much and come out incredibly damaged.
It’s also two teenagers yelling at each other in the middle of a blizzard.
Tumblr media
For instance, this is a tragic statement about Kristoria’s emotional trauma.
It also sounds vaguely like Ymir is encouraging murder.
It might not sound funny now, but give it time. Around the arc that ends with Historia killing her father, this becomes utterly hilarious.
Tumblr media
And this... this will always hit hard.
Kristoria’s my favorite character, and that’s been the case since I first saw her. This is the arc that gives substance to that fondness, and this moment in particular is one of the most brutally cool parts of Kristoria.
She isn’t just trying to kill herself. She joins the military. She conducts herself admirably. She’s a good enough soldier to earn a spot in the top ten, even if that should more correctly be the top eleven.
Yeah, she doesn’t care about herself. Her care for others is also debatable.
But she isn’t just stumbling her way towards the quickest end. She keeps her head up and finds a way to die that looks appropriate from every angle, and marches toward it. If she had died here, even though that’s exactly her plan, and staying alive isn’t something she’s trying too hard at, she would have died on her feet, still stubbornly clinging to the heroic ideal she wants to decorate herself with.
Krista might be a fake hero, but Kristoria goes the extra mile even when she’s completely out of heart to give.
That unholy stubbornness is headed the exact wrong direction here, but it is such a cool character trait.
Tumblr media Tumblr media
Ymir and Kristoria’s relationship is really just this long debate over which one of them is better at winning arguments.
Tumblr media
I also appreciate that Ymir’s winning argument, in this case, involves throwing people off cliffs.
Sure, she’s right.
But even without titan powers I can totally see her suggesting throwing someone off a cliff as a valid way to keep them alive if it meant finding a way to prove Kristoria wrong in this scene.
She starts out wanting Kristoria to leave Daz behind. Then it turns into a philosophical showdown, and suddenly, nope, there is a way for all of us to live, guess what Krista, YOU ARE WRONG ABOUT EVERYTHING FOREVER.
(Love yourself.)
Tumblr media
...Whatever the anime does wrong, now and in the future, I don’t think I will ever be able to deny the extreme gratitude I feel towards whoever lovingly detailed Ymir picking up a kicking Kristoria and throwing her down a hill and into a tree.
Best love interests ever.
Tumblr media
You three still aren’t supposed to be here, but I begrudgingly appreciate that even when Eren finds Krista creepy, he’s the kind of righteous dude who will do whatever he can for his crew, and of course Mikasa and Armin won’t ever let him do it alone.
Fine, I like the filler this episode.
Tumblr media
“Hello, we are also here, and have absolutely no ulterior motive to making sure that Krista is still breathing. Look at how helpful and great we are.”
Tumblr media
“We’re just good people who love our friends and need more screentime.”
For a good time, count how many times Krista is mentioned by name compared to Daz and Ymir.
Tumblr media
You know, I feel like the full context of what happens here deserves more words.
Ymir literally jumps off a cliff to win an argument with her girlfriend, leaving said girlfriend smacked against a tree and under a pile of snow in the middle of a blizzard, all with the full expectation that Kristoria is going to be just dandy.
AND SHE’S RIGHT.
Kristoria gets a front row seat to two people she sort of wants alive diving off a cliff, and then gets to wander through the wilderness in the dead of night, blizzard raging, entirely by herself.
Tumblr media
Just like Ymir knew she would.
...
Just because it’s a terrible plan doesn’t mean I can’t find her faith heartwarming, shut up.
Tumblr media
I feel like this screencap accurately captures the Ymir experience in its entirety.
Tumblr media
...I always forget how tiny Historia is.
She is incredibly tiny.
Tumblr media
I don’t have a comment.
I just feel something in my chest.
I think it is pain.
The whimpering noises coming from somewhere support this theory.
This level of physical affection is not in the manga version help it doesn’t even make sense for their personal bubbles to be ignored like this where they’re at right now it’s just done to make a smooth transition cut so how dare you make me feel things.
Stop.
Tumblr media
Look, see, we have a perfectly good thing here where even the idea of living under her real name makes Kristoria gasp fearfully, and that is a slice of tension that I should be able to dig my teeth into and enjoy,
BUT INSTEAD WE’RE HERE, DOING THIS!
Tumblr media
My heart is on the floor yet somehow still doing things to me and I have complaints.
Tumblr media
Oh good, this is better.
...Does Ymir just. enjoy jumping off high places?
This is also some epic music to get the party started.
Tumblr media
LET THE BODIES HIT THE FLOOR
Speaking critically for a moment, as much as I dig the music once we’re back from the Information for Public Disclosure, I’m really disappointed in the blocking for Ymir’s initial attack on the titans.
It lasts about ten seconds, so wow get over it, but they go with more long shots than swift cuts for those ten seconds. Considering her fighting style, it feels like the wrong call. It’s impressive to watch how swiftly she’s moving from titan to titan, but some of the brutal strength of the violence is missing. Chomp, nom, move on. There are a few good shots mixed in, but the flow of the scene feels like it could have been way more intense if they’d kept close to Ymir.
Loving that music, though.
Tumblr media
Pictured: Kristoria nearly falling from her death because she hasn’t moved a single inch since trying to reach out and stop Ymir from jumping off yet another high surface.
So. Cause of death?
Could not stop staring at Ymir.
Okay.
Tumblr media
...I’ve been good. Very good, arguably. If Studio WIT wants to take a few liberties with micro expressions, that’s their call, and they even made one really unfair thing out of it, so I shouldn’t complain too loudly.
...
Yeah, fuck it.
SHE DOES NOT SMILE IN THIS PANEL OF THE MANGA. VERY MUCH THE OPPOSITE, AND THAT WAS WITH SIGNIFICANT LESS DAMAGE TO HER LEG.
YOU ALSO FAILED TO DEPICT CONNIE’S PANICKED STILL OF REACHING OUT WITH BOTH ARMS TO TRY AND CATCH HER. IT IS PRECIOUS AND ADORABLE AND YOU ARE DEAD TO ME.
Bertolt’s “wtf” expression is a gem, though.
Tumblr media
This is Kristoria’s most vivid recollection of three years of friendship with Ymir.
Bless these two.
Tumblr media
Only two people on island with knowledge of history past a hundred years ago shocked when the person named Ymir has a link to Titans.
Bertolt really does have magnificent background expressions.
Tumblr media
I. feel personally victimized by this episode.
What always gets me about this section of Utgard is how disturbed Kristoria starts out by... all of this. It’s all scary stuff, everyone up safe on the tower is talking about how suspicious everything is, and Kristoria’s a bit of an anxious mess to begin with when it comes to life.
You can see so easily how someone who’s never had a reason to trust anybody could have trouble trusting the motives of a secret like this, and the environment is just waiting to tighten its hold on all of her insecurities.
But Ymir is still Ymir.
Even before the pieces fully snap together, and Kristoria starts breaking out of her anxious shell, she can’t watch Ymir in danger and not worry. She can’t turn off caring for her friend.
And then we just. just.
Oh help they added a montage.
Tumblr media Tumblr media
This should not be allowed at all what even why are you doing this.
Butting heads and marriage proposals. And awkward drinking experiences.
That’s what Kristoria holds dear to her heart when she thinks of Ymir.
I’m fine. Fine fine fine. Fine.
Tumblr media
Help me I love this episode.
I do not have words. They are not found. This world was not meant to waste moments talking about scenes like this when they’re there to be enjoyed. There is no greater high than Kristoria shouting off encouragement about property destruction and generally showing her deep, abiding love for Ymir by calling her an irredeemable jackass while she nobly tries to save them all at her expense.
Tumblr media
Then WIT goes ahead and brings me back to earth when it decides to cut my favorite smile altogether. While I’m grateful for the return of my ability to make words instead of distressed noises, why. You gave the filler its dear sweet time to do whatever it felt like, and now we’re left without an animated form of the bestest smile ever.
Minus bazillion points.
Oh wait.
Tumblr media
Waaait.
Tumblr media
You. can’t just.
Tumblr media
Ow?
Ahaha oh, but this is entirely the anime’s fault and ow. That... that slow hesitance of her feet before they just start going. Ymir’s being torn to shreds, and there are titans everywhere, but running to her side is such a basic instinct for Kristoria that she just... goes.
The manga captures that sense too, but the boots. That tiny little delay before she bolts.
How are you allowed.
Tumblr media
Oh yeah, and here we have Ymir’s eyes opening. Entirely because Kristoria’s calling out to her. That’s good. That’s okay. Yeah.
If I didn’t have things to complain about like WIT turning Kristoria’s kindly request that a titan wait on eating her into the anime version of thought bubbles (WHICH SHE SHOULD NOT HAVE YET), I don’t know what I’d do.
Tumblr media
Mikasa’s auditions for the role of Kristoria’s personal white knight just make me really happy.
Tumblr media
Smiling Erens would, except.
Well.
Sorry about your life, kid.
Tumblr media
....Yours too, but, uh.
Um.
Tumblr media Tumblr media Tumblr media
...oh wow.
This can’t be how they’re supposed to spend their budget. but. This is so amazingly beautiful. The lighting is so, so soft, and Historia’s voice when she tells Ymir’s her name is one of the most gentle utterances you will ever hear on this show.
You have this episode full of teenagers yelling and being scared and making poor decisions, and so much pain, and so much violence and passion. Then the morning sun rises, and all that’s left is this tender moment between two people who love each other.
And Ymir, battered and bloody, smiling at the sound of Historia’s name.
Tumblr media Tumblr media
More care than I’d dared to hope for goes into the final scene, and... yeah, wow. Thanks for existing.
So.
That’s it.
Episode over.
On the whole, I like the manga version better thanks to a few tiny details that don’t matter to anyone but me, but this is... extraordinary, and I am so glad that they were willing to take their time and let it flourish into everything it’s meant to be. Damn.
I can’t see myself doing one of these again, but it definitely had its moments (this episode hurts me), and I hope some enjoyment can be had from the transcript. Thanks for following along.
92 notes · View notes
Text
Assessment Task 7
Graphic file format types
1. JPEG - Joint photographic experts group 
Possibly the most common file type you run across the web. You can use JPEGs for projects on the web, in Microsoft Office documents, or for projects that require printing at a high resolution. Paying attention to the resolution and file size with JPEGs is essential in order to produce a nice looking project.
2. PNG - portable network graphics 
Useful for interactive documents such as web pages, but are not suitable to print. The reason PNGs are used in most web projects is that you can save your image with more colours on a transparent background. This makes for a much sharper, web-quality image.
3. GIF - graphics interchange format 
Most common in their animated form. In their more basic form, GIFs are formed from up to 256 colours in the RGB colour space. Due to the limited number of colours, the file size is drastically reduced. They are a common file type for web projects where an image needs to load very quickly, as opposed to one that  has to retain a high level of quality. 
4. TIFF - tagged image file 
Despite TIFF images' ability to recover their quality after manipulation, you should avoid using this file type on the web, it can take forever to load. TIFF files are also commonly used when saving photographs for print.
5. PDS - photoshop document 
PSDs are files that are created and saved in Adobe Photoshop, the most popular graphics editing software ever. This type of file contains "layers" that make modifying the image much easier to handle. This is also the program that generates the raster file types mentioned above. The largest disadvantage to PSDs is that Photoshop works with raster images as opposed to vector images.
6. PDF - portable document format 
If a designer saves your vector logo in PDF format, you can view it without any design editing software (as long as you have downloaded the free Acrobat Reader software), and they have the ability to use this file to make further manipulations. This is by far the best universal tool for sharing graphics. 
7. RAW - raw image format 
RAW images are valuable because they capture every element of a photo without processing and losing small visual details. Eventually, however, you'll want to package them into a raster or vector file type so they can be transferred and resized for various purposes.
8. AI - adobe illustrator document 
Adobe Illustrator is the industry standard for creating artwork from scratch and therefore more than likely the program in which your logo was originally rendered. Illustrator produces vector artwork, the easiest type of file to manipulate.
Lossy/Lossless File Compression
Lossless and lossy compression are characterisations which are used to explicate the distinction between two file compression formats. One (lossless) allows for all of the original data to be recovered whenever the file is uncompressed again. The other (lossy) works differently since it ends up eliminating all the “unnecessary” bits and pieces of information in the original file to make it smaller when compressed.
While lossy file compression are mostly associated with image files, it can also be used for audio files. Files such as MP3s, MP4s, and AAC files, or images such as JPEGs, are usually compressed in the lossy format. Whenever you choose to compress a file using the lossy format, many of the redundant information in the files, are eliminated. Because of that, lossy compression tends to be used whenever people are reducing the size of a bitmap images, or other files.
Unlike lossy file compression, using the lossless format can end up reducing a file’s size without any loss of the original quality. With new compression algorithms available now, lossless compressed files are preserved better than ever before. Using lossless compression to compress a file, will rewrite the data in the same manner as that of the original file. 
Metadata and Meta Files
Metadata is data that describes other data. A Metafile is a file format that can store multiple types of data such as graphics file formats. These graphics files can contain raster, vector, and type data. 
Meta is a prefix that in most information technology usages means "an underlying definition or description.” 
Metadata summarises basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created and date modified and file size are examples of very basic document metadata.  Having the ability to filter through that metadata makes it much easier for someone to locate a specific document. Metadata can be created manually, or by automated information processing. Manual creation tends to be more accurate, allowing the user to input any information they feel is relevant or needed to help describe the file.
File Naming Conventions
Naming records consistently, logically and in a predictable way will distinguish similar records from one another at a glance, and by doing so will facilitate the storage and retrieval of records, which will enable users to browse file names more effectively and efficiently. 
Following  a few simple rules such as these below will greatly improve your file management. 
Keep file names, but meaningful 
Avoid unnecessary repetition and redundancy in file names 
Using capital letters to delimit words, not spaces or underscores 
If using a date in the name of the file, always state the date backwards such as YYYYMMDD
Avoid using common words at the start of the file names, unless doing so will make it easier to retrieve the record 
Importance of File Management
Having files spread out around on your hard drive in different directories will add time to your day as you have to search for them. Choose or create one folder as a central place to store files, preferably backed up daily. Storing files all over your hard drive also has the added problem of slowing down your computer and if there is a technical malfunction may mean that you lose essential files and information. Create a hierarchy rather than putting everything in one single folder makes it much easier to find the exact information you are looking for. It is easiest if it follows a logical progression.
 i.e Animation ‘Year 1 > Block 3 > Technologies > Assessment Task 3’ 
0 notes
digital-strategy · 8 years
Link
http://ift.tt/2nIeskH
Posted by Everett
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It's been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really...
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I'm not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the "supplemental index" of yesteryear. But that's another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you're having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there's a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure ---> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you're only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It's vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It's easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it's not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME - MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: http://ift.tt/2nQK6ch (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I've done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate...
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a "linkbait" piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn't answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., "No content is here yet, but if you sign in and leave some user-generated-content, then we'll have content here for the next guy." By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth...
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting "Improve" in the Action column, elaborate in the Details column:
"Improve these pages by writing unique, useful content to improve the Copyscape risk score."
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn't worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting "Remove" from the Action column, elaborate in the Details column:
"Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap."
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as "Improve"?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as "Consolidate"?
When you have overlapping topics that don't provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as "Improve" and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as "Consolidate" and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single "Evergreen" landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 ---> Best Sellers).
Which of these pages should be marked as "Remove"?
Pages with poor link, traffic, and social metrics related to low-quality content that isn't worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn't worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as "Leave As-Is"?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn't be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn't suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the "Page Type" is known (i.e., it's in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn't suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the "Page Type" is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or...
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to "Remove" and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”... Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow's content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
via SEOmoz Daily SEO Blog
0 notes
robertmcraft · 8 years
Text
How to Do a Content Audit [Updated for 2017]
Posted by Everett
//<![CDATA[ (function($) { // code using $ as alias to jQuery $(function() { // Hide the hypotext content. $('.hypotext-content').hide(); // When a hypotext link is clicked. $('a.hypotext.closed').click(function (e) { // custom handling here e.preventDefault(); // Create the class reference from the rel value. var id = '.' + $(this).attr('rel'); // If the content is hidden, show it now. if ( $(id).css('display') == 'none' ) { $(id).show('slow'); if (jQuery.ui) { // UI loaded $(id).effect("highlight", {}, 1000); } } // If the content is shown, hide it now. else { $(id).hide('slow'); } }); // If we have a hash value in the url. if (window.location.hash) { // If the anchor is within a hypotext block, expand it, by clicking the // relevant link. console.log(window.location.hash); var anchor = $(window.location.hash); var hypotextLink = $('#' + anchor.parents('.hypotext-content').attr('rel')); console.log(hypotextLink); hypotextLink.click(); // Wait until the content has expanded before jumping to anchor. //$.delay(1000); setTimeout(function(){ scrollToAnchor(window.location.hash); }, 1000); } }); function scrollToAnchor(id) { var anchor = $(id); $('html,body').animate({scrollTop: anchor.offset().top},'slow'); } })(jQuery); //]]>
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It's been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really...
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I'm not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the "supplemental index" of yesteryear. But that's another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you're having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there's a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure ---> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you're only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It's vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It's easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it's not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME - MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: http://ift.tt/2nQK6ch (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I've done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate...
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a "linkbait" piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn't answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., "No content is here yet, but if you sign in and leave some user-generated-content, then we'll have content here for the next guy." By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth...
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting "Improve" in the Action column, elaborate in the Details column:
"Improve these pages by writing unique, useful content to improve the Copyscape risk score."
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn't worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting "Remove" from the Action column, elaborate in the Details column:
"Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap."
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as "Improve"?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as "Consolidate"?
When you have overlapping topics that don't provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as "Improve" and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as "Consolidate" and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single "Evergreen" landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 ---> Best Sellers).
Which of these pages should be marked as "Remove"?
Pages with poor link, traffic, and social metrics related to low-quality content that isn't worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn't worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as "Leave As-Is"?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn't be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn't suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the "Page Type" is known (i.e., it's in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn't suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the "Page Type" is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or...
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to "Remove" and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”... Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow's content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
identityshine · 8 years
Text
How to Do a Content Audit [Updated for 2017]
Posted by Everett
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It's been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really...
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I'm not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the "supplemental index" of yesteryear. But that's another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you're having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there's a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure ---> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you're only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It's vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It's easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it's not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME - MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: http://ift.tt/2nQK6ch (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I've done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate...
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a "linkbait" piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn't answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., "No content is here yet, but if you sign in and leave some user-generated-content, then we'll have content here for the next guy." By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth...
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting "Improve" in the Action column, elaborate in the Details column:
"Improve these pages by writing unique, useful content to improve the Copyscape risk score."
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn't worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting "Remove" from the Action column, elaborate in the Details column:
"Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap."
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as "Improve"?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as "Consolidate"?
When you have overlapping topics that don't provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as "Improve" and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as "Consolidate" and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single "Evergreen" landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 ---> Best Sellers).
Which of these pages should be marked as "Remove"?
Pages with poor link, traffic, and social metrics related to low-quality content that isn't worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn't worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as "Leave As-Is"?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn't be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn't suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the "Page Type" is known (i.e., it's in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn't suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the "Page Type" is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or...
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to "Remove" and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”... Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow's content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
How to Do a Content Audit [Updated for 2017] posted first on http://ift.tt/2maTWEr
0 notes
john-dennis · 3 years
Text
Transform Your Team Collaboration With Team Chat Software
When building an organization of different employees, you must ensure that your staff communicates effectively. Smooth and swift communication will ensure that information and instructions can be transmitted seamlessly. There are various means and channels through which staff members can communicate effectively, but Team Chat Software stands out from the rest of the pack.
Communication media has undergone numerous changes that improved its scaling capabilities. For example, team chat software provides organizations with a simple means to communicate with administrative staff and team members. Some people prefer to refer to it as a business communication tool or channel, but this application is more than just a communication tool. It's powerful enough to connect multiple departments and reach people in other countries.
Tumblr media
Recently, team chat apps and collaboration tools have become an increasingly popular solution for small and large companies. If you operate a small remote business, this communication tool can hire small staff in different locations and quickly integrate them into an efficient communication channel. In addition, this software solution can also foster collaboration between employees and team members. This article will look at how team chat software can spur cooperation among employees.
Let's begin!
How Can Team Chat Software Transform Team Collaboration?
Many organizations have gradually agreed that team chat software can boost their work operations. According to reports gathered by Statista in 2016, 53% of organizations have successfully adopted and implemented team chat software into their work operations. These small and large companies have fallen in love with the increased functionality and scale this app offers. As communication between employees becomes more crucial than ever, software solutions that can provide suitable solutions must be developed too. Now, there are hundreds of business communication tools to choose from. But it would help if you also learned why they are important.
So, how can a team chat software inspire more team collaboration in your work operations? The following points explain how these apps can affect workplace interaction;
It can help to manage conversations
In simple terms, team chat software is designed to allow teams to communicate optimally. Everyone and anyone will be carried along as duties, roles, and responsibilities are handed out by administrative officers.
Modern-day team chat software is customizable and can manage conversations on any level. Team Chat Software is beneficial because it can allow different scales of conversation. It can support interaction between two people or the entire organization at once. It may also boost the interaction between a single team or solely for administrative officers in certain scenarios.
Depending on the manner of operation in the organization, you can have preset channels for specific purposes or create them immediately if needed. This software will also allow workforce members to be tagged into conversations when their input is required.
It has an effective search filter
A crucial aspect of communication between employees is accountability. This software will be possible to keep an accurate record of what anyone says. Then, when you need to refer to the conversation, all you need to do would be to search for a specific keyword.
This solution is quite impressive because you don't have to remember everything said during meetings. Instead, you can go back to refer to it. These conversations will be recorded in compliance with workforce regulations. Employees should be encouraged to keep private conversations of these official channels to prevent such content from unwanted discovery.
It allows employees to see and hear each other regularly
There's no better way to foster collaboration and teamwork in a workforce that provides them a communication channel to interact through text, voice, and video. Some conversations and meeting types are better organized when participants can see and hear each other, even in the same geographical location. Voice and video communication are now an integral part of most business communication tools. These tools boost teamwork and efficiency through clear communication in the long run.
It encourages faster decision-making
When members of your workforce can communicate over long distances without having to see one another physically, your decision-making will be remarkable. To make crucial decisions, employees no longer have to travel long distances. Instead, they can log on to the communication tool and let their views be aired. As a result, different teams and departments in organizations will meet on short notice and ponder important issues. This ability to make prompt decisions can help compensate for unplanned market changes and some form of crisis.
It allows joint sharing and storage of files
Quick haring and storage of files can save time spent on work operations. When using team chat software, every member of the organization will be provided with equal access to a database of files. Employees will also be able to share files quickly with whoever needs them. These shared files will be stored in the conversation history and readily made available when they are required. It can be searched for with the tool's filter and used for future references.
Conclusion
Finally, we've come to the end of this article. That's all on how team chat software can boost teamwork and team collaboration. Before the advent of this software solution, organizations had to rely on manual and less effective communication channels. These channels were slow and failed to involve more than a few employees at once. However, modern-day innovative solutions have taken care of all these problems. If you're an organization that cares about staff collaboration, you should consider investing in a business communication tool. You'll be taking a significant step towards boosting your efficiency and productivity. It could be the final piece that you need for an autonomous unit.
0 notes
ubizheroes · 8 years
Text
How to Do a Content Audit [Updated for 2017]
Posted by Everett
https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js //
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It’s been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really…
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I’m not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the “supplemental index” of yesteryear. But that’s another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you’re having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there’s a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure —> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you’re only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It’s vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It’s easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it’s not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME – MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: example.com/page#!key=value (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We’ve noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that’s the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that’s the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I’ve done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you’ve made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here’s where the fun really begins. In a large organization, it’s tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it’s ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That’s all fine, as long as you’re working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate…
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a “linkbait” piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn’t answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., “No content is here yet, but if you sign in and leave some user-generated-content, then we’ll have content here for the next guy.” By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth…
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting “Improve” in the Action column, elaborate in the Details column:
“Improve these pages by writing unique, useful content to improve the Copyscape risk score.”
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn’t worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting “Remove” from the Action column, elaborate in the Details column:
“Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap.”
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as “Improve”?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as “Consolidate”?
When you have overlapping topics that don’t provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as “Improve” and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as “Consolidate” and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single “Evergreen” landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 —> Best Sellers).
Which of these pages should be marked as “Remove”?
Pages with poor link, traffic, and social metrics related to low-quality content that isn’t worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn’t worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as “Leave As-Is”?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it’s best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you’ll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn’t be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn’t suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn’t suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn’t suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the “Page Type” is known (i.e., it’s in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn’t suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the “Page Type” is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or…
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to “Remove” and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”… Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow’s content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 “Other” pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to “improve or remove” content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the “Details” column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the “Details” column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!
from Moz Blog https://moz.com/blog/content-audit via IFTTT
from Blogger http://imlocalseo.blogspot.com/2017/03/how-to-do-content-audit-updated-for-2017.html via IFTTT
from IM Local SEO https://imlocalseo.wordpress.com/2017/03/22/how-to-do-a-content-audit-updated-for-2017/ via IFTTT
from Gana Dinero Colaborando | Wecon Project https://weconprojectspain.wordpress.com/2017/03/22/how-to-do-a-content-audit-updated-for-2017/ via IFTTT
from WordPress https://mrliberta.wordpress.com/2017/03/22/how-to-do-a-content-audit-updated-for-2017/ via IFTTT
0 notes
neilmberry · 8 years
Text
How to Do a Content Audit [Updated for 2017]
Posted by Everett
//<![CDATA[ (function($) { // code using $ as alias to jQuery $(function() { // Hide the hypotext content. $('.hypotext-content').hide(); // When a hypotext link is clicked. $('a.hypotext.closed').click(function (e) { // custom handling here e.preventDefault(); // Create the class reference from the rel value. var id = '.' + $(this).attr('rel'); // If the content is hidden, show it now. if ( $(id).css('display') == 'none' ) { $(id).show('slow'); if (jQuery.ui) { // UI loaded $(id).effect("highlight", {}, 1000); } } // If the content is shown, hide it now. else { $(id).hide('slow'); } }); // If we have a hash value in the url. if (window.location.hash) { // If the anchor is within a hypotext block, expand it, by clicking the // relevant link. console.log(window.location.hash); var anchor = $(window.location.hash); var hypotextLink = $('#' + anchor.parents('.hypotext-content').attr('rel')); console.log(hypotextLink); hypotextLink.click(); // Wait until the content has expanded before jumping to anchor. //$.delay(1000); setTimeout(function(){ scrollToAnchor(window.location.hash); }, 1000); } }); function scrollToAnchor(id) { var anchor = $(id); $('html,body').animate({scrollTop: anchor.offset().top},'slow'); } })(jQuery); //]]>
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It's been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really...
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I'm not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the "supplemental index" of yesteryear. But that's another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you're having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there's a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure ---> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you're only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It's vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It's easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it's not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME - MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: http://ift.tt/2nQK6ch (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I've done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate...
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a "linkbait" piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn't answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., "No content is here yet, but if you sign in and leave some user-generated-content, then we'll have content here for the next guy." By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth...
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting "Improve" in the Action column, elaborate in the Details column:
"Improve these pages by writing unique, useful content to improve the Copyscape risk score."
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn't worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting "Remove" from the Action column, elaborate in the Details column:
"Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap."
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as "Improve"?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as "Consolidate"?
When you have overlapping topics that don't provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as "Improve" and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as "Consolidate" and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single "Evergreen" landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 ---> Best Sellers).
Which of these pages should be marked as "Remove"?
Pages with poor link, traffic, and social metrics related to low-quality content that isn't worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn't worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as "Leave As-Is"?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn't be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn't suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the "Page Type" is known (i.e., it's in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn't suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the "Page Type" is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or...
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to "Remove" and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”... Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow's content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
How to Do a Content Audit [Updated for 2017] published first on http://elitelimobog.blogspot.com
0 notes
holmescorya · 8 years
Text
How to Do a Content Audit [Updated for 2017]
Posted by Everett
//<![CDATA[ (function($) { // code using $ as alias to jQuery $(function() { // Hide the hypotext content. $('.hypotext-content').hide(); // When a hypotext link is clicked. $('a.hypotext.closed').click(function (e) { // custom handling here e.preventDefault(); // Create the class reference from the rel value. var id = '.' + $(this).attr('rel'); // If the content is hidden, show it now. if ( $(id).css('display') == 'none' ) { $(id).show('slow'); if (jQuery.ui) { // UI loaded $(id).effect("highlight", {}, 1000); } } // If the content is shown, hide it now. else { $(id).hide('slow'); } }); // If we have a hash value in the url. if (window.location.hash) { // If the anchor is within a hypotext block, expand it, by clicking the // relevant link. console.log(window.location.hash); var anchor = $(window.location.hash); var hypotextLink = $('#' + anchor.parents('.hypotext-content').attr('rel')); console.log(hypotextLink); hypotextLink.click(); // Wait until the content has expanded before jumping to anchor. //$.delay(1000); setTimeout(function(){ scrollToAnchor(window.location.hash); }, 1000); } }); function scrollToAnchor(id) { var anchor = $(id); $('html,body').animate({scrollTop: anchor.offset().top},'slow'); } })(jQuery); //]]>
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It's been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really...
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I'm not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the "supplemental index" of yesteryear. But that's another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you're having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there's a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure ---> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you're only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It's vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It's easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it's not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME - MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: http://ift.tt/2nQK6ch (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I've done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate...
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a "linkbait" piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn't answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., "No content is here yet, but if you sign in and leave some user-generated-content, then we'll have content here for the next guy." By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth...
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting "Improve" in the Action column, elaborate in the Details column:
"Improve these pages by writing unique, useful content to improve the Copyscape risk score."
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn't worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting "Remove" from the Action column, elaborate in the Details column:
"Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap."
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as "Improve"?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as "Consolidate"?
When you have overlapping topics that don't provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as "Improve" and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as "Consolidate" and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single "Evergreen" landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 ---> Best Sellers).
Which of these pages should be marked as "Remove"?
Pages with poor link, traffic, and social metrics related to low-quality content that isn't worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn't worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as "Leave As-Is"?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn't be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn't suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the "Page Type" is known (i.e., it's in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn't suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the "Page Type" is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or...
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to "Remove" and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”... Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow's content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
Text
What Absolutely Everybody Is Saying About above Ground Pools and What You Have to Do
The Bad Secret of above Ground Pools
The Pain of above Ground PoolsHere's What I Know About above Ground Pools
Whatever They Told You About above Ground Pools Is Dead Wrong...And Here's Why
In any event, it is necessary to offer a safe source of power. It really does not serve any beneficial purpose for virtually any area of the body. When you have somewhere to store the unopened boxes, you may want to benefit from a great deal, however you ought to keep in mind that in case the pool has a problem you will usually be unable to send it back after 30 60 days.
Its aluminum construction provides weather-resistant capabilities and long-term durability. You may want to try out skiing or snowboarding. The exclusive Therma-Seal technology gives superior sealing for greatest durability.
The experts have warehouses and storage facilities all over the country, We'll ship everything to the local distribution warehouse and direct to you in an issue of days. FILTRATION The filter is among your most significant parts of equipment. There are various sorts of pools accessible to suit the requirement of different kinds of consumers. Check out above ground pool reviews here.
You are certainly able to expect to get a broad range of fashions and colours. Our pools are offered in a range of fashions and sizes.
Also, price ranges above don't incorporate any pool decking. There are a lot of people kinds of pools to pick from depending on what you're searching for. There are several forms of swimming pools based on the building of the basin.
If you locate these prices somewhat expensive for your finances, you can attempt checking out Wal-Mart and Sears. Don't forget to shop smart and you'll be rewarded. The buy and installation were first pace.
Typically after installed you can observe some part of the liner along the cover of the exterior of the pool. There are lots of factors which can influence the install time for a pool. There are lots of different sorts of above ground pools.
A good pool installation package ought to be a significant part any pool buy. Whichever you opt for every model is designed utilizing the best quality materials and the newest manufacturing technology available. If you're contemplating an above ground pool, the most significant aspect is quality.
Actually, over the latest months it can be regarded as being too hot, but all the buildings and vehicles have air conditioning so that's never an issue for the residents. To prolong the life span of their carpeting and protect against premature wear, the majority of people know that they ought to frequently vacuum their carpeting. The neighborhood phone directory must be reprinted twice annually so as to accommodate each one of the new residents that are moving in in their thousands.
If your choice is the most suitable model, it can endure for many years. Whatever model you choose to go with, something you can rely on is fast, professional and very affordable installation with KiKi Pool Installations, LLC.
Another mold was added a couple of years later. Remember that any security device you buy should be quite sturdy and you ought to read all the directions first. The additonal price tag of the chief drain including installation is $445.00.
The faces of the pool are made from powder-coated steel and there's an inflatable ring on top just enjoy the effortless set. Besides the liner the remainder of the elements of an above ground pool will endure for many years. A primary drain also tends to pull in any little dirt particles, thereby cutting down the should vacuum the pool.
Pool sets usually incorporate the essentials like a filter pump and pool ladder together with the pool itself. It's true, you can receive a pool even if your lawn isn't perfectly flat, though it will likely be more expensive money to install.
A above ground pool may be an important purchase. Your pool company can assist by providing pool maintenance services if you want, or offering personalized ideas and information if you want to take care of your above ground pool yourself. It consists of a 1200gph filter pump with GFCIwhich is suitable for its size.
0 notes
dykeredhood · 6 months
Text
Interrupting the Batman posting for some Winter Soldier posting in honor of yesterday’s Captain America: The Winter Soldier 10 year anniversary mass hysteria event 💪🏼🩶
3 notes · View notes
robertmcraft · 8 years
Text
How to Do a Content Audit [Updated for 2017]
Posted by Everett
//<![CDATA[ (function($) { // code using $ as alias to jQuery $(function() { // Hide the hypotext content. $('.hypotext-content').hide(); // When a hypotext link is clicked. $('a.hypotext.closed').click(function (e) { // custom handling here e.preventDefault(); // Create the class reference from the rel value. var id = '.' + $(this).attr('rel'); // If the content is hidden, show it now. if ( $(id).css('display') == 'none' ) { $(id).show('slow'); if (jQuery.ui) { // UI loaded $(id).effect("highlight", {}, 1000); } } // If the content is shown, hide it now. else { $(id).hide('slow'); } }); // If we have a hash value in the url. if (window.location.hash) { // If the anchor is within a hypotext block, expand it, by clicking the // relevant link. console.log(window.location.hash); var anchor = $(window.location.hash); var hypotextLink = $('#' + anchor.parents('.hypotext-content').attr('rel')); console.log(hypotextLink); hypotextLink.click(); // Wait until the content has expanded before jumping to anchor. //$.delay(1000); setTimeout(function(){ scrollToAnchor(window.location.hash); }, 1000); } }); function scrollToAnchor(id) { var anchor = $(id); $('html,body').animate({scrollTop: anchor.offset().top},'slow'); } })(jQuery); //]]>
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It's been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really...
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I'm not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the "supplemental index" of yesteryear. But that's another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you're having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there's a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure ---> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you're only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It's vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It's easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it's not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME - MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: http://ift.tt/2nQK6ch (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I've done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate...
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a "linkbait" piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn't answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., "No content is here yet, but if you sign in and leave some user-generated-content, then we'll have content here for the next guy." By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth...
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting "Improve" in the Action column, elaborate in the Details column:
"Improve these pages by writing unique, useful content to improve the Copyscape risk score."
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn't worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting "Remove" from the Action column, elaborate in the Details column:
"Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap."
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as "Improve"?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as "Consolidate"?
When you have overlapping topics that don't provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as "Improve" and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as "Consolidate" and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single "Evergreen" landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 ---> Best Sellers).
Which of these pages should be marked as "Remove"?
Pages with poor link, traffic, and social metrics related to low-quality content that isn't worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn't worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as "Leave As-Is"?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn't be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn't suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the "Page Type" is known (i.e., it's in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn't suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the "Page Type" is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or...
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to "Remove" and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”... Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow's content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes
robertmcraft · 8 years
Text
How to Do a Content Audit [Updated for 2017]
Posted by Everett
//<![CDATA[ (function($) { // code using $ as alias to jQuery $(function() { // Hide the hypotext content. $('.hypotext-content').hide(); // When a hypotext link is clicked. $('a.hypotext.closed').click(function (e) { // custom handling here e.preventDefault(); // Create the class reference from the rel value. var id = '.' + $(this).attr('rel'); // If the content is hidden, show it now. if ( $(id).css('display') == 'none' ) { $(id).show('slow'); if (jQuery.ui) { // UI loaded $(id).effect("highlight", {}, 1000); } } // If the content is shown, hide it now. else { $(id).hide('slow'); } }); // If we have a hash value in the url. if (window.location.hash) { // If the anchor is within a hypotext block, expand it, by clicking the // relevant link. console.log(window.location.hash); var anchor = $(window.location.hash); var hypotextLink = $('#' + anchor.parents('.hypotext-content').attr('rel')); console.log(hypotextLink); hypotextLink.click(); // Wait until the content has expanded before jumping to anchor. //$.delay(1000); setTimeout(function(){ scrollToAnchor(window.location.hash); }, 1000); } }); function scrollToAnchor(id) { var anchor = $(id); $('html,body').animate({scrollTop: anchor.offset().top},'slow'); } })(jQuery); //]]>
This guide provides instructions on how to do a content audit using examples and screenshots from Screaming Frog, URL Profiler, Google Analytics (GA), and Excel, as those seem to be the most widely used and versatile tools for performing content audits.
{Expand for more background}
It's been almost three years since the original “How to do a Content Audit – Step-by-Step” tutorial was published here on Moz, and it’s due for a refresh. This version includes updates covering JavaScript rendering, crawling dynamic mobile sites, and more.
It also provides less detail than the first in terms of prescribing every step in the process. This is because our internal processes change often, as do the tools. I’ve also seen many other processes out there that I would consider good approaches. Rather than forcing a specific process and publishing something that may be obsolete in six months, this tutorial aims to allow for a variety of processes and tools by focusing more on the basic concepts and less on the specifics of each step.
We have a DeepCrawl account at Inflow, and a specific process for that tool, as well as several others. Tapping directly into various APIs may be preferable to using a middleware product like URL Profiler if one has development resources. There are also custom in-house tools out there, some of which incorporate historic log file data and can efficiently crawl websites like the New York Times and eBay. Whether you use GA or Adobe Sitecatalyst, Excel, or a SQL database, the underlying process of conducting a content audit shouldn’t change much.
TABLE OF CONTENTS
What is an SEO content audit?
What is the purpose of a content audit?
How & why “pruning” works
How to do a content audit
The inventory & audit phase
Step 1: Crawl all indexable URLs
Crawling roadblocks & new technologies
Crawling very large websites
Crawling dynamic mobile sites
Crawling and rendering JavaScript
Step 2: Gather additional metrics
Things you don’t need when analyzing the data
The analysis & recommendations phase
Step 3: Put it all into a dashboard
Step 4: Work the content audit dashboard
The reporting phase
Step 5: Writing up the report
Content audit resources & further reading
What is a content audit?
A content audit for the purpose of SEO includes a full inventory of all indexable content on a domain, which is then analyzed using performance metrics from a variety of sources to determine which content to keep as-is, which to improve, and which to remove or consolidate.
What is the purpose of a content audit?
A content audit can have many purposes and desired outcomes. In terms of SEO, they are often used to determine the following:
How to escape a content-related search engine ranking filter or penalty
Content that requires copywriting/editing for improved quality
Content that needs to be updated and made more current
Content that should be consolidated due to overlapping topics
Content that should be removed from the site
The best way to prioritize the editing or removal of content
Content gap opportunities
Which content is ranking for which keywords
Which content should be ranking for which keywords
The strongest pages on a domain and how to leverage them
Undiscovered content marketing opportunities
Due diligence when buying/selling websites or onboarding new clients
While each of these desired outcomes and insights are valuable results of a content audit, I would define the overall “purpose” of one as:
The purpose of a content audit for SEO is to improve the perceived trust and quality of a domain, while optimizing crawl budget and the flow of PageRank (PR) and other ranking signals throughout the site.
Often, but not always, a big part of achieving these goals involves the removal of low-quality content from search engine indexes. I’ve been told people hate this word, but I prefer the “pruning” analogy to describe the concept.
How & why “pruning” works
{Expand for more on pruning}
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove. Optimizing crawl budget and the flow of PR is self-explanatory to most SEOs. But how does a content audit improve the perceived trust and quality of a domain? By removing low-quality content from the index (pruning) and improving some of the content remaining in the index, the likelihood that someone arrives on your site through organic search and has a poor user experience (indicated to Google in a variety of ways) is lowered. Thus, the quality of the domain improves. I’ve explained the concept here and here.
Others have since shared some likely theories of their own, including a larger focus on the redistribution of PR.
Case study after case study has shown the concept of “pruning” (removing low-quality content from search engine indexes) to be effective, especially on very large websites with hundreds of thousands (or even millions) of indexable URLs. So why do content audits work? Lots of reasons. But really...
Does it matter?
¯\_(ツ)_/¯
How to do a content audit
Just like anything in SEO, from technical and on-page changes to site migrations, things can go horribly wrong when content audits aren’t conducted properly. The most common example would be removing URLs that have external links because link metrics weren’t analyzed as part of the audit. Another common mistake is confusing removal from search engine indexes with removal from the website.
Content audits start with taking an inventory of all content available for indexation by search engines. This content is then analyzed against a variety of metrics and given one of three “Action” determinations. The “Details” of each Action are then expanded upon.
The variety of combinations of options between the “Action” of WHAT to do and the “Details” of HOW (and sometimes why) to do it are as varied as the strategies, sites, and tactics themselves. Below are a few hypothetical examples:
You now have a basic overview of how to perform a content audit. More specific instructions can be found below.
The process can be roughly split into three distinct phases:
Inventory & audit
Analysis & recommendations
Summary & reporting
The inventory & audit phase
Taking an inventory of all content, and related metrics, begins with crawling the site.
One difference between crawling for content audits and technical audits:
Technical SEO audit crawls are concerned with all crawlable content (among other things).
Content audit crawls for the purpose of SEO are concerned with all indexable content.
{Expand for more on crawlable vs. indexable content}
The URL in the image below should be considered non-indexable. Even if it isn’t blocked in the robots.txt file, with a robots meta tag, or an X-robots header response –– even if it is frequently crawled by Google and shows up as a URL in Google Analytics and Search Console –– the rel =”canonical” tag shown below essentially acts like a 301 redirect, telling Google not to display the non-canonical URL in search results and to apply all ranking calculations to the canonical version. In other words, not to “index” it.
I'm not sure “index” is the best word, though. To “display” or “return” in the SERPs is a better way of describing it, as Google surely records canonicalized URL variants somewhere, and advanced site: queries seem to show them in a way that is consistent with the "supplemental index" of yesteryear. But that's another post, more suitably written by a brighter mind like Bill Slawski.
A URL with a query string that canonicalizes to a version without the query string can be considered “not indexable.”
A content audit can safely ignore these types of situations, which could mean drastically reducing the amount of time and memory taken up by a crawl.
Technical SEO audits, on the other hand, should be concerned with every URL a crawler can find. Non-indexable URLs can reveal a lot of technical issues, from spider traps (e.g. never-ending empty pagination, infinite loops via redirect or canonical tag) to crawl budget optimization (e.g. How many facets/filters deep to allow crawling? 5? 6? 7?) and more.
It is for this reason that trying to combine a technical SEO audit with a content audit often turns into a giant mess, though an efficient idea in theory. When dealing with a lot of data, I find it easier to focus on one or the other: all crawlable URLs, or all indexable URLs.
Orphaned pages (i.e., with no internal links / navigation path) sometimes don’t turn up in technical SEO audits if the crawler had no way to find them. Content audits should discover any indexable content, whether it is linked to internally or not. Side note: A good tech audit would do this, too.
Identifying URLs that should be indexed but are not is something that typically happens during technical SEO audits.
However, if you're having trouble getting deep pages indexed when they should be, content audits may help determine how to optimize crawl budget and herd bots more efficiently into those important, deep pages. Also, many times Google chooses not to display/index a URL in the SERPs due to poor content quality (i.e., thin or duplicate).
All of this is changing rapidly, though. URLs as the unique identifier in Google’s index are probably going away. Yes, we’ll still have URLs, but not everything requires them. So far, the word “content” and URL has been mostly interchangeable. But some URLs contain an entire application’s worth of content. How to do a content audit in that world is something we’ll have to figure out soon, but only after Google figures out how to organize the web’s information in that same world. From the looks of things, we still have a year or two.
Until then, the process below should handle most situations.
Step 1: Crawl all indexable URLs
A good place to start on most websites is a full Screaming Frog crawl. However, some indexable content might be missed this way. It is not recommended that you rely on a crawler as the source for all indexable URLs.
In addition to the crawler, collect URLs from Google Analytics, Google Webmaster Tools, XML Sitemaps, and, if possible, from an internal database, such as an export of all product and category URLs on an eCommerce website. These can then be crawled in “list mode” separately, then added to your main list of URLs and deduplicated to produce a more comprehensive list of indexable URLs.
Some URLs found via GA, XML sitemaps, and other non-crawl sources may not actually be “indexable.” These should be excluded. One strategy that works here is to combine and deduplicate all of the URL “lists,” and then perform a crawl in list mode. Once crawled, remove all URLs with robots meta or X-Robots noindex tags, as well as any URL returning error codes and those that are blocked by the robots.txt file, etc. At this point, you can safely add these URLs to the file containing indexable URLs from the crawl. Once again, deduplicate the list.
Crawling roadblocks & new technologies
Crawling very large websites
First and foremost, you do not need to crawl every URL on the site. Be concerned with indexable content. This is not a technical SEO audit.
{Expand for more about crawling very large websites}
Avoid crawling unnecessary URLs
Some of the things you can avoid crawling and adding to the content audit in many cases include:
Noindexed or robots.txt-blocked URLs
4XX and 5XX errors
Redirecting URLs and those that canonicalize to a different URL
Images, CSS, JavaScript, and SWF files
Segment the site into crawlable chunks
You can often get Screaming Frog to completely crawl a single directory at a time if the site is too large to crawl all at once.
Filter out URL patterns you plan to remove from the index
Let’s say you’re auditing a domain on WordPress and you notice early in the crawl that /tag/ pages are indexable. A quick site:domain.com inurl:tag search on Google tells you there are about 10 million of them. A quick look at Google Analytics confirms that URLs in the /tag/ directory are not responsible for very much revenue from organic search. It would be safe to say that the “Action” on these URLs should be “Remove” and the “Details” should read something like this: Remove /tag/ URLs from the indexed with a robots noindex,follow meta tag. More advice on this strategy can be found here.
Upgrade your machine
Install additional RAM on your computer, which is used by Screaming Frog to hold data during the crawl. This has the added benefit of improving Excel performance, which can also be a major roadblock.
You can also install Screaming Frog on Amazon Web Server (AWS), as described in this post on iPullRank.
Tune up your tools
Screaming Frog provides several ways for SEOs to get more out of the crawler. This includes adjusting the speed, max threads, search depth, query strings, timeouts, retries, and the amount of RAM available to the program. Leave at least 3GB off limits to the spider to avoid catastrophic freezing of the entire machine and loss of data. You can learn more about tuning up Screaming Frog here and here.
Try other tools
I’m convinced that there's a ton of wasted bandwidth on most content audit projects due to strategists releasing a crawler and allowing it to chew through an entire domain, whether the URLs are indexable or not. People run Screaming Frog without saving the crawl intermittently, without adding more RAM availability, without filtering out the nonsense, or using any of the crawl customization features available to them.
That said, sometimes SF just doesn’t get the job done. We also have a process specific to DeepCrawl, and have used Botify, as well as other tools. They each have their pros and cons. I still prefer Screaming Frog for crawling and URL Profiler for fetching metrics in most cases.
Crawling dynamic mobile sites
This refers to a specific type of mobile setup in which there are two code-bases –– one for mobile and one for desktop –– but only one URL. Thus, the content of a single URL may vary significantly depending on which type of device is visiting that URL. In such cases, you will essentially be performing two separate content audits. Proceed as usual for the desktop version. Below are instructions for crawling the mobile version.
{Expand for more on crawling dynamic websites}
Crawling a dynamic mobile site for a content audit will require changing the User-Agent of the crawler, as shown here under Screaming Frog’s “Configure ---> HTTP Header” menu:
The important thing to remember when working on mobile dynamic websites is that you're only taking an inventory of indexable URLs on one version of the site or the other. Once the two inventories are taken, you can then compare them to uncover any unintentional issues.
Some examples of what this process can find in a technical SEO audit include situations in which titles, descriptions, canonical tags, robots meta, rel next/prev, and other important elements do not match between the two versions of the page. It's vital that the mobile and desktop version of each page have parity when it comes to these essentials.
It's easy for the mobile version of a historically desktop-first website to end up providing conflicting instructions to search engines because it's not often “automatically changed” when the desktop version changes. A good example here is a website I recently looked at with about 20 million URLs, all of which had the following title tag when loaded by a mobile user (including Google): BRAND NAME - MOBILE SITE. Imagine the consequences of that once a mobile-first algorithm truly rolls out.
Crawling and rendering JavaScript
One of the many technical issues SEOs have been increasingly dealing with over the last couple of years is the proliferation of websites built on JavaScript frameworks and libraries like React.js, Ember.js, and Angular.js.
{Expand for more on crawling Javascript websites}
Most crawlers have made a lot of progress lately when it comes to crawling and rendering JavaScript content. Now, it’s as easy as changing a few settings, as shown below with Screaming Frog.
When crawling URLs with #! , use the “Old AJAX Crawling Scheme.” Otherwise, select “JavaScript” from the “Rendering” tab when configuring your Screaming Frog SEO Spider to crawl JavaScript websites.
How do you know if you’re dealing with a JavaScript website?
First of all, most websites these days are going to be using some sort of JavaScript technology, though more often than not (so far) these will be rendered by the “client” (i.e., by your browser). An example would be the .js file that controls the behavior of a form or interactive tool.
What we’re discussing here is when the JavaScript is used “server-side” and needs to be executed in order to render the page.
JavaScript libraries and frameworks are used to develop single-page web apps and highly interactive websites. Below are a few different things that should alert you to this challenge:
The URLs contain #! (hashbangs). For example: http://ift.tt/2nQK6ch (AJAX)
Content-rich pages with only a few lines of code (and no iframes) when viewing the source code.
What looks like server-side code in the meta tags instead of the actual content of the tag. For example:
You can also use the BuiltWith Technology Profiler or the Library Detector plugins for Chrome, which shows JavaScript libraries being used on a page in the address bar.
Not all websites built primarily with JavaScript require special attention to crawl settings. Some websites use pre-rendering services like Brombone or Prerender.io to serve the crawler a fully rendered version of the page. Others use isomorphic JavaScript to accomplish the same thing.
Step 2: Gather additional metrics
Most crawlers will give you the URL and various on-page metrics and data, such as the titles, descriptions, meta tags, and word count. In addition to these, you’ll want to know about internal and external links, traffic, content uniqueness, and much more in order to make fully informed recommendations during the analysis portion of the content audit project.
Your process may vary, but we generally try to pull in everything we need using as few sources as possible. URL Profiler is a great resource for this purpose, as it works well with Screaming Frog and integrates easily with all of the APIs we need.
Once the Screaming Frog scan is complete (only crawling indexable content) export the “Internal All” file, which can then be used as the seed list in URL Profiler (combined with any additional indexable URLs found outside of the crawl via GSC, GA, and elsewhere).
This is what my URL Profiler settings look for a typical content audit for a small- or medium-sized site. Also, under “Accounts” I have connected via API keys to Moz and SEMrush.
Once URL Profiler is finished, you should end up with something like this:
Screaming Frog and URL Profiler: Between these two tools and the APIs they connect with, you may not need anything else at all in order to see the metrics below for every indexable URL on the domain.
The risk of getting analytics data from a third-party tool
We've noticed odd data mismatches and sampled data when using the method above on large, high-traffic websites. Our internal process involves exporting these reports directly from Google Analytics, sometimes incorporating Analytics Canvas to get the full, unsampled data from GA. Then VLookups are used in the spreadsheet to combine the data, with URL being the unique identifier.
Metrics to pull for each URL:
Indexed or not?
If crawlers are set up properly, all URLs should be “indexable.”
A non-indexed URL is often a sign of an uncrawled or low-quality page.
Content uniqueness
Copyscape, Siteliner, and now URL Profiler can provide this data.
Traffic from organic search
Typically 90 days
Keep a consistent timeframe across all metrics.
Revenue and/or conversions
You could view this by “total,” or by segmenting to show only revenue from organic search on a per-page basis.
Publish date
If you can get this into Google Analytics as a custom dimension prior to fetching the GA data, it will help you discover stale content.
Internal links
Content audits provide the perfect opportunity to tighten up your internal linking strategy by ensuring the most important pages have the most internal links.
External links
These can come from Moz, SEMRush, and a variety of other tools, most of which integrate natively or via APIs with URL Profiler.
Landing pages resulting in low time-on-site
Take this one with a grain of salt. If visitors found what they want because the content was good, that’s not a bad metric. A better proxy for this would be scroll depth, but that would probably require setting up a scroll-tracking “event.”
Landing pages resulting in Low Pages-Per-Visit
Just like with Time-On-Site, sometimes visitors find what they’re looking for on a single page. This is often true for high-quality content.
Response code
Typically, only URLs that return a 200 (OK) response code are indexable. You may not require this metric in the final data if that's the case on your domain.
Canonical tag
Typically only URLs with a self-referencing rel=“canonical” tag should be considered “indexable.” You may not require this metric in the final data if that's the case on your domain.
Page speed and mobile-friendliness
Again, URL Profiler comes through with their Google PageSpeed Insights API integration.
Before you begin analyzing the data, be sure to drastically improve your mental health and the performance of your machine by taking the opportunity to get rid of any data you don’t need. Here are a few things you might consider deleting right away (after making a copy of the full data set, of course).
Things you don’t need when analyzing the data
{Expand for more on removing unnecessary data}
URL Profiler and Screaming Frog tabs Just keep the “combined data” tab and immediately cut the amount of data in the spreadsheet by about half.
Content Type Filtering by Content Type (e.g., text/html, image, PDF, CSS, JavaScript) and removing any URL that is of no concern in your content audit is a good way to speed up the process.
Technically speaking, images can be indexable content. However, I prefer to deal with them separately for now.
Filtering unnecessary file types out like I've done in the screenshot above improves focus, but doesn’t improve performance very much. A better option would be to first select the file types you don’t want, apply the filter, delete the rows you don’t want, and then go back to the filter options and “(Select All).”
Once you have only the content types you want, it may now be possible to simply delete the entire Content Type column.
Status Code and Status You only need one or the other. I prefer to keep the Code, and delete the Status column.
Length and Pixels You only need one or the other. I prefer to keep the Pixels, and delete the Length column. This applies to all Title and Meta Description columns.
Meta Keywords Delete the columns. If those cells have content, consider removing that tag from the site.
DNS Safe URL, Path, Domain, Root, and TLD You should really only be working on a single top-level domain. Content audits for subdomains should probably be done separately. Thus, these columns can be deleted in most cases.
Duplicate Columns You should have two columns for the URL (The “Address” in column A from URL Profiler, and the “URL” column from Screaming Frog). Similarly, there may also be two columns each for HTTP Status and Status Code. It depends on the settings selected in both tools, but there are sure to be some overlaps, which can be removed to reduce the file size, enhance focus, and speed up the process.
Blank Columns Keep the filter tool active and go through each column. Those with only blank cells can be deleted. The example below shows that column BK (Robots HTTP Header) can be removed from the spreadsheet.
[You can save a lot of headspace by hiding or removing blank columns.]
Single-Value Columns If the column contains only one value, it can usually be removed. The screenshot below shows our non-secure site does not have any HTTPS URLs, as expected. I can now remove the column. Also, I guess it’s probably time I get that HTTPS migration project scheduled.
Hopefully by now you've made a significant dent in reducing the overall size of the file and time it takes to apply formatting and formula changes to the spreadsheet. It’s time to start diving into the data.
The analysis & recommendations phase
Here's where the fun really begins. In a large organization, it's tempting to have a junior SEO do all of the data-gathering up to this point. I find it useful to perform the crawl myself, as the process can be highly informative.
Step 3: Put it all into a dashboard
Even after removing unnecessary data, performance could still be a major issue, especially if working in Google Sheets. I prefer to do all of this in Excel, and only upload into Google Sheets once it's ready for the client. If Excel is running slow, consider splitting up the URLs by directory or some other factor in order to work with multiple, smaller spreadsheets.
Creating a dashboard can be as easy as adding two columns to the spreadsheet. The first new column, “Action,” should be limited to three options, as shown below. This makes filtering and sorting data much easier. The “Details” column can contain freeform text to provide more detailed instructions for implementation.
Use Data Validation and a drop-down selector to limit Action options.
Step 4: Work the content audit dashboard
All of the data you need should now be right in front of you. This step can’t be turned into a repeatable process for every content audit. From here on the actual step-by-step process becomes much more open to interpretation and your own experience. You may do some of them and not others. You may do them a little differently. That's all fine, as long as you're working toward the goal of determining what to do, if anything, for each piece of content on the website.
A good place to start would be to look for any content-related issues that might cause an algorithmic filter or manual penalty to be applied, thereby dragging down your rankings.
Causes of content-related penalties
These typically fall under three major categories: quality, duplication, and relevancy. Each category can be further broken down into a variety of issues, which are detailed below.
{Expand to learn more about quality, duplication, and relevancy issues}
Typical low-quality content
Poor grammar, written primarily for search engines (includes keyword stuffing), unhelpful, inaccurate...
Completely irrelevant content
OK in small amounts, but often entire blogs are full of it.
A typical example would be a "linkbait" piece circa 2010.
Thin/short content
Glossed over the topic, too few words, or all image-based content.
Curated content with no added value
Comprised almost entirely of bits and pieces of content that exists elsewhere.
Misleading optimization
Titles or keywords targeting queries for which content doesn't answer or deserve to rank.
Generally not providing the information the visitor was expecting to find.
Duplicate content
Internally duplicated on other pages (e.g., categories, product variants, archives, technical issues, etc.).
Externally duplicated (e.g., manufacturer product descriptions, product descriptions duplicated in feeds used for other channels like Amazon, shopping comparison sites and eBay, plagiarized content, etc.)
Stub pages (e.g., "No content is here yet, but if you sign in and leave some user-generated-content, then we'll have content here for the next guy." By the way, want our newsletter? Click an AD!)
Indexable internal search results
Too many indexable blog tag or blog category pages
And so on and so forth...
It helps to sort the data in various ways to see what’s going on. Below are a few different things to look for if you’re having trouble getting started.
{Expand to learn more about what to look for}
Sort by duplicate content risk
URL Profiler now has a native duplicate content checker. Other options are Copyscape (for external duplicate content) and Siteliner (for internal duplicate content).
Which of these pages should be rewritten?
Rewrite key/important pages, such as categories, home page, top products
Rewrite pages with good link and social metrics
Rewrite pages with good traffic
After selecting "Improve" in the Action column, elaborate in the Details column:
"Improve these pages by writing unique, useful content to improve the Copyscape risk score."
Which of these pages should be removed/pruned?
Remove guest posts that were published elsewhere
Remove anything the client plagiarized
Remove content that isn't worth rewriting, such as:
No external links, no social shares, and very few or no entrances/visits
After selecting "Remove" from the Action column, elaborate in the Details column:
"Prune from site to remove duplicate content. This URL has no links or shares and very little traffic. We recommend allowing the URL to return 404 or 410 response code. Remove all internal links, including from the sitemap."
Which of these pages should be consolidated into others?
Presumably none, since the content is already externally duplicated.
Which of these pages should be left “As-Is”?
Important pages which have had their content stolen
Sort by entrances or visits (filtering out any that were already finished)
Which of these pages should be marked as "Improve"?
Pages with high visits/entrances but low conversion, time-on-site, pageviews per session, etc.
Key pages that require improvement determined after a manual review of the page.
Which of these pages should be marked as "Consolidate"?
When you have overlapping topics that don't provide much unique value of their own, but could make a great resource when combined.
Mark the page in the set with the best metrics as "Improve" and in the Details column, outline which pages are going to be consolidated into it. This is the canonical page.
Mark the pages that are to be consolidated into the canonical page as "Consolidate" and provide further instructions in the Details column, such as:
Use portions of this content to round out /canonicalpage/ and then 301 redirect this page into /canonicalpage/
Update all internal links.
Campaign-based or seasonal pages that could be consolidated into a single "Evergreen" landing page (e.g., Best Sellers of 2012 and Best Sellers of 2013 ---> Best Sellers).
Which of these pages should be marked as "Remove"?
Pages with poor link, traffic, and social metrics related to low-quality content that isn't worth updating
Typically these will be allowed to 404/410.
Irrelevant content
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Out-of-date content that isn't worth updating or consolidating
The strategy will depend on link equity and traffic as to whether it gets redirected or simply removed.
Which of these pages should be marked as "Leave As-Is"?
Pages with good traffic, conversions, time on site, etc. that also have good content.
These may or may not have any decent external links.
Taking the hatchet to bloated websites
For big sites, it's best to use a hatchet-based approach as much as possible, and finish up with a scalpel in the end. Otherwise, you'll spend way too much time on the project, which eats into the ROI.
This is not a process that can be documented step-by-step. For the purpose of illustration, however, below are a few different examples of hatchet approaches and when to consider using them.
{Expand for examples of hatchet approaches}
Parameter-based URLs that shouldn't be indexed
Defer to the technical audit, if applicable. Otherwise, use your best judgment:
e.g., /?sort=color, &size=small
Assuming the tech audit didn't suggest otherwise, these pages could all be handled in one fell swoop. Below is an example Action and example Details for such a page:
Action = Remove
Details = Rel canonical to the base page without the parameter
Internal search results
Defer to the technical audit if applicable. Otherwise, use your best judgment:
e.g., /search/keyword-phrase/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
Blog tag pages
Defer to the technical audit if applicable. Otherwise:
e.g., /blog/tag/green-widgets/ , blog/tag/blue-widgets/
Assuming the tech audit didn't suggest otherwise:
Action = Remove
Details = Apply a noindex meta tag. Once they are removed from the index, disallow /search/ in the robots.txt file.
E-commerce product pages with manufacturer descriptions
In cases where the "Page Type" is known (i.e., it's in the URL or was provided in a CMS export) and Risk Score indicates duplication:
e.g., /product/product-name/
Assuming the tech audit didn't suggest otherwise:
Action = Improve
Details = Rewrite to improve product description and avoid duplicate content
E-commerce category pages with no static content
In cases where the "Page Type" is known:
e.g. /category/category-name/ or category/cat1/cat2/
Assuming NONE of the category pages have content:
Action = Improve
Details = Write 2–3 sentences of unique, useful content that explains choices, next steps, or benefits to the visitor looking to choose a product from the category.
Out-of-date blog posts, articles, and other landing pages
In cases where the title tag includes a date, or...
In cases where the URL indicates the publishing date:
Action = Improve
Details = Update the post to make it more current, if applicable. Otherwise, change Action to "Remove" and customize the Strategy based on links and traffic (i.e., 301 or 404).
Content marked for improvement should lay out more specific instructions in the “Details” column, such as:
Update the old content to make it more relevant
Add more useful content to “beef up” this thin page
Incorporate content from overlapping URLs/pages
Rewrite to avoid internal duplication
Rewrite to avoid external duplication
Reduce image sizes to speed up page load
Create a “responsive” template for this page to fit on mobile devices
Etc.
Content marked for removal should include specific instructions in the “Details” column, such as:
Consolidate this content into the following URL/page marked as “Improve”
Then redirect the URL
Remove this page from the site and allow the URL to return a 410 or 404 HTTP status code. This content has had zero visits within the last 360 days, and has no external links. Then remove or update internal links to this page.
Remove this page from the site and 301 redirect the URL to the following URL marked as “Improve”... Do not incorporate the content into the new page. It is low-quality.
Remove this archive page from search engine indexes with a robots noindex meta tag. Continue to allow the page to be accessed by visitors and crawled by search engines.
Remove this internal search result page from the search engine indexed with a robots noindex meta tag. Once removed from the index (about 15–30 days later), add the following line to the #BlockedDirectories section of the robots.txt file: Disallow: /search/.
As you can see from the many examples above, sorting by “Page Type” can be quite handy when applying the same Action and Details to an entire section of the website.
After all of the tool set-up, data gathering, data cleanup, and analysis across dozens of metrics, what matters in the end is the Action to take and the Details that go with it.
URL, Action, and Details: These three columns will be used by someone to implement your recommendations. Be clear and concise in your instructions, and don’t make decisions without reviewing all of the wonderful data-points you’ve collected.
Here is a sample content audit spreadsheet to use as a template, or for ideas. It includes a few extra tabs specific to the way we used to do content audits at Inflow.
WARNING!
As Razvan Gavrilas pointed out in his post on Cognitive SEO from 2015, without doing the research above you risk pruning valuable content from search engine indexes. Be bold, but make highly informed decisions:
Content audits allow SEOs to make informed decisions on which content to keep indexed “as-is,” which content to improve, and which to remove.
The reporting phase
The content audit dashboard is exactly what we need internally: a spreadsheet crammed with data that can be sliced and diced in so many useful ways that we can always go back to it for more insight and ideas. Some clients appreciate that as well, but most are going to find the greater benefit in our final content audit report, which includes a high-level overview of our recommendations.
Counting actions from Column B
It is useful to count the quantity of each Action along with total organic search traffic and/or revenue for each URL. This will help you (and the client) identify important metrics, such as total organic traffic for pages marked to be pruned. It will also make the final report much easier to build.
Step 5: Writing up the report
Your analysis and recommendations should be delivered at the same time as the audit dashboard. It summarizes the findings, recommendations, and next steps from the audit, and should start with an executive summary.
Here is a real example of an executive summary from one of Inflow's content audit strategies:
As a result of our comprehensive content audit, we are recommending the following, which will be covered in more detail below:
Removal of about 624 pages from Google index by deletion or consolidation:
203 Pages were marked for Removal with a 404 error (no redirect needed)
110 Pages were marked for Removal with a 301 redirect to another page
311 Pages were marked for Consolidation of content into other pages
Followed by a redirect to the page into which they were consolidated
Rewriting or improving of 668 pages
605 Product Pages are to be rewritten due to use of manufacturer product descriptions (duplicate content), these being prioritized from first to last within the Content Audit.
63 "Other" pages to be rewritten due to low-quality or duplicate content.
Keeping 226 pages as-is
No rewriting or improvements needed
These changes reflect an immediate need to "improve or remove" content in order to avoid an obvious content-based penalty from Google (e.g. Panda) due to thin, low-quality and duplicate content, especially concerning Representative and Dealers pages with some added risk from Style pages.
The content strategy should end with recommended next steps, including action items for the consultant and the client. Below is a real example from one of our documents.
We recommend the following three projects in order of their urgency and/or potential ROI for the site:
Project 1: Remove or consolidate all pages marked as “Remove”. Detailed instructions for each URL can be found in the "Details" column of the Content Audit Dashboard.
Project 2: Copywriting to improve/rewrite content on Style pages. Ensure unique, robust content and proper keyword targeting.
Project 3: Improve/rewrite all remaining pages marked as “Improve” in the Content Audit Dashboard. Detailed instructions for each URL can be found in the "Details" column
Content audit resources & further reading
Understanding Mobile-First Indexing and the Long-Term Impact on SEO by Cindy Krum This thought-provoking post begs the question: How will we perform content inventories without URLs? It helps to know Google is dealing with the exact same problem on a much, much larger scale.
Here is a spreadsheet template to help you calculate revenue and traffic changes before and after updating content.
Expanding the Horizons of eCommerce Content Strategy by Dan Kern of Inflow An epic post about content strategies for eCommerce businesses, which includes several good examples of content on different types of pages targeted toward various stages in the buying cycle.
The Content Inventory is Your Friend by Kristina Halvorson on BrainTraffic Praise for the life-changing powers of a good content audit inventory.
Everything You Need to Perform Content Audits
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!
0 notes