This article covers the distinction between 404 errors and soft 404 errors and how to resolve the SEO problems that may be to blame.
Every time a page loads in a web browser, a response code that may or may not is displayed on the page itself is stored in the HTTP headers.
The 404-response code is one of the many distinct response codes a server can use to let you know whether or not the page is loaded.
Any code between 400 and 499 often denotes a page load failure. The only response code with a clear meaning is 404, which indicates that the website has genuinely vanished and is unlikely to return any time soon.
What’s a Soft 404 Error?
An unofficial response code is not delivered to a web browser when a soft 404 error occurs. It is merely a label that Google adds to a page in its index.
Google carefully distributes resources when it crawls pages to avoid wasting time on pages that aren't needed and shouldn't be indexed.
However, some servers have incorrect configurations, causing their missing page to load a 200 code when it ought to show a 404-response code. Even if the web page explicitly declares that it cannot be discovered, the page may still be indexed if the invisible HTTP header shows a 200 code, which would be a waste of Google's resources.
To address this problem, Google takes note of the traits of 404 pages and checks to see if the page in question truly is a 404 page. In other words, Google discovered that if a page had the appearance, smell, and behavior of a 404 error, it probably is one.
Potentially Misidentified as Soft 404
There are other instances where the page isn't gone, but Google has classified it as missing because of a few criteria.
A few of these traits include having too many pages on the site that are too identical to one another and having little or no information on the page.
These traits are comparable to the aspects that the Panda algorithm addresses. Thin and duplicate material is viewed as detrimental ranking criteria by the Panda upgrade.
Therefore, resolving these problems will aid in preventing both Panda problems and soft 404s.
There are two primary sources of 404 errors:
- An error in the link directs users to a page that doesn’t exist.
- A link going to a page that used to exist suddenly disappeared.
If a linking mistake is the root of the 404, all you need to do is fix the links.
Finding all the broken links on a website is the most challenging aspect of this operation.
For huge, complicated sites with thousands or millions of pages, it may be more difficult. Crawling instruments are useful in situations like these. Use Xenu, DeepCrawl, Screaming Frog, or Botify, among other programs.
A Page That No Longer Exists
You have two choices when a page is gone:
- If the page was unintentionally deleted, restore it.
- If it was deleted on purpose, use a 301 redirect to take users to the closest relevant page.
Find every instance of a linking error on the website first. You can utilize crawling technologies to find all linking issues on a large-scale website. Crawling tools might not find orphaned pages, which are pages that are not linked from any of the other pages or the navigational links.
Orphaned pages can exist if they were once a part of the website but the connection to them vanished after a website redesign. However, external links from other websites may still be pointing to them. You can make use of several tools to confirm whether these types of pages are present on your website.
Google Search Console
As Google's crawler scans all the pages it can discover, Search Console will report any 404 pages it encounters. This can involve external links pointing to a page on your website that is no longer there.
A missing page report is not by default available in Google Analytics. However, there are several ways to find them.
You can, for instance, develop a custom report that isolates pages with page titles that reference Error 404 – Page Not Found.
Making your content categories and adding all 404 pages to them is another technique to detect orphaned pages in Google Analytics.
Site: Operator Search Command
All of example.com's pages that are indexed by Google are listed when you search for “site:example.com” in Google. The pages can then be checked one at a time to see if they are loading or returning 404 errors.
I like to use WebCEO, which includes the capability to run the site: operator not only on Google but also on Bing, Yahoo, Yandex, Naver, Baidu, and Seznam, to accomplish this on a large scale.
Running it through numerous search engines can assist you with a bigger list of your site's pages since each search engine will only give you a subset of results. This list is exportable and can be used with tools to perform a bulk 404 check. To do this, I simply add all URLs as links to an HTML file and load it on Xenu to check for 404 errors in bulk.
Other Backlink Research Tools
Additionally useful are backlink analysis programs like Majestic, Ahrefs, Moz Open Site Explorer, Sistrix, LinkResearchTools, and CognitiveSEO.
A list of the backlinks pointing to your domain will often be exported by these programs. You can then check all the pages that are connected and look for 404 errors from there.
How to Fix Soft 404 Errors
Because a soft 404 is not a true 404 error, crawling tools won't pick it up. However, you can find something different using crawling tools. To find are some of the following:
Thin Content: Some crawling technologies provide the overall word count in addition to reporting pages with scant content. From there, you can sort URLs according to how many words are in your article. To determine whether a page has thin content, start with the pages with the fewest words.
Duplicate Content: Some crawling tools are capable of calculating the percentage of the page that is made up of template material. You should investigate these pages to learn why duplicate material is present on your website if the main content is virtually identical to that of several other sites.
In addition to the crawling tools, you may utilize Google Search Console to locate pages that are labeled as soft 404s by looking under crawl errors.
You can uncover and fix issues before Google even notices them by crawling an entire site to look for soft 404s.
You must fix these soft 404 errors after you have found them.
The majority of the time, the answers seem straightforward. This can be as simple as adding more information to pages with less on them or replacing redundant content with fresh, original material.
Here are a few things to keep in mind as you go through this process:
Consolidate Pages: Being overly detailed with the page topic can leave you with little to say, which can result in shallow content. If the themes are connected, it may be more acceptable to combine numerous short pages into one. This not only addresses the problem of thin content but also addresses the problem of duplicating material. For instance, an online store that sells shoes in several sizes and colors can have a separate URL for every size and color combination. This results in a vast number of pages with sparse and often identical content. Instead, putting this all on one page and listing the possibilities is a more effective strategy.
Find Technical Issues That Cause Duplicate Content: Even the most basic web crawling software, such as Xenu, which merely examines URLs, response codes, and title tags, may still detect duplicate content problems by looking at URLs. This contains details like HTTP vs. HTTPS, www vs. non-www URLs, with and without an index.html, with tracking parameters and without, etc. On slide 6 of this presentation, a nice breakdown of these common duplicate content concerns discovered in URL patterns can be seen.
Google Treats 404 Errors & Soft 404 Errors the Same Way
Although a soft 404 is not a true 404 error, Google will deindex those pages if they are not rectified promptly. It is recommended to crawl your site regularly to check for 404 or soft 404 issues. Crawling tools should be an important part of your SEO arsenal.