Broken Link Checker Checker

Source code including bash deploy scripts on Bitbucket

Echoback to see the request headers for this browser

To see Http Status Codes and be able to return them from this site.

Normal Internal Links

/pagea.html

pageb.html

/normal/pagec.html

normal/paged.html

http://brokenlinkcheckerchecker.com/pagea.html

http://brokenlinkcheckerchecker.com/pageb.html

http://brokenlinkcheckerchecker.com/normal/pagec.html

http://brokenlinkcheckerchecker.com/normal/paged.html

/normal/ Subdirectory testing with no filename

/normal Subdirectory testing with no filename and a redirect to /normal/

/normal2/ Link to a page which will link back to /normal to test redirect duplicates

Normal Internal Redirects

/normal 301 Moved Permanently - should redirect to /normal/ (do this above..default for nginx).. tell browser to forget old URL

http://www.brokenlinkcheckerchecker 301 Moved Permanently - should redirect to http://brokenlinkcheckerchecker tell browser to forget old URL

/301 301 Moved Permanently - should redirect to /

/301broken 301 Moved Permanently - should redirect to a broken page /brokenurl

/301loop 301 Moved Permanently - should redirect to itself /301loop which should create Too Many Directs error on a browser

/302 302 Found - should redirect to / - Temporary redirects (eg helpful when performing site maintenance)

Internal broken links

Internal broken link to a page /brokenurl that doesn't exist and goes to the global 404

Internal broken link2 with a trailing slash which should go to the 404 as above

Broken link to an asset image .png

Internal Redirects to Canonical URL

A canonical URL is your preferred url

http://www.brokenlinkcheckerchecker.com/sc/200.html http://www.brokenlinkcheckerchecker.com/sc/200.html which does a 301 permanent redirect to http://brokenlinkcheckerchecker.com/sc/200.html

Redirects and https and www variants

1. http://davemateer.com/brokenurl which should redirect to https://davemateer.com/404.html

2. http://www.davemateer.com/brokenurl which should redirect twice to https://davemateer.com/404.html

3. https://davemateer.com/brokenurl (canonical) which should return a 404 https://davemateer.com/404.html

4. https://www.davemateer.com/brokenurl which should redirect to the https://davemateer.com/404.html

External Edge Cases

The x's should show up as broken links

I've encountered lots of strange behaviour whilst doing broken link checking usually to do with anti-scraping mechanisms. These links are part of my test suite I run against my own tool. The links are here to test and help out other broken link checkers too.

These links should not redirect to something else (that is tested below)

2. Linked in - working link Usually returns a 999 status code or hits a security check through puppeteer

2x. Linked in - not working link Usually returns a 999 status code or hits a security check through puppeteer

3. Drupal.org- working link

3x. Drupal.org - not working link

4. mouser.co.uk Akamai problem?

4x. mouser.co.uk Akamai problem?

5. element14.com can timeout - webserver security.

5x. element14.com can timeout - webserver security.

6. cert-manager.io/docs Can be strange

6x. cert-manager.io/docsXXX Can be strange

7. zillow.com Hits a captcha

7x. zillow.com Hits a captcha

8. autohotkey.com cloudflare fronted

8x. autohotkey.com cloudflare fronted

9. autohotkey.com/boards cloudflare fronted

9x. autohotkey.com/boards cloudflare fronted

10. rayner.com normal wordpress

10x. rayner.com normal wordpress

11. https://www.amazon.co.uk amazon - blocks HEAD

11x. https://www.amazon.co.uk/XXX

20. https://www.dell.com/support/article/en-au/sln311129/dell-command-update?lang=en

20x. https://www.dell.com/support/article/en-uk/sln311129XXX/dell-command-update?lang=en HEAD doesn't work GET does

24. https://twitter.com/dave_mateer twitter - working link

24x. https://twitter.com/dave_mateerXXX twitter - not working link but hard to test

Non-Existent Domain Name

Nodomainhere link so a link to a broken domain name

Files

test_image.jpg A 4KB jpg with MIME type: image/jpeg (source wikipedia: https://en.wikipedia.org/wiki/File:Test_image.jpg)

pizigani_10mb.jpg A 10MB jpg with MIME type: image/jpeg (source: https://commons.wikimedia.org/wiki/File:Pizigani_1367_Chart_10MB.jpg)

a17.pdf A 20MB pdf with MIME type: (source: https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf

cp43.pdf A 100MB pdf with MIME type: (source: https://cartographicperspectives.org/index.php/journal/article/view/cp43-complete-issue/pdf)

200MB.zip A 200MB zip with MIME type: (source: https://www.thinkbroadband.com/download)

Big Pages

big-html-page.html A 2.4MB large html file with data from CIA World Book (source: https://corpus.canterbury.ac.nz/descriptions/).

big-html-page2.html A 38MB large html file with repeated data from CIA World Book (source: https://corpus.canterbury.ac.nz/descriptions/).

Blank Hyperlink

Link with nothing in it ie a href="" blank link

Rate Limits

A broken link checked should be able to handle 429 Too Many Requests and back off/retry accordingly

/ratelimit/index.html 1 request per second

/ratelimit10s/index.html 1 request every 10 seconds

Dynamic Server .NET Core

On http://dnet.brokenlinkcheckerchecker.com

Page Load Time

A webmaster needs an overview of all pages on the site to see where the slow pages are

/slowloadingpage A slow loading page simulating a lot of back end processing

Intermittent Errors

A broken link checked should be able to handle spurious errors

/500 Returns a 500 every time

Returns a 500 to begin with, but after 3 seconds it will return a 200 (todo)

Page Sizes

A webmaster needs to see where all the big pages are

pagewithbigimages.html A page with big images to simulate forgetting to modify images (todo)

Prioritisation

Quite often the hardest part of broken link checking is finding out what the most important links are to fix. There are usually so many things wrong with websites, that a webmaster needs a priority list

Then to prioritise which pages may be the most important to fix first

http://dnet.brokenlinkcheckerchecker.com/priority/ Here are 100 links with 5 pages that have broken internal links (todo)

http://dnet.brokenlinkcheckerchecker.com/priority/ Here are 100 links with 5 pages that have broken internal images (todo)

http://dnet.brokenlinkcheckerchecker.com/priority/ Here are 100 links with 5 pages that have issues, and have priority based on number of times the link is broken(todo)

http://dnet.brokenlinkcheckerchecker.com/priority/ Here are 100 links with 5 pages that have issues, and have priority based on number of times the link is broken and recency (todo)

When to give up (max number of links to crawl)

Some websites are massive and may have pathways which go deeper and deeper. A broken link checker should know when to give up

http://dnet.brokenlinkcheckerchecker.com/blackhole This is a black hole of links going deeper and deeper (todo)

http://dnet.brokenlinkcheckerchecker.com/wideblackhole This is a black hole of links going wider and wider (todo)

Thank you for using Broken Link Checker Checker!!

Home