Category talk:Articles with dead external links

Latest comment: 1 year ago by WhatamIdoing in topic Spring cleaning 2023

Help wanted - fix broken links edit

Swept in from the pub

I've written a bot that will flag broken URLs with {{dead link}}, which will print a very noticeable warning next to broken links for people who have enabled the ErrorHighlighter gadget from Special:Preferences (for those who haven't enabled the gadget the warning is invisible). Articles with broken links will also appear in Category:Articles with dead external links. I'm still validating that the bot won't break anything and have thus only run it against Category:Star articles and a handful of other articles, but at present that still leaves over 50 articles needing links fixed (or removed, in cases where the associated business has closed). Please help out by reviewing/fixing broken links in Category:Articles with dead external links, and if you would like to see the bot run against a specific article or group of articles please let me know and I will do so. Feedback appreciated. -- Ryan • (talk) • 17:16, 9 April 2016 (UTC)Reply

It would be nice if you could somehow mention the exact link that is dead. Preferably on the talk page in question. Hobbitschuster (talk) 17:38, 9 April 2016 (UTC)Reply
What does it mean that a link is dead, in this context? A website can be temporarily down or unavailable. Even HTTP 404 responses may be due to a temporary malconfiguration. --LPfi (talk) 18:05, 9 April 2016 (UTC)Reply
@Hobbitschuster: if you enable the ErrorHighlighter gadget then you will see a very noticeable "dead link" message right next to the broken link. @LPfi: right now the bot flags links that return 404 errors (page not found) and DNS lookup errors (site not found) as dead links. I've set the bot up so that when it is re-run it will first delete all instances of {{dead link}} in the article, so if a link that was broken somehow comes back to life it would no longer be flagged as dead. -- Ryan • (talk) • 18:22, 9 April 2016 (UTC)Reply
Where do you activate that? Hobbitschuster (talk) 18:46, 9 April 2016 (UTC)Reply
In Preferences / Gadgets / Experimental, tick ErrorHighlighter. -- WOSlinker (talk) 19:07, 9 April 2016 (UTC)Reply
Thanks. Hobbitschuster (talk) 19:34, 9 April 2016 (UTC)Reply
OK. I suppose those errors should not occur on well maintained sites. It would probably still be good to include a timestamp in the template, so that a link that has been dead a long time and those recently marked could be told apart. Then the old templates should also not be removed, but left alone, unless the link has come live again (or the error has become a transient one). --LPfi (talk) 18:26, 10 April 2016 (UTC)Reply
The timestamp is already being added - see Special:Diff/2966265/2969005 which added {{dead link|April 2016}} to nine links. The current implementation uses month and year, which matches w:Template:Dead link, but I would need to modify the bot to leave old template timestamps in place when the bot is re-run. -- Ryan • (talk) • 18:36, 10 April 2016 (UTC)Reply
Sometimes, a link looks to have "come live again" but is actually being cybersquatted - the original venue is still dead and some unsavoury characters registered the name the moment the original legitimate registration expired. The site then returns advertising, linkspam or a listing of the domain name for sale at some extortionate price. Often, it merely redirects traffic to some other domain. If we link to that sort of domain, it makes us look spammy. K7L (talk) 18:44, 10 April 2016 (UTC)Reply
The bot is admittedly much more limited than a human editor - for example, there is no way for a bot to accurately determine if a link is to a site that is being cybersquatted, and as noted previously I'm not flagging sites that timeout or have other potentially temporary issues. That said, I think there is significant value in flagging links that are clearly dead, both to ensure we are linking to accurate information and as a way to more easily find listings for places that might have gone out of business. -- Ryan • (talk) • 19:15, 10 April 2016 (UTC)Reply
The bot will not find all links that need updating but it is finding enough for now. Looks like there is much work to do, is going to take a concerted effort to fix them all but this will improve the site for readers and its search engine ranking. --Traveler100 (talk) 19:58, 12 April 2016 (UTC)Reply
Two updates: first, I've been running the bot in batches against Category:Guide articles, but it's slow going since I want to review all changes to catch any bad edits - examples of bad edits include this one to the "Humphrey's" listings that require fixes to the bot code to handle unexpected characters like a semicolon in a URL. Second, for some reason I am occasionally seeing DNS lookup failures for valid sites, which the bot then flags incorrectly. I switched to Google Public DNS, but I've still seen a couple of false positives; I'd like to get that issue resolved before having the bot run against too many articles. -- Ryan • (talk) • 20:24, 12 April 2016 (UTC)Reply

Update edit

As of 17-April the bot has now run against all star & guide articles, so any dead links in those articles should now be tagged with {{dead link}}. While the vast majority of tagging was done without issue, there are a tiny number of edge cases that aren't handled properly and require updates to the code before the bot can be run without supervision. In the mean time, if anyone wants to see the bot run against a specific article or group of articles please let me know. -- Ryan • (talk) • 18:13, 17 April 2016 (UTC)Reply

Until now I've been running the bot in batches and manually reviewing changes in order to catch any problems. Issues that I've fixed include problems with URLs ending in ")", issues with w:Internationalized domain names, occasional DNS lookup failures for valid URLs (I've switched to Google DNS to resolve that one), etc. Since things look fairly good at this point I'm going to let the bot run unsupervised, but if anyone notices any links flagged incorrectly please let me know so I can fix the code. -- Ryan • (talk) • 04:29, 5 May 2016 (UTC)Reply
The bot is doing a good job. I have just noticed on Cramlington that it marked 5 links, but the edit summary only said "Flag 4 potential dead links". In this case this is because two links are the same. I think that this is sufficiently rare not to be a problem. AlasdairW (talk) 21:44, 5 May 2016 (UTC)Reply

The bot has finished running against all articles. After ten years the site has unfortunately built up a lot of dead links, but hopefully having a way to tag them will allow for easier future maintenance. -- Ryan • (talk) • 07:03, 10 May 2016 (UTC)Reply

For anyone else who is like me and keeps an eye on articles within a certain region, a useful tool for finding articles in that region with deadlinks is https://petscan.wmflabs.org/. Here's an example for using that tool to find all articles with dead links within Southern California: [1] (replace "Southern California" with your region of choice). -- Ryan • (talk) • 19:23, 14 May 2016 (UTC)Reply

Dead links bot edit

Swept in from the pub

For the first time since last May I'm re-running the bot that tags potentially dead external links with {{dead link}} and adds articles to Category:Articles with dead external links in the process. After 24 hours the bot is up to Den Helder, so I expect it will take another 4-5 days to scan everything. This bot is useful for tracking down closed businesses and for updating stale data, so if there is a particular region you like to look after, consider doing the following:

  1. Enable the "ErrorHighlighter" gadget from the "Gadgets" tab of Special:Preferences. Once enabled you will be able to see dead links and other syntax issues highlighted in articles.
  2. To see a list of articles that contain dead links within a region, go to [2] and change "California|6" to whatever region you are interested in (example: "New York City|6").

Let me know if there are any questions or concerns. Kudos to User:Traveler100 and User:AlasdairW who have already been scrambling to fix dead links as the bot is updating things. -- Ryan • (talk) • 22:18, 24 January 2017 (UTC)Reply

Aw! I was excited that the number of articles with dead links fell below 7000. I started to fix dead links for New South Wales related articles and I was going do it for all of Australia hopefully but now it will take a longer time. Oh well, just more work to do. :) Gizza (roam) 04:41, 25 January 2017 (UTC)Reply
The war against link rot is (unfortunately) never ending :) -- Ryan • (talk) • 06:13, 25 January 2017 (UTC)Reply
I think it can at least be made easier by people not including stuff like "/home.html" in links in the first place. Quite often dead links are fixed by just cutting of something like that. Please try and be on the lookout for stuff like that when adding links. Hobbitschuster (talk) 15:54, 25 January 2017 (UTC)Reply
I have it on my TODO list to have the bot re-check any dead link of the form "http(s)://www.example.com/(index|default|home)*", and if the link works without the "index|default|home" part to then replace it, but I haven't gotten around to implementing and testing that yet. I did recently run an update that fixes links with extra slashes in the URL ("//") or that have an improper protocol ("htp://", "http//", etc). -- Ryan • (talk) • 16:21, 25 January 2017 (UTC)Reply
That would help, indeed. However, the bot also flags a large number of listings for places that have simply closed or changed ownership. I find it a bit depressing... but it's certainly very useful :-) JuliasTravels (talk) 16:35, 25 January 2017 (UTC)Reply
Another thing that might be worth taking into consideration (either in a separate bot or in a future update) is a specific type of link squatting that is falsely labeled as a live link (even if the link has been previously labeled "dead") such as seen here (I raised the case of this specific link at Talk:Isla de Ometepe, but I think the issue is broader than that, as this particular design (is it a particular hosting service that does this to previously live domains?) is particularly common so if it is possible and not too much work to implement something that detects those (or simply not labelling any previously dead links live unless by hand) would be useful. Especially if such a link became dead prior to the first bot run but never showed up as dead. Hobbitschuster (talk) 17:24, 25 January 2017 (UTC)Reply
Domain squatters will be out of scope for anything my bot would deal with, unless someone can come up with a simple and reliable way for an automated tool to identify them. -- Ryan • (talk) • 17:32, 25 January 2017 (UTC)Reply
Amazing how many have gone bad in less than a year. We had almost fixed all marked bad links for the United Kingdom and it is getting towards 100 again and not even halfway though the alphabet. --Traveler100 (talk) 18:13, 25 January 2017 (UTC)Reply
Well, it helps to keep our guides up to date, so that's a good thing all around. How many of those links would you say are really dead dead and how many would you say are just the above outlined problem of complicated URLs jumping around? Hobbitschuster (talk) 18:19, 25 January 2017 (UTC)Reply

@Wrh2: Can you at least tell the bot not to mark links as live that have previously been marked dead (unless they have since been marked live by human editors)? I am more comfortable with a handful of false positives than with linksquatters falsely labeled live links if there is a way to prevent it. Would that be possible to implement? Hobbitschuster (talk) 18:18, 25 January 2017 (UTC)Reply

I would rather not make that change unless there is a broad agreement to do so. Some sites break temporarily, and sometimes people update links but don't remove the {{dead link}} template, so I think it is safest to reflect which links were active at the time the bot last ran. -- Ryan • (talk) • 18:22, 25 January 2017 (UTC)Reply
+1 for doing the change: I get much more "working" links that go do domain squatters than real dead links. Jlg23 (talk)
I think it is more common that a link becomes live through a link squatter being mistaken for the real deal than for a previously dead link to become the genuine article once more. And while some do forget to remove the dead link template, this is caught when checking up on dead links, whereas false negatives are much harder to catch. I personally tend towards any system that produces false positives instead of one that produces false negatives, as false positives are usually less harmful when it comes to dead weblinks. Hobbitschuster (talk) 19:06, 25 January 2017 (UTC)Reply
Yes, in the last month I have been going through the ones marked in May last year as bad links. When I clicked on them, they went to an active web site, in the majority of the cases to domain name squatters. --Traveler100 (talk) 21:07, 25 January 2017 (UTC)Reply
I wonder if a custom edit summary would be a feasible middle ground here. Anyone interested in that problem could look for the edit summary. WhatamIdoing (talk) 01:35, 26 January 2017 (UTC)Reply

Status update edit

The bot has now processed every article. For those interested in helping with cleanups:

Thanks to everyone who has helped with cleanups thus far - a review of recent changes shows that a lot of closed listings have been deleted, and a lot of broken links have been fixed. If anyone sees any bot edits that look incorrect please let me know so that I can fix it before running again in the future. -- Ryan • (talk) • 07:00, 28 January 2017 (UTC)Reply

thanks for doing the update. Now we all have some work to do :-) --Traveler100 (talk) 07:53, 28 January 2017 (UTC)Reply
Too true. It's a bad sign when an article like Trepassey and the Irish Loop breaks in a couple of places in the first four days after its creation because the government in St. John's moved the entire parks and environment web site. K7L (talk) 14:04, 28 January 2017 (UTC)Reply

Statistics edit

As of 13 August 2018

Articles with dead links by type and status in Category:Articles with dead external links
Type/Status
Matrix
Outline Usable Guide Star Unranked Total (line)
District 42 310 95 18 465
City 2306 2561 345 14 5226
Airport 3 11 3 0 17
Park 121 45 9 1 176
Dive guide 1 5 0 1 7
Region 321 116 4 1 7 449
Country 71 16 1 1 89
Itinerary 19 18 4 0 41
Travel topic 38 28 9 1 76
Total 2922 3109 470 37 7 6555
Articles with dead links by continent and status in Category:Articles with dead external links
Type/Status
Matrix
Outline Usable Guide Star Unranked Total (line)
Africa 167 127 15 1 311
Antarctica 0 0 0 0 0
Asia 581 576 133 7 1298
Europe 890 1038 144 5 2080
North America 1026 1089 147 22 2286
Oceania 64 76 6 1 147
South America 144 161 13 0 318

A note on false positives edit

As the bot is re-running there have been a few links tagged as "dead" that appear to still be alive. I'd like to call out the reason for a few of these so that people understand what's going on:

  • In a number of cases I've seen Facebook links come back as dead links, but clicking on them in a browser appears to show a valid page. In the cases I've seen, it appears that the pages in question aren't public and only return content if you are logged into Facebook - if you open the same page in an incognito window you get the "not found" error that the bot is seeing.
  • There are some cases where a site is simply misbehaving. In this diff the https://www.loewen-schwarzenberg.de/ link is clearly returning valid content when you visit it in a browser, but the HTTP response code from that URL indicates that the URL is invalid (404 not found). If the site itself is returning a response saying that the URL is invalid, any bot (mine, Google's search spiders, etc) is going to flag the link as invalid.

I'll post any others issues that may generate false positives as I find them. -- Ryan • (talk) • 01:55, 1 June 2020 (UTC)Reply

Spring cleaning 2023 edit

Swept in from the pub

Hey, all. Many of you will have seen on your watchlists that User:InternetArchiveBot has been on its travels, touring the world of Wikivoyage and flagging hundreds of dead links. If you find yourselves lost for something to do at any point this weekend, then spending a few minutes fixing some dead links would be a huge help to the project. Alternatively, if you have better things to do, have fun doing those! --ThunderingTyphoons! (talk) 16:29, 27 January 2023 (UTC)Reply

I did this for Indianapolis, Indiana, and Midwest yesterday and learned a few things while doing the work, so it was more engaging than just mechanical edits (if that encourages anyone else). —Justin (koavf)TCM 17:10, 27 January 2023 (UTC)Reply
You can find affected articles in Category:Articles with dead external links. Do please double-check the bot's work, because it'd be sad to lose a listing if a website was only down briefly, but I'll bet that this is a good way to find attractions that have closed during the pandemic.
Justin, did you have any success in finding replacements (e.g., if a restaurant changed its website)? WhatamIdoing (talk) 21:18, 27 January 2023 (UTC)Reply
A problem is that I cannot keep up that speed, even with articles I have been engaged in. The list will be lost in Recent changes and the Watchlist sooner than I can fix those (which also means flooding them isn't really helpful). Luckily there is Category:Articles with dead external links, to which one can return later, to find familiar articles (or any articles, if one has got the time). –LPfi (talk) 21:23, 27 January 2023 (UTC)Reply
I did, thank you for asking: nothing was entirely removed and I found that a certain bus route was outdated and replaced it with another. In no case did I just remove a listing entirely. —Justin (koavf)TCM 02:27, 28 January 2023 (UTC)Reply

In many cases the dead link is because "example.com/In_english" has become "example.com/en" or similar more complicated changes, especially where the link was to some specific aspect of a venue (one example I stumbled upon was a senior home with good lunches, in a town with few restaurants). A few years ago a lot of places in Finland moved from example.com to example.fi. Such things are often reasonably easy to fix if you know the language or the local web trends. Possibly important to fix, but the entry should certainly not be removed just because of the dead link. –LPfi (talk) 10:54, 30 January 2023 (UTC)Reply

Category:Articles with dead external links is the hardest maintenance category is get under control. After every round by a bot, we end up with 8000 or so articles that have dead links. Just have to keep on going at it! Gizza (roam) 23:35, 1 February 2023 (UTC)Reply
Cyberpower678 (but also ThunderingTyphoons!, WhatamIdoing, LPfi, Vidimian), could you tell me how can I activate this bot for it:voy as well? --Andyrom75 (talk) 14:09, 2 February 2023 (UTC)Reply
First, I suggest reading Wikivoyage:Travellers' pub/2022#InternetArchiveBot to get some idea of what it's capable of. The way it works for Wikipedia might or might not be what you want. (It wasn't what we want here, but every community is different.) Then I suggest contacting User:Cyberpower678. WhatamIdoing (talk) 22:11, 2 February 2023 (UTC)Reply
Great; I think it's perfect how it has been tuned to work here on en:voy. Thanks, Andyrom75 (talk) 21:38, 3 February 2023 (UTC)Reply

Misidentified links edit

Cyberpower678 and Harej would probably appreciate a few notes about links that are tagged as dead, but which seem to be working. Please add anything that seems like it could be a pattern that the devs might want to investigate. WhatamIdoing (talk) 17:09, 30 January 2023 (UTC)Reply

I'll start with this pair of links in the COVID-19 article. Alasdair said they're just slow, rather than actually dead, and the bot has been edit-warring to tag them for days. WhatamIdoing (talk) 17:10, 30 January 2023 (UTC)Reply
I think I identified a pattern of the bot tagging websites opening with a pop-up as dead when they aren't. I wasn't watching here closely these days, so I started a thread at the user talk of the bot at Meta instead. More details can be seen there. Vidimian (talk) 23:49, 1 February 2023 (UTC)Reply
IABot is generally whitelisted as a bot on most CloudFlare protected sites. In this case, the firewall it's hitting is a geo-restriction, set by the site, to challenge all clients outside of the local region. We'll have to whitelist them. —CYBERPOWER (Chat) 16:07, 3 February 2023 (UTC)Reply
I've temporarily blocked the bot from editing British coast and Boating in Finland (cc AlasdairW and LPfi) to stop it from edit warring. Cyberpower678 and Harej, once you've resolved this issue, please give me a ping so I can unblock the bot from editing those two pages. SHB2000 (talk | contribs | meta) 05:47, 6 February 2023 (UTC)Reply
@SBH2000: I've whitelisted the domain from the British coast page as the bot is disallowed from accessing it. As for the Boating in Finland, there's a syntax parsing bug happening resulting in repeated tagging. Fortunately there's a workaround. The issue is caused by the <!-- markup for HTML comments. When it's directly attached to an external link, this bug is triggered. Simply adding a single whitespace between the link and the markup, will workaround the bug. I have opened a Phabricator ticket to track this.—CYBERPOWER (Chat) 18:44, 17 February 2023 (UTC)Reply
@SHB2000: fix ping.—CYBERPOWER (Chat) 18:46, 17 February 2023 (UTC)Reply
In Germany, http://lufthansa.com was tagged as broken, but for me it just forwarded to the https version of the site (which works fine). El Grafo (talk) 11:28, 6 February 2023 (UTC)Reply
On London/Westminster, the bot keeps flagging this website as dead, but it's live. It's an understandable glitch, because it is a very old-looking site ("© Oxford Tube 2012") and there is a much newer official site at this URL ("© Oxford Tube 2021"); however, the latter redirects to the old site when the user attempts to buy tickets, so the old one must be considered the primary official site. This is probably more Stagecoach's fault than IA Bot's, but it's still annoying having to "edit war" with a bot.--ThunderingTyphoons! (talk) 13:58, 6 February 2023 (UTC)Reply
It sounds like we need some anti-edit-warring code. WhatamIdoing (talk) 16:42, 6 February 2023 (UTC)Reply
I have tried both reverting the bot's edits and commenting out the dead link tag, but neither have worked. I think we need a way of flagging that a link may generate a spurous dead link, so that the bot ignores it. Maybe we need a slow link template, which could also generate a list of such links for manual checking. AlasdairW (talk) 23:50, 10 February 2023 (UTC)Reply
At the English Wikipedia, some of the bots just watch for reverts, and stop edit warring. I'm pretty sure that w:en:User:XLinkBot by Versageek and Beetstra is set up that way. WhatamIdoing (talk) 17:27, 11 February 2023 (UTC)Reply
IABot used to do this as well, but unfortunately, given the scale of the bot, the code was to resource demanding on current infrastructure to keep it running. I had to disable it in favor of making the bot simply do a better job instead. —CYBERPOWER (Chat) 18:25, 17 February 2023 (UTC)Reply
@Cyberpower678, what mechanism were you using for the anti-edit-warring code? Could something simpler be done, like checking to see whether the bot has edited the article in the last month/year? WhatamIdoing (talk) 17:15, 19 February 2023 (UTC)Reply
It was specifically looking if it had previously made alterations to the link in the past, and if it was about to make the same alterations again. Simply checking to see if the bot edited in the recent history is way too blunt of a method in this instance, as dead links can happen at any moment. —CYBERPOWER (Chat) 21:58, 19 February 2023 (UTC)Reply
Well, it is a blunt instrument, but that might be better than edit warring. We have enough dead links tagged to keep us busy for a long time. WhatamIdoing (talk) 22:55, 20 February 2023 (UTC)Reply
You can use {{cbignore}} to direct the bot away from a link. Though whitelisting links the bot simply cannot access is probably a better solution. —CYBERPOWER (Chat) 18:24, 17 February 2023 (UTC)Reply
Oxfordtubes.com is set to disallow ALL bots on CLoudFlare. I have whitelisted the domain on IABot. —CYBERPOWER (Chat) 18:23, 17 February 2023 (UTC)Reply
Very interesting case here. It's been getting 403s on the specific URL up until yesterday. When I tested the URL with the bot today on the same machine, it's loading fine. Not sure what is going on here, but the bot has self-reset the status to Alive. —CYBERPOWER (Chat) 18:20, 17 February 2023 (UTC)Reply
Return to "Articles with dead external links" page.