Wikinews:Bots/Requests/InternetArchiveBot


 * Closed as successful. [24Cr][talk] 07:08, 30 October 2021 (UTC)


 * Operator: and
 * Bot name:
 * Programming language: PHP
 * Already used on: Operates on dozens of additional Wikimedia wikis
 * Task: InternetArchiveBot identifies dead links and adds links to archived versions where available. Per request on Phabricator. Harej (talk) 22:32, 20 January 2021 (UTC)

Task 1
We needed IABot to take care of two different tasks. Upon careful discussion, it was agreed to split the tasks, the prior being taking the snapshot of all the sources in archives (currently under progress, oversee by user:acagastya); and the other being taking snapshots of all the sources for new articles. Since the first task is under progress, let's focus the discussion on just the latter task. •–• 20:34, 25 January 2021 (UTC)

Task 2

 * All the sources used in a mainspace article are expected to be listed in the url parameter of source. source also accepts archiveurl parameter.  Could you configure IABot in such a way that it archives the sources from mainspace articles (which are NOT archived), from url and update archiveurl?   does this sound safe?  I hope I am not overlooking something, please let me know if I am. •–•  20:41, 25 January 2021 (UTC)
 * I have configured the bot to recognize the "archiveurl" parameter. Harej (talk) 20:50, 25 January 2021 (UTC)


 * Seems to me one ought to wait until publication-plus-24-hours before archiving sources. Other than that, I have no strong objection; the most technical harm that could possibly be done is quite limited. --Pi zero (talk) 21:05, 25 January 2021 (UTC)
 * Does IABot have specific triggers (like working at the time of new page creation)? Or does it work in a said time-interval?  Re what  has added Harej, I think IABot can just check, if(wikitext.categories.includes('Published')) { run; }.  Will that work?  Additionally, can you control which snapshot instance will be added to archive URL?  The snapshot with timestamp closest after category:published was added will be ideal. •–•  07:17, 26 January 2021 (UTC)
 * (Yeah, published will likely do, which would admittedly be much easier.) --Pi zero (talk) 17:45, 26 January 2021 (UTC)
 * Acagastya, IABot does not have triggers. However, the bot will use the archive corresponding to the stated access time in the citation, or the closest one to when the URL was added. Harej (talk) 17:20, 1 February 2021 (UTC)


 * Would IABot be able to fill in all three parameters on source for pages with deadlinks? Namely also, brokenURL and archivedescription</tt> when links are being recovered? —chaetodipus (talk &middot; contribs) 04:46, 29 July 2021 (UTC)
 * I see the Phabricator note says this is stalled. Could you briefly outline what needs to be done to unstall? --Green Giant (talk) 03:38, 7 August 2021 (UTC)
 * It’s been a fortnight with no response. Please could you provide an update? If not, shall we close this request? [24Cr][talk] 12:54, 21 August 2021 (UTC)
 * I think that is because noone here is voting -- we need to vote it to pass or fail for them to proceed. <span style="color: #000; box-shadow: 0 0 7px #5de; padding-left:2.5px; padding-right:2.5px; border-radius:10px;">•–• 12:59, 21 August 2021 (UTC)
 * Is that why it is listed as stalled? If so, we can vote but I was hoping to see a test run of 10-20 edits first. However, if the same task is being done on another wiki, I guess we can move to approval. [24Cr][talk] 13:10, 21 August 2021 (UTC)


 * I will test the bot for 20 edits. After around 20 edits I will stop the bot, and let you assess. Harej (talk) 22:08, 13 September 2021 (UTC)
 * This may be delayed as the bot is currently down for maintenance. Harej (talk) 23:44, 13 September 2021 (UTC)

We have been trying to do test edits on the bot and have not been succeeding. Basically the bot goes through AllPages in alphabetical order and all the pages it has come across are protected. The bot will work for “single page” runs via the Management Interface (makes edits on your user account’s behalf, so it will work on the pages you normally can edit) and also multi-page runs for unprotected pages. If you want the bot to make background edits, it will need to be promoted to admin. Harej (talk) 19:08, 27 September 2021 (UTC)
 * I’ve promoted the bot to admin for a month to help the test run. Please advise if anything else is needed. [24Cr][talk] 12:53, 2 October 2021 (UTC)
 * @Harej, I saw the bot made a single edit so far (diff). If the bot is rescuing a dead link in sources, can it also add "brokenURL = true" so that it displays the archived link in the article? —chaetodipus (talk &middot; contribs) 04:39, 20 October 2021 (UTC)
 * chaetodipus, it will add that parameter on future edits. Harej (talk) 18:30, 20 October 2021 (UTC)

Votes
given the limitations of the bot (the way it was designed, to serve a specific purpose), I think it is achieves a part of the task, and I am okay with the compromise. <span style="color: #000; box-shadow: 0 0 7px #5de; padding-left:2.5px; padding-right:2.5px; border-radius:10px;">•–• 17:34, 1 February 2021 (UTC)
 * And thinking about it, if the bot does not edit semi-protected pages, we won't royally screw up. :D <span style="color: #000; box-shadow: 0 0 7px #5de; padding-left:2.5px; padding-right:2.5px; border-radius:10px;">•–•  17:36, 1 February 2021 (UTC)

if it gets the process moved to implementation. [24Cr][talk] 00:01, 27 August 2021 (UTC)

I think this would definitely be useful in recovering the many dead sources in our archives. —chaetodipus (talk &middot; contribs) 05:27, 21 October 2021 (UTC)

I am familiar with IAB and find it extremely useful on other projects. I am surprised to learn in viewing this request that it isn't already approved. I think including this is a no-brainer. --TheSandDoctor (talk) 21:29, 22 October 2021 (UTC)