New season, new algorithm

Richard Martin-Nielsen · March 21, 2023

New season, new bot

Where I live it is starting to look a little like spring. I realise some people are now waiting for autumn and winter, and others may be thinking in terms of dry season or wet.

Still, this week I am going to start running the new bot code.

A drawing belonging to the description of Stonehenge; view of a complete Stonehenge, seen from above. Pen and brown ink, with indented lines. Drawn by John Webb between 1650-1655 A drawing describing a complete Stonehenge, seen from above. Pen and brown ink, with indented lines. Drawn by John Webb between 1650-1655. Image from The British Museum, used under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) licence.

What was the old?

I travelled away from home a few weeks ago and this forced me to see if I could get the bot to run by itself.

In the end, the answer to this question was, “partially”.

Nonetheless, having spent some time figuring out how to get the code to run on a regular basis, I then have left it to run by itself for a few weeks more.

In the meantime I have been reworking how the bot works in fairly significant ways.

The original “bot” code was a very-lightly adjusted version of code to prepare a digest of toots. Intended to be run once or twice a day, it would come up with a list of interesting and possibly-unseen toots, and present them nicely. I reworked this so that the same code would boost them instead, added in some logic to be able to scan some high volume news accounts and identify toots of interest to an ADN audience, and bolted a UNODA-specific RSS bot onto the side. It’s not pretty, but it ran.

Twice a day, it would dump a series of toots into the timelines of the few people who follow it. (Thank you all.)

Having it running on a somewhat regular basis (I now realise I broke something on the production code a few days ago and didn’t quite notice) let it quietly irritate me on a regular basis with some of the things it was doing, and left me some time to try reworking it.

How is the new bot different?

The new bot’s algorithm and functioning is more explicitly based on the ICYMI (Law) bot.1

It now functions in two separate phases, Scanning and Boosting. They might be also be called filling the bucket and emptying the bucket.

Scan phase

This is a very rough description of what the algorithm does. It is intended to run roughly every two hours over a space of about ten hours in my (Eastern Europe Time) day.

  1. Look at its timeline for the past 24 hours.
    • Ignore any posts from accounts with #nobot or #noindex in their profile.
    • Ignore any posts it has already favourited, bookmarked or boosted.
    • Ignore any posts older than 24 hours.
  2. Assign scores to posts
    • Calculate the geometric mean of the number of likes, boosts and replies
    • For filtered accounts (high volume general purpose accounts which post quite a bit of ADN material, mixed in with a lot of other stuff, and which don’t use hashtags), boost this figure slightly when there are keyword matches in the text of the toot for the hashtags the bot follows.
    • Adjust this score according to the number of followers the author account has (having fewer followers gives a boost, since it’s less likely to have been seen broadly)
  3. Drop all posts with a score of zero (it happens)
  4. Drop toots not in the top quartile of scores of the remaining toots (that is, having dropped the zeroes)
  5. Look at the highest scored toot from each account, dropping others from the same account
  6. Choose the top three, and put them into a bucket to be boosted.

Other things which the scan code collects information about but which haven’t been wired in yet:

  • Look at what it has boosted in the past 5 days.
  • Try to determine if a toot originates from twitter – as with the ICYMI (Law) bot, I may wish to throttle boosts of twitter-originating toots.
  • Keep track of what url a toot links to – I eventually would like to try to avoid boosting repetitive links by the same account to the same news article, but unfortunately at least one of the accounts which provides good inputs links everything to its front page, so this will need to be looked at carefully.

I am aware that the third quartile measure will slow decline as I repeatedly “skim off” the high-scoring toots. I’m curious to see how this will function in practice.

Boost phase2

This bit of the code is very simple.

Once every twenty minutes or so, the code should look in the bucket, take a toot out, and boost it.

Testing, again

I have learned a lot in adjusting the code, including starting to work with Pandas and revisiting the awesome firepower of BeautifulSoup. I am also having to pick through how to get my code to operate on a scheduled basis – sometimes this is the hardest part, and I may yet end up writing my own notes on how launchctl works.

I will run the bot on a scheduled basis again. The outputs may be a bit more haphazard than they had been but I can see how this is going to get to a better system.

The code isn’t pretty and it’s not doing everything I would like it to. I plan to improve on both of those points in coming weeks.

  1. The text description of the ICYMI (Law) bot algorithm was copied into the source code, commented-out, and translated. 

  2. I just like that as a heading. My MTCR background is showing. 

Twitter, Facebook