Motivation
The ICYMI (ADN) bot has now been running in an automated mode for several weeks, having first been run in semi-automated mode for a few weeks starting in early February. I have a general sense of what it is doing but am not editing or reviewing its algorithmic operation.
I have a sense of one or two things which I may want to adjust about the algorithm but want to have a look at what it’s doing in a slightly more systematic way first. I have a month’s worth of logs and I will try looking at them.
First look
The bot assigns a score to each toot it sees, then boosts the toots which get the highest score. This can mean that when there aren’t many toots, toots with lower scores might get boosted which otherwise wouldn’t be amplified. (I think this happens on weekends.)
First, a quick look at how the scores range over the month. Not much. There are a few outliers.
But even when I crop out the outliers, there’s not much visible pattern to see.
Accounts boosted
Anyone who follows the bot and will see that it tends to boost some accounts a lot. This isn’t very surprising since (a) it doesn’t follow many accounts and (b) even those accounts who do post about ADN topics don’t all consistently use the hashtags which the bot listens for.
Still, a histogram shows that @nknews, @nukes_of_hazard and @bojacobs are most likely to score well and be boosted.
Scores of toots from regularly boosted accounts
Looking at the scores of toots from the most frequently boosted accounts, there is some variation.
Day of week variation
I’ve wondered if there were any weekly cycle in the scores of the toots boosted - with a slight thinking that on the weekends when there is less “official” or “news” traffic, the threshold required to be boosted might drop. There might be slight dip on Saturday and Sunday, but I suspect that my scoring is blind to whether items are popular and interesting due to being topical or not.
Sites referred to
Looking at which sites the toots link to there is greater variation, though nknews.org, linked to the @nknews account, gets almost half the links.
Beyond that, there is a mix of press, NGOs, and other specialised media outlets.
Keyword frequency and topic analysis
A very very basic keyword search was used to mark all boosted toots based on the content of the toot (not any linked site). This was then used to look at which topics are most frequently referred to, and where they may overlap.
More than half all the toots referred to nuclear, followed by Korea, “missile” and Russia. Interestingly, “Korea” and “missile” together came nearly as frequently as “nuclear” alone, then followed by “Korea” and “nuclear”, and “Russia” and “nuclear” together.
31 of the boosted toots (out of 371 total) didn’t fall into any category.
“nuclear” and “power” appear together in 26 toots and it seems the bot is amplifying discussion related to nuclear power plants.
In the time period concerned, North Korea carried out several missile tests, some South Korean figures suggested that ROK should seek a nuclear weapons, and Russia announced that it would place nuclear weapons in Belarus. Germany also announced that it would be closing its last nuclear power plants and Finland brought a new nuclear power plant on line.
Comments
Doing all this has given several negative results. It has shown a relatively small fraction of boosted toots don’t (obviously) fit into any of the main categories of ADN activity which I think I want to be capturing.
It has confirmed that I don’t have much range in the scores which are being assigned.
It seems to suggest that there isn’t much variation in the threshold being used from day to day, though perhaps with a slight “lowering of standards” on weekends.
It suggests that the scoring does mark for popularity but not so much for strict relevance to ADN.
I have not set up this analysis to compare the age of boosted toots. This information isn’t stored in my logfiles. I can think of a way that I can store it going forward (but that will require changing to bot’s code) and a way to extract it while doing this analysis (as is done for the text of toots) but I’ve just not done that yet.
Next steps
There are some more options for analysis which could be done here.
I would like to get the bot to be less “sequentially enthuasiastic” about single accounts. That is, if there is only one account which is above the threshold for a few hours, I’d prefer that they be amplified less rather than that the bot amplify this only active account.
I am considering getting the bot to detect and thin cases where more than one toot is referring to the same web page (usually a news article). The danger with this is the @ArmsControlNow account regularly links directly to the top of their Arms Control Today website, even when they are referring to specific articles on the site (which they coudld directly link to). So, filtering out enthusiasm for one set of articles could end up eliminating amplification of hundreds of other articles.
In general I could make the bot store a bit more information about each toot that it’s boosting - to start with when it does the amplification, and possibly to include some of the data which it uses to generate the score which decides the amplification, as well as the threshold being applied at each point. Logfiles are good, these ones aren’t going to get that big - and it has been useful to have these files and I think I could learn more from having a bit more information about how the bot has been working.