My Error Theory

Monday, 26 August 2024

About the recent elections

It has been about a month and a half since the recent UK elections and it has in some ways been the most important election of my life. I haven't seen a major change of government since 2010, when I was 14 years old. Rather coincidental that I'm now double that - 28 - clearly the UK have been planning their transitions around me. Anyhow, I'm not going to delve into any discussion around which party had the best manifesto, or who I wanted to win. Instead, I'm going to take a more philosophical route and discuss some thoughts around democracy and voting systems.

Democracy

The first question - what is a democracy? I've looked through a few definitions, but I'm going to start with the one from Brittanica:

Democracy is a system of government in which laws, policies, leadership, and major undertakings of a state or other polity are directly or indirectly decided by the 'people'

Different countries will have different notions of a democracy - this can be the degree to which decision are made directly (e.g. referendums) or indirectly (putting trust in a trusted set of officials). It can also be the degree to which decisions are made locally (e.g. local government) or centrally (e.g. central government). There are also some questions around what the 'people' refers to - is it everyone? Is it a 'meritocratic' group?

Usually central to the notion of a democracy is a concept of an election, where the 'people' are able to have their say on the management of the country through voting for different candidates that have their own plan of governance. The theory is that the 'people' will vote in accordance to their own preference relations, and that consequently, candidates will need to have plans that take these preferences into account.

The UK democratic system

In the UK, we have the house of commons which is comprised of 650 elected members of parliament (MP). Each member of parliament is associated with a constituency - a defined area of the country. During elections - which must be called minimum every 5 years - the members of the constituency will vote for who will be their representative to the house of commons. This is also known as their local MP.

When a new law is being passed, all MPs are given a vote and a majority is required to pass it. This creates a delegated authority - the 'people' want to vote for an MP who will in turn vote in accordance to policies that will benefit the 'people'.

While voting is done on the individual level (i.e. voters vote for a candidate), candidates tend to represent different political parties. The idea is that members of the same political party would demonstrate similar interests in policy, and more crucially - that they would tend to vote together. Thus if one political party was able to get a majority of the 650 seats, then they could almost guarantee that their laws would pass.

Citizens might not know much about the individual candidates in their constituency, but they do tend to know the individual parties (e.g. Labour, Conservative), which means they will usually vote based on which party they want to win.

The UK uses a 'First past the post' voting system. This means that voters only vote for a single candidate, and the candidate with the most votes will win the seat. As a result, parties will usually only field a single candidate from their party for each constituent. This is to avoid the case where a vote is split between two candidates from the same party, leading to a secondary party getting the majority of votes (and thus winning the seat).

What happened in the UK election 2024?

I mentioned that I'm only going to talk about the results from a system level. So I want to begin by showing some interesting statistics from the election. The following image comes from BBC news: https://www.bbc.co.uk/news/articles/c4nglegege1o

A few observations. First is that Labour got 34% of the votes but won 63% of seats. This means that 34% of all votes went to Labour party, but Labour got the majority of the votes in 63% of constituencies.

In striking comparison, Reform UK and Green party won 14% and 7% of votes respectively, but only won 1% of seats each. This means in practice that while around a 5th of the voters voted for either Reform or Green party, they actually only won 9 seats out of 650 (about 1.5%).

This oddness is even more explicit when you comparison votes for reform v votes for Lib Dem. Reform actually got 2% more votes than Lib Dem, but won less than a 10th of the seats.

Now there a few practical reasons why this occurred. One is the Labour and Lib Dem coordinated to maximise conservative losses. This means that they did not invest campaign resources that would contest the strong seats of each other. In other words, Labour did not try and fight the Lib dem strongholds. This meant that the competition was usually Labour v Conservative or Lib Dem v Conservative. The vote did not get split between Labour and Lib Dem. This would also happen on an individual voter's level. Many voters focussed on maximising Tory losses and thus would vote for whichever party (be it Labour, Lib Dem or otherwise) that would be most likely to beat Tory.

Let's call this observation tactical voting, where voters do not reveal their preference to ensure an overall preferred voting split.

We saw the opposite result happen with Conservative and Reform. It was suggested that a large majority of Reform voters would have otherwise voted Conservative. As a result, their vote was effectively split. This was devastating to both parties - as it meant that conservative kept having to fight the strongest of Labour / Lib Dem, while their own votes were being eaten by Reform. It also meant that Reform was unlikely to win many seats - many of their votes came from areas that were predominantly Tory. So they ended up splitting the vote to an even greater degree.

Lets call this observation split voting, where third-preferred party wins as a result of voters splitting their votes between their first two preferred.

Note how both of these phenomenon are indicative of a larger worry of wasted votes. If a party does not win a constituency, their votes they won are effectively useless. Similarly, voters who want to vote for fringe options, may effectively be making wasted votes.

A final point that's worth mentioning. Parties campaign and voters vote with the knowledge that we have a first past the post system. This means that candidates may and are likely to vote differently if the voting system was different. This doesn't mean that we should take a neutral stance towards voting a system. For example, we could have a voting system where only one person's vote actually matters (effectively dictatorship). Everyone else might vote in this manner and make votes that don't align to their preference at all, as they know their vote doesn't matter. This is clearly not a good system.

What makes a good voting system?

This question has been asked for a while. I think one of the challenges here is that there is a distinction and potentially, a tension, between a good voting system, and a good governance system.

For example, we can consider a system where there is only one constituency. All votes are distributed towards this one constituency, and then we split the 650 seats based on the proportion of votes given to a specific party. In some ways this is very positive, as no vote is wasted. On the other hand, try passing a law when there are 30 parties each with 10 MPs. For example, even in the current set up, Labour only having 34% (~220) seats would make it enormously difficult to pass laws, as they would need to get another 105 votes from other parties.

There has been some mathematical analysis involved in this discussion. Probably the most basic and relevant discussion is to talk about Arrow's impossibility theorem. Arrows theorem begins by discussing a few useful properties we might expect from a voting system. The idea of course is that each individual expresses their vote into the system, and then the system outputs a 'collective' vote (which for the UK system, is just the end output of MP numbers by party). Preferences must be transitive - this means that if an individual prefers A to B and B to C, then they must prefer A to C.

Unanimity: preferences that are agreed by all individuals must be respected
Unrestricted domain: all agents can have any preference and order their choices in any way, including by indicating ties
Independence of irrelevant alternatives: an assessment between two options only depends on those options - no other options or comparisons matter.
Non dicatorship - there is no individual who's vote will fully define the collective vote

The point of the impossibility theorem is to show that these four properties actually contradict one other. In other words, a voting system that meets the first three properties, will always have an individual with dictatorial powers.

Note that there are three dimensions worth considering here:

First is the agents actual preferences - who the actually prefer. (this tends to have no constraint)

Second is the agents vote - who they end up voting for (this is constrained by the voting system - in first past the post they only have one choice)

Third is what the system outputs from the votes - in the UK this is an allocation of MPs.

Note that mathematically, the second dimension is abstracted, as it is simply a method from people's preferences to system outputs. But in practice the second dimension is very pertinent and can very much lead to a distinction between the preferences and their system ouputs.

Let's briefly talk about why these properties matter:

1) Unanimity: If all agents agree on a specific preference (i.e. A over B) then the system output will agree with this preference. Note that first past the post will meet this preference - if all individuals vote exactly in accordance to their first choice. But if people don't vote according to their preference, then its likely the system outputs will not align with the outputs.

2) Unrestricted domain: individuals are free to vote however they wish - there are no artificial restrictions.

3) Independence of irrelevant alternatives (IIA): the idea here is that we can add or remove choices without affecting a preference between two original choices. For example, suppose I ask who you prefer between option A and B, and you choose A. Suppose now I tell you that option C exists - it would seem bizarre for you now to prefer option B to option A.

4) Non dicatorship: intuitively, a dictator is someone who has total control over a system - whatever they want, goes. A dictator in an election can therefore be considered a voter with the power to fully influence the vote. Whatever they vote - goes. Clearly have a dicatorial voter is a bad thing, as it means that no one elses vote really matters.

An analysis of first past the post

Now let's talk again about the UK election system, i.e first past the post. We've previously described issues such as tactical voting, split voting and wasted votes.

Many of the problems with first past the post relates to its violation of IIA. This is usually because even though candidates will have preferences that are in line with IIA, their votes often go against their preferences in a way that the system output violates IIA.

Tactical voting:

The agent might prefer A > B and A > C. But they might think that other people are more likely to vote for B. So they end up voting for B instead so that C can't win. In otherwords, the existence of third choice (C) means that end up voting in a way that B > A! This violates IAA.

Practically - someone might flat-out prefer the green party (meaning they prefer green to labour and conservative). But they might know that very few people will vote Green and they really don't want conservative to win. As a result, they choose to vote labour.

Note that this leads to another problem - agents have some sense of what other people are voting. If they had no idea what other people would vote for, then it would be optimal for them to vote in accordance to their preferences. This is a potential problem with having 'polls' before elections. People use them to get a sense of other votes. Note, that's not to say that they should be banned, as I'm not sure how effective that is. Neither is it to say whether poll-influenced voting is better or worse in terms of overall result. But there is a sense in which it does not feel democractic and is not based on the original interests of the people.

Split voting:

A similar problem that violates IIA. We can have the majority of people have preferences like:

Conservative > reform > labour or Reform > conversative > labour.

However the end system is one in which labour wins - so the fact that we have two similar options of conservative and reform means the third choice wins.

Or in other words, we could have a system where the majority of people prefer option A to option C. But the existence of an option B (which splits the vote), means that the end system prefers option C to option A.

Note that split voting will occur even without polls. For example, suppose we have the following set up:

30%: A > B > C

30%: B > A > C

40%: C > B > A

Here we have 60% people who think that C should not win, they're just split between A and B. But if B did not exist, then C would definitely lose.

If people do not have polls then they might vote in accordance to their preference. But under first past the post, C still ends up winning, as we'd get a 30% vote for A, a 30% vote for B and a 40% vote for C. We'd need a system where second choice matters, to avoid this scenario.

Wasted votes:

As we know, first past the post allows for wasted votes. The wasted votes are very much connected to tactical voting. People vote tactically to ensure their votes are 'less wasted'. But because First Past the Post means that individuals can only vote for one thing (and said vote is frankly irrelevant if its not the top two choices), then individuals are encouraged to vote differently to their preference!

This is effectively another violation of IAA. The end system gets weird outputs because people are not voting in accordance to their preference.

Back to the UK election results

So I've discussed how disproportionate the UK election results are - where the proportion of votes do not really match the proportion of seats.

I've discussed how this result can be in part be related to three related phenomena:

Tactical voting

Split voting

Wasted votes

I've also talked about how these all violate certain principles of an optimal voting, and that it is in large party due to the disparity between an individuals' preference and their actual choice of vote.

I've also talked about how this can in part be solved by restricting information on who other people are voting for - to ensure that people are forced to vote in accordance to their preference. But that fundamentally, first past the post (and all voting systems) will have vulnerabilities that lead to such skewed results. That's frustrating!

I think my overall conclusion is that I don't really feel like our current system is very democratic. My personal choce was actually to spoil my vote as I wasn't a fan of any of the parties. But it does feel a bit weird that Labour is able to get a super majority, landslide victory while getting less than 10 million of the 48 million available. I'm not sure there's ever going to be anything we can do to avoid split voting in first past the post elections, but there's definitely something distasteful to me about tactical voting.

Saturday, 24 August 2024

Setting up a bot network

Coordinated Inauthentic Behavior (CIB) refers to the use of multiple fake or deceptive accounts across social media platforms to manipulate public discourse, influence public opinion, or achieve specific objectives. These accounts often work together in a coordinated manner, using deceptive tactics to create the illusion of genuine activity. To maintain this illusion, bad actors are increasingly turning to generative AI to automate the spread of their agenda, for example developing articles that pump out fake news or manipulative information, or to maintain personas to make their bots more legitimate.

As generative AI improves, there is a concern that it is becoming increasingly cheap for a bot to be convincing. And when bad actors can set up thousands of bots, each spreading their own message that is subtly aligned to a specific agenda - the detection of such networks can be increasingly difficult.

They say to ‘know thy enemy’, and for a self-identified good actor, that means properly understanding how these networks are created and maintained. I suspect that generative Ai has reduced the barrier to entry for bad actors spreading CIB, but I want to understand how low this barrier is, and what we might do to increase it.

I’m primarily interested in two questions - how difficult is it to create a bot network - and how much does it cost?

I think the immediate response is “it depends on how sophisticated the bot network is”. So I’m going to explore these questions through the lens of a few case studies that I have identified that involve increasingly sophisticated use of bots and generative AI.

Before we begin, it’s worth saying that I have some basic experience in Python and using APIs, but I’m certainly no expert in AI, software engineering or bot networks. As you’ll see from my findings - this should actually emphasise my point regarding low barriers to entry. If even I’m able to pull together the relevant pieces of such a network - then I can’t imagine what a bad actor can do.

Section 1: Defining the case studies

Case study 1: How do I set up a bot?

I imagine that most of us (including myself) only have an abstract conception of a bot, and have no idea how to actually create one. So my first case study will just articulate the process of manually setting up a bot. This will help inform later discussions around scaling up to dealing larger networks of bots.

Case study 2: #LondonVoterFraud

See this article: https://www.bbc.co.uk/news/uk-england-london-68923015

#LondonVoterFraud was a bot network of around 300 bots, that aimed (and succeeded) in making #LondonVoterFraud trending on X. It was likely designed to spread disinformation during the UK local elections.

The #LondonVoterFraud bot network was run as follows: each bot only posts a single tweet using the following format:

#LondonVoterFraud

What’s interesting about this is that it’s actually quite basic - most of the content is meaningless and each bot only posts once. However, this is also an advantage to evade detection - as each bot posts only once, it is difficult to prove that a given account is a bot, nor that the different accounts are connected.

Case Study 3: Crypto bot network

See this article: https://mashable.com/article/twitter-x-crypto-botnet-chatgpt-research

This bot network comprises around 1000 bots spamming links to various ‘news’ websites, that just repost content from legitimate sites. The vast majority of posts by the bots in this network were related to cryptocurrency. The bots would also reply or retweet popular crypto users. The network used ChatGPT to create their posts, which was also the reason they eventually got caught.

Case Study 4: CopyCop

See article: https://go.recordedfuture.com/hubfs/reports/cta-2024-0509.pdf

CopyCop is a (likely) Russian-operated influence network using large language models (LLMs) to produce and disseminate political content across inauthentic U.S., U.K., and French media outlets.

It is difficult to identify the full details or even make informed guesses of how this operation was conducted. So my main focus here is just to highlight how much impact this can cause in the case of a motivated and coordinated operator.

I’d also highlight two main features to highlight is how prolific their content production is, as well as their use of ‘bot personas’ to evade detection.

Section 2: analysing the use cases

Case study 1: The cost of setting up a bot

I’m going to go into the full details of this, as it’s both legal and the information is very much openly available.

First to articulate some assumptions:

1 single bot on X
Making one post, programmatically.
We don’t care about being caught
We’ll just be running this on our laptop / desktop

As we’re only looking at making a single bot, we can do this manually (which is in line with X terms of service).

Making an X account

There are many guides on this online for the code. I’m focussing here on the requirements and limitations, as these will be very important later on.

To make an X account we typically need:

Name
Phone
Email
Date of birth
CAPTCHA / Anti-AI test

Technically, you may only need an email address to make an X account. However since we’re looking to make a bot, we’ll need a developer account - of which a phone number may be required. Let’s assume for now you have both.

Now, your name and date of birth can be fully made up.

Phone number: we assume for level 1 that you already have a phone number for use.

Email: we assume in this case that you already have an email. Creating one manually however is fairly simple (around 5 minutes), although it may itself require an email address.

We then need to sign up for an X developer account.

I’ve currently signed up for a free version, which allows me to make about 1500 posts per month:

From here, we need to create an application and generate an API key and secret.

We may need to change our user authentication settings, to ensure that this API key has write permissions.

We’re nearly done, now we just need to write a script for this bot, allowing us to make posts by making requests to the X API.

There are many examples of how to do this online, from a preliminary search, see: https://thepythoncode.com/article/make-a-twitter-bot-in-python

https://medium.com/@Wardu/creating-a-twitter-bot-api-v2-bff2235f2d5a

If you need additional help, you can also ask ChatGPT. In fact, this is what I did the first time I made a bot.

If you want to minimise cost, you can also run this on something like a google Colab for free python and computation access.

Overall:

Cost: Largely free (requires a computer of some sort)
Time: ~30 minutes
Skill requirements:

Basic programming
Basic understanding of APIs

2) The cost of running something like #LondonVoterFraud

Now things are getting a bit complicated. Here are our assumptions:

300 X bots each making 1 post.
A list of 300 motivational quotes that we can use as the posts for our X bots
An hour of computation sufficient to send out all the posts at once
IP address manipulation so that we’re not immediately caught

Let’s start with how to navigate each of these one at a time.

300 X bots.

It’s worth noting straight up that we don’t actually need 300 developer accounts. We only need 1 developer account which will be used to create an application, and 299 normal accounts. The rest will just need to authenticate with my application so that I can post tweets through it. It may be worth having a few more developer accounts however to make this harder to detect.

We’re dealing with such a large scale of bot accounts that it’s not recommended to create them manually. It took me between 15 minutes setting up each of these accounts, and manually capturing the details of the login details, API keys is time consuming. What’s more, after making around 6 accounts from the same IP, X is probably going to block you.

It is in principle possible to set this up manually, but we’re looking towards 100 hours of time, as well as needing IP changing technology. Instead, it is recommended to do this using automated methods.

The ‘easy’ way:

By far the easiest option is just to purchase the accounts outright. Note, that this is against X terms of service, so we’re moving into what is known as the ‘black hat’ territory.

An example is here: https://accsmarket.com/

Note, that the cost of each account will differ based on different factors.

Here is an example:

https://accsmarket.com/en/item/twitter-accounts--the-account-has-about-50-followers-email-is-in-set-may-require-confirmation-by-sms-sex-can-be-both-male-and-female-the-account-profiles-may-be-empty-or-have-limited-entries-such-as-p-14404

The account has about 50 followers.
Email is in set(may require confirmation by SMS).
Sex can be both male and female.
The account profiles may be empty or have limited entries such as photos and other information.
Token are included.

Cost is around $0.37 per account, which totals around $100 for 300 accounts.

Basically - the more ‘human’ the account, the harder it will be to detect - and the more useful it will be.

The ‘hard’ way

So we’re going to need to find an automatic way of creating these accounts. Previously it was mentioned that need:

Name
Phone
Email
Date of birth
CAPTCHA / Anti-AI test

Again, name and date of birth are arbitrary - use something like chatgpt if needed to create a list.

Note that we may not need both - email is likely to be much easier. However X will occasionally ask accounts to be verified, which may require a phone number.

For now, let’s look at the case where you might need 300 phone numbers and 300 emails.

Phone numbers:

Now, we only need the phone numbers temporarily - to make the relevant accounts. We can potentially get away with using phone number services. Note, that we probably do need a permanent phone number and can’t just use basic verification.

The easiest option may be to use a burner phone number - for example something like Burner or Grasshopper. These are however ‘long term solutions’ and can be costly, for example Burner is $5 a month. Looking online it looks like you can buy phone numbers for about £1 a month.

Emails / Captcha / account set-up

I’m putting these all together as now we’re essentially defining a workflow that needs to be automated via script.

Assume now that we have all the details required to make the relevant accounts at hand. Our script will need to do the following:

Find a way to interact with my web-browser programmatically

My script needs to be able to process the various web-forms required to make an account. E.g. it needs to determine where to input the date or name or email. It then needs to be able to click confirm to make an account
An example library for this is Selenium

Find a way to deal with Captcha / Anti AI methods

There are two ways of doing this, one is human and the other is AI / automated.

Human method: just have a human around to solve all the Captchas. This can be somewhat slow, but a human can potentially solve a captcha in less than a minute. If we’re only looking at 300 bots then this could be done in about 5 hours. It is still considerably faster than doing the whole thing manually.

AI method: There are many technical libraries that can be used. I tested this using chatgpt plus:

Essentially, I asked ChatGPT to look at these two images and tell me which one was correct. Once I’ve identified the correct one, I’m free to submit.

Note that this is a fairly complex example, and I would need to rate up to 5 times to validate each one.

This would require setting up a link with ChatGPT 4.0 or some other LLM.

Proxy

Since we’re operating at a large scale, we should start looking at ways that X might try and catch us. One way is to look at our IP address - it’s clearly suspicious if the same IP is making a large number of posts from a hundreds of accounts.

To avoid this, we should look at frequently changing our IP address to confuse the social media platform from where we are posting from. This is known as using a proxy.

One question to consider is - how often do we want to change our IP? Should each account have its only unique IP (300 proxies?) or can I just have a proxy that rotates my IP address after every post?

The price of a proxy depends in part on whether it is static (uses the same IP address) or rotating (allows you to switch IP addresses).

Some estimates might be around $1-2 per proxy: https://www.blackhatworld.com/seo/static-residential-proxies-tier-1-isps-ultra-fast-private-proxies-only-unlimited-bandwidth.1411857/

For rotating proxies, pricing is often based on mobile data. We’re provisionally planning to use send API requests through a proxy to make a post on X, which uses less than 1kb of data. If we’re only looking to make around 300 requests (1 post per account), that’s only 300kb..

A source like: https://www.pyproxy.com/?utm-source=bingtg&utm-keyword=?bingde364&msclkid=9a97e5c1295e1a9b2a1c8350908ee651

Suggests pricing around $0.77 GB. So something like this could be genuinely quite negligible in cost.

Setting up our #LondonVoterFraud network:

I’ll articulate the approximate structure for doing this.

First we note that we have code for making a single tweet through an API, using a developer account.

Suppose now that we have 300 individual X accounts, and we want to make a tweet through each. For the #LondonVoterFraud case, this was done by generating a list of inspirational quotes. This can probably be created manually through browsing the internet, or one can ask an LLM to do it for you.

Store account details for each of the 300 accounts in the form of a list that we can iterate through.
Make an app with our developer account (as in the first case)
For each individual account

Authorise our app to make posts on behalf of this account, using the OAuth process. This will generate account keys for each account
Use the application to post a tweet

Overall:

Cost: ~$100

Account creation

~ $100 to purchase 300 X accounts
OR, upwards of a $300 to purchase the relevant details to create 300 X accounts. Maybe $20 for a subscription to ChatGPT4 to help solve captchas.

Proxies

Negligible if we’re using a rotating proxy and making only one post per account
Otherwise, upwards of $600 if we’re looking at static proxies for each account

Running the script

As the accounts are run one-off, the computation costs are still relatively negligible.
As we’re doing this one-off we can probably get away with using free credits from AWS, GoogleCloud, Heroku ect.

1000 bots posting routine genAI content about cryptocurrencies

Academic paper: http://arxiv.org/pdf/2307.16336

In some ways, this is fairly similar to our second example, except now we’re making mass posts and having to use genAI to facilitate our content. Here are our examples:

1000 bots each making 5 posts per day.
Each post is created using chatgpt, of around 150 characters, or approximately 40 tokens.
The bots are running for 3 months on a server that needs to be constantly on.
The bots need to be difficult to detect, both in terms of their IP addresses as well as the accounts themselves.

ChatGPT costs

The process from 300 accounts to 1000 accounts is similar. The scripts to manage these accounts will be relatively similar as well.

Let’s suppose we just bought the 1000 accounts - then we’re looking at around $300 one off. We need to buy more accounts in the future if we get caught though!

We're looking at around 5000 posts per day. Let’s suppose they’re using ChatGPT:

Now each post will need a prompt and will return an output.

As we’re expecting each tweet to be max 280 characters, let’s assume the average post is around 150 characters. This is equivalent to around 40 tokens.

Prompts can be quite long, but here is a suggested prompt (123 characters, ~30 tokens)

“Write a short tweet describing a recent change in the cryptomarket, encouraging users to engage and join a new initiative”

For those curious, this is what I got back: "Exciting news in the crypto world! 🚀 Join us as we embrace a recent market shift and launch a new initiative aimed at revolutionizing digital currency. Don't miss out—let's shape the future together! #CryptoInnovation #JoinTheMovement"

There will need to be some prompt engineering pipelines to ensure we get original content and are able to advertise actual initiatives (e.g. links).

Now let’s look at ChatGPT pricing:

Model	Cost
GPT 3.5 input	$0.5 per 1m tokens
GPT 3.5 output	$1.5 per 1m tokens
GPT 4 input	$5 per 1m tokens
GPT 4 output	$15 per 1m tokens

So at 5000 posts per day using GPT3.5, we’re looking at around 150,000 input tokens and 200,000 output tokens per day. That’s about $0.075 + $0.3 = $0.375 per day.

If we use GPT4, then we’re looking at $3.75 per day.

Note, ChatGPT also offer an option to do ‘batch processing’, which means the posts are calculated in advance. This provides a 50% discount.

Assuming we batch process all our posts using GPT3.5, we’re looking at spending as little as $68 per year ($5.6 per month) in LLM costs.

Computation:

So we now need to be running a long-term server that will manage these accounts and make posts each day. The server costs should still be relatively light as this processing does not require much computation. For now, we’ll be using public cloud services to do this processing.

Here is an initial estimate using AWS:

Compute instance - t3~medium (~$31.2 per month)

Load balancer / scaling: potentially free

Database (for storing Chatgpt responses): potentially free, otherwise we can probably make do with a small relational database of around ~15 per month.

We realistically will want to have multiple EC2 instances, so let’s say this totals up at around $100 a month.

Proxies

We’re making a lot more posts now, and we’ll still need to rotate our proxies. There are often limits on how frequently a proxy can rotate (e.g. basic ones can swap IPs once a minute)

https://www.illusory.io/pricing

We could for example purchase 1000 static proxies, at around $2000 a month. Or we could potentially purchase a number of rotating proxies.

Even though we’re posting 5 times per day, it might be suspicious if they’re posting once every 4.8 hours. Let’s assume each account posts somewhat randomly in 12 hours ‘day-period’.

This means they need to post on average every 144 minutes. If our proxy can rotate every minute then it can make 144 posts in 144 minutes.

Now we have 1000 bots, so we’re looking at around 1000 posts per 144 minutes, meaning at minimum we’d need about 8 of these standard proxies.

The pricing here: https://www.illusory.io/pricing suggests a cost of $5 per hour per proxy.

Suppose then we’re running 8 proxies per hour for 12 hours a day. Then we’re looking at 96*$5 = $480 a day.

Alternatively, we could look at prices based on data sent, we previously said around $0.77 per GB.

If we assume each post is 1kb of data, and we have 5000 posts per day then we’re looking at about 5mb of data, which is a negligible cost.

Note that this changes quite significantly if I’m suddenly posting images. Small images are around 100kb, while larger images can be 1mb or more.

If in each post we attached a small image, then we’re using 101kb per post, or around 0.5GB per day, which is still pretty negligible in cost.

API Costs

Something we have not needed to consider until now is the cost of making posts on X. X provides a free tier, allowing around 1500 posts to be made per month. This was sufficient for something like #LondonVoterFraud which would only look at one post per account, but it is insufficient when we need to make 150,000 posts per month.

Further details can be found here: https://developer.x.com/en/products/twitter-api

Overall, we’re likely looking at something between multiple Basic accounts or 1 pro account.

The cost of a basic account is around $100 a month and allows 50k posts per app. We’d probably get 2-3 of these to ensure we can make sufficient posts. This would cost around $200-300 a month. A single pro account would help keep things connected, but would cost $500 a month.

Measures to avoid getting caught

A few to consider. First we might be better off buying ‘high quality’ accounts that seem to be nothing to with crypto and appear very much ‘human’.

We might want to give our accounts a bit of personality, so we might for example want to generate some interesting names. We could make a few API requests that have nothing to do with crypto, for example we could try following famous celebrities.

We might want to be a bit clever with how we make our posts, for example posting them less frequently.

Overall costs: ~ $350 one off-cost for accounts + $150 a month in maintenance + $300 in monthly API costs.

Running ‘Copycop’

The level of sophistication in ‘Copycop’ makes it hard to determine the likely costs, especially as I’d expect most of the services to be running using custom scripts and tools.

However, I can speculate a few differences in comparison to our third case of the crypto bots.

Running their own LLM

Presumably, copycop is not just running its own instance of ChatGPT. Instead, they may want to create their own LLM or tweak an open source one like Llama3.

This will free up token costs, but they will instead need to have their own infrastructure (including GPUs) to be able to keep their content machine running.

Creation of a ‘personas’ function.

I am unsure to what extent this is available ‘off the shelf’.

Using multiple sources to post content

They aren’t just using twitter. They’re now hosting their own websites and pumping out content to it.

https://www.bbc.co.uk/news/uk-england-london-68923015

https://mashable.com/article/twitter-x-crypto-botnet-chatgpt-research

https://go.recordedfuture.com/hubfs/reports/cta-2024-0509.pdf

Subscribe to: Comments ( Atom )

Latex

Monday, 26 August 2024

About the recent elections

Saturday, 24 August 2024

Setting up a bot network