Coordinated Inauthentic Behavior (CIB) refers to the use of multiple fake or deceptive accounts across social media platforms to manipulate public discourse, influence public opinion, or achieve specific objectives. These accounts often work together in a coordinated manner, using deceptive tactics to create the illusion of genuine activity. To maintain this illusion, bad actors are increasingly turning to generative AI to automate the spread of their agenda, for example developing articles that pump out fake news or manipulative information, or to maintain personas to make their bots more legitimate.
As generative AI improves, there is a concern that it is becoming increasingly cheap for a bot to be convincing. And when bad actors can set up thousands of bots, each spreading their own message that is subtly aligned to a specific agenda - the detection of such networks can be increasingly difficult.
They say to ‘know thy enemy’, and for a self-identified good actor, that means properly understanding how these networks are created and maintained. I suspect that generative Ai has reduced the barrier to entry for bad actors spreading CIB, but I want to understand how low this barrier is, and what we might do to increase it.
I’m primarily interested in two questions - how difficult is it to create a bot network - and how much does it cost?
I think the immediate response is “it depends on how sophisticated the bot network is”. So I’m going to explore these questions through the lens of a few case studies that I have identified that involve increasingly sophisticated use of bots and generative AI.
Before we begin, it’s worth saying that I have some basic experience in Python and using APIs, but I’m certainly no expert in AI, software engineering or bot networks. As you’ll see from my findings - this should actually emphasise my point regarding low barriers to entry. If even I’m able to pull together the relevant pieces of such a network - then I can’t imagine what a bad actor can do.
Section 1: Defining the case studies
Case study 1: How do I set up a bot?
I imagine that most of us (including myself) only have an abstract conception of a bot, and have no idea how to actually create one. So my first case study will just articulate the process of manually setting up a bot. This will help inform later discussions around scaling up to dealing larger networks of bots.
Case study 2: #LondonVoterFraud
See this article: https://www.bbc.co.uk/news/uk-england-london-68923015
#LondonVoterFraud was a bot network of around 300 bots, that aimed (and succeeded) in making #LondonVoterFraud trending on X. It was likely designed to spread disinformation during the UK local elections.
The #LondonVoterFraud bot network was run as follows: each bot only posts a single tweet using the following format:
<motivational quote>
#LondonVoterFraud
What’s interesting about this is that it’s actually quite basic - most of the content is meaningless and each bot only posts once. However, this is also an advantage to evade detection - as each bot posts only once, it is difficult to prove that a given account is a bot, nor that the different accounts are connected.
Case Study 3: Crypto bot network
See this article: https://mashable.com/article/twitter-x-crypto-botnet-chatgpt-research
This bot network comprises around 1000 bots spamming links to various ‘news’ websites, that just repost content from legitimate sites. The vast majority of posts by the bots in this network were related to cryptocurrency. The bots would also reply or retweet popular crypto users. The network used ChatGPT to create their posts, which was also the reason they eventually got caught.
Case Study 4: CopyCop
See article: https://go.recordedfuture.com/hubfs/reports/cta-2024-0509.pdf
CopyCop is a (likely) Russian-operated influence network using large language models (LLMs) to produce and disseminate political content across inauthentic U.S., U.K., and French media outlets.
It is difficult to identify the full details or even make informed guesses of how this operation was conducted. So my main focus here is just to highlight how much impact this can cause in the case of a motivated and coordinated operator.
I’d also highlight two main features to highlight is how prolific their content production is, as well as their use of ‘bot personas’ to evade detection.
Section 2: analysing the use cases
Case study 1: The cost of setting up a bot
I’m going to go into the full details of this, as it’s both legal and the information is very much openly available.
First to articulate some assumptions:
- 1 single bot on X
- Making one post, programmatically.
- We don’t care about being caught
- We’ll just be running this on our laptop / desktop
As we’re only looking at making a single bot, we can do this manually (which is in line with X terms of service).
- Making an X account
There are many guides on this online for the code. I’m focussing here on the requirements and limitations, as these will be very important later on.
To make an X account we typically need:
- Name
- Phone
- Date of birth
- CAPTCHA / Anti-AI test
Technically, you may only need an email address to make an X account. However since we’re looking to make a bot, we’ll need a developer account - of which a phone number may be required. Let’s assume for now you have both.
Now, your name and date of birth can be fully made up.
Phone number: we assume for level 1 that you already have a phone number for use.
Email: we assume in this case that you already have an email. Creating one manually however is fairly simple (around 5 minutes), although it may itself require an email address.
We then need to sign up for an X developer account.
I’ve currently signed up for a free version, which allows me to make about 1500 posts per month:
From here, we need to create an application and generate an API key and secret.
We may need to change our user authentication settings, to ensure that this API key has write permissions.
We’re nearly done, now we just need to write a script for this bot, allowing us to make posts by making requests to the X API.
There are many examples of how to do this online, from a preliminary search, see: https://thepythoncode.com/article/make-a-twitter-bot-in-python
https://medium.com/@Wardu/creating-a-twitter-bot-api-v2-bff2235f2d5a
If you need additional help, you can also ask ChatGPT. In fact, this is what I did the first time I made a bot.
If you want to minimise cost, you can also run this on something like a google Colab for free python and computation access.
Overall:
- Cost: Largely free (requires a computer of some sort)
- Time: ~30 minutes
- Skill requirements:
- Basic programming
- Basic understanding of APIs
2) The cost of running something like #LondonVoterFraud
Now things are getting a bit complicated. Here are our assumptions:
- 300 X bots each making 1 post.
- A list of 300 motivational quotes that we can use as the posts for our X bots
- An hour of computation sufficient to send out all the posts at once
- IP address manipulation so that we’re not immediately caught
Let’s start with how to navigate each of these one at a time.
300 X bots.
It’s worth noting straight up that we don’t actually need 300 developer accounts. We only need 1 developer account which will be used to create an application, and 299 normal accounts. The rest will just need to authenticate with my application so that I can post tweets through it. It may be worth having a few more developer accounts however to make this harder to detect.
We’re dealing with such a large scale of bot accounts that it’s not recommended to create them manually. It took me between 15 minutes setting up each of these accounts, and manually capturing the details of the login details, API keys is time consuming. What’s more, after making around 6 accounts from the same IP, X is probably going to block you.
It is in principle possible to set this up manually, but we’re looking towards 100 hours of time, as well as needing IP changing technology. Instead, it is recommended to do this using automated methods.
The ‘easy’ way:
By far the easiest option is just to purchase the accounts outright. Note, that this is against X terms of service, so we’re moving into what is known as the ‘black hat’ territory.
An example is here: https://accsmarket.com/
Note, that the cost of each account will differ based on different factors.
Here is an example:
https://accsmarket.com/en/item/twitter-accounts--the-account-has-about-50-followers-email-is-in-set-may-require-confirmation-by-sms-sex-can-be-both-male-and-female-the-account-profiles-may-be-empty-or-have-limited-entries-such-as-p-14404
- The account has about 50 followers.
- Email is in set(may require confirmation by SMS).
- Sex can be both male and female.
- The account profiles may be empty or have limited entries such as photos and other information.
- Token are included.
Cost is around $0.37 per account, which totals around $100 for 300 accounts.
Basically - the more ‘human’ the account, the harder it will be to detect - and the more useful it will be.
The ‘hard’ way
So we’re going to need to find an automatic way of creating these accounts. Previously it was mentioned that need:
- Name
- Phone
- Date of birth
- CAPTCHA / Anti-AI test
Again, name and date of birth are arbitrary - use something like chatgpt if needed to create a list.
Note that we may not need both - email is likely to be much easier. However X will occasionally ask accounts to be verified, which may require a phone number.
For now, let’s look at the case where you might need 300 phone numbers and 300 emails.
Phone numbers:
Now, we only need the phone numbers temporarily - to make the relevant accounts. We can potentially get away with using phone number services. Note, that we probably do need a permanent phone number and can’t just use basic verification.
The easiest option may be to use a burner phone number - for example something like Burner or Grasshopper. These are however ‘long term solutions’ and can be costly, for example Burner is $5 a month. Looking online it looks like you can buy phone numbers for about £1 a month.
Emails / Captcha / account set-up
I’m putting these all together as now we’re essentially defining a workflow that needs to be automated via script.
Assume now that we have all the details required to make the relevant accounts at hand. Our script will need to do the following:
- Find a way to interact with my web-browser programmatically
- My script needs to be able to process the various web-forms required to make an account. E.g. it needs to determine where to input the date or name or email. It then needs to be able to click confirm to make an account
- An example library for this is Selenium
- Find a way to deal with Captcha / Anti AI methods
There are two ways of doing this, one is human and the other is AI / automated.
Human method: just have a human around to solve all the Captchas. This can be somewhat slow, but a human can potentially solve a captcha in less than a minute. If we’re only looking at 300 bots then this could be done in about 5 hours. It is still considerably faster than doing the whole thing manually.
AI method: There are many technical libraries that can be used. I tested this using chatgpt plus:
Essentially, I asked ChatGPT to look at these two images and tell me which one was correct. Once I’ve identified the correct one, I’m free to submit.
Note that this is a fairly complex example, and I would need to rate up to 5 times to validate each one.
This would require setting up a link with ChatGPT 4.0 or some other LLM.
Proxy
Since we’re operating at a large scale, we should start looking at ways that X might try and catch us. One way is to look at our IP address - it’s clearly suspicious if the same IP is making a large number of posts from a hundreds of accounts.
To avoid this, we should look at frequently changing our IP address to confuse the social media platform from where we are posting from. This is known as using a proxy.
One question to consider is - how often do we want to change our IP? Should each account have its only unique IP (300 proxies?) or can I just have a proxy that rotates my IP address after every post?
The price of a proxy depends in part on whether it is static (uses the same IP address) or rotating (allows you to switch IP addresses).
Some estimates might be around $1-2 per proxy: https://www.blackhatworld.com/seo/static-residential-proxies-tier-1-isps-ultra-fast-private-proxies-only-unlimited-bandwidth.1411857/
For rotating proxies, pricing is often based on mobile data. We’re provisionally planning to use send API requests through a proxy to make a post on X, which uses less than 1kb of data. If we’re only looking to make around 300 requests (1 post per account), that’s only 300kb..
A source like: https://www.pyproxy.com/?utm-source=bingtg&utm-keyword=?bingde364&msclkid=9a97e5c1295e1a9b2a1c8350908ee651
Suggests pricing around $0.77 GB. So something like this could be genuinely quite negligible in cost.
Setting up our #LondonVoterFraud network:
I’ll articulate the approximate structure for doing this.
First we note that we have code for making a single tweet through an API, using a developer account.
Suppose now that we have 300 individual X accounts, and we want to make a tweet through each. For the #LondonVoterFraud case, this was done by generating a list of inspirational quotes. This can probably be created manually through browsing the internet, or one can ask an LLM to do it for you.
- Store account details for each of the 300 accounts in the form of a list that we can iterate through.
- Make an app with our developer account (as in the first case)
- For each individual account
- Authorise our app to make posts on behalf of this account, using the OAuth process. This will generate account keys for each account
- Use the application to post a tweet
Overall:
- Cost: ~$100
- Account creation
- ~ $100 to purchase 300 X accounts
- OR, upwards of a $300 to purchase the relevant details to create 300 X accounts. Maybe $20 for a subscription to ChatGPT4 to help solve captchas.
- Proxies
- Negligible if we’re using a rotating proxy and making only one post per account
- Otherwise, upwards of $600 if we’re looking at static proxies for each account
- Running the script
- As the accounts are run one-off, the computation costs are still relatively negligible.
- As we’re doing this one-off we can probably get away with using free credits from AWS, GoogleCloud, Heroku ect.
- 1000 bots posting routine genAI content about cryptocurrencies
Academic paper: http://arxiv.org/pdf/2307.16336
In some ways, this is fairly similar to our second example, except now we’re making mass posts and having to use genAI to facilitate our content. Here are our examples:
- 1000 bots each making 5 posts per day.
- Each post is created using chatgpt, of around 150 characters, or approximately 40 tokens.
- The bots are running for 3 months on a server that needs to be constantly on.
- The bots need to be difficult to detect, both in terms of their IP addresses as well as the accounts themselves.
ChatGPT costs
The process from 300 accounts to 1000 accounts is similar. The scripts to manage these accounts will be relatively similar as well.
Let’s suppose we just bought the 1000 accounts - then we’re looking at around $300 one off. We need to buy more accounts in the future if we get caught though!
We're looking at around 5000 posts per day. Let’s suppose they’re using ChatGPT:
Now each post will need a prompt and will return an output.
As we’re expecting each tweet to be max 280 characters, let’s assume the average post is around 150 characters. This is equivalent to around 40 tokens.
Prompts can be quite long, but here is a suggested prompt (123 characters, ~30 tokens)
“Write a short tweet describing a recent change in the cryptomarket, encouraging users to engage and join a new initiative”
For those curious, this is what I got back: "Exciting news in the crypto world! 🚀 Join us as we embrace a recent market shift and launch a new initiative aimed at revolutionizing digital currency. Don't miss out—let's shape the future together! #CryptoInnovation #JoinTheMovement"
There will need to be some prompt engineering pipelines to ensure we get original content and are able to advertise actual initiatives (e.g. links).
Now let’s look at ChatGPT pricing:
Model | Cost |
GPT 3.5 input | $0.5 per 1m tokens |
GPT 3.5 output | $1.5 per 1m tokens |
GPT 4 input | $5 per 1m tokens |
GPT 4 output | $15 per 1m tokens |
So at 5000 posts per day using GPT3.5, we’re looking at around 150,000 input tokens and 200,000 output tokens per day. That’s about $0.075 + $0.3 = $0.375 per day.
If we use GPT4, then we’re looking at $3.75 per day.
Note, ChatGPT also offer an option to do ‘batch processing’, which means the posts are calculated in advance. This provides a 50% discount.
Assuming we batch process all our posts using GPT3.5, we’re looking at spending as little as $68 per year ($5.6 per month) in LLM costs.
Computation:
So we now need to be running a long-term server that will manage these accounts and make posts each day. The server costs should still be relatively light as this processing does not require much computation. For now, we’ll be using public cloud services to do this processing.
Here is an initial estimate using AWS:
Compute instance - t3~medium (~$31.2 per month)
Load balancer / scaling: potentially free
Database (for storing Chatgpt responses): potentially free, otherwise we can probably make do with a small relational database of around ~15 per month.
We realistically will want to have multiple EC2 instances, so let’s say this totals up at around $100 a month.
Proxies
We’re making a lot more posts now, and we’ll still need to rotate our proxies. There are often limits on how frequently a proxy can rotate (e.g. basic ones can swap IPs once a minute)
https://www.illusory.io/pricing
We could for example purchase 1000 static proxies, at around $2000 a month. Or we could potentially purchase a number of rotating proxies.
Even though we’re posting 5 times per day, it might be suspicious if they’re posting once every 4.8 hours. Let’s assume each account posts somewhat randomly in 12 hours ‘day-period’.
This means they need to post on average every 144 minutes. If our proxy can rotate every minute then it can make 144 posts in 144 minutes.
Now we have 1000 bots, so we’re looking at around 1000 posts per 144 minutes, meaning at minimum we’d need about 8 of these standard proxies.
The pricing here: https://www.illusory.io/pricing suggests a cost of $5 per hour per proxy.
Suppose then we’re running 8 proxies per hour for 12 hours a day. Then we’re looking at 96*$5 = $480 a day.
Alternatively, we could look at prices based on data sent, we previously said around $0.77 per GB.
If we assume each post is 1kb of data, and we have 5000 posts per day then we’re looking at about 5mb of data, which is a negligible cost.
Note that this changes quite significantly if I’m suddenly posting images. Small images are around 100kb, while larger images can be 1mb or more.
If in each post we attached a small image, then we’re using 101kb per post, or around 0.5GB per day, which is still pretty negligible in cost.
API Costs
Something we have not needed to consider until now is the cost of making posts on X. X provides a free tier, allowing around 1500 posts to be made per month. This was sufficient for something like #LondonVoterFraud which would only look at one post per account, but it is insufficient when we need to make 150,000 posts per month.
Further details can be found here: https://developer.x.com/en/products/twitter-api
Overall, we’re likely looking at something between multiple Basic accounts or 1 pro account.
The cost of a basic account is around $100 a month and allows 50k posts per app. We’d probably get 2-3 of these to ensure we can make sufficient posts. This would cost around $200-300 a month. A single pro account would help keep things connected, but would cost $500 a month.
Measures to avoid getting caught
A few to consider. First we might be better off buying ‘high quality’ accounts that seem to be nothing to with crypto and appear very much ‘human’.
We might want to give our accounts a bit of personality, so we might for example want to generate some interesting names. We could make a few API requests that have nothing to do with crypto, for example we could try following famous celebrities.
We might want to be a bit clever with how we make our posts, for example posting them less frequently.
Overall costs: ~ $350 one off-cost for accounts + $150 a month in maintenance + $300 in monthly API costs.
- Running ‘Copycop’
The level of sophistication in ‘Copycop’ makes it hard to determine the likely costs, especially as I’d expect most of the services to be running using custom scripts and tools.
However, I can speculate a few differences in comparison to our third case of the crypto bots.
- Running their own LLM
Presumably, copycop is not just running its own instance of ChatGPT. Instead, they may want to create their own LLM or tweak an open source one like Llama3.
This will free up token costs, but they will instead need to have their own infrastructure (including GPUs) to be able to keep their content machine running.
- Creation of a ‘personas’ function.
I am unsure to what extent this is available ‘off the shelf’.
- Using multiple sources to post content
They aren’t just using twitter. They’re now hosting their own websites and pumping out content to it.
https://www.bbc.co.uk/news/uk-england-london-68923015
https://mashable.com/article/twitter-x-crypto-botnet-chatgpt-research
https://go.recordedfuture.com/hubfs/reports/cta-2024-0509.pdf
No comments :
Post a Comment