With Marion Louveaux, we decided that we had to build a Twitter bot for our preferred hashtag. We explored different possibilities but truth is I couldn’t resist building it using R and {rtweet}. Here are the steps I used to set up a Twitter bot on my Raspberry Pi.
Create a twitter bot
We will create a twitter bot that retweets tweets containing #rspatial
: https://twitter.com/talk_rspatial
In this aim, we need:
- A script to retrieve tweets with
#rspatial
- A script to retweet while respecting Twitter API use
- A server that will regularly execute the scripts
I use package {rtweet} to communicate with Twitter. I use my personal Raspberry Pi with a CRON to execute R scripts. Scripts created are available as functions in package {tweetrbot}
Note that the procedure is detailed in the next paragraph, to present the code, but you can directly jump to the following paragraph “Procedure with package {tweetrbot} and a Raspi” if you want less details.
Detailed procedure and code
Creating your Twitter token
- I recommend to use a specific mail address for this bot, in case Twitter has something to tell you.
- You need to create a specific Twitter account for this bot on Twitter.
- Read vignette of {rtweet} to create your tokens: https://rtweet.info/articles/auth.html
Then you will run this kind of code to properly save your tokens.
## authenticate via access token
token <- rtweet::create_token(
app = "my_twitter_research_app",
consumer_key = "zzz",
consumer_secret = "zzeee",
access_token = "1234-zzzzz",
access_secret = "zzzzaaaaa")
Rules of the Twitter API use may be found here https://developer.twitter.com/en/docs/basics/rate-limiting and here https://developer.twitter.com/en/docs/basics/rate-limits.html . Some selected information:
- POST: The 300 per 3 hours is with the POST statuses/update and POST statuses/retweet/:id endpoints is a combined limit. You can only post 300 Tweets or Retweets during a 3 hour period.
- GET: All request windows are 15 minutes in length. Endpoint
sGET earch/tweets
: Resource familysearch
: Requests / window (user auth)180
, Requests / window (app auth)450
Also, be sure to respect Twitter automation rules: https://help.twitter.com/fr/rules-and-policies/twitter-automation
In this specific case:
Automated Retweets: Provided you comply with all other rules, you may Retweet or Quote Tweet in an automated manner for entertainment, informational, or novelty purposes. Automated Retweets often lead to negative user experiences, and bulk, aggressive, or spammy Retweeting is a violation of the Twitter Rules.
Retrieve tweets and store locally
The first function is used to retrieve tweets from Twitter and store them on the server. It is the code of function get_and_store()
in package {tweetrbot}.
- For each iteration of the CRON, we download the last 20 tweets with
#rspatial
. - We create two database:
- A small one to keep the last tweets to be sure we do not retweet already tweeted ones:
to_tweet_rspatial.rds
- A big one that will store all tweets ever retrieved, for future analyses:
complete_tweets_rspatial.rds
- A small one to keep the last tweets to be sure we do not retweet already tweeted ones:
- We store the console output of the last CRON in a log file, just in case
# For logs
sink(file = "rtweet_console.log", append = FALSE)
# Number of tweets to retrieve
n_tweets <- 20
# Retrieve tweets for one hashtag
cat("Retrieve tweets\n") # for log
new_tweets <- rtweet::search_tweets(
"#rspatial", n = n_tweets, include_rts = FALSE
) %>%
mutate(
retweet_order = NA_real_,
bot_retweet = FALSE)
# Add to the existing database
cat("Add tweets to to-tweet database\n") # for log
tweets_file <- "tweets_rspatial.rds"
if (file.exists(tweets_file)) {
old_tweets <- readRDS(tweets_file)
newold_tweets <- new_tweets %>%
bind_rows(old_tweets) %>%
arrange(desc(bot_retweet)) %>% # TRUE first
distinct(status_id, .keep_all = TRUE)
} else {
newold_tweets <- new_tweets
}
saveRDS(newold_tweets, tweets_file)
# Add to the complete database
cat("Add tweets to complete database\n") # for log
complete_tweets_file <- "complete_tweets_rspatial.rds"
if (file.exists(complete_tweets_file)) {
complete_old_tweets <- readRDS(complete_tweets_file)
complete_newold_tweets <- new_tweets %>%
bind_rows(complete_old_tweets) %>%
distinct(status_id, .keep_all = TRUE)
} else {
complete_newold_tweets <- new_tweets
}
saveRDS(complete_newold_tweets, complete_tweets_file)
## # A tibble: 20 x 92
## user_id status_id created_at screen_name text source display_text_wi… reply_to_status… reply_to_user_id
## <chr> <chr> <dttm> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 948421… 12581094… 2020-05-06 19:00:41 statsandda… "📚Li… Twitt… 277 <NA> <NA>
## 2 948421… 12581065… 2020-05-06 18:49:04 statsandda… "📚#r… Twitt… 275 <NA> <NA>
## 3 507582… 12580813… 2020-05-06 17:08:56 chrisprener "Exc… Twitt… 277 <NA> <NA>
## 4 153192… 12580669… 2020-05-06 16:11:48 kaseyzap "Jus… Twitt… 113 <NA> <NA>
## 5 473551… 12577737… 2020-05-05 20:46:38 SpatialPat… "Gi*… Twitt… 114 <NA> <NA>
## 6 973194… 12577220… 2020-05-05 17:21:30 allisongli… "@le… Twitt… 208 125770284998477… 2930387886
## 7 712203… 12577052… 2020-05-05 16:14:41 TimSalabim3 "#rs… Twitt… 47 <NA> <NA>
## 8 986017… 12576860… 2020-05-05 14:58:08 v_valerioh "@Ci… Twitt… 59 125767343544825… 742379544309567…
## 9 742379… 12576794… 2020-05-05 14:31:52 CivicAngela "@Sh… Tweet… 234 125767863578947… 742379544309567…
## 10 742379… 12576734… 2020-05-05 14:08:08 CivicAngela "Goo… Twitt… 196 <NA> <NA>
## 11 346069… 12576761… 2020-05-05 14:18:47 dikayodata "I’v… Twitt… 132 <NA> <NA>
## 12 400694… 12576424… 2020-05-05 12:04:57 yabellini "Mañ… Twitt… 111 <NA> <NA>
## 13 720884… 12576033… 2020-05-05 09:29:45 hanna123987 "Can… Twitt… 276 <NA> <NA>
## 14 103516… 12574612… 2020-05-05 00:04:52 mdsumner "pol… Twitt… 102 <NA> <NA>
## 15 925963… 12573945… 2020-05-04 19:40:05 AlexaLFH "I'm… Twitt… 255 <NA> <NA>
## 16 148518… 12573795… 2020-05-04 18:40:24 edzerpebes… "Hi … Twitt… 168 <NA> <NA>
## 17 148518… 12573776… 2020-05-04 18:32:50 edzerpebes… "@Md… Twitt… 73 125729207432981… 17159131
## 18 148518… 12573763… 2020-05-04 18:27:40 edzerpebes… "sf … Twitt… 212 <NA> <NA>
## 19 281055… 12573099… 2020-05-04 14:03:47 jakub_nowo… "@ed… Twitt… 274 <NA> 148518970
## 20 124906… 12572954… 2020-05-04 13:06:15 davsjob "🗺️A… Twitt… 215 <NA> <NA>
## # … with 83 more variables: reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, favorite_count <int>,
## # retweet_count <int>, quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>, media_t.co <list>, media_expanded_url <list>,
## # media_type <list>, ext_media_url <list>, ext_media_t.co <list>, ext_media_expanded_url <list>,
## # ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>, lang <chr>, quoted_status_id <chr>,
## # quoted_text <chr>, quoted_created_at <dttm>, quoted_source <chr>, quoted_favorite_count <int>,
## # quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>, quoted_name <chr>,
## # quoted_followers_count <int>, quoted_friends_count <int>, quoted_statuses_count <int>, quoted_location <chr>,
## # quoted_description <chr>, quoted_verified <lgl>, retweet_status_id <chr>, retweet_text <chr>,
## # retweet_created_at <dttm>, retweet_source <chr>, retweet_favorite_count <int>, retweet_retweet_count <int>,
## # retweet_user_id <chr>, retweet_screen_name <chr>, retweet_name <chr>, retweet_followers_count <int>,
## # retweet_friends_count <int>, retweet_statuses_count <int>, retweet_location <chr>, retweet_description <chr>,
## # retweet_verified <lgl>, place_url <chr>, place_name <chr>, place_full_name <chr>, place_type <chr>, country <chr>,
## # country_code <chr>, geo_coords <list>, coords_coords <list>, bbox_coords <list>, status_url <chr>, name <chr>,
## # location <chr>, description <chr>, url <chr>, protected <lgl>, followers_count <int>, friends_count <int>,
## # listed_count <int>, statuses_count <int>, favourites_count <int>, account_created_at <dttm>, verified <lgl>,
## # profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>, profile_banner_url <chr>,
## # profile_background_url <chr>, profile_image_url <chr>, retweet_order <dbl>, bot_retweet <lgl>
Tweet regularly and kill process
The second function will tweet one by one if there is no other tweeting process. It is the code of function retweet_and_update()
in package {tweetrbot}.
- Verify if there is not already a R process running a tweet loop.
- Fill the process PID in an external log file. Only if empty, we can run the loop.
- Create a script that will retweet every 10 minutes, if not already retweeted
- Define the tweeting order: older to most recent tweets
- Update info when retweeted with
bot_retweet=TRUE
if it worked andbot_retweet=NA
if it did not work for further investigation if needed. Originally, this is set tobot_retweet=FALSE
when created. - Update database at the end of the loop
- Read the last version of the database (in case an other CRON arrived during loop)
- Update with information after retweet
- Remove tweets if size is bigger than 3 times the number of tweets retrieved (20 here, so 60).
- Save updated database
# Get current PID
current_pid <- as.character(Sys.getpid())
# Read log PID to verify no running process
loop_pid_file <- "loop_pid.log"
if (!file.exists(loop_pid_file)) {file.create(loop_pid_file)}
loop_pid <- readLines(loop_pid_file)
# Run loop only if not already running
if (length(loop_pid) != 0) {
cat("Loop already running\n") # for log
return(NULL)
}
cat("Start the loop\n") # for log
# Fill the log file to prevent other process
writeLines(current_pid, loop_pid_file)
# Add a column to database to define retweeting order
tweets_file <- "tweets_rspatial.rds"
to_tweets <- readRDS(tweets_file) %>%
filter(bot_retweet == FALSE) %>%
arrange(desc(created_at)) %>% # older at the end
mutate(retweet_order = rev(1:n())) %>% # older tweeted first
select(retweet_order, bot_retweet, everything())
# Retweet
for (i in sort(to_tweets$retweet_order)) {
cat("Loop: ", i, "/", max(to_tweets$retweet_order), "\n") # for log
# which to retweet
w.id <- which(to_tweets$retweet_order == i)
print(paste(i, "- Retweet: N=",
to_tweets$retweet_order[w.id],
"-",
substr(to_tweets$text[w.id], 1, 180)))
retweet_id <- to_tweets$status_id[w.id]
r <- rtweet::post_tweet(retweet_id = retweet_id)
# Change status
if (r$status_code == 200) {
# status OK
to_tweets$bot_retweet[w.id] <- TRUE
} else {
# status not OK
to_tweets$bot_retweet[w.id] <- NA
}
# # Wait before the following retweet to avoid to be ban
# # Sys.sleep(60*10) # Sleep 10 minutes
# Sys.sleep(10)
# }
# Save failure in other database
failed_tweets <- to_tweets %>%
filter(is.na(bot_retweet))
# _Add failed to the existing database
tweets_failed_file <- "tweets_failed_rspatial.rds"
if (file.exists(tweets_failed_file)) {
old_failed_tweets <- readRDS(tweets_failed_file)
newold_failed_tweets <- failed_tweets %>%
bind_rows(old_failed_tweets) %>%
distinct(status_id, .keep_all = TRUE)
} else {
newold_failed_tweets <- failed_tweets
}
saveRDS(newold_failed_tweets, tweets_failed_file)
# Read current dataset on disk again (in case there was an update)
tweets_file <- "tweets_rspatial.rds"
current_tweets <- readRDS(tweets_file)
# Remove duplicates, keep retweet = TRUE (first in list)
updated_tweets <- to_tweets %>%
bind_rows(current_tweets) %>%
arrange(desc(bot_retweet)) %>% # TRUE first
distinct(status_id, .keep_all = TRUE)
# Remove data from the to-tweets database if number is bigger than 50 and already retweeted
if (nrow(updated_tweets) > (n_tweets * 3)) {
updated_tweets <- updated_tweets %>%
arrange(desc(created_at)) %>%
slice(1:(n_tweets * 3))
}
# Save updated list of tweets
saveRDS(updated_tweets, tweets_file)
# Wait before the following retweet to avoid to be ban
# Sys.sleep(60*10) # Sleep 10 minutes
Sys.sleep(10)
}
# remove pid when loop finished
file.remove(loop_pid_file)
# Stop sink
sink(file = NULL, append = FALSE)
Procedure with package {tweetrbot} and a Raspi
Install R 3.6 on the Raspberry Pi and required packages
R needs to be installed on your server.
On the default repository of Raspberry Pi, there is R 3.3, which is quite old… To get the latest version, you have to build R from source. (Code adapted from this interesting resource: Setting up your own shiny-server / rstudio-server on a Raspberry Pi 3B+). This may take a long time!
Note that for installation of R, I specified a custom directory with option like --prefix=$HOME/R
, because the lack of place requires me to store R on an external drive. This is not essential.
bash
sudo apt-get install -y gfortran libreadline6-dev libx11-dev libxt-dev \
libpng-dev libjpeg-dev libcairo2-dev xvfb \
libbz2-dev libzstd-dev liblzma-dev \
libcurl4-openssl-dev libssl-dev \
wget
cd /usr/local/src
sudo wget https://cran.rstudio.com/src/base/R-3/R-3.6.1.tar.gz
sudo su
tar zxvf R-3.6.1.tar.gz
cd R-3.6.1
./configure --enable-R-shlib --prefix=$HOME/R #--with-blas --with-lapack #optional
make
make install
cd ..
rm -rf R-3.6.1*
exit
cd
Now, we will install R packages needed, in particular {tweetrbot}. Here I use sudo R
to open R in the terminal, so that I can set up {rtweet} with the super-user. The same user that will run the CRON. You can also run it with your account, but you will have to set the CRON for your account too. If, like me, you installed R in a specific directory, either change your system PATH, or call R using the full path.
bash
sudo $HOME/R/bin/R
R
install.packages("remotes", repos = "https://cloud.r-project.org/")
remotes::install_github("statnmap/tweetrbot")
[EDIT 2019-09-07] If you have problems installing {httpuv} and/or {later} on your Raspberry Pi, you may want to read this issue and compile {later} yourself: https://github.com/r-lib/later/issues/73
Get package:
bash
git clone https://github.com/r-lib/later.git
# sudo apt install libboost-atomic-dev #optional if you don't have libboost
sudo vi later/src/Makevars
Modify Makevars
:
PKG_LIBS = -pthread -latomic
# If that doesn't work, try:
PKG_LIBS = -pthread -lboost_atomic
Install manually:
bash
sudo R CMD INSTALL later
Prepare the R-script that will be run regularly
Set up your Twitter tokens using rtweet::create_token()
using the appropriate R session, prior to the creation of the R script. There is no need to let your credentials appear clearly in this script !
Then, create the R script that will be run by the CRON.
bash
mkdir ~/talk_rspatial
cd ~/talk_rspatial
vim rtweet_raspi.R
- Option
complete_tweets_file
allows to save the entire list of tweets retweeted since the beginning. In case you would like to do some tweet analysis later on. debug=FALSE
in functionretweet_and_update()
will really tweet on Twitter if the account is correctly set. For preliminary tests, usedebug=TRUE
.
R
library(tweetrbot)
# Where to store tweets and logs
my_dir <- "~/talk_rspatial"
## Retrieve tweets, store on the drive
get_and_store(
query = "#rspatial", n_tweets = 20,
dir = my_dir
)
## Tweet regularly and update the table stored on the drive
retweet_and_update(
dir = my_dir,
n_tweets = 20, n_limit = 3,
sys_sleep = 600, debug = FALSE
)
Configure CRON
A CRON is the short version for crontab, a system chrono table allowing to ask your system to execute some tasks at specific time of the year, month, day…
We edit the crontab to execute our R script.
bash
sudo crontab -e
# if you want to run for a specific user
crontab -u yourusername -e
After that a crontab file will open to which you can add a command with the following form:
Minute Hour Day-of-Month Month Day-Of-Week Command
So, to run the R script rtweet_raspi.R
every 2 hours for every day of the year we should add to the crontab file the following line. If, like me, you installed R in a specific directory, either change your system PATH, or call R using the full path.
0 */2 * * * sudo $HOME/R/bin/Rscript ~/talk_rspatial/rtweet_raspi.R
[EDIT: 2020-05-06] By the way, if your raspi reboots or script fails in the middle of a loop, tweets will start over in spite of loop_pid.log
. I updated functions of {tweetrbot} package for that !
Go further
Now that you have an R script to retrieve and store tweets of a specific community, you can imagine some analyses. Also, as you know how to set a CRON, you can imagine some programmed tweets with graphical analyses of tweets of the past month for instance.
Use your own imagination, but try not to bother to much to many (real) people on Twitter!
Now that the bot is set up, do not hesitate to use
#rspatial
in your tweets and to follow @talk_rspatial. Be patient for the retweet as they are gathered every 2 hours, then retweeted every 10 minutes. Also my server is a Raspi, be kind !
Other resources
- [2017-11-12] - tidyverse tweets bot, a bot that tweets out tidyverse related material: article de blog et dépôt github
- [2017-08-13] - dépôt Github - Bots retweeting language specific R hashtags
- [2015-12-29] - Rrobot, A twitterbot that tweets about #rstats: article de blog et dépôt github
- [2015-01-19] - Programming a Twitter bot – and the rescue from procrastination
- [2015-11-26] - How to program a Twitter bot
- Other Raspi Twitter Bots
Citation:
For attribution, please cite this work as:
Rochette Sébastien. (2019, Aug. 30). "Create a twitter bot on a Raspberry Pi 3 using R". Retrieved from https://statnmap.com/2019-08-30-create-a-twitter-bot-on-a-raspberry-pi-3-using-r/.
BibTex citation:
@misc{Roche2019Creat,
author = {Rochette Sébastien},
title = {Create a twitter bot on a Raspberry Pi 3 using R},
url = {https://statnmap.com/2019-08-30-create-a-twitter-bot-on-a-raspberry-pi-3-using-r/},
year = {2019}
}