Create a twitter bot on a Raspberry Pi 3 using R

With Marion Louveaux, we decided that we had to build a Twitter bot for our preferred hashtag. We explored different possibilities but truth is I couldn’t resist building it using R and {rtweet}. Here are the steps I used to set up a Twitter bot on my Raspberry Pi.

Create a twitter bot

We will create a twitter bot that retweets tweets containing #rspatial: https://twitter.com/talk_rspatial
In this aim, we need:

  • A script to retrieve tweets with #rspatial
  • A script to retweet while respecting Twitter API use
  • A server that will regularly execute the scripts

I use package {rtweet} to communicate with Twitter. I use my personal Raspberry Pi with a CRON to execute R scripts. Scripts created are available as functions in package {tweetrbot}

Procedure by [Marion Louveaux](https://marionlouveaux.fr)

Figure 1: Procedure by Marion Louveaux

Note that the procedure is detailed in the next paragraph, to present the code, but you can directly jump to the following paragraph “Procedure with package {tweetrbot} and a Raspi” if you want less details.

Detailed procedure and code

Creating your Twitter token

  • I recommend to use a specific mail address for this bot, in case Twitter has something to tell you.
  • You need to create a specific Twitter account for this bot on Twitter.
  • Read vignette of {rtweet} to create your tokens: https://rtweet.info/articles/auth.html

Then you will run this kind of code to properly save your tokens.

## authenticate via access token
token <- rtweet::create_token(
  app = "my_twitter_research_app",
  consumer_key = "zzz",
  consumer_secret = "zzeee",
  access_token = "1234-zzzzz",
  access_secret = "zzzzaaaaa")

Rules of the Twitter API use may be found here https://developer.twitter.com/en/docs/basics/rate-limiting and here https://developer.twitter.com/en/docs/basics/rate-limits.html . Some selected information:

  • POST: The 300 per 3 hours is with the POST statuses/update and POST statuses/retweet/:id endpoints is a combined limit. You can only post 300 Tweets or Retweets during a 3 hour period.
  • GET: All request windows are 15 minutes in length. Endpoint sGET earch/tweets: Resource family search: Requests / window (user auth) 180, Requests / window (app auth) 450

Also, be sure to respect Twitter automation rules: https://help.twitter.com/fr/rules-and-policies/twitter-automation
In this specific case:

Automated Retweets: Provided you comply with all other rules, you may Retweet or Quote Tweet in an automated manner for entertainment, informational, or novelty purposes. Automated Retweets often lead to negative user experiences, and bulk, aggressive, or spammy Retweeting is a violation of the Twitter Rules.

Retrieve tweets and store locally

The first function is used to retrieve tweets from Twitter and store them on the server. It is the code of function get_and_store() in package {tweetrbot}.

  • For each iteration of the CRON, we download the last 20 tweets with #rspatial.
  • We create two database:
    • A small one to keep the last tweets to be sure we do not retweet already tweeted ones: to_tweet_rspatial.rds
    • A big one that will store all tweets ever retrieved, for future analyses: complete_tweets_rspatial.rds
  • We store the console output of the last CRON in a log file, just in case
# For logs
sink(file = "rtweet_console.log", append = FALSE)

# Number of tweets to retrieve
n_tweets <- 20

# Retrieve tweets for one hashtag
cat("Retrieve tweets\n") # for log
new_tweets <- rtweet::search_tweets(
  "#rspatial", n = n_tweets, include_rts = FALSE
) %>% 
  mutate(
    retweet_order = NA_real_,
    bot_retweet = FALSE)

# Add to the existing database
cat("Add tweets to to-tweet database\n") # for log
tweets_file <- "tweets_rspatial.rds"
if (file.exists(tweets_file)) {
  old_tweets <- readRDS(tweets_file)
  newold_tweets <- new_tweets %>% 
    bind_rows(old_tweets) %>% 
    arrange(desc(bot_retweet)) %>% # TRUE first 
    distinct(status_id, .keep_all = TRUE)
} else {
  newold_tweets <- new_tweets
}
saveRDS(newold_tweets, tweets_file)

# Add to the complete database
cat("Add tweets to complete database\n") # for log
complete_tweets_file <- "complete_tweets_rspatial.rds"
if (file.exists(complete_tweets_file)) {
  complete_old_tweets <- readRDS(complete_tweets_file)
  complete_newold_tweets <- new_tweets %>% 
    bind_rows(complete_old_tweets) %>% 
    distinct(status_id, .keep_all = TRUE)
} else {
  complete_newold_tweets <- new_tweets
}
saveRDS(complete_newold_tweets, complete_tweets_file)
## # A tibble: 30 x 92
##    user_id status_id created_at          screen_name text  source display_text_wi… reply_to_status… reply_to_user_id
##    <chr>   <chr>     <dttm>              <chr>       <chr> <chr>             <dbl> <chr>            <chr>           
##  1 301593… 11639279… 2019-08-20 21:37:00 AndrewRenn… Tryi… Twitt…              182 <NA>             <NA>            
##  2 332333… 11638136… 2019-08-20 14:02:51 FelipeSMBa… #SIG… Twitt…               33 <NA>             <NA>            
##  3 114378… 11632647… 2019-08-19 01:41:54 terusteran… "Jau… Twitt…              159 <NA>             <NA>            
##  4 394517… 11629914… 2019-08-18 07:35:39 EAGLE_MSc   @zev… Twitt…              206 112320469522760… 1909185565      
##  5 582682… 11629699… 2019-08-18 06:10:25 m_wegmann   @zev… Twitt…              168 112320469522760… 1909185565      
##  6 895591… 11678036… 2019-08-31 14:17:46 StatnMap    "#rs… Twitt…              268 <NA>             <NA>            
##  7 116748… 11678015… 2019-08-31 14:09:18 talk_rspat… Hey … Twitt…              272 <NA>             <NA>            
##  8 739968… 11674866… 2019-08-30 17:18:11 elmudge3    Toda… Twitt…              208 <NA>             <NA>            
##  9 739968… 11671002… 2019-08-29 15:42:49 elmudge3    When… Twitt…              106 <NA>             <NA>            
## 10 331537… 11668783… 2019-08-29 01:00:48 dwwolfson   Do p… Twitt…              133 <NA>             <NA>            
## # … with 20 more rows, and 83 more variables: reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>, reply_count <int>, hashtags <list>, symbols <list>,
## #   urls_url <list>, urls_t.co <list>, urls_expanded_url <list>, media_url <list>, media_t.co <list>,
## #   media_expanded_url <list>, media_type <list>, ext_media_url <list>, ext_media_t.co <list>,
## #   ext_media_expanded_url <list>, ext_media_type <chr>, mentions_user_id <list>, mentions_screen_name <list>,
## #   lang <chr>, quoted_status_id <chr>, quoted_text <chr>, quoted_created_at <dttm>, quoted_source <chr>,
## #   quoted_favorite_count <int>, quoted_retweet_count <int>, quoted_user_id <chr>, quoted_screen_name <chr>,
## #   quoted_name <chr>, quoted_followers_count <int>, quoted_friends_count <int>, quoted_statuses_count <int>,
## #   quoted_location <chr>, quoted_description <chr>, quoted_verified <lgl>, retweet_status_id <chr>,
## #   retweet_text <chr>, retweet_created_at <dttm>, retweet_source <chr>, retweet_favorite_count <int>,
## #   retweet_retweet_count <int>, retweet_user_id <chr>, retweet_screen_name <chr>, retweet_name <chr>,
## #   retweet_followers_count <int>, retweet_friends_count <int>, retweet_statuses_count <int>, retweet_location <chr>,
## #   retweet_description <chr>, retweet_verified <lgl>, place_url <chr>, place_name <chr>, place_full_name <chr>,
## #   place_type <chr>, country <chr>, country_code <chr>, geo_coords <list>, coords_coords <list>, bbox_coords <list>,
## #   status_url <chr>, name <chr>, location <chr>, description <chr>, url <chr>, protected <lgl>, followers_count <int>,
## #   friends_count <int>, listed_count <int>, statuses_count <int>, favourites_count <int>, account_created_at <dttm>,
## #   verified <lgl>, profile_url <chr>, profile_expanded_url <chr>, account_lang <lgl>, profile_banner_url <chr>,
## #   profile_background_url <chr>, profile_image_url <chr>, retweet_order <dbl>, bot_retweet <lgl>

Tweet regularly and kill process

The second function will tweet one by one if there is no other tweeting process. It is the code of function retweet_and_update() in package {tweetrbot}.

  • Verify if there is not already a R process running a tweet loop.
    • Fill the process PID in an external log file. Only if empty, we can run the loop.
  • Create a script that will retweet every 10 minutes, if not already retweeted
    • Define the tweeting order: older to most recent tweets
  • Update info when retweeted with bot_retweet=TRUE if it worked and bot_retweet=NA if it did not work for further investigation if needed. Originally, this is set to bot_retweet=FALSE when created.
  • Update database at the end of the loop
    • Read the last version of the database (in case an other CRON arrived during loop)
    • Update with information after retweet
    • Remove tweets if size is bigger than 3 times the number of tweets retrieved (20 here, so 60).
    • Save updated database
# Get current PID
current_pid <- as.character(Sys.getpid())

# Read log PID to verify no running process
loop_pid_file <- "loop_pid.log"
if (!file.exists(loop_pid_file)) {file.create(loop_pid_file)}
loop_pid <- readLines(loop_pid_file)

# Run loop only if not already running
if (length(loop_pid) != 0)  {
  cat("Loop already running\n") # for log
  return(NULL)
}

cat("Start the loop\n") # for log
# Fill the log file to prevent other process
writeLines(current_pid, loop_pid_file)  

# Add a column to database to define retweeting order
tweets_file <- "tweets_rspatial.rds"
to_tweets <- readRDS(tweets_file) %>% 
  filter(bot_retweet == FALSE) %>% 
  arrange(desc(created_at)) %>% # older at the end
  mutate(retweet_order = rev(1:n())) %>% # older tweeted first
  select(retweet_order, bot_retweet, everything())

# Retweet
for (i in sort(to_tweets$retweet_order)) {
  cat("Loop: ", i, "/", max(to_tweets$retweet_order), "\n") # for log
  # which to retweet
  w.id <- which(to_tweets$retweet_order == i)
  print(paste(i, "- Retweet: N=", 
              to_tweets$retweet_order[w.id],
              "-",
              substr(to_tweets$text[w.id], 1, 180)))
  retweet_id <- to_tweets$status_id[w.id]
  r <- rtweet::post_tweet(retweet_id = retweet_id)
  # Change status
  if (r$status_code == 200) {
    # status OK
    to_tweets$bot_retweet[w.id] <- TRUE
  } else {
    # status not OK
    to_tweets$bot_retweet[w.id] <- NA
  }
  #   # Wait before the following retweet to avoid to be ban
  #   # Sys.sleep(60*10) # Sleep 10 minutes
  #   Sys.sleep(10)
  # }
  
  # Save failure in other database
  failed_tweets <- to_tweets %>% 
    filter(is.na(bot_retweet))
  
  # _Add failed to the existing database
  tweets_failed_file <- "tweets_failed_rspatial.rds"
  if (file.exists(tweets_failed_file)) {
    old_failed_tweets <- readRDS(tweets_failed_file)
    newold_failed_tweets <- failed_tweets %>% 
      bind_rows(old_failed_tweets) %>% 
      distinct(status_id, .keep_all = TRUE)
  } else {
    newold_failed_tweets <- failed_tweets
  }
  saveRDS(newold_failed_tweets, tweets_failed_file)
  
  # Read current dataset on disk again (in case there was an update)
  tweets_file <- "tweets_rspatial.rds"
  current_tweets <- readRDS(tweets_file)
  # Remove duplicates, keep retweet = TRUE (first in list)
  updated_tweets <- to_tweets %>% 
    bind_rows(current_tweets) %>% 
    arrange(desc(bot_retweet)) %>% # TRUE first 
    distinct(status_id, .keep_all = TRUE)
  # Remove data from the to-tweets database if number is bigger than 50 and already retweeted
  if (nrow(updated_tweets) > (n_tweets * 3)) {
    updated_tweets <- updated_tweets %>% 
      arrange(desc(created_at)) %>% 
      slice(1:(n_tweets * 3))
  }
  # Save updated list of tweets
  saveRDS(updated_tweets, tweets_file)
  
  # Wait before the following retweet to avoid to be ban
  # Sys.sleep(60*10) # Sleep 10 minutes
  Sys.sleep(10)
}

# remove pid when loop finished
file.remove(loop_pid_file)

# Stop sink
sink(file = NULL, append = FALSE)

Procedure with package {tweetrbot} and a Raspi

Install R 3.6 on the Raspberry Pi and required packages

R needs to be installed on your server.
On the default repository of Raspberry Pi, there is R 3.3, which is quite old… To get the latest version, you have to build R from source. (Code adapted from this interesting resource: Setting up your own shiny-server / rstudio-server on a Raspberry Pi 3B+). This may take a long time!
Note that for installation of R, I specified a custom directory with option like --prefix=$HOME/R, because the lack of place requires me to store R on an external drive. This is not essential.

bash

sudo apt-get install -y gfortran libreadline6-dev libx11-dev libxt-dev \
       libpng-dev libjpeg-dev libcairo2-dev xvfb \
       libbz2-dev libzstd-dev liblzma-dev \
       libcurl4-openssl-dev libssl-dev \
       wget
cd /usr/local/src
sudo wget https://cran.rstudio.com/src/base/R-3/R-3.6.1.tar.gz
sudo su
tar zxvf R-3.6.1.tar.gz
cd R-3.6.1
./configure --enable-R-shlib --prefix=$HOME/R #--with-blas --with-lapack #optional
make
make install
cd ..
rm -rf R-3.6.1*
exit
cd

Now, we will install R packages needed, in particular {tweetrbot}. Here I use sudo R to open R in the terminal, so that I can set up {rtweet} with the super-user. The same user that will run the CRON. You can also run it with your account, but you will have to set the CRON for your account too. If, like me, you installed R in a specific directory, either change your system PATH, or call R using the full path.

bash

sudo $HOME/R/bin/R

R

install.packages("remotes", repos = "https://cloud.r-project.org/")
remotes::install_github("statnmap/tweetrbot")

[EDIT 2019-09-07] If you have problems installing {httpuv} and/or {later} on your Raspberry Pi, you may want to read this issue and compile {later} yourself: https://github.com/r-lib/later/issues/73
Get package:
bash

git clone https://github.com/r-lib/later.git
# sudo apt install libboost-atomic-dev #optional if you don't have libboost
sudo vi later/src/Makevars

Modify Makevars:

PKG_LIBS = -pthread -latomic
# If that doesn't work, try:
PKG_LIBS = -pthread -lboost_atomic

Install manually:
bash

sudo R CMD INSTALL later

Prepare the R-script that will be run regularly

Set up your Twitter tokens using rtweet::create_token() using the appropriate R session, prior to the creation of the R script. There is no need to let your credentials appear clearly in this script !
Then, create the R script that will be run by the CRON.

bash

mkdir ~/talk_rspatial
cd ~/talk_rspatial
vim rtweet_raspi.R
  • Option complete_tweets_file allows to save the entire list of tweets retweeted since the beginning. In case you would like to do some tweet analysis later on.
  • debug=FALSE in function retweet_and_update() will really tweet on Twitter if the account is correctly set. For preliminary tests, use debug=TRUE.

R

library(tweetrbot)
# Where to store tweets and logs
my_dir <- "~/talk_rspatial"
## Retrieve tweets, store on the drive
get_and_store(
  query = "#rspatial", n_tweets = 20,
  dir = my_dir
)
## Tweet regularly and update the table stored on the drive
retweet_and_update(
  dir = my_dir,
  n_tweets = 20, n_limit = 3,
  sys_sleep = 600, debug = FALSE
)

Configure CRON

A CRON is the short version for crontab, a system chrono table allowing to ask your system to execute some tasks at specific time of the year, month, day…
We edit the crontab to execute our R script.

bash

sudo crontab -e
# if you want to run for a specific user
crontab -u yourusername -e 

After that a crontab file will open to which you can add a command with the following form:

Minute Hour Day-of-Month Month Day-Of-Week Command

So, to run the R script rtweet_raspi.R every 2 hours for every day of the year we should add to the crontab file the following line. If, like me, you installed R in a specific directory, either change your system PATH, or call R using the full path.

0 */2 * * * sudo $HOME/R/bin/Rscript ~/talk_rspatial/rtweet_raspi.R

Go further

Now that you have an R script to retrieve and store tweets of a specific community, you can imagine some analyses. Also, as you know how to set a CRON, you can imagine some programmed tweets with graphical analyses of tweets of the past month for instance.
Use your own imagination, but try not to bother to much to many (real) people on Twitter!

Now that the bot is set up, do not hesitate to use #rspatial in your tweets and to follow @talk_rspatial. Be patient for the retweet as they are gathered every 2 hours, then retweeted every 10 minutes. Also my server is a Raspi, be kind !

Other resources



Citation:

For attribution, please cite this work as:
Rochette Sébastien. (2019, Aug. 30). "Create a twitter bot on a Raspberry Pi 3 using R". Retrieved from https://statnmap.com/2019-08-30-create-a-twitter-bot-on-a-raspberry-pi-3-using-r/.


BibTex citation:
@misc{Roche2019Creat,
    author = {Rochette Sébastien},
    title = {Create a twitter bot on a Raspberry Pi 3 using R},
    url = {https://statnmap.com/2019-08-30-create-a-twitter-bot-on-a-raspberry-pi-3-using-r/},
    year = {2019}
  }
Comments
comments powered by Disqus