Using tuber

tuber: Access YouTube from R

Install, Load the package

To install the latest version from CRAN:

install.packages("tuber")

The latest development version of the package will always be on GitHub. Instructions for installing the package from Github are provided below.

# install.packages('devtools')
devtools::install_github("soodoku/tuber", build_vignettes = TRUE)

Next, load the package:

library(tuber)

Using the package

To get going, get the application id and password from Google Developer Console (see https://developers.google.com/youtube/v3/getting-started). Enable YouTube APIs. Create OAuth credentials, being sure to select ‘Other’ as your Application Type. Then set the application id and password via the yt_oauth function. For more information about YouTube OAuth, see YouTube OAuth Guide.

yt_oauth("998136489867-5t3tq1g7hbovoj46dreqd6k5kd35ctjn.apps.googleusercontent.com", "MbOSt6cQhhFkwETXKur-L9rN")
## Waiting for authentication in browser...
## Press Esc/Ctrl + C to abort
## Authentication complete.

Get Statistics of a Video

get_stats(video_id="N708P-A45D0")
## No. of Views 525112 
## No. of Likes 5576 
## No. of Dislikes 3564 
## No. of Favorites 0 
## No. of Comments 5329

Get Information About a Video

get_video_details(video_id="N708P-A45D0")

Get Caption of a Video

get_captions(video_id="yJXTXN4xrI8")
## <?xml version="1.0" encoding="utf-8"?>
## <transcript>
##   <text start="6.614" dur="1.549">Every four seconds,</text>
##   <text start="8.163" dur="1.534">someone is diagnosed with</text>
##   <text start="9.697" dur="1.885">Alzheimer&amp;#39;s disease.</text>
##   <text start="11.582" dur="2.172">It&amp;#39;s the most common cause of dementia,</text>
##   <text start="13.754" dur="2.859">affecting over 40 million people worldwide,</text>
##   <text start="16.613" dur="2.52">and yet finding a cure is something that still</text>
##   <text start="19.133" dur="2.482">eludes researchers today.</text>
##   <text start="21.615" dur="3.273">Dr. Alois Alzheimer, a German psychiatrist,</text>
##   <text start="24.888" dur="3.047">first described the symptoms in 1901</text>
##   <text start="27.935" dur="2.46">when he noticed that a particular hospital patient</text>
##   <text start="30.395" dur="1.917">had some peculiar problems,</text>
##   <text start="32.312" dur="1.803">including difficulty sleeping,</text>
##   <text start="34.115" dur="3.37">disturbed memory, drastic mood changes,</text>
##   <text start="37.485" dur="2.166">and increasing confusion.</text>
##   <text start="39.651" dur="1.877">When the patient passed away,</text>
##   <text start="41.528" dur="2.211">Alzheimer was able to do an autopsy</text>
##   <text start="43.739" dur="2.04">and test his idea that perhaps</text>
##   <text start="45.779" dur="2.421">her symptoms were caused by irregularities</text>
##   <text start="48.2" dur="1.963">in the brain&amp;#39;s structure.</text>
##   <text start="50.163" dur="1.951">What he found beneath the microscope</text>
##   <text start="52.114" dur="2.473">were visible differences in brain tissue</text>
##   <text start="54.587" dur="2.194">in the form of misfolded proteins</text>
##   <text start="56.781" dur="1.334">called plaques,</text>
##   <text start="58.115" dur="2.433">and neurofibrillary tangles.</text>
##   <text start="60.548" dur="2.378">Those plaques and tangles work together</text>
##   <text start="62.926" dur="2.419">to break down the brain&amp;#39;s structure.</text>
##   <text start="65.345" dur="1.792">Plaques arise when another protein</text>
##   <text start="67.137" dur="2.643">in the fatty membrane surrounding nerve cells</text>
##   <text start="69.78" dur="2.697">gets sliced up by a particular enzyme,</text>
##   <text start="72.477" dur="2.585">resulting in beta-amyloid proteins,</text>
##   <text start="75.062" dur="1.799">which are sticky and have a tendency</text>
##   <text start="76.861" dur="1.587">to clump together.</text>
##   <text start="78.448" dur="1.952">That clumping is what forms the things</text>
##   <text start="80.4" dur="2.131">we know as plaques.</text>
##   <text start="82.531" dur="1.793">These clumps block signaling</text>
##   <text start="84.324" dur="1.502">and, therefore, communication</text>
##   <text start="85.826" dur="2.336">between cells, and also seem to trigger</text>
##   <text start="88.162" dur="2.536">immune reactions that cause the destruction</text>
##   <text start="90.698" dur="2.134">of disabled nerve cells.</text>
##   <text start="92.832" dur="2.782">In Alzheimer&amp;#39;s disease, neurofibrillary tangles</text>
##   <text start="95.614" dur="3.085">are built from a protein known as tau.</text>
##   <text start="98.699" dur="2.89">The brain&amp;#39;s nerve cells contain a network of tubes</text>
##   <text start="101.589" dur="2.024">that act like a highway for food molecules</text>
##   <text start="103.613" dur="1.563">among other things.</text>
##   <text start="105.176" dur="2.543">Usually, the tau protein ensures that these tubes</text>
##   <text start="107.719" dur="2.256">are straight, allowing molecules</text>
##   <text start="109.975" dur="1.917">to pass through freely.</text>
##   <text start="111.892" dur="1.709">But in Alzheimer&amp;#39;s disease,</text>
##   <text start="113.601" dur="3.463">the protein collapses into twisted strands or tangles,</text>
##   <text start="117.064" dur="1.832">making the tubes disintegrate,</text>
##   <text start="118.896" dur="2.505">obstructing nutrients from reaching the nerve cell</text>
##   <text start="121.401" dur="2.628">and leading to cell death.</text>
##   <text start="124.029" dur="2.336">The destructive pairing of plaques and tangles</text>
##   <text start="126.365" dur="2.332">starts in a region called the hippocampus,</text>
##   <text start="128.697" dur="2.419">which is responsible for forming memories.</text>
##   <text start="131.116" dur="1.713">That&amp;#39;s why short-term memory loss</text>
##   <text start="132.829" dur="2.702">is usually the first symptom of Alzheimer&amp;#39;s.</text>
##   <text start="135.531" dur="1.884">The proteins then progressively invade</text>
##   <text start="137.415" dur="1.616">other parts of the brain,</text>
##   <text start="139.031" dur="1.834">creating unique changes that signal</text>
##   <text start="140.865" dur="2.416">various stages of the disease.</text>
##   <text start="143.281" dur="1.235">At the front of the brain,</text>
##   <text start="144.516" dur="3.536">the proteins destroy the ability to process logical thoughts.</text>
##   <text start="148.052" dur="3.168">Next, they shift to the region that controls emotions,</text>
##   <text start="151.22" dur="2.337">resulting in erratic mood changes.</text>
##   <text start="153.557" dur="1.224">At the top of the brain,</text>
##   <text start="154.781" dur="2.364">they cause paranoia and hallucinations,</text>
##   <text start="157.145" dur="2.053">and once they reach the brain&amp;#39;s rear,</text>
##   <text start="159.198" dur="1.999">the plaques and tangles work together</text>
##   <text start="161.197" dur="2.418">to erase the mind&amp;#39;s deepest memories.</text>
##   <text start="163.615" dur="1.621">Eventually the control centers governing</text>
##   <text start="165.236" dur="2.794">heart rate and breathing are overpowered as well</text>
##   <text start="168.03" dur="1.796">resulting in death.</text>
##   <text start="169.826" dur="2.039">The immensely destructive nature of this disease</text>
##   <text start="171.865" dur="2.999">has inspired many researchers to look for a cure</text>
##   <text start="174.864" dur="3.752">but currently they&amp;#39;re focused on slowing its progression.</text>
##   <text start="178.616" dur="1.387">One temporary treatment</text>
##   <text start="180.003" dur="2.627">helps reduce the break down of acetylcholine,</text>
##   <text start="182.63" dur="2.653">an important chemical messenger in the brain</text>
##   <text start="185.283" dur="2.519">which is decreased in Alzheimer&amp;#39;s patients</text>
##   <text start="187.802" dur="3.063">due to the death of the nerve cells that make it.</text>
##   <text start="190.865" dur="2.316">Another possible solution is a vaccine</text>
##   <text start="193.181" dur="2.461">that trains the body&amp;#39;s immune system to attack</text>
##   <text start="195.642" dur="3.587">beta-amyloid plaques before they can form clumps.</text>
##   <text start="199.229" dur="2.801">But we still need to find an actual cure.</text>
##   <text start="202.03" dur="1.75">Alzheimer&amp;#39;s disease was discovered</text>
##   <text start="203.78" dur="1.669">more than a century ago,</text>
##   <text start="205.449" dur="2.664">and yet still it is not well understood.</text>
##   <text start="208.113" dur="1.667">Perhaps one day we&amp;#39;ll grasp</text>
##   <text start="209.78" dur="2.916">the exact mechanisms at work behind this threat</text>
##   <text start="212.696" dur="2.214">and a solution will be unearthed.</text>
## </transcript>
## 

Search Videos

res <- yt_search("Barack Obama")
head(res[, 1:3])
##                publishedAt                channelId                                                                           title
## 1 2016-09-06T11:20:20.000Z UC6CZwQv8cbZco3wOwtp7W2g             See How Obama Ignored Xi Jinping & Welcomed PM Modi At G20 Summit !
## 2 2016-09-05T06:01:17.000Z UC6RJ7-PaXg6TIH2BzZfTV7w                                   PM Modi Meets Barack Obama at G20 Summit 2016
## 3 2008-03-24T16:31:53.000Z UC4o-h3-3GhrmHB6ytgO3oIQ                                      Top 20 Obama Pastor Comments You Never Saw
## 4 2016-09-03T06:43:38.000Z UC5aeU5hk31cLzq_sAExLVWg                                     LIVE: Obama arrives in China for G20 summit
## 5 2016-09-20T01:30:42.000Z UCmWHDwXFvdKc8OVBbb2dMZg Buy Lot of 20 Obama, 44th U.S. President, 56th Presidential Inauguration coins!
## 6 2016-09-03T06:57:05.000Z UCgrNz-aDmcr2uuto8_DL2jg                           US President Obama arrives in Hangzhou for G20 Summit

Get Comments on a video

res <- get_comment_threads(c(video_id="N708P-A45D0"))
head(res)
##    authorDisplayName                                                                              authorProfileImageUrl                                        authorChannelUrl
## 1             Tony Rx https://lh5.googleusercontent.com/-OL_t2gRZ8RI/AAAAAAAAAAI/AAAAAAAAACg/tXnqTDNYXec/photo.jpg?sz=50 http://www.youtube.com/channel/UCqKvEBPMtbmh020d56WrQXQ
## 2       Kevin Manning https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/photo.jpg?sz=50 http://www.youtube.com/channel/UCEF-WytcLvgijQIklAbteUw
## 3       Kevin Manning https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/photo.jpg?sz=50 http://www.youtube.com/channel/UCEF-WytcLvgijQIklAbteUw
## 4 TheDrunkenRocketMan https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/photo.jpg?sz=50 http://www.youtube.com/channel/UCvTypSrsihX3fr2zCZivmwA
## 5          pangratata https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/photo.jpg?sz=50 http://www.youtube.com/channel/UCETB8ILZueyFMCFWMF0p3Yg
## 6          pangratata https://lh3.googleusercontent.com/-XdUIqdMkCWA/AAAAAAAAAAI/AAAAAAAAAAA/4252rscbv5M/photo.jpg?sz=50 http://www.youtube.com/channel/UCETB8ILZueyFMCFWMF0p3Yg

Get statistics of all the videos in a channel

a <- list_channel_resources(filter = c(channel_id = "UCT5Cx1l4IS3wHkJXNyuj4TA"), part="contentDetails")

# Uploaded playlists:
playlist_id <- a$items[[1]]$contentDetails$relatedPlaylists$uploads

# Get videos on the playlist
vids <- get_playlist_items(filter= c(playlist_id=playlist_id)) 

# Video ids
vid_ids <- as.vector(vids$contentDetails.videoId)

# Function to scrape stats for all vids
get_all_stats <- function(id) {
  get_stats(id)
} 

# Get stats and convert results to data frame 
res <- lapply(vid_ids, get_all_stats)
res_df <- do.call(rbind, lapply(res, data.frame))

head(res_df)

If you need to find a channel ID from a username, the list_channel_resources function can help. You can use it like:

res = list_channel_resources(filter = c(username = "GoogleDevelopers"), part="id")

# Parse out channel_id
if(!is.null(res$items[[1]]$id)){
  channel_id <- res$items[[1]]$id
} else {
  stop("User not found") 
}
# channel_id should be UC_x5XG1OV2P6uZZ5FSM9Ttw
##            id viewCount likeCount dislikeCount favoriteCount commentCount
## 1 91gZ4taDiDE       906         4            1             0            0 
## 2 bHPCvSqTxn4       706         0            0             0            0 
## 3 h2UPH87kjhc       458         1            0             0            0
## 4 E2VtxjljZCE       391         0            0             0            0 
## 5 5Ajfk620fA0    175000         5            0             0            0 
## 6 PdI3HjulcA4       575         3            2             0            3