--- title: "Analyzing Politician Instagram Accounts Using clarifai" author: "Gaurav Sood" date: "2015-11-10" vignette: > %\VignetteIndexEntry{Analyzing Politician Instagram Accounts Using clarifai} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Analyzing Politician Instagram Accounts Using clarifai ```{r load_instagram, eval=FALSE} library(instaR) ``` To use the instagram API, go to [https://instagram.com/developer/](https://instagram.com/developer/) and click on manage client and then register a client. Choose a name etc. For website and redirect URL, write in localhost:1410. This will give you client ID and secret. Plug these in as follows: ```{r insta_auth, eval=FALSE} my_oauth <- instaOAuth(app_id="1f1f8228974248ba804b4c02fb3c082f", app_secret="a8a727a6b21e488988207686c88ec49e") save(my_oauth, file="my_oauth") ``` Now it is time to load clarifai: ```{r load_clarifai, eval=FALSE} library(clarifai) ``` Clarifai ships with instagram handles of politicians. Load the file using: ```{r get_data, eval=FALSE} filepath <- system.file("inst/extdata/congress.csv", package = "clarifai") pols <- read.csv(filepath) ``` Next, download data from instagram: ```{r download_data, eval=FALSE} # getUserMedia(pols$instagram[1], token=my_oauth) res <- list() for (i in 1:nrow(pols)) { # Not all politicians have instagram accounts. if (pols$instagram[i]!="") { # Not all have public posts res[[i]] <- tryCatch(getUserMedia(pols$instagram[i], token=my_oauth), error=function(err) NA) } else { res[[i]] <- NA } } # rbind res2 <- do.call(rbind, res) # nrow = 8088 (may change for runs in the future) ``` Merge it with some pols data ```{r merge_write, eval=FALSE} # Get pols data ready small_pols <- pols[,c("first_name", "last_name", "party", "instagram", "dw_nominate")] small_pols_2 <- subset(small_pols, instagram!="") # take out no username/NA # Merge res2[, c("first_name", "last_name", "party", "instagram", "dw_nominate")] <- small_pols_2[match(res2$username, small_pols_2$instagram),] # write.csv(res2, file="res2.csv", row.names=F) ``` Now, get image labels from clarifai: ```{r get_clarifai_labels, eval=FALSE} labs <- list() # Not implemented optimally. # You can push all images at once. And that is the best than 8k requests. for (i in 1:nrow(res2)) { labs[[i]] <- tryCatch(tag_image_urls(res2$image_url[i]), error=function(err) NA) } labs_df <- do.call(rbind, labs) ``` Next merge the labels back into the data: ```{r merge_and_save, eval=FALSE} # Merge labs_df[,names(res2)] <- res2[match(labs_df$img_url, res2$image_url),] # write.csv(labs_df, file="labs_df.csv", row.names=F) # This data frame is available in the extdata folder ``` Let us analyze data. Popular tags: ```{r pop_tags, eval=FALSE} head(table(labs_df$tags)[order(-table(labs_df$tags))], 40) ``` ```{r out_pop, eval=FALSE} ## people politics adult men group government business women portrait leader ## 1592 1137 1132 999 910 795 793 773 763 670 ## clothing politician education speech election indoors meeting room competition many ## 554 472 456 435 433 426 360 352 347 345 ``` Do Republican instagram accounts have more photos with military tags than Democrats? ```{r military, eval=FALSE} table(grepl("military", labs_df$tags), labs_df$party) ``` ```{r out_mil, eval=FALSE} ## D R ## FALSE 19806 16030 ## TRUE 94 90 ``` How about women? ```{r women, eval=FALSE} table(grepl("women", labs_df$tags), labs_df$party) ``` ```{r out_women, eval=FALSE} ## D R ## FALSE 19458 15853 ## TRUE 442 267 ``` ```{r men, eval=FALSE} table(grepl("men", labs_df$tags), labs_df$party) ``` See also for men: ```{r out_men, eval=FALSE} ## D R ## FALSE 18265 14978 ## TRUE 1635 1142 ``` Protest? ```{r protest, eval=FALSE} table(grepl("protest", labs_df$tags), labs_df$party) ``` ```{r out_protest, eval=FALSE} ## D R ## FALSE 19734 16024 ## TRUE 166 96 ```