YouTube content frequently
contains emojis, special Unicode characters, and text in various
languages. The tuber package provides built-in functions
for detecting, extracting, and manipulating emojis without external
dependencies.
library(tuber)
# Get comments from a video
comments <- get_all_comments(video_id = "your_video_id")
# Check which comments contain emojis
comments$has_emoji <- has_emoji(comments$textDisplay)
# Count emojis per comment
comments$emoji_count <- count_emojis(comments$textDisplay)
# Filter to emoji-rich comments
emoji_comments <- comments[comments$emoji_count > 0, ]The package provides five main functions for working with emojis:
has_emoji() - Check for emoji presencecount_emojis() - Count emojis in textextract_emojis() - Get emojis from textremove_emojis() - Strip emojis from textBeyond emojis, tuber handles Unicode text
consistently:
safe_utf8() - Ensure UTF-8 encodingYour R environment may not support UTF-8 display. The data is still correct; only the display is affected. Try:
Compound emojis (like family emojis or skin tone modifiers) may be counted as multiple characters. This is due to how Unicode encodes these as sequences of code points.
The emoji pattern covers most common Unicode emoji blocks. Very new emojis added in recent Unicode versions may not be detected until the pattern is updated.