There are nearly 400,000 subscribers to a YouTube account called Rob the Robot – Learning Videos For Children. In a 2020 animated video, the protagonist and his friends visit a stadium-themed planet and perform heroic Heracles-inspired feats. Their adventures are elementary-age appropriate, but younger readers with YouTube’s auto-captioning may be surprised to expand their vocabulary. At one point, YouTube’s algorithms misheard the word “brave” for “rape” and captioned a scene where a character aspires to be “strong and raped like Heracles.” “.
A recent study of YouTube’s algorithmic captioning on children-directed videos documented how text sometimes translates to “very adult” language. In a sample of more than 7,000 videos from 24 top-rated children’s channels, 40% of these each displayed 1,300 “taboo” words about swearing in their subtitles. In about 1% of videos, subtitles include words from a list of “highly inappropriate” terms.
Several videos posted on Ryan’s World, a leading children’s channel with over 30 million subscribers, are the best illustration of this problem. In one video, the phrase “You should also buy corn” given in the footnote is “You should also buy porn”. Because the system AI mistook “corn” for “p*rn”. In other videos, “beach towel” is transliterated as “b*tch towel”, “unusual” (buster) becomes “bastard”, “crab”. (crab) became “crap” and on a craft video teaching how to make a monster themed dollhouse there was the word “bed for p*nis”.
“It is surprising and disturbing”said Ashique KhudaBukhsh, an assistant professor at Rochester Institute of Technology who has researched the issue.
Automatic captions are not available on YouTube Kids, the child-directed version of the platform. But, many families often use the standard version of YouTube, where they can also watch it. The Pew Research Center reported in 2020 that 80% of parents with children 11 years of age or younger said their children watched YouTube content, and more than 50% of children did so on a daily basis.
KhudaBukhsh hopes the study will draw attention to a phenomenon he says has received little attention from tech companies and researchers. He calls it the “inappropriate content illusion”. That’s when algorithms add inappropriate content that wasn’t present in the original content. This is similar to how smartphone autocomplete often filters adult language to the point of annoyance, but in the opposite direction.
Meanwhile, YouTube spokeswoman Jessica Gibby said children under 13 should use YouTube Kids, where automatic captions can’t be viewed. On the standard version of YouTube, she also says the feature improves accessibility. She said: “We’re constantly working to improve automatic captioning and reduce errors.”
Alafair Hall, a spokeswoman for Pocket.watch, a children’s entertainment studio that publishes Ryan’s World content, said in a statement that the company “is in close and immediate contact with our platform partners, such as YouTube, to update any inaccurate video subtitles.”
“The benefits of speech-to-text are undeniable, but there are blind spots in these systems that need to be checked and rebalanced.” KhudaBukhsh said.
Those blind spots may not come as a surprise to humans, thanks in part to how much easier it is for us to understand the broader context and meaning of a person’s words. The algorithms are different. Although they have improved their language processing ability, they still lack the ability to fully and comprehensively understand the problem. This has caused problems for companies that rely on machines for word processing. A startup has had to overhaul its adventure game after it was discovered to sometimes depict sexual scenarios involving minors.
Machine learning algorithms will “learn” a task by processing large amounts of training data – in this case, the appropriate translation and audio files. KhudaBukhsh said that YouTube’s system sometimes inserts profanity because its training data mainly consists of adult speech and has few words from children. When the researchers manually examined examples of words that didn’t fit in the captions, they found they often appeared alongside the words of children or people who appeared to be non-native English speakers. Previous studies have also found that transcription services from Google and other big tech companies make more errors in the case of non-white speakers, as well as fewer errors in the English language. Standard American English, compared with other dialects also in America.
Rachael Tatman, a linguist, says a simple list of words that are not used on children’s videos on YouTube solves many problems. But, “Clearly no one is supervising the engineering,” she said.
Still, Tatman says a block list would be an imperfect solution. Inappropriate phrases can be constructed with individual innocuous words. A more sophisticated approach would be to tweak the subtitle system to avoid using adult language when making children’s content, but Tatman says that won’t be perfect either. Machine learning software works with the language of statistics in certain directions, but it is not easily programmed to respect context. According to Tatman, “Language models are not precise tools.”
KhudaBbukhsh and his collaborators have invented and tested systems to correct taboo words in recordings, but even the best systems are less than 30% effective. The team also runs audio from children’s YouTube videos through an automatic transcription service provided by Amazon. It also sometimes makes mistakes that cause content to be edited. Amazon spokeswoman Nina Lindsey declined to comment on the matter, but did provide links to developer documentation on how to correct or filter unwanted words.
Refer Wired