AI startup Mistral has released a new API for content moderation.
This API is the same API that powers moderation for Mistral's chatbot platform, Le Chat, and can be customized to suit specific applications and safety standards, Mistral said. It was trained to classify text in a variety of languages, including English, French, and German, into one of nine categories: sexual, hate and discrimination, violence and threats, and dangerous and criminal content. , utilizing a fine-tuned model (Ministral 8B). , Self-Harm, Health, Economics, Legal, and Personally Identifiable Information.
According to Mistral, the moderation API can be applied to either live or conversational text.
“Over the past few months, we have seen growing enthusiasm across the industry and research community for new AI-based moderation systems, which can help make moderation more scalable and robust across applications. ,” Mistral wrote in a blog post. “Our content moderation classifier leverages the most relevant policy categories for effective guardrails and addresses harms generated by models such as unqualified advice and PII. Introducing a pragmatic approach to safety.”
An AI-powered moderation system could theoretically help. But they are also susceptible to the same biases and technical flaws that plague other AI systems.
For example, some models trained to detect toxicity disproportionately label phrases in African American Vernacular English (AAVE), an informal grammar used by some Black Americans, as “toxic.” It is assumed that there is. Posts on social media about people with disabilities are also flagged as more negative or harmful by commonly used public sentiment and toxicity detection models, the study found.
Mistral claims its moderation model is highly accurate, but admits it is still a work in progress. Notably, the company did not compare the performance of its API to other popular moderation APIs, such as Jigsaw's Perspective API or OpenAI's Moderation API.
“We work with our customers to build and share moderation tools that are scalable, lightweight, and customizable. We continue to work with the research community to help improve safety in the broader field,” the company said in a statement. I'll go,'' he said.