Sarah Bird, the principal product manager at Microsoft in the field of artificial intelligence, reveals the limit in an interview where her team has designed several new safety features that will be user-friendly for Azure customers who do not employ dedicated red teams to test their AI services. Microsoft states that these tools driven by LLM can identify potential vulnerabilities to track "plausible but unsupported" deviations and block malicious instructions in real-time for Azure AI customers working with any model hosted on the platform.
"We understand that not everyone has deep expertise in immediate injection attacks or hate speech, so the evaluation system generates the necessary guidelines to simulate these types of attacks. Customers can then receive a score and see the results," she says.
Three features: Quick Guards that block immediate injections or malicious instructions from external documents instructing models to act against their training; Grounding Identification, which identifies and blocks deviations; and Safety Evaluations, assessing model vulnerabilities, are now available for preview in Azure AI. Two additional features for directing models towards secure outputs and monitoring instructions for flagging users who may be problematic will arrive soon.
Whether the user inputs an instruction or the model processes third-party data, the monitoring system will evaluate it to see if it contains forbidden words or hidden instructions before deciding to send it to the model for response. The system then tests the model's response and checks if the model has deduced information not in the document or guideline.
In the case of Google Gemini images, filters designed to reduce bias had unintended effects, an area where Microsoft says their Azure AI tools will allow more personalized control. Bird confirms there is a concern that Microsoft and other companies may decide what is suitable or unsuitable for AI models, so her team has added a way for Azure customers to modify the filtering of hate speech or violence that the model sees and blocks.
In the future, Azure users can also receive reports of users attempting to run unsafe outputs. Bird says this allows system administrators to understand which users are their red team and who may be more maliciously inclined individuals.
Bird says the safety features are "integrated" directly into GPT-4 and other popular models like Llama 2. However, since the Azure model zoo contains many AI models, users of smaller and less used open-source systems may need to manually indicate safety features for models.