[ad_1]
Giant language fashions (LLMs) reminiscent of ChatGPT are being more and more utilized in academic {and professional} settings. It is very important perceive and research the numerous biases current in such fashions earlier than integrating them into present purposes and our each day lives.
One of many biases I studied in my previous article was relating to historic occasions. I probed LLMs to grasp what historic information they encoded within the type of main historic occasions. I discovered that they encoded a severe Western bias in the direction of understanding main historic occasions.
On the same vein, on this article, I probe language fashions relating to their understanding of necessary historic figures. I requested two LLMs who crucial historic individuals in historical past had been. I repeated this course of 10 instances for 10 completely different languages. Some names, like Gandhi and Jesus, appeared extraordinarily regularly. Different names, like Marie Curie or Cleopatra, appeared much less regularly. In comparison with the variety of male names generated by the fashions, there have been extraordinarily few feminine names.
The most important query I had was: The place had been all the ladies?
Persevering with the theme of evaluating historic biases encoded by language fashions, I probed OpenAI’s GPT-4 and Anthropic’s Claude relating to main historic figures. On this article, I present how each fashions comprise:
- Gender bias: Each fashions disproportionately predict male historic figures. GPT-4 generated the names of feminine historic figures 5.4% of the time and Claude did so 1.8% of the time. This sample held throughout all 10 languages.
- Geographic bias: Whatever the language the mannequin was prompted in, there was a bias in the direction of predicting Western historic figures. GPT-4 generated historic figures from Europe 60% of the time and Claude did so 52% of the time.
- Language bias: Sure languages suffered from gender or geographic biases extra. For instance, when prompted in Russian, each GPT-4 and Claude generated zero ladies throughout all of my experiments. Moreover, language high quality was decrease for some languages. For instance, when prompted in Arabic, the fashions had been extra prone to reply incorrectly by producing…
[ad_2]
Source link