Incorrectly generates Polish diacritical marks "ĄĆĘŁŃÓŚŻŹ

#2
by JL42 - opened

Like ALL Chinese models (and almost all AI models), it incorrectly generates Polish diacritical marks “ĄĆĘŁŃÓŚŻŹ”.
zoomit

it maybe due to not too many polish training data

it maybe due to not too many polish training data

This is downright ridiculous and incomprehensible. There is no shortage of training data. This is not some dead, unused language. We have been waiting for improvement for several years.

Sorry, I mean there might be not too much in the training data of this model. The main language is English and Chinese in training data. I guess so.

I have the same issue. It fails completely in Polish (and I assume in all non-English languages as well).

China is far away, so maybe they don’t know Polish exists 😝

There are a lot of Polish employees at OpenAI, so they’ve taken care of this in ChatGPT models. Also, Nano Banana performs well in Polish.

We should add European-language rendering to the benchmarks so this kind of failure gets caught and exposed next time. 😁🤫

For years, I have been wondering why AI models bypass the Polish language. Many other languages that are less commonly used, have less training data, or are more difficult are sometimes available, but Polish is NOT.
Almost all image generators have problems with Polish diacritical marks (“ĄĆĘŁŃÓŚŻŹ”). There is also a problem with most AI audio models and the Polish language.

Sign up or log in to comment