Language Processing in LLMs: English vs. Japanese
I'm Japanese. I think there used to be an idea that LLMs tended to give better answers when the prompt was written in English rather than in Japanese (maybe due to how inputs are processed?). Is that still true recently? Also, could you explain the differences between the following cases: 1. in:JP out:JP, 2. in:EN out:JP, 3. in:JP out:EN, and 4. in:EN out:EN? Please consult the most up-to-date information.
Prompt Language and Answer Quality: Early on, users noticed that English prompts often produced more detailed or accurate answers than Japanese prompts. This was largely due to how LLMs were trained and processed text. Models like GPT-3 were trained predominantly on English data (GPT-3’s training was ~92% English vs only 0.2% Japanese (www.passaglia.jp)), and their tokenization schemes were optimized for English. Japanese text can require many more tokens to represent the same content, making processing less efficient (www.passaglia.jp) (www.passaglia.jp). Also, initial fine-tuning (RLHF) was mostly done in English (www.passaglia.jp). These factors meant earlier LLMs understood and responded in English more readily than in Japanese. However, recent models have greatly closed this gap. For example, GPT-4 was designed with strong multilingual capabilities and can follow Japanese instructions almost as well as English (www.passaglia.jp). OpenAI’s own evaluation found GPT-4’s accuracy on a benchmark was ~79.9% in Japanese vs 85.5% in English – a difference, but not a huge gap (www.passaglia.jp). In everyday use, GPT-4 produces high-quality answers in Japanese, and the advantage of using English prompts is much smaller than it used to be. That said, English still has a slight edge in some cases: studies in 2024 observed that ChatGPT’s answers to the same questions were rated more comprehensive and accurate in English than in Japanese (pmc.ncbi.nlm.nih.gov). Likewise, GPT-4 scored higher on a Japanese radiology exam when questions were translated into English (about 89% accuracy) than when asked in Japanese (~70%) (pmc.ncbi.nlm.nih.gov). This suggests that while Japanese works well, the model’s very best performance still tends to emerge with English prompts. In short, modern LLMs handle Japanese much better than before, but English remains the most “understood” language for many models.
Differences between input/output language combinations: Each combination of prompt and answer language can subtly affect an LLM’s performance and response style:
-
JP → JP (Japanese in, Japanese out): You ask in Japanese and get an answer in Japanese. This is a natural scenario for Japanese users. Today’s top LLMs (like GPT-4) can interpret Japanese questions and generate fluent Japanese answers effectively. The content quality is generally high, approaching what you’d get in English (www.passaglia.jp). The model will usually maintain appropriate Japanese style (e.g. polite phrasing). In the past, such answers might have been shorter or less detailed due to limited Japanese training, but recent models give much more thorough answers in Japanese. One caveat is that Japanese text is tokenized into more pieces, so responses might be slightly slower to produce and use more tokens (www.passaglia.jp). Overall, for most purposes you can ask in Japanese and get a good answer in Japanese – the model handles it well, though it may occasionally omit minor details that it might include in an English answer.
-
EN → JP (English in, Japanese out): You provide the prompt in English, but request the answer in Japanese. Many users tried this strategy to combine the model’s strong English comprehension with its ability to output Japanese. Because the question is in English, the LLM can leverage its full understanding and nuances learned from English training data. It then translates or generates the answer in Japanese. This often yields very detailed answers – essentially the model arrives at an answer as if it were explaining in English, then expresses it in Japanese. For example, a complex prompt about a technical topic might be understood more deeply if given in English, reducing the chance of misinterpretation. The Japanese output in this case is usually accurate and fluent, since models like GPT-4 have near-native Japanese generation ability. In some cases the phrasing might read as if it was translated (e.g. overly direct or using structures more common in English), but generally it’s natural. This approach can still be useful if you feel the Japanese phrasing of a prompt isn’t capturing the detail – writing it in English ensures the model fully grasps the request, then you still get the answer in Japanese. Recent improvements mean the difference isn’t as dramatic as before, but English-in/Japanese-out can sometimes produce a slightly more comprehensive answer than Japanese-in/Japanese-out. Essentially, you’re using English to unlock the model’s best reasoning, and Japanese to present the result.
-
JP → EN (Japanese in, English out): Here you ask the question in Japanese, but the answer is given in English. This scenario might occur if you’re more comfortable writing in Japanese but want an English response (perhaps for practice or to share with English speakers). Modern LLMs will translate the Japanese prompt internally and then respond in English. The output quality in English is usually very high, as the model excels at articulating answers in English. The main consideration is whether the model correctly understood the Japanese input. If the Japanese prompt is clear, GPT-4 can interpret it accurately and you’ll get an English answer comparable to an English→English case. In fact, GPT-4’s fluency in English ensures the answer will be well-structured and detailed. There is a small risk that subtle nuances or specific terms in the Japanese question could be lost or misunderstood, which might make the English answer slightly off-target compared to asking directly in English. But for the most part, JP→EN works almost as seamlessly as EN→EN for a capable multilingual model. Essentially, the model is doing translation on the input in this case (as opposed to case 2, where translation is on the output). As long as the question is well-formed in Japanese, the English answer will be as informative as if you’d asked in English, since the model’s expression in English is its strong suit.
-
EN → EN (English in, English out): This is the classic use-case and usually the strongest scenario for most LLMs. You’re feeding an English prompt and getting an English answer. The model doesn’t have to perform any translation; it can directly apply its training which heavily features English data and English instruction-following. In general, this yields the most detailed and nuanced responses. The model can draw on a wide range of knowledge and express it in the richly trained English style (elaborate explanations, idiomatic phrases, etc.). Evaluations consistently show top performance in English prompts – for instance, GPT-4 achieves its highest accuracy on tasks in English (www.passaglia.jp), and human evaluators often rate English responses as more comprehensive (pmc.ncbi.nlm.nih.gov). If you’re comfortable with English, this mode is ideal for getting the maximum depth and reasoning. In terms of content, an English→English exchange might include more expansive explanations or examples, simply because the model has an easier time communicating in the language it “knows” best. That said, the gap is not as huge as it once was; a Japanese user can still get excellent results in Japanese. But EN→EN remains the most reliable way to get the very best quality answer from many LLMs, given their training background.
Summary: Early large language models often struggled with non-English prompts, but recent models have made great strides in Japanese. You no longer must use English to get a good answer – you can usually ask and receive answers in Japanese without issue. English prompts still have a slight advantage in certain complex or knowledge-intensive queries, yielding more detailed answers in some evaluations (pmc.ncbi.nlm.nih.gov) (pmc.ncbi.nlm.nih.gov). The differences between the four input/output cases mainly come down to whether the model needs to internally translate the question or answer. Using English for the prompt can maximize the model’s understanding, while using Japanese for the output lets you read the answer in your native language. Conversely, a Japanese prompt might be a bit harder for the model to parse perfectly, but an English output is where the model is most articulate. Overall, modern LLMs like GPT-4 perform well in all four scenarios. If you’re aiming for absolute maximum accuracy or detail (for example, complex problem solving or niche knowledge), you might get a tiny boost by phrasing the question in English. Otherwise, you should feel comfortable using Japanese for both input and output – the model will still provide a high-quality answer in your language. The “English prompt advantage” has diminished as models have become more multilingual, though English remains the model’s strongest language in a pinch.