Prompting Rules That Reduce Codex Mojibake Accidents on Windows
· Go Komura · Codex, Windows, Mojibake, UTF-8, CP932, AI Coding
When you have Codex work with files containing Japanese text on Windows, the first thing that actually helps is not aligning every editor and shell setting — it is explicitly telling Codex how to read, how to write, and where to stop.
The situations that cause the most trouble look like this.
- UTF-8, CP932, and UTF-16-family files coexist
- The text looks readable on screen, but the interpretation of the actual bytes is off
- You only meant to tweak an existing file, but it gets re-saved in a different encoding
- Breakage happens in “non-code” files: CSV, TXT, logs, Markdown, configuration files
- A throwaway script or raw shell output gets saved as-is, and the accident becomes permanent
OpenAI’s Codex is more stable when you treat it less like a one-off chat partner and more like a teammate you use continuously, with settings and working rules. In particular, if your workflow has Codex read AGENTS.md, encoding rules belong there permanently rather than being repeated verbally every time.
In this article, we organize, from a practitioner’s perspective, the instructions that are most effective to give Codex up front so it can safely handle Japanese files on Windows.
1. The Conclusion First
The single most effective way to reduce Codex mojibake accidents on Windows is to fix the encoding work procedure in advance.
These are the rules that help the most.
- For existing files containing Japanese, have it check the likely encoding, BOM presence, and newline style before reading
- For files where mojibake is suspected, do not let it save until it is confident
- For existing files, have it preserve the original encoding, BOM, and newlines
- For new files, steer toward UTF-8 per repository convention
- For writes, only allow methods where the encoding can be made explicit
- After saving, have it re-read the file and verify representative Japanese lines
In short, day-to-day form, it boils down to this.
- Check before reading
- No saving when in doubt
- Preserve existing files; UTF-8 only for new ones
- Ban ambiguous write paths
- Re-read and verify at the end
Conversely, these are the dangerous kinds of instructions.
- “Fix the mojibake”
- “Convert everything to UTF-8”
- “Output a CSV”
- “Just make it match”
- “Save it for now and let’s see”
None of these say at which point Codex should stop. For mojibake prevention, you need to specify not only what to do but also where to stop short of saving.
2. Why Mojibake Accidents Are So Common on Windows
The real problem is not that Codex is weak at Japanese — it is that on the Windows asset side, multiple encodings and multiple write paths coexist.
In practice, this kind of mixture is not unusual.
- Newer sources and Markdown are UTF-8
- Older CSVs, TXT files, logs, and configs are CP932-family
- Some outputs and tool-generated artifacts are UTF-16-family
- Save paths vary across editors, shells, and Excel-derived output
- Newlines are also mixed between LF and CRLF
In this state, if Codex misinterprets the bytes even once, it can proceed to the next edit treating strings it failed to read as if they had been read correctly. And if it then saves, the problem is no longer a display issue — it becomes fixed as corruption of the file itself.
That is why mojibake prevention ultimately comes down to how you manage the I/O procedure.
3. The Rules You Want to Fix for Codex First
3.1 Have it check the likely encoding, BOM, and newlines before reading
The first rule is this.
Before reading an existing file that contains Japanese, check its likely encoding, BOM presence, and newline style, and if anything looks suspicious, do not proceed to interpreting the content as-is.
The point is to change the workflow to “before reading the text, first look at the file’s premises.”
3.2 Do not let it save a file with suspected mojibake based on guesswork
This one is especially important.
When mojibake is suspected, treat the file as read-only during investigation and prohibit overwriting until the encoding interpretation is credible.
The same applies to humans: never save a file you have not actually been able to read. Saving on “it looks a bit broken, but this is probably it” turns that guess into the confirmed version of the accident.
3.3 Preserve existing files; default to UTF-8 only for new files
In the context of mojibake prevention, “unify everything to UTF-8” is surprisingly dangerous.
Eventually deciding to move the whole repo to UTF-8 is a legitimate call, but it is safer to do that as a separate task while reviewing the diff and the blast radius. For everyday maintenance, this workflow is the stable one.
- When editing an existing file, preserve its original encoding
- When adding a new file, create it as UTF-8 per repo convention
- If an existing file needs conversion, keep that separate from normal functional fixes
3.4 Do not let it use ambiguous write paths by default
What multiplies accidents on Windows is “it’s just a small output, so write it sloppily from the shell.”
- Dumping output directly via redirection
- Saving directly with a convenience command
- Promoting a temporary artifact straight into a production file
These paths often have no explicit encoding, making them a breeding ground for accidents. So it is safest to also fix, for Codex, how write mechanisms are chosen.
3.5 After saving, have it re-read and verify representative Japanese lines
“It saved successfully” and “it is not broken” are not the same thing.
What matters is having it read representative Japanese lines again after saving and check points like these.
- No replacement characters
U+FFFDhave crept in - No unnatural increase in
? - The diff is not a huge BOM-only or newline-only change
- Japanese text that was not supposed to change is still intact
3.6 When warning signs appear, have it report before fixing
In encoding accidents, you limit the damage better by having Codex stop and report rather than forcing a fix.
For example, if any of these signs appear, it is safer to treat the situation as abnormal for the moment.
- An increase in
U+FFFD - An increase in
? - An unexpected BOM change
- A large newline-only diff
- Only the Japanese lines changing unnaturally and substantially
4. As a Short Instruction to Attach to Tasks
If you want a short version to attach to each task, this much is already quite effective.
In this task, avoiding encoding accidents is the top priority.
- For existing files containing Japanese, check the likely encoding, BOM presence, and newline style before reading
- Do not save files with suspected mojibake based on guesswork
- Preserve the original encoding / BOM / newlines of existing files
- Create new files as UTF-8 per repo convention
- Only use write methods where the encoding can be made explicit
- After saving, re-read the file and confirm that representative Japanese lines are intact
- Report as abnormal any increase in `U+FFFD` or `?`, BOM / newline accidents, or large diffs
If the target files are already known, adding this one line stabilizes things considerably.
Target files: <paths> / Representative strings: "<examples>"
Providing representative strings is remarkably effective. It gives Codex a concrete watch point: “this Japanese text must not break.”
5. A Template Worth Keeping in AGENTS.md
Rather than repeating the same warnings over and over, put them in AGENTS.md. Below is a practice-oriented template for repos that handle Japanese files on Windows.
# Text Encoding Rules
## Scope
This repository may contain Japanese text and mixed legacy encodings.
Avoid mojibake and accidental re-encoding above all else.
## Mandatory Rules
- Before reading or editing an existing text file that may contain Japanese, first determine:
- likely encoding
- BOM presence
- newline style
- If mojibake is suspected, do not save the file until the encoding interpretation is credible.
- Preserve the original encoding, BOM, and newline style for existing files.
- Treat "convert to UTF-8" as a separate, explicit task.
- New files should follow repository convention. If there is no clear rule, prefer UTF-8 and state whether BOM is used.
- Do not use ambiguous write paths by default, such as shell redirection or convenience commands without explicit encoding control.
- After writing, reopen the file and verify representative Japanese lines.
- If any of the following appears, stop and report:
- replacement characters
- unexpected `?`
- unintended BOM change
- unintended newline conversion
- whole-file diffs without a business reason
## Reporting Format
For each changed text file, report:
- path
- detected or preserved encoding
- BOM presence
- newline style
- how verification was performed
- whether representative Japanese text remained intact
The strength of this template is that it pins down not just how to edit but how not to break things. In particular, these two lines pull a lot of weight:
If mojibake is suspected, do not save ...Treat "convert to UTF-8" as a separate, explicit task.
6. Bad Instructions vs. Good Instructions
In mojibake prevention, the granularity of your instructions strongly shapes the outcome.
| Bad instruction | Good instruction |
|---|---|
| Fix the mojibake | First determine whether the file itself is corrupted or it is only a display-side issue, and do not save based on guesswork |
| Convert everything to UTF-8 | Preserve the original encoding of existing files; only create new files as UTF-8 per repo convention. Make converting existing files a separate task |
| Output a CSV | Match the encoding used in existing operations, make the encoding explicit when writing, and re-read the Japanese columns after output to verify |
| Fix whatever you can read | Do not save anything you are unsure about; report candidates and your reasoning instead |
| Just make it match | Do not change the BOM, newlines, or encoding on your own; make sure the diff contains only the business change |
The point is to always write in the checks before touching anything and the verification after saving.
7. A Checklist for Review Time
After Codex has done the work, fixing the checkpoints on the human side as well makes things even more stable.
- Is the encoding / BOM / newline handling reported for each changed file?
- Have only the Japanese lines changed unnaturally and substantially?
- Are there large newline-only diffs?
- Has
U+FFFDor?increased? - Are there whole-file diffs unrelated to the business change?
- Have columns or quoting broken in CSVs or logs?
What matters in mojibake prevention is stopping suspicious diffs early, more than accumulating successful diffs.
8. Summary
When you have Codex handle Japanese files on Windows, the first thing that helps is not perfecting your machine’s setup — it is explicitly giving Codex the encoding work procedure.
The five points worth remembering:
- Have it check encoding / BOM / newlines before reading
- If mojibake is suspected, do not let it save on guesswork
- Preserve existing files; steer only new files toward UTF-8
- Ban ambiguous write paths
- Have it re-read after saving and verify representative Japanese lines
And if you find yourself saying it every time, put it in AGENTS.md. That is the most practical move.
The core of mojibake prevention is not asking it to “handle Japanese properly” — it is writing down the conditions under which saving is allowed and the conditions under which it must stop. Once you have that in writing, Codex becomes far easier to work with even on Windows.
9. References
- OpenAI Codex docs, Best practices
- OpenAI Codex docs, Custom instructions with AGENTS.md
- OpenAI Codex docs, Windows
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Windows Text Encodings and Line Endings - The Basics of Mojibake and CRLF/LF
The Shift_JIS / UTF-8 / UTF-16 confusion on Windows, mojibake, and the difference between CRLF and LF, organized into a form that is easy...
An Introduction to Windows Text Encodings - The Mojibake That Happens When Integrating with Linux
A practical look at why mojibake happens on Windows, through the differences between CP932, UTF-8, UTF-16, BOMs, code pages, PowerShell, ...
What Is MFC on Windows? Foundational Knowledge for Maintaining Existing Assets
An overview of the Microsoft Foundation Classes (MFC): its relationship to Win32, application structure, message maps, Document/View, DDX...
What to Do Before Disposing of a Windows PC — A Practical Checklist for Data Erasure, Account Unlinking, and Backups
What to do before disposing of, transferring, selling, or returning a leased Windows PC — covering backups, data erasure, BitLocker, Micr...
Windows App Outsourcing and Contract Development: What to Sort Out Before You Ask
Before commissioning Windows app outsourcing or contract development, here is how to sort out existing software modification, device inte...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Technical Consulting & Design Review
In development environments where existing assets mix CP932 and UTF-8, sorting out AI prompting rules and operational procedures up front is one of the easiest ways to reduce accidents.
Windows App Development
For Windows business tools and maintenance projects, operational design that prevents encoding accidents in Japanese files, CSVs, and configuration files directly affects implementation quality.
Author Profile
Profile page for the article author.
Go Komura
Representative of KomuraSoft LLC
Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.
Public links