Use the -c option to exclude invalid characters that iconv cannot convert

Tadashi Shigeoka ·  Fri, July 15, 2016

When I tried to convert a CSV file output from a database on Linux from UTF-8 to Shift JIS character encoding, it seemed to contain characters that couldn’t be converted, and I got an illegal input sequence at position error that caused processing to terminate midway.

Linux | リナックス
$ iconv -f utf-8 -t sjis -o output-sjis.csv input.csv
iconv: illegal input sequence at position 652782

Adding the -c option prevents processing from stopping midway, excludes invalid characters from output, and processes to the end.

$ iconv -c -f utf-8 -t sjis -o output-sjis.csv input.csv

It was properly documented in the help.

$ iconv --help
Usage: iconv [OPTION...] [FILE...]
Convert encoding of given files from one encoding to another.

 Input/Output format specification:
  -f, --from-code=NAME       encoding of original text
  -t, --to-code=NAME         encoding for output

 Information:
  -l, --list                 list all known coded character sets

 Output control:
  -c                         omit invalid characters from output
  -o, --output=FILE          output file
  -s, --silent               suppress warnings
      --verbose              print progress information

  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

For bug reporting instructions, please see:
.

I’d like to graduate from the work of converting CSV files to Shift JIS just to open them in Excel.

That’s all from the Gemba.