Wednesday 18 February 2009

Exporting to CSV, Opening In Excel, Special Character Problem

When creating a csv file from VB.net, then opening this file in Excel using the file association method (ie process.start(file.csv)) special characters such as Nestlé are not recognised by MS Excel. Instead it is shown as Nestlé 
When opening the same CSV file in notepad the foreign characters are shown correctly.

However, if the Import feature is used in Excel, and the encoding is specified as UTF-8 then the data is displayed correctly.

By default, the vb.net streamwriter will use UTF-8 encoding. 
In UTF-8 the two-byte sequence 0xC3 0xA9 = é
In Windows-1252 0xC3 0xA9 = Ã©

To solve this problem I could write the data as an Excel file using COM. However, this ties the application to a particular version of Office and could break if the company upgrades from Office 2003 to 2007.
Instead I have changed the encoding within vb.net.
stwriter = New StreamWriter(File.Create(FilePath), System.Text.Encoding.UTF32)
Using UTF32 did cause Excel to handle foreign characters correctly, although at the cost of automatically splitting the data into columns. 
The TextToColumns feature I added into a macro that automatically selected the first column, split the data into columns and autofit the columns to the contents.

No comments: