User Tools

Site Tools


strings_codepages

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
strings_codepages [2018/11/09 07:44] – created wolfgangriedmannstrings_codepages [2018/11/09 07:53] (current) wolfgangriedmann
Line 78: Line 78:
 <code visualfoxpro>SetAppLocaleId(MAKELCID(MAKELANGID(LANG_NORWEGIAN,SUBLANG_NORWEGIAN_BOKMAL),SORT_DEFAULT))</code> <code visualfoxpro>SetAppLocaleId(MAKELCID(MAKELANGID(LANG_NORWEGIAN,SUBLANG_NORWEGIAN_BOKMAL),SORT_DEFAULT))</code>
  
-// To use language independent sorting+To use language independent sorting
 <code visualfoxpro>SetAppLocaleId(MAKELCID(MAKELANGID(LANG_NEUTRAL,SUBLANG_NEUTRAL),SORT_DEFAULT))</code> <code visualfoxpro>SetAppLocaleId(MAKELCID(MAKELANGID(LANG_NEUTRAL,SUBLANG_NEUTRAL),SORT_DEFAULT))</code>
  
Line 96: Line 96:
  
 With the move to X# things have become easier and more complicated at the same time. With the move to X# things have become easier and more complicated at the same time.
- 
 === Introduction to X# === === Introduction to X# ===
- 
 To start: X# is a .Net development language as you all know and .Net is a Unicode environment. Each character in a Unicode environment is represented by one (or sometimes more) 16 bit numbers. The Unicode character set allows to encode all known characters, so your program can display any combination of West European, East European, Asian, Arabic etc. characters. At least when you choose the right font. Some fonts only support a subset of the Unicode characters, for example only Latin characters and other fonts only Chinese, Korean or Japanese. To start: X# is a .Net development language as you all know and .Net is a Unicode environment. Each character in a Unicode environment is represented by one (or sometimes more) 16 bit numbers. The Unicode character set allows to encode all known characters, so your program can display any combination of West European, East European, Asian, Arabic etc. characters. At least when you choose the right font. Some fonts only support a subset of the Unicode characters, for example only Latin characters and other fonts only Chinese, Korean or Japanese.
 The Unicode character set also has characters that represent the line draw characters that we know from the DOS world, and also various symbols and emoticons have a place in the Unicode character set. The Unicode character set also has characters that represent the line draw characters that we know from the DOS world, and also various symbols and emoticons have a place in the Unicode character set.
Line 108: Line 106:
 The challenge comes when we want to read or write data created in applications that are or were not Unicode, such as Ansi or OEM DBF files, Ansi or OEM text files. Another challenge is code that talks to the Ansi Windows API, such as the VO GUI Classes, and the VO SQL Classes. The challenge comes when we want to read or write data created in applications that are or were not Unicode, such as Ansi or OEM DBF files, Ansi or OEM text files. Another challenge is code that talks to the Ansi Windows API, such as the VO GUI Classes, and the VO SQL Classes.
 === DBFs and Unicode === === DBFs and Unicode ===
- 
 When you open a DBF inside X#, then the RDD system will inspect the header of the file to detect with which codepage the file was created. This codepage is stored in a single byte in the header (using yet another list of numbers). We detect this codepage and map it to the right windows codepage and use an instance of the Encoding class to convert the bytes in the DBF to the appropriate Unicode characters. When we write to a dbf then the RDD system will convert the Unicode characters to the appropriate bytes and write these bytes to the DBF. This works transparently and may only cause problems when you try to write characters that do not exist in the codepage for which the DBF was created. In these situations the conversion either writes an accented character as the same character without accent, or, when no proper translation can be done then a question mark byte will be written to the DBF. The nice thing about the way things are handled now is that you can read and write to dbf files created for different ansi codepages at the same time. However that may cause a problem with regard to indexes. When you open a DBF inside X#, then the RDD system will inspect the header of the file to detect with which codepage the file was created. This codepage is stored in a single byte in the header (using yet another list of numbers). We detect this codepage and map it to the right windows codepage and use an instance of the Encoding class to convert the bytes in the DBF to the appropriate Unicode characters. When we write to a dbf then the RDD system will convert the Unicode characters to the appropriate bytes and write these bytes to the DBF. This works transparently and may only cause problems when you try to write characters that do not exist in the codepage for which the DBF was created. In these situations the conversion either writes an accented character as the same character without accent, or, when no proper translation can be done then a question mark byte will be written to the DBF. The nice thing about the way things are handled now is that you can read and write to dbf files created for different ansi codepages at the same time. However that may cause a problem with regard to indexes.
  
Line 122: Line 119:
 //Important to realize is that the Ansi codepage from a DBF files can and sometimes will be different from the codepage from Windows. So it is possible to read a Greek DBF file and translate the characters to the correct Unicode strings. But if you want to display the contents of this file with the traditional Standard MDI application on a Western  European windows, then you will still see ‘garbage’ because many of these Greek characters are not available in the Western European codepage.// //Important to realize is that the Ansi codepage from a DBF files can and sometimes will be different from the codepage from Windows. So it is possible to read a Greek DBF file and translate the characters to the correct Unicode strings. But if you want to display the contents of this file with the traditional Standard MDI application on a Western  European windows, then you will still see ‘garbage’ because many of these Greek characters are not available in the Western European codepage.//
 //We recognize that the Ansi UI layer and Ansi SQL layer could be a problem for many people and we have already worked on GUI classes and SQL classes that are fully Unicode aware. As soon as the X# Runtime is stable enough we will add these classes libraries as a bonus option to the FOX subscribers version of X#. These class libraries have the same class names, property names and method names as the VO GUI and SQL Classes. Existing code (even painted windows) should work with little or no changes. We have also emulated the VO specific event model where events are handled at the form level with events such as EditChange, ListBoxSelect etc.// //We recognize that the Ansi UI layer and Ansi SQL layer could be a problem for many people and we have already worked on GUI classes and SQL classes that are fully Unicode aware. As soon as the X# Runtime is stable enough we will add these classes libraries as a bonus option to the FOX subscribers version of X#. These class libraries have the same class names, property names and method names as the VO GUI and SQL Classes. Existing code (even painted windows) should work with little or no changes. We have also emulated the VO specific event model where events are handled at the form level with events such as EditChange, ListBoxSelect etc.//
 +
 //The GUI classes in this special library are created on top of Windows Forms and the SQL classes are created on top of Ado.NET. So these classes are not only fully Unicode but they are also AnyCPU.// //The GUI classes in this special library are created on top of Windows Forms and the SQL classes are created on top of Ado.NET. So these classes are not only fully Unicode but they are also AnyCPU.//
 +
 //Both of these class libraries will be “open”: For the GUI library will use a factory model where there is a factory that creates the underlying controls, panels and forms used by the GUI library. The base factory creates standard System.Windows.Forms objects. However you can also inherit from this factory and use 3rd party controls, such as the controls from Infragistics or DevExpress. All you have to do is to tell the GUI library that you want to use another factory class and of course implement the changes in the factory. And you can mix things if you want. For example replace the pushbuttons and textboxes and leave the rest standard System.Windows.Forms.// //Both of these class libraries will be “open”: For the GUI library will use a factory model where there is a factory that creates the underlying controls, panels and forms used by the GUI library. The base factory creates standard System.Windows.Forms objects. However you can also inherit from this factory and use 3rd party controls, such as the controls from Infragistics or DevExpress. All you have to do is to tell the GUI library that you want to use another factory class and of course implement the changes in the factory. And you can mix things if you want. For example replace the pushbuttons and textboxes and leave the rest standard System.Windows.Forms.//
 +
 //You may wonder why we have chosen to build this on Windows.Forms and not WPF even when Microsoft has told us that Windows.Forms is “dead”.  The biggest reason is that it is much easier to create a compatible UI layer with Windows Forms. And fortunately Microsoft recently has announced that they will come with Windows.Forms for .Net Core 3. The first demos show that this works very well and is much faster. So the future for Windows.Forms looks promising again !// //You may wonder why we have chosen to build this on Windows.Forms and not WPF even when Microsoft has told us that Windows.Forms is “dead”.  The biggest reason is that it is much easier to create a compatible UI layer with Windows Forms. And fortunately Microsoft recently has announced that they will come with Windows.Forms for .Net Core 3. The first demos show that this works very well and is much faster. So the future for Windows.Forms looks promising again !//
  
 //In a similar way we have created a Factory for the SQL Classes. We will deliver a version with 3 different SQL factories, based on the three standard Ado.Net dataproviders: ODBC, OLEDB and SqlClient.// //In a similar way we have created a Factory for the SQL Classes. We will deliver a version with 3 different SQL factories, based on the three standard Ado.Net dataproviders: ODBC, OLEDB and SqlClient.//
 +
 //You should be able to subclass the factory and use any other Ado.Net dataprovider, such as a provider for Oracle or MySql. Maybe we will find a sponsor and will include these factories as well. The SQL factories will also allow you to inspect and modify readers, commands, connections etc. before and after they are opened.//  //You should be able to subclass the factory and use any other Ado.Net dataprovider, such as a provider for Oracle or MySql. Maybe we will find a sponsor and will include these factories as well. The SQL factories will also allow you to inspect and modify readers, commands, connections etc. before and after they are opened.// 
 +
 === Sorting and comparing strings in your code,  the compiler option /vo13 and SetExact() === === Sorting and comparing strings in your code,  the compiler option /vo13 and SetExact() ===
 Creating indexes and sorting files is not the only place in your code where string sorting algorithms are used. They are also used in your code when you compare two strings to determine which us greater (string1 <= string2) , when you compare 2 strings for equality ( string1 = string2, string1 == string2) and in code such as ASort(). One factor that makes this extra complicated is that quite often the compiler cannot recognize that you are comparing strings because the strings are "hidden" inside a usual. The compiler has no idea for code like usual1 <= usual2 what the values can be at runtime. You could be comparing 2 dates, 2 integers, 2 strings, 2 symbols, 2 floats or even a mixture (like an integer with a float). In these cases the compiler will simply produce code that compares 2 usuals (by calling an operator method on the usual type) and “hopes” that you are not comparing apples and pears. When you do, you will receive a runtime error. Creating indexes and sorting files is not the only place in your code where string sorting algorithms are used. They are also used in your code when you compare two strings to determine which us greater (string1 <= string2) , when you compare 2 strings for equality ( string1 = string2, string1 == string2) and in code such as ASort(). One factor that makes this extra complicated is that quite often the compiler cannot recognize that you are comparing strings because the strings are "hidden" inside a usual. The compiler has no idea for code like usual1 <= usual2 what the values can be at runtime. You could be comparing 2 dates, 2 integers, 2 strings, 2 symbols, 2 floats or even a mixture (like an integer with a float). In these cases the compiler will simply produce code that compares 2 usuals (by calling an operator method on the usual type) and “hopes” that you are not comparing apples and pears. When you do, you will receive a runtime error.
  
-But lets start simple and assume you are comparing 2 strings. To achieve the same results in X# as in your Ansi VO App we have added a compiler option /vo13 which tells the compiler to use “compatible string comparisons”. When you use this compiler option then the compiler will call a function in the X# runtime (called __StringCompare) to do the actual string comparison. This function will use the same comparison algorithm that is used for DBF files. So it follows the SetCollation() setting and will convert the strings to Ansi or Unicode and use the same routines that VO does. This should produce exactly the same results as your VO app. Of course it will not know how to compare characters that are not available in the Ansi codepage. So if you have 2 strings with for example difference Chinese characters and use compatible comparisons to compare them, then it is very well possible that they are seen as “the same” because the differences are lost when converting the Unicode characters to Ansi. They might both be translated to a string of one or more questionmarks.+But lets start simple and assume you are comparing 2 strings. To achieve the same results in X# as in your Ansi VO App we have added a compiler option /vo13 which tells the compiler to use “compatible string comparisons”. When you use this compiler option then the compiler will call a function in the X# runtime (called %%__%%StringCompare) to do the actual string comparison. This function will use the same comparison algorithm that is used for DBF files. So it follows the SetCollation() setting and will convert the strings to Ansi or Unicode and use the same routines that VO does. This should produce exactly the same results as your VO app. Of course it will not know how to compare characters that are not available in the Ansi codepage. So if you have 2 strings with for example difference Chinese characters and use compatible comparisons to compare them, then it is very well possible that they are seen as “the same” because the differences are lost when converting the Unicode characters to Ansi. They might both be translated to a string of one or more questionmarks.
  
 If you compile your code without /vo13 then the compiler will not use the X# runtime function to compare strings but will insert a call to System.String.Compare() to compare the strings. Important to realize is that this function does NOT use the SetExact() logic. So where with SetExact(FALSE) the comparison “Aaaa” = “A” would return TRUE with compatible string comparisons, it will return FALSE when the noncompatible comparisons are used. If you compile your code without /vo13 then the compiler will not use the X# runtime function to compare strings but will insert a call to System.String.Compare() to compare the strings. Important to realize is that this function does NOT use the SetExact() logic. So where with SetExact(FALSE) the comparison “Aaaa” = “A” would return TRUE with compatible string comparisons, it will return FALSE when the noncompatible comparisons are used.
Line 139: Line 141:
 If you compare a String with a USUAL or 2 USUALS then the compiler cannot know how to compare the strings . In that case the code produced by the compiler will convert the string to a usual and call the comparison routine for USUALs. If you compare a String with a USUAL or 2 USUALS then the compiler cannot know how to compare the strings . In that case the code produced by the compiler will convert the string to a usual and call the comparison routine for USUALs.
 Unfortunately, when we wrote the runtime we did not know what your compiler option of choice would be, so the runtime does not know what to do. To solve this problem we use the following solution: Unfortunately, when we wrote the runtime we did not know what your compiler option of choice would be, so the runtime does not know what to do. To solve this problem we use the following solution:
-In the startup code of your application the compiler adds a (compiler generated) line of code that saves the setting of the /vo13 option in the so called “runtimestate”. The code inside the runtime that compares 2 usuals will check this setting and will either call the "plain" String.Compare() or the compatible __StringCompare().+In the startup code of your application the compiler adds a (compiler generated) line of code that saves the setting of the /vo13 option in the so called “runtimestate”. The code inside the runtime that compares 2 usuals will check this setting and will either call the "plain" String.Compare() or the compatible %%__%%StringCompare().
 Important to realize is that when your app consists of more than one assembly it becomes important to use the setting of /vo13 consistently over all of your assemblies and main app. Mixing this option can easily lead to problems that are difficult to find. Important to realize is that when your app consists of more than one assembly it becomes important to use the setting of /vo13 consistently over all of your assemblies and main app. Mixing this option can easily lead to problems that are difficult to find.
 It can become even more complicated if you are using 3rd party components especially if you don’t have the source. That is why we recommend that 3rd party developers always compile with /vo13 ON. You may think that that would produce problems if the rest of the components is compiled with /vo13 off. But rest assured, this is a scenario that we have foreseen: inside the __StringCompare() code that is called when /vo13 is enabled there is a check for the same vo13 option in the runtimestate. So if the function detects that the main app is running with /vo13 off, then the function will call the same String.Compare() code that the compiler would have used otherwise. It can become even more complicated if you are using 3rd party components especially if you don’t have the source. That is why we recommend that 3rd party developers always compile with /vo13 ON. You may think that that would produce problems if the rest of the components is compiled with /vo13 off. But rest assured, this is a scenario that we have foreseen: inside the __StringCompare() code that is called when /vo13 is enabled there is a check for the same vo13 option in the runtimestate. So if the function detects that the main app is running with /vo13 off, then the function will call the same String.Compare() code that the compiler would have used otherwise.
Line 165: Line 167:
 Unfortunately there is no universal best solution. You will have to evaluate your application and decide yourself what is good for you. Of course we are available on our forums to help you make that decision. Unfortunately there is no universal best solution. You will have to evaluate your application and decide yourself what is good for you. Of course we are available on our forums to help you make that decision.
  
-Original article URLs: +Original article URLs:  
-[[https://www.xsharp.info/articles/blog/x-strings-and-codepages-part-1|www.xsharp.info/articles/blog/x-strings-and-codepages-part-1]]+[[https://www.xsharp.info/articles/blog/x-strings-and-codepages-part-1|www.xsharp.info/articles/blog/x-strings-and-codepages-part-1]] 
 [[https://www.xsharp.info/articles/blog/x-strings-and-codepages-part-3|www.xsharp.info/articles/blog/x-strings-and-codepages-part-3]] [[https://www.xsharp.info/articles/blog/x-strings-and-codepages-part-3|www.xsharp.info/articles/blog/x-strings-and-codepages-part-3]]
strings_codepages.txt · Last modified: 2018/11/09 07:53 by wolfgangriedmann