ASE Quiz Question Randomizer
 
Randomly selected quiz question (want a different question? Click here):


March 2010
I never know in advance what my next quiz question will be about, but this week the topic came to me easily. I wasted an evening trying to solve a wacky problem while getting some SQL to run on a system (a separate box) that was being prepared for demo purposes.
This turned out to be one of those cases where it's proven true that there's always something left to learn (as I mentioned last month).

The problem first surfaced when using sp_autoformat to format the contents of a table (if you don't know sp_autoformat, you should get acquainted -- try running 'sp_autoformat sysusers' as an example of what it does)
sp_autoformat is supposed to format each column as narrow as possible, given the actual data values it contains, trimming off trailing spaces from [var]char columns, among other things.
My problem was that sp_autoformat was formatting all [var]char columns with a bunch of spaces at the end, which is exactly what should NOT happen. Worse, it really messed up the formatting of some output that was supposed to fit on an 80-character window.

After checking my own SQL code was not at fault, I zoomed in on what sp_autoformat was doing (try adding the @trace=4 parameter). It seemed sp_autoformat was doing everything right, yet every column was still formatted longer than the longest value it contained. Weird.

Eventually I could simplify the problem and reproduce it without sp_autoformat, as follows:
1> select substring('abc', 1, 3)
2> go

 ------
 abc

(1 row affected)
Can you spot the problem? I'm doing a substring() whose result should be 3 characters long, but the resulting column is in fact 6 characters.
This seemed to defy all logic. What on earth is going on here?

Answer:
I was really stuck with this issue, and I didn't see a way out. I had already tried whether it was specific for a particular ASE version. The problem seemed to occur in any ASE 15 version, at least 15.0.3 and 15.5. However, when I ran the same SQL on an ASE 15.x server on my own system (i.e. not the demo box), everything worked fine, and the result was 3 characters long, as expected.
I tweaked both ASE 15 servers until they were fully identical in configuration settings, and still I was stuck: on the demo system, substring('abc',1,3) still generated an incorrect 6-character result, whereas on my own system it produced a 3-character result. At this point there was *no* difference between those two ASE 15 servers, other than that they were running on different hosts.

When I tried running my SQL on a 12.5 server on the demo box, the problem did not occur and the result was 3 characters long as it should be. So was this an ASE 15-specific problem?
I searched the ASE documentation for clues, but to no avail. Then, at the point when hard liquor looked like the only direction where solutions might be found, I realised that there had been some kind of message when I connected to the 12.5 server on the demo box for testing my reproduction:
% isql -Usa -P -S ASE1254 
Msg 2401, Level 11, State 2:
Server 'ASE1254':
Character set conversion is not available between client character set 'utf8'
and server character set 'iso_1'.
No conversions will be done.
1> 
I had ignored this message, but suddenly it seemed a good idea to pay attention. What was this saying?
Apparently the client on the demo box was using utf8 as its character set, and the server was using the default of iso_1. Hmmmm.... but so what? I did not get this message when connecting to the ASE 15 servers.
Curious, I tried running the same repro on a 12.5 server on my own system. This also worked correctly (3-character result), but there was no such message after connecting.

At this point it started to dawn on me that the issue might not be on the side of the server, but somehow involve the client. Something with character sets...
I recalled my little purple booklet has a section on character set stuff, although I don't think I've ever used that section for any practical purpose. The section ('Character set & sort order') didn't provide any clues, but the last line referred to some global variables towards the back of the quickref guide. There, I found the global variable @@client_csname, which supposedly indicates the character set used by the client for the current session.
When I selected this variable on the demo box (where the problem occurs), it appeared to be utf8. On my own system, where the problem did not occur, the client was using iso_1. So when the client uses utf8, somehow the [var]char result set columns are getting longer? That still sounded odd, but at least that's something we can experment with.

So, on the demo box, I ran the following. Using -Jiso_1 forces isql to use iso_1 as its client character set:
% isql -Usa -P -S ASE155     # uses utf8
1> select substring('abc', 1, 3)
2> go

 ------
 abc

(1 row affected)
1> exit
%
%
% isql -Usa -P -S ASE155 -Jiso_1    # uses iso_1
1> select substring('abc', 1, 3)
2> go

 ---
 abc

(1 row affected)
1> exit
%
We seemed to be getting somewhere now... At this point, I could draw at least two conclusions: (i) the issue was caused by the client using utf8 and (ii) the difference between my own system and the demo box was that my own system has environment settings such that the client uses iso_1 by default, whereas on the demo box, the client uses utf8 by default (as it turned out, this was because the environment variable LANG was unset on my own system but set to 'en_US.UTF-8' on the demo box).

Some questions still remained though, such as: why does the problem not occur on the ASE 12.5 servers? And why is there no warning about character set conversion in the ASE 15 servers? And most of all: why does a client character set of utf8 produce a 6-character string, and with iso_1, a 3-character string?

After reading through the documentation on ASE character set matters, things started to become more clear. Obviously, utf8 is a Unicode character set, which can represent many more symbols than, say, ASCII or iso_1. This means that Unicode characters can occupy 2 bytes, whereas iso_1 characters occupy only 1 byte. ASE -- or rather, OpenClient-- will automatically perform character set conversion where possible, and when converting from a single-byte character set (iso_1) to a multi-byte character set (utf8), a single character in the single-byte character set to may be represented by more than one byte in a multi-byte character set. This is actually indicated by the global variable @@client_csexpansion which indicates the factor of this expansion. Indeed, it turns out that this variable contains the value 2 when the client character set is utf8, and 1 when it is iso_1.
So there we are: with a client using utf8 and the server using iso_1, ASE will expand each iso_1 character to a 2-byte utf8 character. In isql, this means that the resulting column is shown as 6 positions, even though it is technically still 3 characters of 2 bytes each. When the client uses iso_1, the result set column consists of 3 1-byte characters, shown as 3 positions by isql.
To return to my original SQL formatting problem, by forcing the client to use iso_1 (with isql -Jiso_1 or by setting LANG to blanks), or suppressing character set conversion altogether (with isql -J), all my formatting worked fine again.

At this point, I thought a single-malt scotch was indeed well-deserved. By now I had spent hours on this problem.

While being really happy with myself, I realised there might have been an even simpler reproduction. And indeed: simply running "select 'a' " would have given a 2-character result column for utf8, but only a 1-character column for iso_1 (you can test this yourself by simply forcing utf8 or iso_1 with isql -Jutf8 or isql -Jiso_1, respectively -- this is assuming that your ASE server uses iso_1; but by now I'm sure you get the idea).

As for the two remaining questions: I'm assuming the OpenClient version that comes with ASE 15 has unicode support provided by default, whereas in 12.5 it does not. This must be the reason why the warning about character set conversion occurred only on 12.5. And this is also the reason why the 12.5 servers produce the correct 3-character result: character set conversion simply didn't happen for utf8, and this was what the warning message from isql was trying to alert me to.
(I guess there must be some kind of way to make character set conversion work for utf8 in 12.5 too -- but as you can tell from all of the above, my awareness of character set aspects in ASE is pretty limited, although recently I've been learning fast).

When you think about it, there's a lot of stuff happening behind the scenes when it comes to character sets. It's one of those things you usually don't notice -- until you work on a different system which may have different environment settings. Keep this in mind when writing SQL code where the formatting of results matters -- and test with different character sets.

How would you rate this ASE quiz question?
Great    OK    Average    Boring     



More ASE quiz questions: 2013 - 2012 - 2011 - 2010 - 2009 - 2008 - 2007 - 2006 - 2005 - 2004 - 2003 - 2002 - 2001
For a randomly selected quiz question, click here.


 
 This document is located at www.sypron.nl/quizrand.html