Sybase ASE
All versions
Last updated: 02 December 2007
ASE Quiz Questions: answers 2007
More ASE quiz questions: 2013 - 2012 - 2011 - 2010 - 2009 - 2008 - 2007 - 2006 - 2005 - 2004 - 2003 - 2002 - 2001
For a randomly selected quiz question, click here.

December 2007
Read the following, then answer 'true' or 'false' without thinking too long about it:

"Since truncate table is a minimally logged operation, it should not be used in a production database: its minimally logged nature may make it impossible to recover the database from transaction log dumps"

The statement above is absolutely false, for reasons I'll explain below. This issue keeps coming back surprisingly often, even with experienced DBAs, so let me try to stamp it out once more.

ASE has a number of so-called 'minimally logged' operations, like select into and fast BCP (some others are: index creation in parallel mode; alter table...lock; alter table to add/modify/drop a column; unlogged writetext; reorg rebuild for a table).
These operations use optimizations that cause them not to generate a transaction log record for each affected row, but only log each (de)allocated page. Due to these optimizations, minimally logged operations tend to be faster than their regular, fully logged, counterparts (like and 'slow' BCP).
In order to be able to fully recover a database in case of an unexpected crash or shutdown, ASE relies on the fact that all modifications in a database are always written to the transaction log first. For minimally logged operations like select into, this is however not true since the individual row inserts are not written to the log. Consequently, when a minimally logged operation has been performed, ASE will be unable to recover the database if a calamity comes along.
To stop you from accidentally compromising your database integrity, minimally logged operations are disabled by default. To run such an operation, you must first enable the database option 'select into/bulkcopy/pllsort' with sp_dboption for the database where you want to run the mininmally logged operation. Without this option being set, an attempt to run a minimally logged operation will typically result in an error.
Once a minimally logged operation has taken place, ASE no longer allows you to dump the transaction log. This is done to avoid creating a false sense of security that the dumped log can be used to recover the database later (after all, that's what we dump transaction logs for: to load 'm later): because the dumped log does not contain all information needed to fully reconstruct the database contents, we better don't let it be dumped at all. In order to guarantee recoverability, a full database dump should be made first.
For this reason, minimally logged operations should typically not be used in production databases (excluding, of course, tempdb or other databases whose recoverability may not matter).

So far, nothing new, right?

Now then, truncate table is also a minimally logged operation, since it does not log every deleted row. Instead, it just deallocates all data pages and index pages, and log those deallocations, which is why it's a much faster way to clean out a table than running 'delete' (which is fully logged). It is essential to understand that simply deallocating these pages does not impact recoverability of the database, since all required information is included in the log: when loading a log dump which contains the log records for a truncate table, those deallocations are simply applied, and the table is truncated again. Likewise, if a truncate table operation is aborted, the deallocations are simply rolled back: the deallocated pages are still there (they cannot be reused until the truncate table command has completed), so they get marked again as allocated. In both cases, database integrity is still guaranteed.
For this reason, truncate table is not subject to 'select into/bulkcopy/pllsort' being enabled.

The fundamental difference between select into, fast BCP etc., versus truncate table is that in the former category, information is missing from the transaction log, while for truncate table, all information is included. Therefore, there is no reason to say that truncate table should never be used in a production database (obviously, you gotta be careful which table you truncate, but that's a different matter).

The reason why this misunderstanding keeps popping up is that we've all learned about the advantages and dangers of minimally logged operations, without realising (or having been told), that truncate table is an exception where it comes to the disadvantages.

I hope this clarifies once and for all. Case closed?

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

October 2007
Let's assume a client application is performing a 10-million-row BCP-in into your biggest, billion-row table every day. As a DBA, you want to monitor the progress of this BCP job, but without logging into the client system and looking at client app log files (or, for the sake of this question, let's assume that such client-side BCP log files are accessible for you).
How can you keep track of the progress of the BCP-in job, using only ASE-side functionality?

To monitor the BCP job's progress, you need to figure out how many rows it has already BCP'd into the table. Assuming you know roughly how many rows it'll be doing (remember, we said 10 million), this will tell you how far the job has progressed.
There's actually two ways of doing this, and both involve the MDA tables. The first one is to monitor the value of monProcessStatement.RowsAffected for the session doing the BCP-in. When queried, the value of monProcessStatement.RowsAffected shows the number of rows affected by any statement executing at that moment, including BCP-in and BCP-out, multi-statement inserts, deletes and updates as well as select statements. When the number of rows the statement will affect is (roughly) known, this can be used to track the statement's progress.
Another way of tracking the progress of a BCP-in is to monitor monOpenObjects.RowsInserted. This value also shows the progress of a BCP-in operation, but in a different way than monProcessStatement.RowsAffected: monOpenObjects.RowsInserted shows the total number of rows inserted into a particular table since the server was started (assuming the config parameter 'number of open objects' is set high enough). This is a cumulative count for all sessions, so to track the progress of your BCP-in job, you'd need to know roughly how many rows the table contained before the BCP-in started.

Some further notes:
  • When doing 'fast' (=minimally logged) BCP-in, monOpenObjects.RowsInserted does not count any of the inserted rows -- in contrast with monProcessStatement.RowsAffected, which does.
  • When doing BCP-in with the '-b NNN' option (i.e. commit after every NNN inserted rows), monProcessStatement.RowsAffected will show a number between 0 and NNN. monOpenObjects.RowsInserted simply keeps counting all inserted rows.
  • To identify sessions doing a BCP-in, look for those sessions where sysprocesses.command or monProcess.Command is showing 'BULK INSERT'
  • You need to run at least ASE 12.5.4 to use monProcessStatement.RowsAffected. In ASE 15, this column was added in 15.0 ESD#2.
  • You might wonder about alternative approaches. For example, doing a select count(*) from the table will also show how the total rowcount changes, provided you run this at isolation level 0. The problem is that this may take a long time to run for large tables. The same disadvantage applies -though to a lesser extent- when using sp_spaceused to get the table's rowcount.
    Using the MDA table approach as described above is far superior since it provides immediate answers.

NB: Thanks to Jeff Tallman for inspiration for this question.

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

September 2007
Sometime ago, someone I met was trying to solve a problem by creating a view against an application table. I'll spare you the details except one: the view had to contain a column reflecting the session's ID number ('spid'). The obvious solution was to include the session-specific global variable @@spid in the view definition as follows:
create view v
  select spid=@@spid,
  from my_table
Unfortunately, the obvious solution doesn't work here as it is not allowed to include variables in a view:
Msg 7351, Level 15, State 1
Server 'SYBASE', Line 3
Local or global variables not allowed in view definition
Despite this limitation, is it possible to still create a view with the desired functionality?

As you guessed, indeed there is a trick to work around the limitation of not allowing variables in a view definition. The trick is to use the get_appcontext() built-in function instead of @@spid:
create view v
  select spid=get_appcontext('SYS_SESSION', 'spid'),
  from my_table
Now, when selecting from the view, the spid column will correspond to the value of @@spid of the executing session (try this yourself!).

The get_appcontext() built-in function, along with its cousins set_appcontext(), list_appcontext() and rm_appcontext(), were added in ASE 12.5 as part of the row-level access control feature (which itself is subject to additional licensing, but the *_appcontext() functions are available irrespective of licensing and can be used independently).
These little-known built-in functions provide access to the so-called 'application context', which lets you create and retrieve attributes in a session; these attributes are accessible for that session only.
If that sounds exotic, what matters here is that you can use these functions to retrieve information about some well-known aspects of a session, such as the current database, the current login ID and DB user ID etc. In order to retrieve this information, the predefined application context name 'SYS_SESSION' must be specified, plus the name of the attribute to retrieve. For more information about these functions, see the ASE documentation.
The get_appcontext() built-in is accepted in a view definition and so allows us to create a view that would not otherwise be possible.

NB: when you don't have sa_role, you'll probably get a permission error when calling get_appcontext(). The remedy is to grant select permission to your user, login or group:

grant select on get_appcontext to public

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

July 2007
With some regularity, I get emails for Sybase users admitting they've been manually changing system tables -- and now their ASE server doesn't run as well as before. If I would perhaps have a suggestion to fix things?
Now, I don't blame anyone for playing with system tables and getting it wrong -- I've done it myself, and it can be most instructive. However, I'm glad I've always done such things on my own test server only -- unlike some of those customers.
For example, one customer requested me to ask Sybase to implement a feature that makes it impossible to manually delete rows from system tables. When I pointed out that the config parameter 'allow updates on system tables' does exactly that, he explained this was not good enough for them since all users logged in as 'sa', having both sa_role and sso_role, and 'allow updates on system tables' was always enabled, and apparently those users sometimes did some bad things. Obviously though, this customer's problem had to be solved on a different level than by adding some ASE security mechanism: there's only so much features can protect against.

Anyway, this month's quiz question looks at two things customers have asked me about:
- First, let's say someone deletes the 'guest' or 'dbo' user from the 'master' database
- Second, let's say all rows in master..syslogins are also deleted
In both cases, what would be the consequences and how would you repair the damage?

To start with the first one: when deleting the 'guest' user from the master database, any logins that have 'master' as their default database but are not an explicit database user in 'master', will no longer be able to log into ASE. In many cases, this would be the bulk of the ASE logins, so that most applications effectively get locked out of ASE. Logins having sa_role are not affected since such a login is automatically 'dbo' in master.
When also deleting the 'dbo' user from the 'master' database, there's surprisingly little impact: logins with sa_role seem to be unaffected. Apparently the automatic 'dbo' status of sa_role takes precedence here (nevertheless, please do not delete the 'dbo' user, just in case...).
Note that the 'guest' user cannot be dropped accidentally, since sp_dropuser refuses to drop 'guest' in the 'master' database (and neither from tempdb, BTW), and refuses to drop 'dbo' anywhere. Therefore, getting rid of the 'guest' user in 'master' requires a manual delete against master..sysusers.
Fixing the situation where the 'guest' user has disappeared is easy: as 'sa', just run sp_adduser 'guest'. Should the 'dbo' user have gone as well, just insert this row:
insert sysusers 
select * from model..sysusers where uid=1

So far no irreversable damage was done, but the second problem (someone having deleted all rows in master..syslogins) is actually pretty serious.
But first, how would you get here? I think the most likely scenario would be something like this: the DBA wants to drop a login, but this is refused since there's still an object owned by this login somewhere. Instead of fixing that (try sp_rv_helplogin), the DBA decides to manually delete the row from master..syslogins... but he makes a mistake in the query and accidentally hits all rows... and of course, he didn't use a transaction.
When a login has no corresponding row in master..syslogins, that login simply cannot connect to ASE anymore. This includes the 'sa' login, so the DBA cannot go in and repair the damage. Since there is no traceflag or trick to log in regardless, you're in trouble -- you've really locked yourself out (if you still have an 'sa' connection into ASE at this point, do NOT log out, and quickly figure out how to manually insert a row into syslogins with name='sa' and suid=1 -- sp_addlogin won't work anymore at this point).
Should you not have an already-connected 'sa' session, there's simply no way to get into ASE anymore, and you basically have to go through the restore procedure for a lost 'master' device: shut down the ASE server (with 'kill -15' please! Avoid 'kill -9' when possible), then run 'dataserver' with the -z and -b options to recreate the 'master' device (rename or copy the existing 'master' device first).
If you have a recent master DB dump (you do, don't you?), load it now (start ASE in standalone mode [with 'dataserver -m'], manually patch up master..sysservers, and load the dump). You should be OK again now. See the ASE Troubleshooting and Error Messages Guide for details on this procedure.
If you do not have a recent master DB dump, you have two reasons for banging your head against the wall (so maybe do that first -- the first reason was that syslogins delete BTW). You should then run disk reinit commands to recreate your sysdevices contents, followed by disk refit to reconstruct sysusages and sysdatabases. However, to run those disk reinit commands, you need to know the physical names, vdevno's an sizes of your database devices, and it there's no master DB dump, I'd guess this information is not available either. It *may* be possible to extract such information from your old master device (you renamed it instead of deleting it, right?), but this could turn out to be a time-consuming job requiring an experienced (and likely, expensive) consultant.
Incidentally, this is why it is a good idea not only to make daily master DB dumps, but also to make ASCII copies (or BCP-out in -c mode) of sysdevices, sysusages and sysdatabases.

Lessons learned: (i) mess with system tables at your peril; (ii) make sure to have database dumps of your 'master' database.

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

May 2007
This question is about string manipulation. There may not be an immediate pressing problem for everyone to solve with this, but it's worth having seen the trick behind the solution. Anyway...

When you need to determine whether a string contains a particular substring, you can use the charindex() built-in function. However, let's say you need to determine whether a string contains that particular substring a specific number of times, for example, twice or thrice (I've been waiting for an opportunity to use that word...). Doing this with charindex() is possible but gets messy very quickly. Can this be done in a better way?

Yes, there's a better way of solving this problem, and it's based on the str_replace() built-in function. This function was introduced in ASE and lets you replace every occurrence of a string by another string, e.g.:
1> select str_replace('xyzyx', 'y', 'YYY')
2> go


(1 row affected)
In 12.5.1, str_replace() was improved by allowing NULL to be specified for the replace-with-string, with the effect of cutting all occurrences of the to-replace-string from the original string:
1> select str_replace('xyzyx', 'y', NULL)
2> go


(1 row affected)
The str_replace() has as many applications as your imagination allows, and this quiz question is just one example that will hopefully inspire your thoughts.
Turning to the question, here's how you can use str_replace() determine whether a particular substring, here 'ABC', occurs twice:
create table t1 (c varchar(50))
insert t1 values ('row 1: ABCdefghabcdefghABCdefgh')
insert t1 values ('row 2: abcdefghabcdefghABCdefgh')
insert t1 values ('row 3: ABCdefghABCdefghabcdefgh')
insert t1 values ('row 4: ABCdefghABCdefghABCdefgh')
insert t1 values ('row 5: abcdefghABCdefghabcdefgh')

-- now find all rows with exactly 2 occurences of 'ABC'
select c 
from t1
where len(c) - len(str_replace(c,'ABC', NULL)) = 2*len('ABC')

 row 1: ABCdefghabcdefghABCdefgh
 row 3: ABCdefghABCdefghabcdefgh

(2 rows affected)

Here's how it works: by replacing 'ABC' by NULL, we remove all occurrences of 'ABC'. This means that the original string gets shorter by an exact multiple of characters equal to the length of 'ABC'.

To take this one step further, you can use the same mechanism to figure out how many times a set of strings occurs. In the following example, table 't2' contains a list of strings for each of which we want to determine how often it occurs in the strings in table 't1':
create table t2 (X varchar(10))
insert t2 values ('ABC')
insert t2 values ('abc')
insert t2 values ('efgh')

select c, X, 
   (len(c) - len(str_replace(c,X, NULL))) / len (X) '#times X occurs in c'
from t1, t2
order by c, X

 c                                X      #times X occurs in c 
 -------------------------------- ------ -------------------- 
 row 1: ABCdefghabcdefghABCdefgh  ABC                       2 
 row 1: ABCdefghabcdefghABCdefgh  abc                       1 
 row 1: ABCdefghabcdefghABCdefgh  efgh                      3 
 row 2: abcdefghabcdefghABCdefgh  ABC                       1 
 row 2: abcdefghabcdefghABCdefgh  abc                       2 
 row 2: abcdefghabcdefghABCdefgh  efgh                      3 
 row 3: ABCdefghABCdefghabcdefgh  ABC                       2 
 row 3: ABCdefghABCdefghabcdefgh  abc                       1 
 row 3: ABCdefghABCdefghabcdefgh  efgh                      3 
 row 4: ABCdefghABCdefghABCdefgh  ABC                       3 
 row 4: ABCdefghABCdefghABCdefgh  abc                       0 
 row 4: ABCdefghABCdefghABCdefgh  efgh                      3 
 row 5: abcdefghABCdefghabcdefgh  ABC                       1 
 row 5: abcdefghABCdefghabcdefgh  abc                       2 
 row 5: abcdefghABCdefghabcdefgh  efgh                      3 

(15 rows affected)
As I said in the introduction, this trick may not immediately help you out with your everyday DBA problems. Nevertheless, I think it's one of those tricks that's may come in handy one day.
I've actually used this once -- the problem being the question which customers had ordered a particular product more than once, with the product codes concatenated in a single varchar column for each customer. I'm still waiting for another opportunity to apply this trick...

You can, of course, also use a loop-based algorithm where you count the number of occurrences of a particular string, and this can be done without using str_replace().
declare @str varchar(50), @srch varchar(10), @i int
select @str  = 'ABCdefghabcdefghABCdefgh'
select @srch = 'ABC'
select @i = 0

while charindex(@srch, @str) > 0
   select @str = substring(@str, charindex(@srch, @str)+len(@srch), 99)
   select @i = @i + 1
print "'%1!' occurs %2! times ", @srch, @i
However, this approach implies a 1-by-1 processing algorithm where you cannot simply join tables anymore as shown above. Since that fully relational approach tends to be faster as well as easier to maintain (from a SQL developer's perspective), I feel the str_replace()-based approach is superior.

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

April 2007
Let's say you're BCPing a file into a table, but you're getting an error like the following:
% bcp mydb..mytab in data.bcp -Umylogin -Psecret -SSYBASE -c 

Starting copy...
1000 rows sent to SQL Server.
2000 rows sent to SQL Server.
3000 rows sent to SQL Server.
CSLIB Message:  - L0/O0/S0/N36/1/0:
cs_convert: cslib user api layer: common library error: The result is 
truncated because the conversion/operation resulted in overflow.
4000 rows sent to SQL Server.
5000 rows sent to SQL Server.
The question: how do you find out which input record in the file is causing this message?

First, the BCP error message points towards a (var)char input value being too long for the column it is copied into; the row is still inserted but the input value is truncated to the length of the column. Since this means your data in the table may not be identical to what was in the file, you should probably figure out exactly what's happening.

Let's first see what does NOT help you answer this question. Perhaps the most common attempt is to add the option -e x.y to the BCP command line: the -e flag will cause BCP to log rejected records into file x.y. However, nothing will show up in that file since the input record is not rejected: it is still inserted although its data may be truncated.
The only thing we can tell from the BCP session output is that the error occurs somewhere between the 3000th and the 4000th input record, but that still hardly makes it easier to figure out which line is causing the problem: let's assume an eyeball-inspection of the input data won't help (typically when this happens, the input data is only 1 character too long, in accordance with Murphy's law...).

One idea is to try using -b 1 with the BCP-in command, the idea being that each row is inserted as a single transaction, and a message is logged for each of these: hopefully this indicates which line causes the problem.
An interesting idea, but no points: the additional output lines for the committed 1-row transactions all look the same, so this will still not help you very much:
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
3000 rows sent to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
CSLIB Message:  - L0/O0/S0/N36/1/0:
cs_convert: cslib user api layer: common library error: The result is 
truncated because the conversion/operation resulted in overflow.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Batch successfully bulk-copied to SQL Server.
Another slight disadvantage of this approach is that you'll be wasting a lot of space in your table, since BCP by default allocates two full extents (of 8 pages each) for every new transaction -- even when the transaction consists of only a single row.

The quickest way to figure out the offending input record is to use the not so well-known -F and -L options of BCP. -F and -L allow you to specify the first and last row or record to copy (depending on whether you're doing BCP-in or BCP-out). This lets you quickly zoom in on the input record in question by using a binary search approach like the following:
% bcp mydb..mytab in data.bcp -Umylogin -Psecret -SSYBASE -c  -F 3000 -L 3500
By taking half of the range between the 3000th and the 4000th input record, you can eliminate 50% of the range immediately. If you're still seeing the error, zoom in further by halving the range again:
% bcp mydb..mytab in data.bcp -Umylogin -Psecret -SSYBASE -c  -F 3000 -L 3250
With a few steps, you'll be able to find out which input record is causing your data truncation problem.
The BCP options -F and -L are worth remembering!

NB: after publishing this question, I got various responses saying that is was easier to BCP the table out again and do a 'diff' with the input file. That will indeed work, but only when you're doing character-mode BCP (i.e. 'bcp -c'). When using native-format BCP, or when you're using a BCP format file, this approach is unlikely to be easy.

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

March 2007
Let's say you need to generate some test data. For example, you need 1 million rows of simulated trading data with a trading date spread evenly over, say, the last 5 years. How can you generate such data quickly?

The following query will do the trick:
set rowcount 1000000
          (select rand() + 
into Trades
from syscolumns c, sysindexes i, sysobjects o, sysusers u
This query works as far back as ASE 11.0 (I didn't try earlier ASE versions), although simplifictions are possible when you use a recent ASE version. Let's look at the various steps.

First, a unique primary column -with the arbitrary column name 'PK' here- is generated with an identity column (a unique primary key is typically needed for every table). This uses the identity() built-in function, which sequentially numbers the generated rows, starting at 1. Note that the identity() built-in can only be used in a select-into statement.

The real trick is in generating random datetime values between 01-Jan-2002 and 01-Jan-2007. The basic idea is to generate a random integer number and add this number as a number of seconds to '01-Jan-2002' with the dateadd() built-in. To cover a 5-year period, the random integer needs to range between 0 and (31536000*5).
Why 31536000*5? Because there are 31536000 seconds in a year (check this yourself -- you'll see I've ignored leap days: I'll leave it as an exercise for the reader to factor these in).
Getting closer to the core of the trick, we use the rand() built-in to generate a random float value between 0 and 1, and then turn this into the required integer by multiplying it by (31536000*5).

The real issue here is to have rand() generate a random number for every row in the result set. When simply running:
TradeDate=dateadd(ss, rand()*31536000*5, '01-Jan-2002')
into Trades
from syscolumns c, sysindexes i, sysobjects o,  sysusers u
... then you'll see that the generated datetime value is identical for all rows; this is because ASE evaluates rand() only once for the entire query. By embedding rand() in a subquery, and adding the dummy expression (ascii(*0+ascii(*0+ascii(*0+ascii(*0) to its result, we're forcing rand() to be evaluated for every row in the result set (for more information about this trick, see section 8.2 in the second edition of my book 'Tips, Tricks & Recipes for Sybase ASE').
In ASE 15.0.1 esd#1 and later, you don't need this dummy expression. As of this version, ASE provides a new built-in rand2() which is evaluated for every row in the result set, so the following query will do:
TradeDate=dateadd(ss, rand2()*31536000*5, '01-Jan-2002')
into Trades
from syscolumns c, sysindexes i, sysobjects o,  sysusers u
This brings us to the 'from' clause. Why do we join four system tables without any where-clause? The answer: laziness. We're trying to generate lots of rows so we're exploiting the concept of the Carthesian product here: the total number of generated rows is the product of the rowcounts of syscolumns, sysobjects, sysindexes and sysusers, respectively (you could also use a 4-way self-join of syscolumns instead -- there's no fundamental difference). Even in ASE 11.0 this will generate at least 1 million rows. The set rowcount 1000000 at the start limits the result to exactly 1 million rows.

Note that in ASE 15.0 and later, you can use the following 'into'-clause instead:
select ...
into existing table Trades
from ...
This allows you to insert the generated rows into an existing table instead of creating a new one. The inserts are performed in minimally logged mode so this is faste rthan when using an Note that the existing table into which the rows areinserted must not have any indexes and cannot appear anywhere else in the query.

Instead of all the above, you can, of course, also write a cursor-based loop to generate rows one-by-one; this also avoids the problem of the evaluation of rand() for every row in the result set. However, the above query is both faster to program and to execute.

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

February 2007
As you no doubt know, statistics that accurately represent the actual data distribution in a column is key to ensuring good performance in ASE. The accuracy of a histogram (being the form in which those statistics are stored) can often be improved by increasing the number of histogram 'steps' -- a 'step' being a data point in the histogram. The idea is simple: the more data points, the greater the likelihood that the histogram shows an accurate picture of the actual data.

For a pre-15.0.1 esd#1 ASE server with default configuration settings, all histograms have 20 steps (or less) by default. Let's assume we want to increase the number of steps to 500 for all currently histogrammed columns in a particular table. How do you achieve this?

There's a lot to say about this issue, but I'll try to keep it short (when someone says that, it's usually an indication it'll be anything but short, but I'll really try...)

Since, the default number of histogram steps can be configured with the config parameter number of histogram steps. Its out-of-the-box default value is 20 but you can change this to a higher value if desired. So let's set this to 500. It's a good start, but it will actually not have any effect on the currently existing histograms.
Why not? The important thing to keep in mind is that running the basic update [index] statistics command does not affect the number of steps in existing histograms -- if the histogram on my_table.my_column had 20 steps (as was most likely), it will still have that same number of steps after running update [index] statistics my_table -- irrespective of the setting of number of histogram steps. Only when running update [index] statistics my_table using N values will the new histograms be created with N steps -- again, irrespective of the setting of number of histogram steps.
Only when a new histogram is created will the setting of number of histogram steps be used for the step count. This happens when a new index is created or when update statistics is run after the existing histograms have been removed (with delete statistics my_table).

Now that we get the basic idea, it's time to be a little bit more precise: up till now, we've really been talking about the number of requested histogram steps; the number of actual steps in the final histogram may turn out to be lower -for very small tables or for very low-cardinality columns- or higher. The latter situation may occur due to a relatively new config setting histogram tuning factor which was introduced in 12.5.2. This parameter aims to generate better statistics by identifying 'frequency cells', reflecting ranges of duplicate values in a column. To this end, the number of requested histogram steps as described above is multiplied by the setting of histogram tuning factor; the resulting number of steps is used to generate a so-called intermediate histogram which can have many more steps (for example, 10000: 500 (steps) * 20 (tuning factor)). If no frequency cells are found, this intermediate histogram is 'compressed' back to the original number of requested steps; but if frequency cells are found, they are maintained in the final histogram. For this reason, the histogram may end up with more steps than were originally requested through number of histogram steps or the using N values option.
Unlike number of histogram steps, histogram tuning factor is always taken into account when a histogram is generated.
Basically, histogram tuning factor is a very cheap way of getting better statistics. Note that histogram tuning factor had its default set to 1 when it was introduced in 12.5.2, meaning it had no effect unless you changed it to a higher value. Given the positive experience with this new parameter ever since, its default was changed to 20 in 15.0.1 esd#2.

So how can you tell how many steps (both requested and actual) your histograms have? Either use optdiag (the client app) or use Kevin Sherlock's sp__optdiag procedure (currently this does not support ASE 15.0 yet).

Returning the original question, the right answer is either of these: update statistics my_table using 500 values -- or run delete statistics my_table followed by update [index] statistics my_table.

How would you rate this ASE quiz question?
Great    OK    Average    Boring     

January 2007
In ASE 15, you can control the types of join the optimizer can choose. For example, when a query uses a hash join but you want to forbid hash joins (for example, because you want to compare performance with a merge join or a nested loop join), you can achieve this by running set hash_join 0 first (in 15.0.1+, you can also run set hash_join off): when re-running the first query again, it will then not choose a hash join anymore.

Recently I was doing exactly this when analysing query performance, but to my surprise, the query kept using a hash join. (NB: I didn't make a typing error or something... set hash_join 0 was executed correctly in a separate batch before resubmitting the query). Also, when setting the optimization goal to allrows_mix, the query still used a hash join as well.
So although session-level settings were in effect forbidding hash joins to be used, these did not seem to have any effect since the query still used a hash join anyway.
Why did this happen ?

The reason was that the statement cache was enabled.
When the query was executed the first time, hash joins were allowed and the optimizer indeed picked a hash join for this particular query. Since the statement cache was enabled, this query plan was stored in the statement cache.
Now, when exactly the same query was submitted again, the stored query plan (from the statement cache) was used, irrespective of any optimizer-related settings forbidding hash joins: these settings affect only query plans being generated by the optimizer, but stored query plans are not affected since they have already been generated before.
If you were to run an identical query but with a slight difference (like additional whitespace), this would not match the query text stored in the statement cache and therefore the stored query plan would not be used; in this case, the optimizer would generate a new plan, and indeed obey the session-level settings forbidding hash joins to be used.

The thing to keep in mind here is that you should make sure the statement cache is disabled when playing with optimizer-related settings. This can be done either for the session by running the statement set statement_cache off or by disabling the statement cache altogether for the entire server with sp_configure 'statement cache size', 0.
(this also applies in ASE 12.x, when using settings like set merge_join off [obsolete in ASE 15]).

Lastly: how can you tell that your query is re-using a stored query plan from the statement cache? Simple: enable showplan, and look for the following text at the top of the query plan:
Executing previously cached statement.
When you see this, the query plan is taken from the statement cache. Instead, when the query plan is not taken from the statement cache but newly generated, you'll see this text:
Executing a newly cached statement.
When you see neither, this means that the statement cache has been disabled for this session (or for the entire server).
How would you rate this ASE quiz question?
Great    OK    Average    Boring     

More ASE quiz questions: 2013 - 2012 - 2011 - 2010 - 2009 - 2008 - 2007 - 2006 - 2005 - 2004 - 2003 - 2002 - 2001
For a randomly selected quiz question, click here.

 This document is located at