This page contains answers to the ASE quiz questions from 2004.
More ASE Quiz questions: 2010 - 2009 - 2008 - 2007 - 2006 - 2005 - 2004 - 2003 - 2002 - 2001
December 2004
Suppose you're a RepServer DBA, and you want to maintain a historical overview of all changes made to any RepServer configuration parameters. Unlike ASE, RepServer config parameter changes are not logged into the RepServer errorlog file. How can you still keep track of these changes?
Answer:
The answer is actually pretty simple: triggers.
First, observe that all RepServer configuration parameters are stored in the RSSD table rs_config (actually, there's one exception, namely the disk_affinity settings -- but we'll deal with that later). A change to a configuration parameter (with RepServer commands configure replication server, create/alter connection or create/alter route) translates into either a change of a row in rs_config (if the changed parameter already existed in rs_config) or an insertion of a new row into rs_config (if it's a setting for a particular database connection or a route, which didn't exist yet).
With this in mind, it's pretty straightforward to log all RepServer config changes: just create triggers on rs_config, and log all changes into a separate table:
use your_RSSD
go
create table rs_config_log (dt datetime,
action char(1),
optionname varchar(30),
objid rs_id,
charvalue_new varchar(255) null,
charvalue_old varchar(255) null)
go
create trigger rs_config_trig_i on rs_config for insert
as
insert rs_config_log
select getdate(), 'I', optionname, objid, charvalue, null
from inserted
go
create trigger rs_config_trig_u on rs_config for update
as
insert rs_config_log
select getdate(), 'U', i.optionname, i.objid,
i.charvalue, d.charvalue
from inserted i, deleted d
where i.optionname = d.optionname
and i.objid = d.objid
go
create trigger rs_config_trig_d on rs_config for delete
as
insert rs_config_log
select getdate(), 'D', optionname, objid, null, charvalue
from deleted
go
Pretty simple, right? Just query rs_config_log and you have a timestamped log of all RepServer config changes. You'll notice that changing the value of an existing parameter is implemented by deleting and re-inserting the corresponding row into rs_config.
Also note that connection- or route-specific settings cannot be deleted from rs_config (so as to revert to the RepServer-wide defaults) with regular RepServer commands (not counting dropping the connection or route). If you want to achieve this, you'll have to manually delete the corresponding rows from rs_config.
As mentioned earlier, changes to the disk_affinity settings are not caught this way -- this is simply because disk_affinity is stored in the RSSD table rs_diskaffinity instead of rs_config. Obviously, nothing stops you from writing similar triggers on rs_diskaffinity, which is left as an exercise for the reader.
Although it's a good principle not to mess around in the RSSD, it's actually perfectly safe to create these triggers: the changes made to rs_config by RepServer are neither extremely performance-critical nor frequent. Also, the triggers do not change any functionality in the RSSD from the perspective of RepServer (which acts as the client application from an RSSD perspective). The only thing to keep in mind is that the triggers should never cause an SQL command to fail -- and I don't immediately see a reason why they would. The insert actions in the triggers will perhaps generate some addtional log records, but that's most likely negligable in comparison with the amount of changes to RSSD tables like rs_locater and rs_segments. Also, the amount of space occupied by the table rs_config_log is not likely to be significant.
NB: On a different but related topic: Note that rs_config.objid, and therefore also rs_config_log.objid, is either 0 (for RepServer-global config parameters) or non-zero (for settings for a specific database connection or route). Unfortunately, in the latter case, the ID value of the connection or route is byte-swapped in an undocumented, platform-specific way. To convert rs_config.objid to the corresponding value in rs_databases.dbid or rs_routes.dest_rsid, use the following expression in a 'classic' ASE RSSD: hextoint(substring(bintostr(rs_config.objid),1,8)). In an 'embedded' ASA RSSD in RS 12.6, use this expression instead: convert(int, substring(rs_config.objid,1,4)).
If you have a copy of my book "The Complete Sybase Replication Server Quick Reference Guide", you can download some handy stored procs to display all current non-default config settings for databases and routes. Those procedures can simply be adapted to work with rs_config_log instead.
November 2004
It's a well-known and documented fact that stored procedure calls in ASE cannot be nested endlessly -- the deepest possible nesting level is 16. The following recursive stored procedure illustrates this (try running it!):
create proc p as
select level=@@nestlevel
exec p
go
-- now run it; you'll quickly hit an error
exec p
go
Despite this limitation, do you think it would be possible for a stored procedure to call itself recursively more than 16 times? If so, how?
Answer:
If you've been following these quiz questions for some time, you probably know which way they tend to go... Therefore, it may not come as a surprise that procedure calls can indeed be nested more than 16 levels deep.
To achieve this, we have to use a little trick. It is true that you cannot nest procedure calls deeper than 16 levels, but that limit applies only within the same ASE server.
However, procedures can also call each other in a different way, namely as a CIS RPC (remote procedure call). With a CIS RPC, the called procedure resides in a different ASE server. For example: exec YOUR_SERVER.your_db..your_proc executes procedure your_db..your_proc in the ASE server named YOUR_SERVER.
Some configuration is required: the remote servername (YOUR_SERVER in the example) must be defined in master..sysservers as well as in the interfaces file; also, it must be possible to connect to that server from the ASE server where this statement is executed from. Also, the configuration parameter enable cis must be set to 1.
As has been demonstrated before (here, for example), the CIS functionality can be especially interesting when the remote ASE server being accessed is actually the same as the server executing the procedure call. With this mechanism, a procedure can be called recursively as many times as you have free user connections available: since each CIS procedure call occupies a new session in the ASE server, the maximum nesting level of 16 is not reached.
Here's a quick example to try yourself (please do NOT run this example in a production server since it may cause temporary problems for other users!).
This example uses the server name loopback; if you have configured the MDA tables in 12.5.0.3+, this definition already exists in your server. If not, create it as follows: sp_addserver loopback, null, @@servername.
-- here we go:
use tempdb
go
-- you may also enable the config
-- parameter 'cis rpc handling' instead
set cis_rpc_handling on
go
-- create the procedure
create proc p @n int as
select @n=@n+1 -- keep track of nesting level
print "Level= %1! ; spid= %2!", @n, @@spid
exec loopback.tempdb..p @n -- call ourselves again via CIS
go
-- run it (not in your production server!)
exec p 0
go
You'll see that nested procedure calls is not limited by 16 levels, but rather by the number of available free user connections (to see the effect, make sure you have a bunch of free user connections available, preferably more than 16).
Functionally, there's no difference with normal procedure calls within the same ASE server, but performance will obviously be less since the CIS server-to-server communication takes some time.
Note that running the above procedure will occupy all available user connections in your ASE server, meaning that nobody can connect to the server anymore (to release all occupied user connections, simply disconnect the session). For this reason, you should perhaps not run this in servers where this would cause problems, such as in production servers.
The practical relevance of this trick is probably limited. If you'd find yourself looking at a problem requiring 50-level deep procedure calls, there's either something wrong with the algoritm, or it shouldn't probably be solved in SQL at all. Nevertheless, this example illustrates what can happen when using CIS features improperly: when a query on a proxy table or execution of a CIS RPC ends up bouncing around between ASE servers, user connections may be consumed. I've actually seen something like this happen once, although not nearly as bad as demonstrated here. Nevertheless, keep in mind that perhaps you shouldn't go crazy with CIS connections between servers for this reason (performance is another).
Lastly, should you be concerned about this type of user-connection-eating problem, which can effectively amount to denial-of-service, you can create a login trigger to limit the number of concurrent sessions by a login. See www.sypron.nl/logtrig.html for details.
October 2004
Suppose you're an ASE DBA, and you have been hired to replace another DBA who left abruptly.
Your first task is to change the production ASE server's sa password. However, you don't know the current sa password to start with -- and nobody else knows it either (let's assume the previous DBA isn't available for questions).
You know that a new sa password can be generated by starting ASE with the command-line option -psa, but this requires that you shut down the server first. Fortunately, since you've been following these monthly quiz questions, you know how to do this -- see the question for September 2003.
Unfortunately, however, it then appears that the sa login does not have sso_role or sa_role anymore -- apparently the previous DBA revoked these roles from the sa login... As a result, the -psa command-line option fails (since it requires the name of a login with sso_role) and no new password is assigned.
At this point, you may be able to start and stop the ASE server as much as you like, but you cannot change the password of the DBA login, because you simply don't know which login has sso_role. Let's also assume you don't even know a single login name in this server (which is probably a fair assumption for a server you've never seen before).
The question: it seems as if you've been locked out of the ASE server... is it possible to somehow change the password of a login with sso_role, or do you have to recreate the server from scratch and reconstruct the user databases from dumps or device copies?
Answer:
Indeed, you have been locked out... Fortunately, there is a way around this problem, so that you won't have to rebuild your ASE server...
The basic idea is simple: just restart the ASE server with -pxxx, where xxx is the name of a login in the master..syslogins table -- that's just any login, trying them one by one if necesssary. At worst, you'll have to try this once for each existing login name (before sending me email to point out an omission -- there is indeed no point in trying the probe login).
However, how do you know which logins exist if you cannot log into the server in the first place? (remember, the question assumed you don't even know a single login name in this server, so you cannot log in to check the contents of master..syslogins).
Fortunately, there is a way out of this situation: the trick is to look into the master..syslogins table directly, without even logging into ASE at all.
How? First, stop ASE. Then, make a copy of the master device (if it's a raw device, use the Unix dd command to copy it to a file). Then, run the Unix strings command on this copy of the master device, and open the resulting output file in an editor. In that file, search for the string 'mastersa' -- this line marks the start of the master..syslogins table, and the login names are clearly visible, preceded by the name of their default database. Note that if the default database for the sa login has been changed, you obviously won't find the string 'mastersa'. Instead, search for 'sybsystemdbprobe' (or 'masterprobe' if you have an older version of ASE).
At this point, you can figure out the names of all logins in the server pretty easily from the subsequent lines. Once you know these login names, you can try them with the -p option to find out which one has sso_role, regenerate a password, and log in to ASE.
Altogether, the whole process may take a little while, but it's likely to be faster, and certainly more convenient, than rebuilding your ASE server from scratch...
September 2004
The following T-SQL code interchanges the values of two local variables:
declare @a int, @b int
-- First, assign some arbitrary values to @a and @b:
select @a = 123, @b = 456
-- Then run some magic lines of T-SQL code...
??????????????? --> that's the question! See below...
-- Finally, the values of @a and @b appear to be swapped:
select @a, @b
go
----------- -----------
456 123
(1 row affected)
The essence of this code -the SQL in the position of the question marks- has been omitted.
The question: which are the most efficient (from a resource consumption point of view) T-SQL statement(s) to make this code work?
Answer:
This question has actually very little to do with databases, or with SQL, at all. It's rather a classic trick that works in any programming language supporting logical bit operations.
The solution is as follows:
declare @a int, @b int
-- First, assign some arbitrary values to @a and @b:
select @a = 123, @b = 456
-- Then run some magic lines of T-SQL code...
-- Let the variables do an exclusive-OR (that's the bit operator ^)
-- on each other 3 times...
select @b = @a ^ @b
select @a = @a ^ @b
select @b = @a ^ @b
-- Finally, the values of @a and @b appear to be swapped:
select @a, @b
go
The exclusive-OR (EXOR) operator is a logical operation performed on the individual bits of both operands. If both input bits are identical, the result bit is 0; otherwise the result bit is 1: select 5 ^ 3 returns 6 (i.e. 0101 ^ 0011 = 0110).
You are invited to verify that the three EXOR operations above will indeed swap the values of the two variables without losing any information. This code works in any version of ASE.
Note that this approach works only for integer datatypes (and not for all datatypes, as erroneously mentioned when this quiz question was first published).
There are of course plenty of other ways to swap the values of two variables, but these will either involve an additional variable, or require some kind of I/O (for example to store values in a table). Hence, this solution is the most 'efficient' one from a resource-usage point of view.
In the old days, this kind of thing was commonly used in situations where every byte of memory counted and your programs had to use as little memory as possible (sigh... those long gone days of assembler programming, core memory and self-modifying code... it makes me feel nostalgic... but I digress).
As for the practical relevance of this trick in T-SQL -- well, there may not be one.
In practice, you'd rather declare an additional local variable than use hard-to-understand code like in this example (in other words, if you include programmer effort as a resource as well, the above solution may in fact not be the most resource-friendly one).
Unless of course, complexity is what you're aiming at, for example to deliberately obscure the logic of your SQL code -- or impress your collegues at work.
Thanks to Joop Bruggink for suggesting this topic!
Update 09-Sep-2004:
Soon after this question was published, emails started coming in describing different ways of achieving the same effect. These are too interesting not to mention here (each separated by a -- marker):
select
@a = case when 1=1 then @b else 1 end,
@b = case when 1=1 then @a else 1 end
--
select @a = @a + @b - @a, @b = @a + @b - @b
--
select @a = convert(int, @b), @b = convert(int, @a)
-- note that the following ones all use a 'dummy' operand value
-- to force the proper order of evaluation:
select @a = @b + 0, @b = @a + 0
--
select @a = @b * 1, @b = @a * 1
--
select @a = @b | 0, @b = @a | 0
--
select @a = @b ^ 0, @b = @a ^ 0
--
select @a = @b & -1, @b = @a & -1
--
Thanks to: Igor Titov, Tom Oorebeek, Xiaoyu, Rajesh Kanchi, Ilya Zvyagin, Dimitrios Antonopoulos (as well as anyone else who responded but who I've missed here).
August 2004
Under normal circumstances, a timestamp column is always guaranteed to be unique for a row. For this reason, it seems timestamp columns are ideal as primary keys (or for making an index unique):
create table mytab (my_column int, timestamp)
create unique index pk on mytab(timestamp)
go
insert mytab (my_column) values (1)
insert mytab (my_column) values (2)
go
Right or wrong?
Answer:
Absolutely wrong.
It is true that a primary key should always uniquely identify a row, but there's another requirement that is usually not mentioned explicitly since it is so obvious: the value of a primary key should also be stable.
This is where timestamp columns fall short: they're indeed unique, but they may change. When any column in a row is updated, the value of the timestamp column for the updated row is automatically changed by the ASE server to a different, but still unique, value.
You can easily see this for yourself. First run the code above, and then run the following lines and see how the timestamp column changes:
select * from mytab
go
update mytab set my_column = 3 where my_column = 1
go
select * from mytab
go
-- the timestamp also changes when the column keeps the same value:
update mytab set my_column = 2 where my_column = 2
go
select * from mytab
go
Now imagine what this would mean in practice: once you've updated some rows and their timestamp columns have changed as a result, how are you going to find those rows back? Since you cannot predict what the new value of the updated timestamp column will be, it has become kinda useless for all regular retrieval purposes.
I guess it's fair to say that timestamp columns are a bit too unique to be a unique key...
You'd perhaps think timestamp columns could be used in a read-only database. Theoretically, that might be true, but if you'd ever drop and recreate a table, all timestamp columns would still be changed as a result.
Also note that it is not a good idea to ever include a timestamp column in any index whatsoever: when a row is updated, the index which includes the timestamp column must also be updated since the timestamp column itself is changed as a side-effect of the row update. This means additional overhead, costing performance.
Summarising: if you require a unique column, don't use the timestamp datatype. Using an identity column may be a better choice.
Note that timestamp columns were actually designed for something called 'optimistic locking': when a client retrieves some rows and then decides to update one of those rows, it should check that no other application has updated the row in the mean time. By including a timestamp column in the table, and using the built-in function tsequal() in the where-clause of the update statement, the application can verify that the row to be updated has indeed not been changed since it was retrieved (if it has been, its timestamp will have changed). This way, the client can avoid locking the rows transactionally beforehand. With this mechanism, the value of a timestamp column is expected to be stable only over a short period of time.
See the ASE documentation for more information about tsequal().
NB: for completeness, note that there is one special situation when timestamp columns are not necessarily unique; this is described in the quiz question for February 2004.
July 2004
You have a table with many rows -- say, 10 million. Let's assume about 9.5 million rows, identified by certain criteria, must be deleted from this table.
What's the fastest way to do this?
Answer:
Perhaps surprisingly, the quickest way to solve delete those rows is not to delete them with a delete statement.
Instead, it is faster to first extract the ~0.5 million rows that should be kept and temporarily store 'm a in different table.
Then, truncate the original table, and copy those 0.5 million rows back into it. Something like this:
select *
into #t
from MyBigTable
where KeyColumn = values-for-rows-to-keep
truncate table MyBigTable
insert MyBigTable
select * from #t
The reason this works faster than simply deleting those 9.5 million rows is because inserting 0.5 million rows is less work than deleting 9.5 million rows: an insertion and deletion of a row is roughly the same amount of work since both are individually logged in the transaction log, and need to update all existing indexes.
The additional work lies in truncating the table, which is very fast: all allocated pages are simply deallocated; individual row deletions are not logged. Also, the 0.5 million rows must be written to #t, but since this is a select..into, which also does not log individual rows, it's very fast as well. Lastly, inserting the rows back into the main table is the most expensive step, performance-wise.
I ran a little test to show the sort of difference you can expect. For a table with a row of 300 bytes and one nonclustered index on an int column, deleting the 9.5 million rows took about 20 million logical I/Os. The other method took only 2.1 million I/Os -- basically, that's 10 times faster! (if the table has more indexes, the difference increases).
As with everything, there are some potential disadvantages for this approach. First, there is a small period of time when those 0.5 million rows are not present in the table, so ideally there should be no users accessing the table during this period.
Furthermore, when delete- or insert-triggers exist for the table, you'll have to take the processing by those triggers into account -- maybe you can simply disable the triggers, but you may have to perform any actions by those triggers yourself (which may or may not be feasible).
Lastly, if the table has foreign key constraints, these may be in the way when truncating the table or re-inserting the rows -- but you can easily drop those constraints and recreate them afterwards (that's perhaps a small price to pay when you can save so much time...).
Updated (09-Jul-2004):
There was quite a bit of response to this quiz question, or rather, to my answer to it.
Some readers responded (and rightly so) I should have mentioned the issue of updating the statistics afterwards: after re-inserting those 0.5 million rows, the statistics still reflect the original 10 million rows, which may not lead to identical distribution of values. To fix, this, run update [index] statistics.
There were also some comments about the point where this approach stops being more efficient. That depends (on the number of indexes, for example) -- but the cutoff point is likely to always be well below 50% of the total number of rows. It's quite easy to do the math yourself: just scale the practical problem down to, say, a 1000-row table, and count the number of logical I/Os required for each of the methods. That'll be a good estimate for the main table.
Lastly, it has been pointed out that it may be more efficient to drop the indexes before copying the 0.5 million rows back into the table, and rebuild the indexe afterwards. In fact, you may be able to gain even more by using fast BCP-in into the main table by copying those 0.5 million rows out and then back in. You may also try to copy them back into the main table with select...into existing table (define a proxy table pointing to the main table first). Note that these methods carry additional network I/O overhead, so experiment with different network packet sizes.
Indeed, these aspects were worth mentioning. On the other hand, I've also received complaints about some of the quiz answers being too long... I'll keep trying to find the right balance !
Go here for a full list of all quiz questions.
|