More ASE quiz questions: 2013 - 2012 - 2011 - 2010 - 2009 - 2008 - 2007 - 2006 - 2005 - 2004 - 2003 - 2002 - 2001
For a randomly selected quiz question, click here.
You need to test a stored procedure that operates on a text column in a database table.
How can you generate some sizable test data for that column? For example, let's say you want to put 10 MB of data in the text column -- what's the easiest way to do that?
Historically, handling datatypes text, unitext and image in T-SQL has been notoriously difficult when the size of the data exceeds the capacity of a varchar or varbinary column (that's 16384 bytes as of ASE 12.5).
This would usually mean using CT-library functions (either directly or via a scripting language that supports text/image handling for ASE), doing messy things with the readtext/writetext statements, or using BCP with format files. None of these options is easy or attractive (although section 14.4 of my book "Tips, Tricks & Recipes for Sybase ASE" has a working example of the BCP-with-format-files approach).
All that has changed with ASE 15.7, which supports using local variables (and parameters) of datatypes text, unitext and image. This greatly simplifies handling of these datatypes.
For example, the following SQL code generates 10 MB of text data by concatenating a text variable with itself 10 times:
declare @t text, @i int
set @i = 0
set @t = replicate ('a', 10000)
while @i < 10 begin
set @t = @t + @t
set @i = @i + 1
select datalength(@t) 'Length of @t'
-- assuming my_table has columns (id int, t text):
insert my_table values (12345, @t)
This example generates just a really long string of 'a' characters, but it's clear how simple it now is to generate arbitrarily long text values in ASE.
Some (not all) ASE built-in string functions can also handle text, unitext and image variables. These are: substring(), charindex() and patindex(). Others, like replicate() as used in the above example are still restricted to the maximum of the varchar datatype, namely 16384 bytes (16KB), even when you specify a higher number.
ASE 15.7 brings many new interesting features. These new capabilities for handling text and image data is just one of the reasons why you might want to consider upgrading to 15.7 -- if you hadn't already done so.
If you are a RepServer DBA, you often have various active 'isql' sessions to manage your replication system; for example, you need one connection to each RepServer in the system, another connection for each RSSD, and often one or more for primary or replicate dataservers/databases.
Frankly, sometimes all those sessions drive me crazy. Worse, it can cause wasted time when you find out that things don't work since you've been running your command in the wrong database or the wrong RepServer.
The good news: you can basically do all of this with just a single 'isql' session. Do you know how?
As of RepServer version 15.2, there is a new feature named 'RepServer Gateway' which makes life easier for DBAs who suffer from too open-session-excess (this is just the name of the feature in the documentation; there is no command named gateway or something).
This feature takes the shape of a new RS command named connect. With this command, when you are connected to a RepServer with 'isql' (or 'sqsh', or whatever favorite client you use), you can instantly switch the connection to another RepServer, a primary or replicate database, or the RSSD without having to enter user names or passwords. All subsequent commands issued in the session ar enow executed in whatever it is you connected to, until you issue the command disc[onnect].
Here is an example:
-- I'm connected to the RepServer now
-- switching to RSSD
connect to rssd
-- I'm connected to the RepServer's RSSD now
-- switching to a primary DB
connect to PROD_ASE157.prim_db
-- I'm connected to database 'prim_db' in ASE server PROD_ASE157 now
-- returning to the original RepServer
-- I'm connected to the original RepServer again now
That looks very convenient indeed. How does it work?
RepServer already knows the username and password needed to connect to the RSSD and all its databases, and uses that information when you run the connect command. Therefore, you don't have to provide that information again.
When RepServer connects to another server, it keeps an eye out for the disconnect command, but otherwise just forwards your commands to the server you connected to, and sends the results back.
You can connect to the following targets with these commands:
- connect [to] rssd - connects to the RepServer's RSSD, using the RSSD's primary user. This is especially convenient when using an embedded SQL Anywhere RSSD, where the default username and password are always a bit of a challenge to remember (at least for me -- I can't remember it and I always had to consult my little green book when connecting to en ERRSD).
- connect [to] <dataserver>.<database> - connects to the specified primary or replicate database, which must be managed by the RepServer, using the maintenance user (NB: there is a snag - see the remark below)
- connect [to] idserver - connects to RepServer that is the ID server for your current RepServer (only if the currently connected RepServer is not already the ID server), using the ID server user.
- connect [to] <repserver_name> - connects to another RepServer that is reachable though a direct route from the current RepServer , using the route user.
Here is something to watch out for: When you're connecting to a primary or replicate ASE database or an ASE-hosted RSSD, you are placed in the ASE database that is the 'default' database for the ASE login used by RepServer to make the connection. This could be a different database than you expected: for example, if you're using the 'sa' login as maintenance user (not a good idea) you'd end up in the 'master' database instead of in the primary or replicate database you aimed for (unless that was also 'master', of course).
The remedy is simple: run the ASE use <db_name> command (but note that you can only access databases where the maintenance user login has access to!).
It is probably a good idea to always do a use <db_name> after connecting to an ASE database, and/or to verify that you're in the right place by running select db_name().
Some practical notes:
- To return to the previous connection, use disc or disconnect. To return to the original RepServer, irrespective of how deeply 'nested' your session is, use disc[onnect] all.
- The show server command shows what you are currently connected to; show connection shows the current hierarchy of connections in your session. Note that both commands throw a syntax error when no connect command is active.
One useful application of this feature is that you can create a single script that marks the primary tables for replication (and maybe even creates them) and creates the repdef and subscription, and checks the RSSD to seeif everything is fine.
- Once connected to a primary or replicate ASE database, you may not be able to connect to yet another database or to the RSSD, since the connect command also exists in ASE, but has different syntax. Therefore you'd get a syntax error. You should first disconnect and return to the original RepServer, and from there connect again to the desired target.
You have to be careful though: what if you end up in the wrong database due to some issue (a database not being created or not being online for example). Unfortunately, the disc[onnect] command cannot be executed conditionally so your options for error handling are very limited. So you should ensure that all required database are accessible before running such a script.
In my opinion, this feature is certainly a great productivity improvement. If you haven't yet upgraded to RS 15.2 or later, here is another reason why you should.
Lastly, a bunch of years ago, I published a trick with 'sqsh' to basically achieve the same effect. That was a bit of a kludge, and this RepServer 15.2 feature is a much better solution for this problem. For one thing, it is independent of the client you're using.
As every DBA knows, set showplan on will generate the query plan for a query.
However, recently I ran into a situation where I had enabled showplan, but I didn't get a query plan no matter what I tried.
The queries themselves executed without problem (i.e. the result sets were correctly returned).
It took a while before I had figured it out what was going on.
Any idea what had caused this funny situation?
First, the problem actually looked like this:
1> set showplan on
1> select count(*) from MyTable
(1 row affected)
... which is weird: when you enable set showplan on, you expect to see to least some output for a query.
But in my case, there was nothing.
I spent some time scratching my head, until I found the reason why (I must admit, with more luck than wisdom).
In short, I had been using one of the technical tips from one my own earlier quiz questions without realising one of the implications.
To be more precise, I had configured a login trigger which kicked in for every new session. This was described in detail in my quiz question for June 2009 (you may want to read that first).
One of the effects of this login trigger is that all showplan output is siphoned off to the trace file. This is why I could run set showplan on as much as I wanted, but the plan output would not be sent to my client session. The showplan output is not lost however, since it written to the trace file.
Note that the same would apply to the output of set statistics io on (and similar commands).
To take a positive view, this proves that application tracing is indeed application-transparent -- but keep these side effects in mind.
If you want to verify whether tracing is enabled for your session, run sp_helpapptrace. If your session number (@@spid) is listed in column 'traced_spid', then your session is indeed being traced.
Every ASE DBA knows that ASE's approach to reconstructing a database is based on an initial database dump followed by multiple transaction log dumps.
However, during a recent customer engagement, I found myself making a database dump, only to immediately delete the dump file once the dump was successfully completed. In contrast, the subsequent transaction log dump files were carefully saved since they were needed in case the database needed to be reconstructed.
It is somewhat counterintuitive that the database dump file was no longer needed and could be deleted. Why did I have to make that dump anyway, and why did I delete it immediately after?
The customer in question was using a backup strategy based on device snapshots made by the storage subsystem.
First, some background.
This backup strategy works by first quiescing the ASE database with quiesce database blahblah hold MyDB for external dump which will let ASE put the database devices in a consistent mode allowing a usable copy to be taken.
The devices are then copied on storage level. This is also known as "splitting the mirror" or (in case of EMC Timefinder) "spliting the BCV" (Business Continuity Volume), which all means the same thing: a copy of a potentially very large device is made in a very short amount of time (typically, just seconds). One example is EMC's Timefinder; other storage-level solutions provide similar features.
Once the copies have been made, the command quiesce database blahblah release should be run in order to let all open transactions proceed as normal.
When the copies of all devices for a database are transferred to another system, they can be replace existing device for that same database. By starting the secondary ASE server with the dataserver -q option, the copied database remains offline. Transaction log dumps that were made with the standby_access option in the original database after the devices were copied, can now be loaded into the copied database in the secondary server.
Nothing of the above is new: this approach has been supported since ASE version 12.5 (see this whitepaper).
However, the first observation is that there is no database dump involved in all of this: the database device copies act as the initial step of the restore sequence.
Ironically though, you still need to make a database dump first since otherwise ASE will simply not allow you to make a transaction log dump at all. But since that database dump will never be used in the restore scenario based on device copies, I could immediately delete it again. (Yes, this is somwehwat of a contradiction; I guess there may be room for improvement around this aspect of ASE).
Just for reference, the full sequence on the primary side is:
You can then take the device copies to another ASE server which already has a database with the same device layout. While that ASE server is shut down, copies the devices copies from the primary into the expected device pathnames.
- Perform a database dump (which you can then discard)
- Run quiesce database ... hold ... for external dump
- In the storage subsystem, make device copies ("split the mirrors")
- Run quiesce database ... release
- Make regular log dumps with dump tran... with standby_access
Then, start this ASE server with the -q option. You can then load the log dumps from the primary database into the copied database.
By default, the copied database remains offline. But you can also try to online it with online database ... for standby_access which will put the database online in read-only mode, and you can still load subsequent log dumps (note that onlining such a database may not always be possible, but you'll find out soon enough).
But in all of this: no database dump needed!
Last month's quiz question was about -F and -L options of the BCP utility -- specifically, how these do not really help you in performing parallel BCP sessions with the aim of getting higher throughput for copying a table to a file.
As it happens, there is actually a way to achieve better BCP-out performance with parallel BCP sessions, though not with -F and -L.
Do you know how?
The trick to scale the throughput of BCP-out with parallel BCP sessions is to BCP out from views on the table, instead of from the table itself. This has been possible for as long as I can remember (the oldest running ASE version I have is 11.0.3, but I know it was possible in earlier versions as well).
How does that work?
Let's say you have a large table with order data over the past X years, looking something like this:
create table Orders (
What you can do is create views covering a year each:
create view view_yr2005 as
select * from Orders where order_date between '01-Jan-2005' and '31-Dec-2005'
create view view_yr2006 as
select * from Orders where order_date between '01-Jan-2006' and '31-Dec-2006'
Now, what you need is an index such that select * from view_yr2005 will use that index -- for example, a clustered index on order_date, or a covering index starting with order_date and including all other columns.
With this all in place, you can now BCP out from the view:
bcp my_db..yr2005 out view_yr2005.bcp -Umyself -Pmypasswd -SSYBASE -c
When looking at the query plan for the BCP, for example with application tracing (example here) or with the MDA table monSysPlanText, you should verify the index is indeed used.
When that works, you can now kick off multiple BCP's in parallel. In the following example, each BCP session will copy the rows for the year corresponding to the view definition (note: I am assuming Unix/Linux here so that you can run a command in the background with '&'):
bcp my_db..view_yr2005 out yr2005.bcp -Umyself -Pmypasswd -SSYBASE -c &
bcp my_db..view_yr2006 out yr2006.bcp -Umyself -Pmypasswd -SSYBASE -c &
bcp my_db..view_yr2007 out yr2007.bcp -Umyself -Pmypasswd -SSYBASE -c &
bcp my_db..view_yr2008 out yr2008.bcp -Umyself -Pmypasswd -SSYBASE -c &
bcp my_db..view_yr2009 out yr2009.bcp -Umyself -Pmypasswd -SSYBASE -c &
The idea is that these BCP sessions all use the index to position the scan at the start of their interval and then read the data from that point onwards. This should give you a faster way of cpying all data from the table to a set of files. Of course, this is under the assumption that the network I/O will not be a performance bottleneck.
Nothing would stop you to create views on different periods, for example months or quarters: creating a view is basically free: it is a quick operation requiring few resources.
Anyway, I guess you get the idea now...
Lastly, among the reader feedback on last month's question, which was also about BCP, were some comments saying things along the line of "it's hard to get excited about BCP". I agree BCP not the most sexy topic, but there is much more to BCP than you'd think, as these latest quiz questions show. Just check out this Google search, and you'll see I've done quite a few BCP-related quiz questions over the years. BCP is not a trivial as you might have thought! (though I won't blame you if it still doesn't get you excited).
As you probably know, the BCP utility has two -F and -L which let you specify the first and last row to be copied, respectively (for an example, see the quiz question from April 2007).
Recently a customer asked me why -F and -L didn't provide the scalability he expected when copying rows from a table to a file with multiple parallel BCP sessions. The idea was to use -F and -L to copy out a different portion of the table in each BCP session. However, this did not turn out to provide the performance improvement he was looking for.
Do you know why?
First, it should be noted that the BCP options -F and -L are formally only supported for copying data from a file to a table. Yet, it actually also works when copying data from a table to a file.
This is one of those cases where the documentation is not terribly clear: it only describes the use of these options for BCP-in, and does not mention BCP-out.
Now, when specifying -F with BCP-out to mark the first row to be copied to a file, this may not work the way you'd expect: all preceding rows are still sent to the BCP client over the network.
As an example, when running bcp mydb..mytab out mytab.bcp -Usa -Pmysecret -SSYBASE -c -F10000, the output file will indeed only contain the 10000th row and beyond. But the preceding 9999 rows are still sent to the BCP client which will discard them; only at the 10000th row will it start writing data to the file.
When specifying -L to mark the last row to be copied, BCP will in fact stop once that last row has been reached, and no further network communication (or table reads) are initiated from that point onwards.
The background of this all is that BCP basically performs a table scan on the source table; inside the ASE server it has no knowledge of whether -F and -L were specified: these options are relevant at the client side only.
Since the network I/O for sending data from the ASE server to the BCP client constitutes most of the work for a BCP-out, this means there are hardly any efficiency gains to be expected by using -F and -L in a series of concurrent BCP-out sessions against the same table.
The ASE documentation actually hints that -F and -L are not intended for performance by stating the following. Even though this is for BCP-in, it suggests these options are not intended for scalability:
Avoid using the -F option when performing heavy-duty, multi-process copying,
as it causes bcp to generally spend more effort to run, and does not provide
you with a faster process. Instead, use -F for single-process, ad-hoc copying.
Just so that you know...
More ASE quiz questions: 2013 - 2012 - 2011 - 2010 - 2009 - 2008 - 2007 - 2006 - 2005 - 2004 - 2003 - 2002 - 2001
For a randomly selected quiz question, click here.