print versionprint version  
Sybase Replication Server
All versions
Last updated: 26 September 2009
Replication Server Tips & Tricks
 
As there appears to be surprisingly little information on the web about Replication Server issues, here's some information from my own experience with RepServer that might be useful. This is basically an arbitrary collection of some bits & pieces of information, without any claim for completeness.

Note: while much of the text applies to any version of RepServer, the examples are all assuming RepServer 12.0 (or later) in a Unix environment, except where indicated otherwise. Also note: "RepServer" is short for "Replication Server", "RepDef" for "Replication Definition", and so on.

Contents




Getting a free version of RepServer
There have been repeated questions in the Sybase community about getting a free version of RepServer. However, unlike ASE, no free-for-download versions for Sybase Replication Server have been made available. Until recently (that's mid-2009), your best bet to gain some experience with RepServer was to get friendly with your Sybase account manager and obtain an evaluation license (and get him to keep extending it).

However, there's an easier option now.
In the summer of 2009, hooking up with the cloud computing trend, Sybase made available some Amazon Machine Images for a number of products, including ASE, RepServer and IQ (details here). This opens up an interesting new way of getting your own private RepServer installation to play with. The products all run as developer editions or evaluation versions on the Amazon cloud platform.
Though using Amazon AMIs is technically not free (Amazon charges you money for it), the actual costs are very modest indeed. I test-drove these AMIs myself and wrote about my experiences -and the costs- on the data management blog. Check it out!



Basic RepServer troubleshooting steps
Sybase Replication Server is a great product as long as everything works fine. When things start to go wrong however, it has all the potential of taking your troubleshooting attempts deep into the night -- and drive you crazy before sunrise. Here's some information that you could try when you're in need of ideas.

1. Start searching at the RepServer(s)
When investigating a suspected problem, your starting point should be the RepServers in your replication system. There's not much point in looking in the primary or replicate dataservers if you haven't checked the status at RepServer level first. Therefore, if you don't know where to start, follow the below steps for all RepServers in your replication system.

2. Is the RepServer running ?
To start with the basics: first check the RepServer is running: log in using isql or do a process listing on OS-level (i.e. using a ps command on Unix or the task manager or services applet on NT). If the RepServer isn't running, this may be because the ASE server containing the RSSD has not been started first. This is a prerequisite because all information RepServer needs to operate is located in the RSSD.
If the RSSD is indeed down, the RepServer won't start and you'll see message #31083 in the RepServer errorlog:
Cannot connect to RSSD server.
When the RSSD server is running, but the RepServer can't login due to an invalid ASE password, you'll see error #31083 again, plus an additional message saying:
Message from server: Message: 4002, State 1, Severity 14 -- 'Login failed.
These messages usually indicate what is causing the problem, so always check the RepServer errorlog.

3. Check the RepServer errorlog
When there seems to be any kind of problem, your should check the RepServer errorlog, as this will usually contain error messages indicating what's been going on. To find out the exact location of the RepServer errorlog, issue the command admin log_name , which will display the full pathname of the errorlog file on the system where the RepServer is running.
Unfortunately, those error messages aren't always very clear, so you should try to figure out what they mean -- see the section Following up error messages.

4. Are there any threads down ?
When investigating a suspected problem, the first thing to check is whether any RepServer threads are down by issuing the RepServer command admin who_is_down (this is a brief version of admin who , which shows all threads irrespective of their status). When this shows nothing, everything is probably OK, although there are some nasty cases when there still is an error. Normally however, a thread which is down indicates a problem. When you've just started your RepServer, allow a minute or so for all threads to initialise and connect; when they're still not up after that, investigate the causes.
Some common situations are the following:
  • A basic rule: when a thread is up and running, it will have a status such as "Active", "Awaiting Command", "Awaiting Message", or other variations of "Awaiting <something>".
  • In a replicate RepServer, a "DSI" thread always comes with a corresponding "DSI EXEC" thread: together they represent the connection from the RepServer to a replicate database. There is also one such pair of threads for the RSSD.
    When the DSI/DSI EXEC threads have the status "Suspended", this can be the result of a suspend connection command (use resume connection to proceed). However, this also occurs when the RepServer was executing a command in that replicate database as part of a replicate transaction, and ran into an dataserver error. A common example is a "duplicate key" error when inserting an already existing key value into a replicate table having a unique index. You can try resuming the connection (using resume connection , but usually this results in the same error occurring again, suspending the connection once more. When this happens, the RepServer errorlog will always contain the error message returned by the dataserver.
    When you want to see the SQL commands that were issued as part of the replicate transaction, dump the first transaction in the corresponding queue (using sysadmin log_first_tran, see the RepServer QuickRef Guide for details).
    Once you understand why the SQL command leads to an error, you will have to take action to resolve the situation so that replication can continue. This may include manually modifying or removing data from the replicate table or deleting the replication transaction from the inbound queue (using the resume connection...skip transaction ).
    When you're running into replicate dataserver errors, you should have a good look at the design of your replication system as these errors suggest things are not going as was originally expected.
  • When you're running RepServer 11.5 or later, a thread called "REP AGENT" represents the ASE RepAgent (or LTM) for the corresponding primary database. When running RepServer 11.0, these threads are called "LTM USER" instead.
    When the status of this thread is "Down", the RepAgent/LTM is not running. For an LTM, you will have to start the LTM process; to start an ASE RepAgent thread, run sp_start_rep_agen from within the primary database. If the RepAgent or LTM won't start, check the dataserver errorlog or LTM errorlog, respectively; there will be some kind of error message logged indicating why it cannot start.
    When the status of this thread is "Suspended", this is because the command suspend log transfer was run; issue resume log transfer to resume the thread.
  • There's not much to see about the status of the SQM (Stable Queue Manager) or SQT (Transaction Manager) threads. You can get detailed information about what's happening to the queues using admin who, sqm or admin who, sqt (but let's leave it for now...).
  • When using routes, an "RSI" thread represents the connection going from the your local RepServer to another RepServer. When the status of such a thread is "Connecting", this means the RepServer at the other end of the route is down, and your RepServer is periodically trying to reconnect.
    A thread called "RSI USER" is the incoming end of a route: a connection coming from another RepServer to your local RepServer. When the status is "Inactive", it means that the remote RepServer isn't running or suspended the route.

5. Is the RSSD log full ?
The RSSD is continuously updated by the RepServer and as a consequence, the transaction log may fill up. When this happens, the RepServer action updating the RSSD will by default go into "log suspend" mode until the log is cleared. This will cause RepServer activity to suspend as well. Therefore, stop the RSSD log from filling up by dumping the log regularly and by installing thresholds.
Note that the both RSSD log and data segment can fill up quickly when running sysadmin log_first_tran or sysadmin dump_queue frequently or for large transactions. Clean up old transactions logged through these commands using the stored procedure rs_delexception from within the RSSD database.

6. Are the stable devices full ?
It's always good to check whether the stable devices have filled up; it's never too early too know. Use the RepServer command admin disk_space to see how far the stable devices have filled up.

7. Stop and restart threads, RepAgents and RepServers
The idea behind the RepServer architecture is that any component (RepAgent, RepServer, dataserver, network) can temporarily go down, and everything automatically continues running when the component comes back up. Usually this works quite well, but sometimes it just doesn't. When you have a situation where you cannot restart a certain RepServer thread, try shutting down and restarting the RepAgent/LTM, log transfer, DSI connection or route in a systematic way. Also, try shutting down the threads, then shut down/restart the RepServer itself, and restart the threads. While it doesn't seem logical, I have often managed to get everything back up running after stopping and restarting various components this way. It's worth trying, and it doesn't cost too much time or effort.
Note that this advice does not apply to the dataservers: surely you just don't want to shut these down, because (unlike the RepServer) there will be users connected to them !



Following up error messages
When something is wrong, there is often an error message of some kind. Here are some suggestions to follow up error messages.

Check error messages in the $SYBASE/$SYBASE_REP/doc directory
The directory $SYBASE/$SYBASE_REP/doc (on NT: %SYBASE%\%SYBASE_REP%\doc ) contains some files named error_messages_* (on NT: errmsg.*) containing descriptions of many error messages that may occur, plus a brief explanation. When you run into any kind of error, these files should be your first stop: check if the error message is listed (easiest by searching for the error number - I always use grep for this), and see what it means. In many cases, this will be very helpful to get an idea of what is going on.
Note that this only applies to error messages from the RepServer, LTM, RSM Server and rs_subcmp. Error messages from the ASE RepAgent are logged in the dataserver errorlog.

Check the RepServer Troubleshooting Guide
If you're stuck with an error message which you cannot find the meaning of, try the RepServer Troubleshooting Guide. As the name suggests, it contains suggestions for solving problems.

Call Sybase TechSupport
If nothing else works, you should consider opening a case with Sybase TechSupport, because you may easily be spending days looking for clues - which you may not find in the end (you do have a support contract, don't you ?).



Problems with rs_init
rs_init is a tool for configuring components of a replication system, such as databases, repservers, etc. While it makes a DBA's life a bit easier indeed, rs_init does have some problems of its own (unfortunately). This section lists a few common issues.

  • rs_init is located in the install subdirectory of the RepServer installation tree ($SYBASE/$SYBASE_REP). Make sure this is in your PATH.
  • A common problem is that you're trying to add a database and it appears that rs_init cannot access the dataserver because that server is not in the interfaces file. Adding the server to the interfaces file is the obvious solution, but then you'll find that it still makes no difference for rs_init. The reason is that rs_init reads the contents of the interfaces file when it starts; when making changes to the interfaces file, rs_init should be stopped and restarted for rs_init to notice the changes. Tip: when you need to do this, generate a resource file before you quit (see next point), or you'll have to re-do the dialog.

  • On Unix, you can generate a resource file from rs_init by hitting CTRL-W (on NT, resource files cannot be generated or used with rs_init). This saves all the information you've entered in the preceding dialog into the resource file, which can be executed from the command line as rs_init -r <your_resource_file_name> .
    One problem is that the file format of rs_init resource files isn't documented very well. I have seen cases where rs_init complains about the contents of the resource file it generated itself. If you run into this sort of problem, there's no documentation that will help, but you may be able to fix it by manually editing the resouce file; this may require some creativity on your side, though.
    In one such case I remember, the error message said that the IDserver name was not specified (it said something like the IDservername being '' (=two single quotes)). This could be fixed by adding a line rs.rs_idserver_name: RSNAME , where RSNAME was the IDserver's name (funny enough, the resource file did contain a line saying rs.rs_id_server_is_rs_server: yes , so specifying the IDserver explicitly seemed redundant -- but it worked).
    If you're having any problems with invalid resource file contents, check the example resource files which exist in the init/rs subdirectory in the RepServer installation tree ($SYBASE/$SYBASE_REP). Try to figure out whatinformation rs_init wants in that resource file, and how you can supply it. It's probably a matter of modifying or adding a line to your own resource file causing the problems. You're on your own though, as there is no documentation here (only some comments in the example files): in the example above, my resource file was almost identical to the setupdb.rs example file, but the line I eventually added was copied from the install.rs file... that's where your creativity comes in.
  • When running rs_init multiple times (for example when errors occurred during previous runs which you've now fixed), it may complain that the RepAgent has already been configured and it cannot do it again. The error message is:
  • Unable to execute query 
    'exec sp_config_rep_agent "YOUR_DBNAME", enable, 
    "YOUR_RSNAME", "YOUR_RSNAME_ra", 
    "YOUR_RSNAME_ra_ps"' against server 'YOUR_DS_NAME'
    
    When this error occurs, disable the RepAgent first as follows: sp_config_rep_agent YOUR_DBNAME, disable . Then re-run rs_init.

    The following error may also be reported (it is related to the above message, and should disappear after disabling the RepAgent):
    INTERNAL ERROR: "Database 'YOUR_DBNAME' is already
    configured to use Replication Agent. Request to
    enable Replication Agent has been ignored."
    	
  • rs_init may generate the following messages, which, despite the message saying it's an error, are really informational messages, and these can therefore be ignored. This often happens when re-running rs_init after fixing a previous error in the setup:
  • INTERNAL ERROR: "The replication status for
    'rs_lastcommit' is already set to false. Replication
    status for 'rs_lastcommit' does not change."
    
    INTERNAL ERROR: "The replication status for
    'rs_update_lastcommit' is already set to true.
    Replication status for 'rs_update_lastcommit' does
    not change."
    
  • When adding a database which was previously dropped from the replication system, the following error may occur:
  • INTERNAL ERROR: "Settrunc() is not permitted when a
    Rep Agent Thread is currently running on the database
    
    This error means that the RepAgent is still running in that database. Stop the RepAgent as follows: sp_stop_rep_agent YOUR_DBNAME . Then re-run rs_init.

  • When setting up a warm standby through rs_init, two common errors may occur:

    • Before adding the active database for the warm standby, you must manually create a logical connection for the warm standby (using the command create logical connection SOME_SERVER.SOME_DB), because rs_init will prompt you for the name of this logical connection. You can not create this logical connection through rs_init. It is actually in the small print in the docs, but many people (including myself) found out the hard way (that's the RepServer way !).

    • When adding the standby database using rs_init, this will normally fail because rs_init cannot log in to the dataserver containing the standby database. This is because the maintenance user login which rs_init tries to use, is not there yet; unlike the maintenance user login for the active database, which is created automatically by rs_init, the maintenance user login for the standby database isn't. Therefore, you must create this login manually and then re-run rs_init.



More Troubleshooting Tips

Message loss
When RepServer detects that messages may have been lost, it goes into "message loss" mode. When this happens, transactions coming from the primary dataserver won't be applied at the replicate dataserver anymore, and they'll accumulate in the stable queues. However, from the admin who output, all threads seem to be up and running, so everything looks OK at first sight. The only way to find out about this status is to check the RepServer errorlog, and check for loss detection messages. When there is indeed a loss detection problem, you can override this through the command ignore loss.
I have once had a pretty weird problem case where message loss had been detected between two RepServers, but ignore loss didn't help at all. This was very strange, and in the end the trick appeared to be as follows: in the RSSD of the replicate RepServer, the rs_exceptslast table contained a row for the route where the status column had the value 2 (meaning it's rejecting transactions due to message loss). The solution was to suspend the RSSD DSI, updated this column to 0 manually, and then resume the DSI. After that, everything was OK again. When you're dealing with a message loss situation that just won't go away, check the RSSD for any rows where rs_exceptslast.status = 2 or rs_oqid.valid = 2; when this exists, you could try updating these to 0 manually. However, note that manually patching the RSSD is risky and probably won't win you the sympathy of Sybase TechSupport afterwards; in other words: this is your own risk.

Transaction cache size too small
I've had a problem case once where everything looked OK, all threads were up, and there were no error messages logged anywhere. Still, no replicate transactions were being applied anymore. After digging into the queues for some time, an error message finally showed up when examining the SQT for the inbound queue:
1> sysadmin sqt_dump_queue , 123, 1, 0
2> go
Msg 6024, Level 12, State 0:
Server 'REP_PRIM1':
Block read failed for queue '123:1', segment 304552,
block 26. OS dependent error is 'Could not initiate
aioread errno = 14

This error sounded pretty alarming, as it suggests there is some low-level disk error. After consulting Sybase TechSupport, it appeared that error 6024 actually indicates that there is a transaction which is too big for RepServer to handle, causing the transaction not to be written from the inbound to the outbound queue (you wouldn't have guessed, would you ?). The solution was to increase the RepServer configuration parameter sqt_max_cache_size (through configure replication server and restart the RepServer).



Various Technical Tips

Viewing current RepServer configuration settings
RepServer configuration parameters can be modified using the "configure replication server" command (some changes will be effective only after a RepServer restart). Config parameter settings are stored in the RSSD in the rs_config table. However, it is not possible to see the current settings from within RepServer. For this, you should log into the RSSD ASE server, and issue the following command from within the RSSD database:
select * from rs_config
Note there is even a brief comment for most config options. RepServer *is* user-friendly here and there !

Use an LTM or a RepAgent ?
In an ASE environment, when you're running at least RepServer 11.5 and ASE 11.5, you can use the internal ASE RepAgent. When you have an earlier version of either ASE or RepServer, you must use an external LTM process. Although LTMs are still supported for current ASE and RepServer versions, it is advisable to use RepAgents when you can, because it makes a DBA's life easier:
  • an LTM is an extra OS-level process requiring an entry in the interfaces file; a RepAgent doesn't need this because it's an internal ASE thread.
  • an LTM needs to be started explicitly by the DBA; once it has been configured, a RepAgent starts automatically when ASE starts.
  • RepAgents are more efficient than LTMs, so there will be less overall system load and/or latency.
  • an LTM can be greedy: when there's a large transaction to be replicated, LTMs have been known to allocate as much memory as needed for this transaction, but never deallocate it again, which may require the DBA to stop & restart the LTM. ASE RepAgents don't have this problem.

Configuring a RepAgent requires (among other things) executing the stored procedure sp_config_rep_agent. See the RepServer QuickRef Guide for details.



DBA tips

Don't remove the secondary truncation point !
When your replication system is up and running, that's how you want to keep it. When something goes wrong, leading to replication being halted somewhere along the chain of events, this can eventually lead to the transaction log of the primary dataserver filling up. In such cases, a DBA could decide to remove the secondary truncation point in the transaction log (a.k.a. the "LTM truncation point"), to allow the primary dataserver to continue processing. While this may indeed be a solution from one point of view, be aware that this will likely cause your replication system to get out of sync because primary transactions won't be replicated anymore. As a DBA, you will then have to re-synch the primary and replicated data; depending on the complexity of your replication system, this may be anywhere between a minor headache and a complete nightmare. Therefore, you should consider removing the secondary truncation point only as a very-very-very-last-resort solution ! It may be more practical to allocate extra disk space for stable queues so that primary log transfer can continue than to reconcile primary and replicate data afterwards.

Have some disk space ready for emergencies
See the previous point. One good feature of RepServer is that you can not only extend the stable queues (using the add partition command), but you can also shrink them again through drop partition -- this command can be issued at any time, and will drop the partition as soon as there's no data on it anymore. Therefore, keep some disk space at hand which you can use to extend the stable queues should this ever be necessary.

How long can your system survive without a DBA ?
There's an interesting (and important) issue with respect to the sizing of your stable queues. Basically, your primary system should be able to continue for at least the time period that no DBA will be looking after the system -- and preferable a little longer than that. Depending on your system and organisation, such a no-DBA period can be a night, a weekend or just a few hours. Getting this sizing right will require monitoring the data volume being replicated.
If you like being paged, you could a cron job checking the free space in the stable queues and paging you when it fills up; very much like the sort of thing that tells you when a dataserver's transaction log fills up (pagers are most popular in America - Europeans can do something similar by having an SMS message sent to a GSM phone). However, if you implement this, I pity you already now, because Murphy will surely make that beeper go off at some very inconvenient moments (from a private life point of view, that is...).

Don't use the RepServer "sa" login
When creating or dropping a subscription, the replicate RepServer will log into the primary RepServer and -when using automatic materialization- into the primary dataserver. It will do this using the same login and password as you used to log into the replicate RepServer (where the subscription is created from). This implies that the same login and password must exist in the replicate and primary RepServer and the primary dataserver.
Therefore, when using the RepServer "sa" login, this means the "sa" password in the dataserver cannot simply be changed anymore as it would be out of sync with the RepServer's "sa" password, causing subscription creation to fail. When there are multiple primary dataservers in the replication system, they would all need to have the same password for their "sa" login. Therefore, best don't use the RepServer "sa" login to avoid this sort of dependency.
I prefer to create a login named "ra" (kinda short for "rep admin") in all RepServers and dataservers with the same passwords everywhere, and give these logins all necessary privileges. This way, I never have to use the RepServer "sa" login anymore, avoiding the problems described above.

Handle your RSSD with care
The RSSD, which is an ASE database, is the place where RepServer stores all the data it needs to be able to operate (not counting its configuration file), including administrative data such as the state and position of the stable queues. Otherwise, the RepServer itself doesn't store any data at all, with the exception of the configuration file, specifying where the RSSD is located. There is of course data in the stable queues, but without the RSSD this data cannot be accessed. This makes the RSSD an important database in your replication system, so ensure the RSSD is treated with the same care as your normal production databases. All considerations about recoverability, database dumps, log dumps and DBCC checks apply to the RSSD as well. For the same reason, don't enable the "trunc log on chkpt" option on the RSSD as this affects recoverability.



Keep the RepServer errorlog
Make sure the RepServer errorlog is not deleted, but kept for at least a few months. In some cases, an indication of an error condition may be logged just once, and never again after that anymore. When there's been a problem you can't resolve, you may want to dig into your errorlog for quite some time back -- it's a bit of a bummer when you've deleted it. This is indeed not very desirable behavior, but it's just one of those typical RepServer features...

Protect your RepServer config file
The RepServer configuration file contains the login and password used to contact the RSSD server and ID server. This makes these files interesting targets for other people of a lesser moral standard, so make sure it is well-protected. In case you use LTMs, the same applies to LTM config files.
Because the RSSD contains the logins and passwords for the primary and maintenance logins, make sure this database (as well as its dumps !) are properly protected. When security is important to you, use the encryption options and security mechanisms where possible.

(pre-12.0) Install RepServer and ASE in separate directory trees
Note: this point applies only to pre-12.0 versions of ASE and RepServer. Starting with versions 12.0, the directory structure has changed so that all Sybase products reside in their own subdirectory under $SYBASE (NT: %SYBASE%). Thus, in this new stucture, there is no danger anymore of libraries being overwritten by installation or upgrade of a different product.

In pre-12.0, by default ASE and RepServer are installed in the same installation directory; however, it's best to install them in separate directory trees. This has the advantage that you can upgrade ASE without risking problems with RepServer (or the other way around) -- these problems may occur because there's only a single lib subdirectory, and the existing libraries (DLLs) in this directory may be overwritten when a different product is installed or upgraded.
When installing ASE and RepServer in separate directories, you'll have to set the $SYBASE variable correctly in each RUN_SERVER file. Also, you should explicitly specify the path to the interfaces file the servers will use (assuming you don't want to maintain multiple copies).
This recommendation mainly applies to Unix because separating RepServer and ASE isn't so easy on NT. Reinoud van Leeuwen describes an interesting trick to install the products separately on NT as well (and still run the servers as NT services), but I found that it is impossible to upgrade RepServer after this (or at least I couldn't find out how to do it). Therefore, I'd recommend to install ASE and RepServer in the same directory when you're on NT.

Create a log of problem cases you've handled
Being indispensable can be good for your ego and your hourly rate, but you may have seconds thoughts when they call you in the weekend or during your holidays because you're the only person who knows what to do when the replication system is having trouble. I recommend creating a log of the problem cases you solved, including the relevant error messages and actions you took to resolve them. Someone else may then be able to fix the problem using this information. Don't forget to include the phone number of Sybase TechSupport and your Sybase customer number.
And don't worry about your job security: when someone else has to do some ad-hoc RepServer troubleshooting while you're away, chances aren't very high they'll say "cool, I'd really like to this every day...".

Keep your replication system simple
A good thing of RepServer is that it's very flexible and powerful. Unfortunately, there is a dark side to this as well: you can create replication systems which are so complex that it may not be possible to recover the entire system in a reasonable amount of time, once something goes wrong somewhere. Therefore, if you have anything to say about the design of your replication system, try to keep it simple and straightforward. If it's not necessary, don't use bi-directional replication, function strings, request functions, and don't mix "normal" replication with function replication; this will go a long way to preserve your sanity...

Use 'sqsh' instead of multiple open sessions
As a RepServer DBA, you're usually having various isql (or sqsh) sessions on your screen to connect to each RepServer, the corresponding RSSDs, and the various primary and replicate dataservers. With a sqsh trick, you can do all that in just a single sqsh session.

Replicating logins, passwords, users & groups
It is not possible to replicate data in system tables, but sometimes you may need exactly that. For example, in a warm standby system, it will be necessary to have identical logins and passwords in the ASE server hosting the primary (active) database and in the server containing the replicate (standby) database. After all, the whole idea of a warm standby is that client applications should be able to switch over to the standby database and continue processing as if nothing changed. Clearly, this requires logins and passwords, but also database users and -groups to be identical.
When running RepServer 11.5 or later, almost all DDL commands within a user database can be automatically replicated to the standby database in a warm standby system using the sp_reptostandby command.
While useful, the sp_reptostandby feature has some limitations: it won't work for logins and passwords (as these are stored in the master database), and it only works in a warm standby environment. Mark A. Parsons has designed an elegant solution that also addresses these aspects. From here, you can download his presentation (note: 330 Kb) for the Australasian TechWave2000 conference explaining the setup, and including script files to actually implement it.
Even if you don't need to replicate logins, passwords etc., I still recommend you check out Mark's solution, because it is a great example of how to use RepServer features to implement some functionality that crosses the boundaries of a single server.

Note: for completeness, note that ASE 12.0 supports the synchronisation (that's something different than replication !) of system table data through the HA (High Availability) feature. However, this is a completely different topic which isn't discuss any further here.



Warm standby tips

Setting up a warm standby
When setting up a warm standby, keep these things in mind:
  • Best keep the database names of active and standby databases the same. This lowers the risk of errors in client- or server-level software which may be referring to the database name explicitly -- remember, the purpose of a warm standby is, presumably, to be able to switch the active database at some point: your applications should still work correctly after that switch.
    Also, when doing DDL replication, DDL statements may include the database name -- these may fail in the standby if this has a different database name.
  • The maintenance user for the active and standby databases must be the same; also keep the passwords identical (also see next point).
  • You must ensure that logins and passwords are synchronised in the ASE servers containing the active and standby databases. This is to ensure that client applications can still log in after a switchover. RepServer doesn't give you solution for this, so you'll have to make something yourself. There are roughly two options for this: first, following Mark A. Parsons' ideas described elsewhere in this document; second, by amending sp_addlogin, sp_modifylogin, sp_droplogin and sp_password in such a way that they execute themselves as an RPC in the standby database after successfully completing in the active database -- this requires some hacking in the SQL text of these procedure.
  • Also, the mapping between suid values (for logins) and uid values (for database users) should be kept identical in both servers. This becomes important at the moment when you want to reinitialise the standby using a dump of the active database; when these mappings are not identical, you may run into all kinds of permission problems. Note that this is always an issue when loading dumps into other servers, as such it is not a replication-specific point.
    The consequence of this issue is that you should not create logins in the standby server when these logins have not been created in the active server first.
  • Be sure to keep replication settings identical in the active and standby databases. A classic mistake is to mark a specific table for replication in the active database, but forget to do this in the standby. There will probably not be a problem until you have to switch to the standby, when you (or likely, someone else) will find out that table is not properly replicating to the new standby...
    Therefore, every time when you execute sp_setreptable or sp_setrepproc in the active database, do the same in the standby. Similarly, when using sp_reptostandby , make sure the standby database has been configured in the same way.
  • Try to keep things simple: stick to a 1-to-1 active-standby setup and simply replicate all tables. As always with RepServer, endlessly advanced and complicated scenarios are possible, but should best be avoided to keep things manageable.

Setting up a warm standby using rs_init
It's probably easiest to use rs_init for setting up the active and standby databases in the warm standby. However, there are some things to be aware of. Go here for more information.

Basic steps for setting up a warm standby
The simplest way of setting up a warm standby is to use the T-SQL stored procedure sp_reptostandby -- this requires you run at least ASE 11.5 and RepServer 11.5. When executing sp_reptostandby my_db, 'all' , all tables in database my_db will be replicated to the standby, without having been marked for replication explicitly using sp_setreptable (this works because a "kind of replicate everything in this database" marker is set for the database, making it irrelevant to mark individual tables for replication). When using sp_reptostandby, DDL replication is also enabled.
Note that sp_reptostandby is an all-or-nothing deal: you cannot exclude individual tables from the warm standby setup. On session level, you can choose to enable or disable DDL replication using the T-SQL command set replication -- see the RepServer QuickRef for command details.

When you need more flexibility in your warm standby definition, don't use sp_reptostandby, but use sp_setreptable and sp_setrepproc to mark all tables and stored procedures which are to be replicated to the standby.


Initialising the standby with a dump
To (re)initialise the standby database, the simplest way is to load a dump from the active database into the standby, after (re)creating the connection to the standby database.
Note that the order of these two steps is essential: you must first run create connection with the use dump marker clause (either manually or through rs_init), and then dump the active database and load it into the standby. If you do it the other way around (first dump the active DB, and then run create connection ), there will be no replication to the standby, and admin logical_status will forever show the status Active/Waiting for dump marker.
More precisely: after executing create connection with the use dump marker clause, wait until admin logical_status shows the status of the standby connection as Suspended/Waiting for Enable Marker and "State of operation in progress" is None; as long as the latter says Attempting to place marker in the Active Log, do not dump the active database yet !

When you've done things the wrong way around, simply drop the connection to the standby that you just created, and then do the steps in the right order.



RepAgent configuration problems
When initialising the standby database by loading a dump from of the active database, you should be aware of the following issue in ASE 11.5 and 11.9.
The dump of the active database contains the config settings for the RepAgent for the active database. When loading this dump into the standby database, the RepAgent config settings are loaded as well, causing these to be wrong, because the "connect_dataserver" and "connect_database" settings are still those that were made for the active database. You'll hit this problem after you've done the switchover and you start the RepAgent for the new active (former standby) database: the RepAgent will shutdown again, but no error message is displayed on the client; the RepServer error log will contain error 14082, and the ASE error log will mention this error as well.
When you have this problem, run sp_config_rep_agent to set the right values for "connect_dataserver" and "connect_database" and restart the RepAgent.
Note: this problem seems to have been fixed in ASE 12.0.


Switching over to the standby
When running a warm standby system, there will (sooner or later) probably be a moment when you'll have to switch the active database to the standby (this is why you have a warm standby in the first place). This consists of the following actions:

1. Switch databases at RepServer level using the switch active command.

2. Start the RepAgent for the new active database (using sp_start_rep_agent ) -- this does not happen automatically as part of the switch active command.

3. Switch client applications to the ASE server and database which has now become the active database. This may not be as straightforward as it sounds, as this typically requires client applications to connect to a different ASE server than before the switch, which means that some special action should be taken.
Here are some possible approaches for tackling this issue:
  • the most elegant solution is to solve this on network (DNS) level, independent of ASE and RepServer. Suppose you have two Unix systems, one named "batman", hosting the primary dataserver and one named "robin" for the standby server. Now make sure the client applications have interfaces files where the dataserver they connect to is specified on to run on a host named "gotham". However, there is no system named "gotham"; instead, the network is configured such that the name "gotham" resolves to "batman" (this can normally be done on DNS level on your LAN). When switching to the standby, reconfigure the DNS so that "gotham" now resolves to "robin". This way, the client applications do not need to do anything at all when switching to the standby (except reconnect). Obviously, the port numbers of the dataservers on both systems should be identical for this to work.
  • the application user explicitly specifies that a different dataserver is now to be used; however this is not desirable, because you would like to hide these issues from the end-user.
  • the interfaces file used by the client applications is changed so that they will connect to a different ASE server while using the same servername on client level. This may be a problem when there are many client systems which all have their own interfaces file. You'll have to figure out a way of getting this done.
    A solution which I've once seen for this is worth mentioning, particularly because of the dangers that come with it: you could add a second "query" line to the interfaces file, so that the first is for the active server, and the second for the standby. The idea is this: as long as the active server is up, all applications will connect to it, as it is the first "query" entry. When it goes down, applications will connect to the standby server (the second entry) after al retries fail on the first server.
    The danger of this approach is that when the first server is running fine but for any reason the connection attempts fail, the application will move on to the second query entry and happily connect to the standby database, and possibly execute DML statements there; obviously, this may lead to a lot of trouble in your database, so think *very* carefully before you'd use this trick...
  • all applications could have a bit of logic built in which retrieve the server name of the ASE server they should connect to. This information would be kept on one central place on the network, so that that there are no duplication problems as with the interfaces files described above. Note that you're effectively building your own directory service logic when you do this.



RepServer versions and platforms

ASE versions vs. RepServer versions
Unlike the version numbering suggests, there is no strict relationship between ASE and RepServer versions. Basically, all ASE versions later than 4.9 will work with any version of RepServer. However, certain features may only be available for specific combinations of ASE and RepServer; for example, you need both ASE 11.5 and RepServer 11.5 to use ASE RepAgents.
Any version of ASE will do for the RSSD.

Differences between RepServer versions 11.0, 11.5 and 12.0
Taking RepServer 11.0 as a starting point (i.e. forgetting about earlier RepServer versions, which are not supported anymore), this gives you all the main replication features as well as function replication and warm standby.
The main new features in RepServer 11.5 are the internal ASE RepAgent, DDL replication for warm standby, publications and improved administration functionality.
RepServer 12.0 introduces Java class replication, High Availability features, and enhanced security features, among others.

Supported platforms
RepServer runs on the main Unix platforms and on NT. Check with Sybase TechSupport for the latest status as products are ported to other platforms regularly. Support for Linux is on its way (see next point)

RepServer for Linux
The first version of RepServer supported on Linux is RepServer 12.0 (released Q1, 2000). While available as a regular product, there seems to be hardly any information about it on Sybase's website. In any case, the product number for RepServer-on-Linux is 18685; if you call Sybase Sales with this number they should be able to give you full details. Note that, unlike ASE-for-Linux, there is no "free" version available for download.
Also, it appears that the standard license for RepServer-on-Linux contractually only allows you to replicate between Linux-based ASE servers (see the next paragraph for an interesting anecdote). For replicating to other platforms than Linux, further licensing seems to be required (note that this is stuff that I've heard from others; actually, I'm staying away from all this contractual and sales-related stuff as much as I can...).

Someone told me the following story: he'd tried to buy one RS-for-Linux license for his personal training, assuming this wouldn't be too expensive. According to Sybase Sales, the base license cost $700. However, they wouldn't sell this to him, unless he also purchased two full ASE-for-Linux deployment licenses at $900 each (one license for each ASE server participating the replication system -- Sybase Sales assumes one of those holds the RSSD I think). And then finally, the price of the RepServer license had to be multiplied by N, where N is the number of ASE servers you're using with RepServer. So the overall pricetag was 2*$700 + 2*$900 = $3200 -- not exactly the kind of price he'd had in mind (for the record: he decided not to buy).

For primary and/or replicate dataservers, ASE 11.9.2 is available for Linux; note that ASE 12.0 will not be available for Linux, but ASE 12.5 allegedly will be (release date somewhere in 2001).

Supported RDBMS's
RepServer allows you to replicate between different types of database, for example from Oracle to DB2. Data can be replicated from ASE, ASA (formerly known as SQL Anywhere), Oracle, DB2, Informix, MVS mainframe data sources, and --expected to be available somewhere in Q1, 2000-- MS SQLServer.
In addition, you can in principle build your own replication agent for other data sources - but don't expect this to be easy. When you plan to do this, it's worth asking around first. For example, it seems there's someone at Sybase South Africa who once built a RepAgent for Adabas.
Data can be replicated to all these databases as well as to just about anything else.
Note that, no matter the types of databases involved, every RepServer always has an RSSD (Replication Server System Database) which must be located in an ASE server (as of RS 12.6, the RSSD can also be located in an 'embedded' ASA server).



Miscellaneous

History
RepServer was developed by Sybase in the beginning of the 90's on request of, and initially funded by a Sybase customer (actually a US-based bank), that wanted a product for data replication. Sybase itself has continued development of the product since then.

RepServer basics
RepServer is a "data movement" product; basically, it just copies data modifications from one location to another. The most important characteristics are:
  • The unit of replication is a transaction, not just an individual DML statement.
  • RepServer guarantees that replicated transactions are applied in the same order as they occurred on the primary side.
  • In principle, when individual components (dataservers, RepServer, network connections) of a RepServer-based system are temporarily unavailable, this should be transparent to the overall replication system. When a component becomes available again, replication should continue with minimal or no further manual DBA action. Most of the time, this works quite well.

How do you get familiar with RepServer ?
I'd say there is no easy, painless way to learn how to use RepServer. As a start, make sure you know everything a normal DBA should know about the RDBMS platforms involved in your replication system. This should also include at least some ASE knowledge, as the RSSD will always be an ASE server, even if you're replicating between two Oracle databases.
When you're replicating between ASE databases, you must understand how the ASE transaction log works, how the log is dumped and truncated, how to enlarge databases and logs, how your choice of disk usage affects recovery issues... in short: the works.
Next, I recommend you take a RepServer training from Sybase ("Fast Track to Replication Server"). After that course, you'll probably be back in your office wondering how you should ever manage to understand this thing -- don't worry, it is possible !
In my experience, the only way to learn RepServer is to create -- and solve -- many problems, because only then will you get a feel for the product. The best way to gain this experience is to set up your own private little replication system from scratch: try to create a setup where you replicate just a simple 1-column table from one database to another. By the time you have succeeded in getting this to work, you will have learned a lot -- even though you already did this on the Sybase training. Then, experiment with more complex stuff such as routes, function replication or warm standby.
Also, make sure you have the RepServer documentation handy: go here for details where to get this from.
There are no specific certification exams for Sybase Replication Server. For replication in an ASE environment, it seems reasonable to expect a RepServer DBA to have an ASE DBA certification.

Why is RepServer so difficult ?
I've heard many different opinions on RepServer, ranging from "wonderful" to, as I once heard someone say, "the worst thing since Pompeii". While the latter may be overstating it a bit, RepServer is certainly not an easy product to set up, administrate or troubleshoot. One reason for this is that RepServer doesn't seem to have been developed with user-friendliness as a prime objective; it's a typical example of a very powerful, flexible, yet difficult-to-use piece of software, which is best appreciated by hard-core technology diehards.
Another factor is that the scope and complexity of your DBA work expands significantly when replication becomes involved: those previously independent dataservers are now suddenly closely related because data is replicated between them. This effectively means that they have become one big system from a DBA point of view, thus making a DBA's job significantly more difficult.

Cost of deploying RepServer
It tends to be pretty expensive to use RepServer. On the one hand, the license fees can be stiff, as they tend to be based on the number of users in your dataservers. More importantly though, you'll need plenty of experienced and skilled DBA staff, both for your dataservers and for RepServer, which tend to be expensive.

What kind of organisations use RepServer ?
Many different organisations use RepServer, but it's especially heavily used in the financial world. Many banks have implemented complex bi-directional replication systems, for example to link their security trading operations at stock exchanges around the world. This is the reason why many RepServer DBAs can be found in places like New York, London and Tokyo.

What about replication for ASA ?
Replication for Adaptive Server Anywhere (formerly known as SQL Anywhere) is done through SQL Remote. This is a different product which is not related to, and works completely different than, RepServer.



RepServer documentation

RepServer manuals
Some of the existing RepServer manuals are the "Reference Manual" (descriptions of all commands), the "Administration Guide" (extensive discussions of administrator tasks, organised by topic) and the "Troubleshooting Guide" (well, for troubleshooting obviously). The "Design Guide" deals with different possible designs of replication systems, such as bi-directional replication and conflict resolution, and changing ownership.

As all Sybase documentation, the RepServer manuals can be downloaded from the Sybase web site as printable PDF files. To obtain these files, go here , and click on the "Core Documentation Set" for the RS version you're interested in. You should then see a list of available manuals, both as PDF documents as well as online searchable documents.

The Sybase web site also provides handy tips and tools at CodeXchange (free registration required).

Certification/compatibility info
Information about compatibility between different versions of RepServer and ASE on different platforms can be found here and here.

RepServer System tables poster
You can download a system tables poster for RepServer 12.0 from the Sybase website as a PDF document. Go here for more information.

RepServer books
There are very few books covering technical aspects of RepServer, but here's a few suggestions:
  • Another book is "Sybase Replication Server Primer" by C.B. Clifford, McGraw-Hill, ISBN 0-07011515-X. This book is only for those who have read all the existing RepServer manuals and still want more. It describes architecture and internals in detail, and is very technical. As such, it isn't easy reading.
    My 1995 edition covers only RepServer version 10 -- I don't know if there are new editions covering more recent versions. The book contains a useful appendix with SQL queries to manually query the RSSD.
RepServer newsgroups
The Sybase news server, forums.sybase.com , carries some newsgroups dedicated to RepServer: sybase.public.rep-agent and sybase.public.rep-server.
There's also a newsgroup about ASA replication (but note: this is quite a different topic!): sybase.public.sqlanywhere.replication.

Professional certifications for RepServer
Since April 2003, it is possible to become Certified Administrator for Replication Server 12.5 by passing a certification exam. More information is here.



 
 This document is located at www.sypron.nl/rs_tips.html