As
there appears to be surprisingly little information on the web about
Replication Server issues, here's
some information from my own experience with RepServer
that might be useful. This is basically an arbitrary
collection of some bits & pieces of information, without any
claim for completeness.
Note:
while much of the text applies to any version of RepServer, the examples are all assuming RepServer 12.0 (or
later) in a Unix environment, except where indicated
otherwise. Also note: "RepServer" is short for
"Replication Server", "RepDef" for
"Replication Definition", and so on.
Contents
Basic RepServer troubleshooting steps
1. Start searching at the RepServer(s)
2. Is the RepServer running ?
3. Check the RepServer errorlog
4. Are there any threads down ?
5. Is the RSSD log full ?
6. Are the stable devices full ?
7. Stop and restart threads, RepAgents and RepServers
Following
up error messages
Check error messages in the $SYBASE/$SYBASE_REP/doc directory
Check the RepServer Troubleshooting Guide
Call Sybase TechSupport
Problems
with rs_init
More
Troubleshooting Tips
Message loss
Transaction cache size too small
Various
Technical Tips
Viewing current RepServer configuration settings
Use an LTM or a RepAgent ?
Some RepServer tips on other websites
DBA tips
Don't remove the secondary truncation point !
Have some disk ready for emergencies
How long can your system survive without a DBA ?
Don't use the RepServer "sa"
login
Handle your RSSD with care
Use 'sqsh' instead of multiple open sessions
Keep the RepServer errorlog
Protect your RepServer config file
(pre-12.0) Install RepServer and ASE in separate directory trees
Create a log of problem cases you've
handled
Keep your replication system simple
Replicating logins, passwords, users & groups
Warm standby tips
Setting up a warm standby
Setting up a warm standby using rs_init
Basic steps for setting up a warm standby
Initialising the standby with a dump
RepAgent configuration problems
Switching over to the standby
RepServer versions and platforms
ASE
versions vs. RepServer versions
Differences between RepServer versions
11.0, 11.5 and 12.0
Supported platforms
RepServer
for Linux
Supported RDBMS's
Miscellaneous
History
RepServer basics
How do you get familiar with
RepServer ?
Why is RepServer so difficult ?
Cost of deploying RepServer
What kind of organisations use
RepServer ?
What about replication for ASA ?
RepServer documentation
RepServer manuals
Certification/compatibility info
RepServer System tables poster
RepServer books
RepServer newsgroups
Professional certifications for RepServer
Getting a free version of RepServer
There have been repeated questions in the Sybase community about getting a free version of RepServer. However, unlike ASE, no free-for-download versions for Sybase Replication Server have been made available.
Until recently (that's mid-2009), your best bet to gain some experience with RepServer was to get friendly with your Sybase account manager and obtain an evaluation license (and get him to keep extending it).
However, there's an easier option now.
In the summer of 2009, hooking up with the cloud computing trend, Sybase made available some Amazon Machine Images for a number of products, including ASE, RepServer and IQ (details here). This opens up an interesting new way of getting your own private RepServer installation to play with. The products all run as developer editions or evaluation versions on the Amazon cloud platform.
Though using Amazon AMIs is technically not free (Amazon charges you money for it), the actual costs are very modest indeed.
I test-drove these AMIs myself and wrote about my experiences -and the costs- on the data management blog. Check it out!
Basic RepServer troubleshooting steps
Sybase Replication Server is a great product
as long as everything works fine. When things start to go
wrong however, it has all the potential of taking your
troubleshooting attempts deep into the night -- and drive
you crazy before sunrise. Here's some information that
you could try when you're in need of ideas.
1.
Start searching at the RepServer(s)
When investigating a suspected
problem, your starting point should be the RepServers
in your replication system. There's not much point in
looking in the primary or replicate dataservers if
you haven't checked the status at RepServer level
first. Therefore, if you don't know where to start,
follow the below steps for all RepServers in your
replication system.
2.
Is the RepServer running ?
To start with the basics: first
check the RepServer is running: log in using
isql
or do a process
listing on OS-level (i.e. using a
ps
command on Unix or
the task manager or services applet on NT). If the
RepServer isn't running, this may be because the ASE
server containing the RSSD has not been started first.
This is a prerequisite because all information
RepServer needs to operate is located in the RSSD.
If the RSSD is indeed down, the RepServer won't start
and you'll see message #31083 in the RepServer errorlog:
Cannot connect to
RSSD server.
When the RSSD server is running, but the RepServer
can't login due to an invalid ASE password, you'll
see error #31083 again, plus an additional message
saying: Message from server: Message: 4002,
State 1, Severity 14 -- 'Login failed.
These messages usually indicate what is causing the
problem, so always check the RepServer errorlog.
3.
Check the RepServer errorlog
When there seems to be any kind of
problem, your should check the RepServer errorlog, as
this will usually contain error messages indicating
what's been going on. To find out the exact location
of the RepServer errorlog, issue the command
admin log_name
, which will display
the full pathname of the errorlog file on the system
where the RepServer is running.
Unfortunately, those error messages aren't always
very clear, so you should try to figure out what they
mean -- see the section
Following up error messages.
4.
Are there any threads down ?
When investigating a suspected
problem, the first thing to check is whether any
RepServer threads are down by issuing the RepServer
command admin
who_is_down
(this is a brief
version of admin
who ,
which shows all threads irrespective of their status).
When this shows nothing, everything is probably OK,
although there are some nasty cases when there still
is an error. Normally however, a thread which is down
indicates a problem. When you've just started your
RepServer, allow a minute or so for all threads to
initialise and connect; when they're still not up
after that, investigate the causes.
Some common situations are the following:
-
A basic rule:
when a thread is up and running, it will have
a status such as "Active", "Awaiting
Command", "Awaiting Message",
or other variations of "Awaiting
<something>".
-
In a replicate
RepServer, a "DSI" thread always
comes with a corresponding "DSI EXEC"
thread: together they represent the
connection from the RepServer to a replicate
database. There is also one such pair of
threads for the RSSD.
When the DSI/DSI EXEC threads have the status
"Suspended", this can be the result
of a suspend
connection
command (use
resume connection
to proceed).
However, this also occurs when the RepServer
was executing a command in that replicate
database as part of a replicate transaction,
and ran into an dataserver error. A common
example is a "duplicate key" error
when inserting an already existing key value
into a replicate table having a unique index.
You can try resuming the connection (using
resume connection
, but usually
this results in the same error occurring
again, suspending the connection once more.
When this happens, the RepServer errorlog
will always contain the error message
returned by the dataserver.
When you want to see the SQL commands that
were issued as part of the replicate
transaction, dump the first transaction in
the corresponding queue (using
sysadmin log_first_tran, see the RepServer QuickRef Guide for details).
Once you understand why the SQL command leads to an error, you will have to take action to
resolve the situation so that replication can continue. This may include manually modifying
or removing data from the replicate table or deleting the replication transaction from the
inbound queue (using the resume connection...skip transaction
).
When you're running into replicate dataserver errors, you should have a good look at the
design of your replication system as these errors suggest things are not going as was
originally expected.
-
When you're
running RepServer 11.5 or later, a thread
called "REP AGENT" represents the
ASE RepAgent (or LTM) for the corresponding
primary database. When running RepServer 11.0,
these threads are called "LTM USER"
instead.
When the status of this thread is "Down",
the RepAgent/LTM is not running. For an LTM,
you will have to start the LTM process; to
start an ASE RepAgent thread, run
sp_start_rep_agen
from within
the primary database. If the RepAgent or LTM
won't start, check the dataserver errorlog or
LTM errorlog, respectively; there will be
some kind of error message logged indicating
why it cannot start.
When the status of this thread is "Suspended",
this is because the command
suspend log transfer
was run;
issue resume
log transfer
to resume the
thread.
-
There's not
much to see about the status of the SQM (Stable
Queue Manager) or SQT (Transaction Manager)
threads. You can get detailed information
about what's happening to the queues using
admin who, sqm
or
admin who, sqt
(but let's
leave it for now...).
-
When using
routes, an "RSI" thread represents
the connection going from the your local
RepServer to another RepServer. When the
status of such a thread is "Connecting",
this means the RepServer at the other end of
the route is down, and your RepServer is
periodically trying to reconnect.
A thread called "RSI USER" is the
incoming end of a route: a connection coming
from another RepServer to your local
RepServer. When the status is "Inactive",
it means that the remote RepServer isn't
running or suspended the route.
5.
Is the RSSD log full ?
The RSSD is continuously updated by the RepServer and
as a consequence, the transaction log may fill up.
When this happens, the RepServer action updating the
RSSD will by default go into "log suspend"
mode until the log is cleared. This will cause
RepServer activity to suspend as well. Therefore,
stop the RSSD log from filling up by dumping the log
regularly and by installing thresholds.
Note that the both RSSD log and data segment can fill
up quickly when running
sysadmin log_first_tran
or
sysadmin dump_queue
frequently or for
large transactions. Clean up old transactions logged
through these commands using the stored procedure
rs_delexception
from within the RSSD
database.
6.
Are the stable devices full ?
It's always good to check whether
the stable devices have filled up; it's never too
early too know. Use the RepServer command
admin disk_space
to see how far the
stable devices have filled up.
7.
Stop and restart threads, RepAgents and RepServers
The idea behind the RepServer
architecture is that any component (RepAgent,
RepServer, dataserver, network) can temporarily go
down, and everything automatically continues running
when the component comes back up. Usually this works
quite well, but sometimes it just doesn't. When you
have a situation where you cannot restart a certain
RepServer thread, try shutting down and restarting
the RepAgent/LTM, log transfer, DSI connection or
route in a systematic way. Also, try shutting down
the threads, then shut down/restart the RepServer
itself, and restart the threads. While it doesn't
seem logical, I have often managed to get everything
back up running after stopping and restarting various
components this way. It's worth trying, and it
doesn't cost too much time or effort.
Note that this advice does not apply to the
dataservers: surely you just don't want to shut these
down, because (unlike the RepServer) there will be
users connected to them !
Following up error messages
When something is wrong, there is often
an error message of some kind. Here are some suggestions
to follow up error messages.
Check
error messages in the $SYBASE/$SYBASE_REP/doc directory
The directory $SYBASE/$SYBASE_REP/doc
(on NT:
%SYBASE%\%SYBASE_REP%\doc
) contains some files
named error_messages_*
(on NT:
errmsg.*)
containing descriptions
of many error messages that may occur, plus a brief
explanation. When you run into any kind of error, these
files should be your first stop: check if the error
message is listed (easiest by searching for the error
number - I always use grep
for this),
and see what it means. In many cases, this will be very
helpful to get an idea of what is going on.
Note that this only applies to error messages from the
RepServer, LTM, RSM Server and rs_subcmp. Error messages
from the ASE RepAgent are logged in the dataserver
errorlog.
Check
the RepServer Troubleshooting Guide
If you're stuck with an error message
which you cannot find the meaning of, try the RepServer
Troubleshooting Guide. As the name suggests, it contains
suggestions for solving problems.
Call
Sybase TechSupport
If nothing else works, you should
consider opening a case with Sybase TechSupport, because
you may easily be spending days looking for clues - which
you may not find in the end (you do have a support
contract, don't you ?).
Problems
with rs_init
rs_init is a tool for configuring components of a
replication system, such as databases, repservers, etc.
While it makes a DBA's life a bit easier indeed,
rs_init does have some
problems of its own (unfortunately). This section lists a
few common issues.
-
rs_init
is located in the install
subdirectory of
the RepServer installation tree ($SYBASE/$SYBASE_REP). Make sure this
is in your PATH.
- A common
problem is that you're trying to add a database
and it appears that rs_init
cannot access the dataserver because that server
is not in the interfaces file. Adding the server
to the interfaces file is the obvious solution,
but then you'll find that it still makes no
difference for rs_init. The
reason is that rs_init reads the
contents of the interfaces file when it starts;
when making changes to the interfaces file, rs_init
should be stopped and restarted for rs_init
to notice the changes. Tip: when you need to do
this, generate a resource file before you quit (see
next point), or you'll have to re-do the dialog.
- On Unix, you
can generate a resource file from rs_init
by hitting CTRL-W (on NT, resource files cannot
be generated or used with rs_init).
This saves all the information you've entered in
the preceding dialog into the resource file,
which can be executed from the command line as
rs_init -r
<your_resource_file_name>
.
One problem is that the file format of rs_init
resource files isn't documented very well. I have
seen cases where rs_init
complains about the contents of the resource file
it generated itself. If you run into this sort of
problem, there's no documentation that will help,
but you may be able to fix it by manually editing
the resouce file; this may require some
creativity on your side, though.
In one such case I remember, the error message
said that the IDserver name was not specified (it
said something like the IDservername being '' (=two
single quotes)). This could be fixed by adding a
line rs.rs_idserver_name:
RSNAME , where
RSNAME was the IDserver's name (funny enough, the
resource file did contain a line saying
rs.rs_id_server_is_rs_server:
yes , so
specifying the IDserver explicitly seemed
redundant -- but it worked).
If you're having any problems with invalid
resource file contents, check the example
resource files which exist in the
init/rs
subdirectory in the RepServer
installation tree ($SYBASE/$SYBASE_REP). Try to figure out
whatinformation rs_init wants in
that resource file, and how you can supply it.
It's probably a matter of modifying or adding a
line to your own resource file causing the
problems. You're on your own though, as there is
no documentation here (only some comments in the
example files): in the example above, my resource
file was almost identical to the
setupdb.rs
example file, but the line I
eventually added was copied from the
install.rs
file... that's where your
creativity comes in.
- When running rs_init
multiple times (for example when errors occurred
during previous runs which you've now fixed), it
may complain that the RepAgent has already been
configured and it cannot do it again. The error
message is:
Unable to execute query
'exec sp_config_rep_agent "YOUR_DBNAME", enable,
"YOUR_RSNAME", "YOUR_RSNAME_ra",
"YOUR_RSNAME_ra_ps"' against server 'YOUR_DS_NAME'
When this error
occurs, disable the RepAgent first as follows:
sp_config_rep_agent YOUR_DBNAME,
disable . Then re-run rs_init.
The following
error may also be reported (it is related to the
above message, and should disappear after disabling
the RepAgent):
INTERNAL ERROR: "Database 'YOUR_DBNAME' is already
configured to use Replication Agent. Request to
enable Replication Agent has been ignored."
- rs_init
may generate the following messages, which,
despite the message saying it's an error, are
really informational messages, and these can
therefore be ignored. This often happens when re-running
rs_init after fixing a previous error in the
setup:
INTERNAL ERROR: "The replication status for
'rs_lastcommit' is already set to false. Replication
status for 'rs_lastcommit' does not change."
INTERNAL ERROR: "The replication status for
'rs_update_lastcommit' is already set to true.
Replication status for 'rs_update_lastcommit' does
not change."
- When adding a
database which was previously dropped from the
replication system, the following error may occur:
INTERNAL ERROR: "Settrunc() is not permitted when a
Rep Agent Thread is currently running on the database
This error means
that the RepAgent is still running in that database.
Stop the RepAgent as follows:
sp_stop_rep_agent YOUR_DBNAME
. Then re-run rs_init.
- When setting
up a warm standby through rs_init,
two common errors may occur:
- Before
adding the active database for the warm
standby, you must manually create a logical
connection for the warm standby (using the
command create
logical connection SOME_SERVER.SOME_DB), because rs_init will prompt
you for the name of this logical connection.
You can not create this logical
connection through rs_init.
It is actually in the small print in the
docs, but many people (including myself)
found out the hard way (that's the RepServer
way !).
- When
adding the standby database using rs_init,
this will normally fail because rs_init cannot log in to the
dataserver containing the standby database.
This is because the maintenance user login
which rs_init tries to use,
is not there yet; unlike the maintenance user
login for the active database, which is
created automatically by rs_init,
the maintenance user login for the standby
database isn't. Therefore, you must create
this login manually and then re-run rs_init.
More
Troubleshooting Tips
Message loss
When RepServer detects that messages
may have been lost, it goes into "message loss"
mode. When this happens, transactions coming from the
primary dataserver won't be applied at the replicate
dataserver anymore, and they'll accumulate in the stable
queues. However, from the admin
who
output, all threads seem to be up and running, so
everything looks OK at first sight. The only way to find
out about this status is to check the RepServer errorlog,
and check for loss detection messages. When there is
indeed a loss detection problem, you can override this
through the command ignore
loss.
I have once had a pretty weird problem case where message
loss had been detected between two RepServers, but
ignore loss
didn't help at all. This
was very strange, and in the end the trick appeared to be
as follows: in the RSSD of the replicate RepServer, the
rs_exceptslast
table contained a row for
the route where the status
column had the value 2 (meaning
it's rejecting transactions due to message loss). The
solution was to suspend the RSSD DSI, updated this column
to 0 manually, and then resume the DSI. After that,
everything was OK again. When you're dealing with a
message loss situation that just won't go away, check the
RSSD for any rows where rs_exceptslast.status
= 2 or
rs_oqid.valid
= 2; when this exists,
you could try updating these to 0 manually. However, note
that manually patching the RSSD is risky and probably
won't win you the sympathy of Sybase TechSupport
afterwards; in other words: this is your own risk.
Transaction
cache size too small
I've had a problem case once where everything looked OK,
all threads were up, and there were no error messages
logged anywhere. Still, no replicate transactions were
being applied anymore. After digging into the queues for
some time, an error message finally showed up when
examining the SQT for the inbound queue:
1> sysadmin sqt_dump_queue , 123, 1, 0
2> go
Msg 6024, Level 12, State 0:
Server 'REP_PRIM1':
Block read failed for queue '123:1', segment 304552,
block 26. OS dependent error is 'Could not initiate
aioread errno = 14
This error sounded pretty alarming, as it suggests there
is some low-level disk error. After consulting Sybase
TechSupport, it appeared that error 6024 actually
indicates that there is a transaction which is too big
for RepServer to handle, causing the transaction not to
be written from the inbound to the outbound queue (you
wouldn't have guessed, would you ?). The solution was to
increase the RepServer configuration parameter sqt_max_cache_size
(through configure
replication server
and restart the RepServer).
Various
Technical Tips
Viewing
current RepServer configuration settings
RepServer configuration parameters can be modified using
the "configure replication server" command (some
changes will be effective only after a RepServer restart).
Config parameter settings are stored in the RSSD in the
rs_config table. However, it is not possible to see the
current settings from within RepServer. For this, you
should log into the RSSD ASE server, and issue the
following command from within the RSSD database:
select * from rs_config
Note
there is even a brief comment for most config options.
RepServer *is* user-friendly here and there !
Use
an LTM or a RepAgent ?
In an ASE environment, when you're running at least
RepServer 11.5 and ASE 11.5, you can use the internal ASE
RepAgent. When you have an earlier version of either ASE
or RepServer, you must use an external LTM process.
Although LTMs are still supported for current ASE and
RepServer versions, it is advisable to use RepAgents when
you can, because it makes a DBA's life easier:
-
an LTM is an extra
OS-level process requiring an entry in the
interfaces file; a RepAgent doesn't need this
because it's an internal ASE thread.
-
an LTM needs to be
started explicitly by the DBA; once it has been
configured, a RepAgent starts automatically when
ASE starts.
-
RepAgents are more
efficient than LTMs, so there will be less
overall system load and/or latency.
-
an LTM can be
greedy: when there's a large transaction to be
replicated, LTMs have been known to allocate as
much memory as needed for this transaction, but
never deallocate it again, which may require the
DBA to stop & restart the LTM. ASE RepAgents
don't have this problem.
Configuring
a RepAgent requires (among other things) executing the
stored procedure sp_config_rep_agent. See the RepServer
QuickRef Guide
for details.
DBA tips
Don't
remove the secondary truncation point !
When your replication system is up and
running, that's how you want to keep it. When something
goes wrong, leading to replication being halted somewhere
along the chain of events, this can eventually lead to
the transaction log of the primary dataserver filling up.
In such cases, a DBA could decide to remove the secondary
truncation point in the transaction log (a.k.a. the
"LTM truncation point"), to allow the primary
dataserver to continue processing. While this may indeed
be a solution from one point of view, be aware that this
will likely cause your replication system to get out of
sync because primary transactions won't be replicated
anymore. As a DBA, you will then have to re-synch the
primary and replicated data; depending on the complexity
of your replication system, this may be anywhere between
a minor headache and a complete nightmare. Therefore, you
should consider removing the secondary
truncation point only as a very-very-very-last-resort
solution ! It may be more practical to
allocate extra disk space for stable queues so that
primary log transfer can continue than to reconcile
primary and replicate data afterwards.
Have
some disk space ready for emergencies
See the previous point. One good feature of RepServer is
that you can not only extend the stable queues (using the
add partition
command), but you can
also shrink them again through
drop partition
-- this command can be
issued at any time, and will drop the partition as soon
as there's no data on it anymore. Therefore, keep some
disk space at hand which you can use to extend the stable
queues should this ever be necessary.
How
long can your system survive without a DBA ?
There's an interesting (and important)
issue with respect to the sizing of your stable queues.
Basically, your primary system should be able to continue
for at least the time period that no DBA will be looking
after the system -- and preferable a little longer than
that. Depending on your system and organisation, such a
no-DBA period can be a night, a weekend or just a few
hours. Getting this sizing right will require monitoring
the data volume being replicated.
If you like being paged, you could a cron job checking
the free space in the stable queues and paging you when
it fills up; very much like the sort of thing that tells
you when a dataserver's transaction log fills up (pagers
are most popular in America - Europeans can do something
similar by having an SMS message sent to a GSM phone).
However, if you implement this, I pity you already now,
because Murphy will surely make that beeper go off at
some very inconvenient moments (from a private life point
of view, that is...).
Don't
use the RepServer "sa" login
When creating or dropping a subscription, the replicate RepServer
will log into the primary RepServer and -when using automatic materialization- into the primary
dataserver. It will do
this using the same login and password as you used to log
into the replicate RepServer (where the subscription is
created from). This implies that the same login and
password must exist in the replicate and primary
RepServer and the primary dataserver.
Therefore, when using the RepServer "sa" login,
this means the "sa" password in the dataserver
cannot simply be changed anymore as it would be out of
sync with the RepServer's "sa" password,
causing subscription creation to fail. When there are
multiple primary dataservers in the replication system,
they would all need to have the same password for their
"sa" login. Therefore, best don't use the
RepServer "sa" login to avoid this sort of
dependency.
I prefer to create a login named "ra" (kinda
short for "rep admin") in all RepServers and
dataservers with the same passwords everywhere, and give
these logins all necessary privileges. This way, I never
have to use the RepServer "sa" login anymore,
avoiding the problems described above.
Handle
your RSSD with care
The RSSD, which is an ASE database, is the place where
RepServer stores all the data it needs to be able to
operate (not counting its configuration file), including
administrative data such as the state and position of the
stable queues. Otherwise, the RepServer itself doesn't
store any data at all, with the exception of the
configuration file, specifying where the RSSD is located.
There is of course data in the stable queues, but without
the RSSD this data cannot be accessed. This makes the
RSSD an important database in your replication system, so
ensure the RSSD is treated with the same care as your
normal production databases. All considerations about
recoverability, database dumps, log dumps and DBCC checks
apply to the RSSD as well. For the same reason, don't
enable the "trunc log on chkpt" option on the
RSSD as this affects recoverability.
Keep the RepServer errorlog
Make sure the RepServer errorlog is not
deleted, but kept for at least a few months. In some
cases, an indication of an error condition may be logged
just once, and never again after that anymore. When
there's been a problem you can't resolve, you may want to
dig into your errorlog for quite some time back -- it's a
bit of a bummer when you've deleted it. This is indeed
not very desirable behavior, but it's just one of those
typical RepServer features...
Protect
your RepServer config file
The RepServer configuration file contains the login and
password used to contact the RSSD server and ID server.
This makes these files interesting targets for other
people of a lesser moral standard, so make sure it is
well-protected. In case you use LTMs, the same applies to
LTM config files.
Because the RSSD contains the logins and passwords for
the primary and maintenance logins, make sure this
database (as well as its dumps !) are properly protected.
When security is important to you, use the encryption
options and security mechanisms where possible.
(pre-12.0) Install
RepServer and ASE in separate directory trees
Note: this point applies only to pre-12.0 versions of ASE and RepServer.
Starting with versions 12.0, the directory structure has changed so that all
Sybase products reside in their own subdirectory under $SYBASE (NT: %SYBASE%).
Thus, in this new stucture, there is no danger anymore of libraries being
overwritten by installation or upgrade of a different product.
In pre-12.0, by default ASE and RepServer are installed in the same
installation directory; however, it's best to install them in
separate directory trees. This has the advantage that you
can upgrade ASE without risking problems with RepServer (or
the other way around) -- these problems may occur because
there's only a single lib
subdirectory, and the existing libraries (DLLs) in this directory
may be overwritten when a different product is installed or upgraded.
When installing ASE and RepServer in separate directories, you'll have to
set the $SYBASE variable correctly in each RUN_SERVER
file. Also, you should explicitly specify the path to the
interfaces file the servers will use (assuming you don't
want to maintain multiple copies).
This recommendation mainly applies to Unix because
separating RepServer and ASE isn't so easy on NT. Reinoud
van Leeuwen describes an
interesting trick
to install the products
separately on NT as well (and still run the servers as NT
services), but I found that it is impossible to upgrade
RepServer after this (or at least I couldn't find out how
to do it). Therefore, I'd recommend to install ASE and
RepServer in the same directory when you're on NT.
Create
a log of problem cases you've handled
Being indispensable can be good for your ego and your
hourly rate, but you may have seconds thoughts when they
call you in the weekend or during your holidays because
you're the only person who knows what to do when the
replication system is having trouble. I recommend
creating a log of the problem cases you solved, including
the relevant error messages and actions you took to
resolve them. Someone else may then be able to fix the
problem using this information. Don't forget to include
the phone number of Sybase TechSupport and your Sybase
customer number.
And don't worry about your job security: when someone
else has to do some ad-hoc RepServer troubleshooting
while you're away, chances aren't very high they'll say
"cool, I'd really like to this every day...".
Keep
your replication system simple
A good thing of RepServer is that it's very flexible and
powerful. Unfortunately, there is a dark side to this as
well: you can create replication systems which are so
complex that it may not be possible to recover the entire
system in a reasonable amount of time, once something
goes wrong somewhere. Therefore, if you have anything to
say about the design of your replication system, try to
keep it simple and straightforward. If it's not
necessary, don't use bi-directional replication, function
strings, request functions, and don't mix "normal"
replication with function replication; this will go a
long way to preserve your sanity...
Use 'sqsh' instead of multiple open sessions
As a RepServer DBA, you're usually having various isql (or sqsh) sessions on your screen to connect to each RepServer, the corresponding RSSDs, and the various primary and replicate dataservers. With a sqsh trick, you can do all that in just a single sqsh session.
Replicating
logins, passwords, users & groups
It is not possible to replicate data in system tables,
but sometimes you may need exactly that. For example, in
a warm standby system, it will be necessary to have
identical logins and passwords in the ASE server hosting
the primary (active) database and in the server
containing the replicate (standby) database. After all,
the whole idea of a warm standby is that client
applications should be able to switch over to the standby
database and continue processing as if nothing changed.
Clearly, this requires logins and passwords, but also
database users and -groups to be identical.
When running RepServer 11.5 or later, almost all DDL
commands within a user database can be automatically
replicated to the standby database in a warm standby
system using the sp_reptostandby
command.
While useful, the sp_reptostandby
feature has some
limitations: it won't work for logins and passwords (as
these are stored in the master database), and it only
works in a warm standby environment.
Mark A. Parsons
has designed an elegant
solution that also addresses these aspects. From here,
you can
download his presentation
(note: 330 Kb) for the
Australasian TechWave2000 conference explaining the
setup, and including script files to actually implement
it.
Even if you don't need to replicate logins, passwords etc.,
I still recommend you check out Mark's solution, because
it is a great example of how to use RepServer features to
implement some functionality that crosses the boundaries
of a single server.
Note:
for completeness, note that ASE 12.0 supports the
synchronisation (that's something different than
replication !) of system table data through the HA (High
Availability) feature. However, this is a completely
different topic which isn't discuss any further here.
Warm standby tips
Setting up
a warm standby
When setting up a warm standby, keep these things in mind:
-
Best keep the
database names of active and standby databases
the same. This lowers the risk of errors in
client- or server-level software which may be
referring to the database name explicitly --
remember, the purpose of a warm standby is,
presumably, to be able to switch the active
database at some point: your applications should
still work correctly after that switch.
Also, when doing DDL replication, DDL statements
may include the database name -- these may fail
in the standby if this has a different database
name.
-
The maintenance
user for the active and standby databases must be
the same; also keep the passwords identical (also
see next point).
-
You must ensure
that logins and passwords are synchronised in the
ASE servers containing the active and standby
databases. This is to ensure that client
applications can still log in after a switchover.
RepServer doesn't give you solution for this, so
you'll have to make something yourself. There are
roughly two options for this: first, following
Mark A. Parsons'
ideas
described
elsewhere in this document; second, by amending
sp_addlogin, sp_modifylogin, sp_droplogin and
sp_password in such a way that they execute
themselves as an RPC in the standby database
after successfully completing in the active
database -- this requires some hacking in the SQL
text of these procedure.
-
Also, the mapping
between suid
values (for
logins) and
uid
values (for
database users) should be kept identical in both
servers. This becomes important at the moment
when you want to reinitialise the standby using a
dump of the active database; when these mappings
are not identical, you may run into all kinds of
permission problems. Note that this is always an
issue when loading dumps into other servers, as
such it is not a replication-specific point.
The consequence of this issue is that you should
not create logins in the standby server when
these logins have not been created in the active
server first.
-
Be sure to keep
replication settings identical in the active and
standby databases. A classic mistake is to mark a
specific table for replication in the active
database, but forget to do this in the standby.
There will probably not be a problem until you
have to switch to the standby, when you (or
likely, someone else) will find out that table is
not properly replicating to the new standby...
Therefore, every time when you execute
sp_setreptable
or
sp_setrepproc
in the
active database, do the same in the standby.
Similarly, when using
sp_reptostandby
, make
sure the standby database has been configured in
the same way.
-
Try to keep things
simple: stick to a 1-to-1 active-standby setup
and simply replicate all tables. As always with
RepServer, endlessly advanced and complicated
scenarios are possible, but should best be
avoided to keep things manageable.
Setting up a warm standby using rs_init
It's probably easiest to use rs_init for setting up the active and standby databases in the warm standby. However, there are some things to be aware of. Go here for more information.
Basic steps
for setting up a warm standby
The simplest way of setting up a warm
standby is to use the T-SQL stored procedure
sp_reptostandby
-- this requires
you run at least ASE 11.5 and RepServer 11.5. When
executing sp_reptostandby
my_db, 'all'
, all tables in database
my_db
will be replicated to the standby, without having been marked for replication
explicitly using sp_setreptable
(this works because a "kind of replicate everything in this
database" marker is set for the database, making it
irrelevant to mark individual tables for replication).
When using sp_reptostandby,
DDL replication is also
enabled.
Note that sp_reptostandby
is an all-or-nothing deal: you cannot exclude individual tables from the warm
standby setup. On session level, you can choose to enable
or disable DDL replication using the T-SQL command
set replication
-- see the RepServer QuickRef for command details.
When you need more flexibility in your warm standby
definition, don't use
sp_reptostandby, but use
sp_setreptable and sp_setrepproc
to mark all tables and stored procedures which are to be replicated to the standby.
Initialising the standby with a dump
To (re)initialise the standby database, the simplest way
is to load a dump from the active database into the
standby, after (re)creating the connection to the standby
database.
Note that the order of these two steps is essential: you
must first run create connection
with the use dump marker clause (either manually or through
rs_init), and then dump the
active database and load it into the standby. If you do
it the other way around (first dump the active DB, and
then run create connection
), there will be no replication to the standby, and
admin logical_status
will forever show the status Active/Waiting for dump marker.
More precisely: after executing create connection
with the use dump marker
clause, wait until admin logical_status
shows the status of the standby connection as Suspended/Waiting for Enable Marker
and "State of operation in progress" is
None; as long as the latter says Attempting
to place marker in the Active Log, do not dump the active database yet !
When you've done things the wrong way around, simply drop
the connection to the standby that you just created, and
then do the steps in the right order.
RepAgent
configuration problems
When initialising the standby database
by loading a dump from of the active database, you should
be aware of the following issue in ASE 11.5 and 11.9.
The dump of the active database contains the config
settings for the RepAgent for the active database. When
loading this dump into the standby database, the RepAgent
config settings are loaded as well, causing these to be
wrong, because the "connect_dataserver" and
"connect_database" settings are still those
that were made for the active database. You'll hit this
problem after you've done the switchover and you start
the RepAgent for the new active (former standby) database:
the RepAgent will shutdown again, but no error message is
displayed on the client; the RepServer error log will
contain error 14082, and the ASE error log will mention
this error as well.
When you have this problem, run
sp_config_rep_agent
to set the right
values for "connect_dataserver" and "connect_database"
and restart the RepAgent.
Note: this problem seems to have been fixed in ASE 12.0.
Switching
over to the standby
When running a warm standby system,
there will (sooner or later) probably be a moment when
you'll have to switch the active database to the standby
(this is why you have a warm standby in the first place).
This consists of the following actions:
1. Switch databases at
RepServer level using the
switch active
command.
2. Start the RepAgent for
the new active database (using
sp_start_rep_agent
) -- this does not
happen automatically as part of the
switch active
command.
3. Switch client
applications to the ASE server and database which has now
become the active database. This may not be as
straightforward as it sounds, as this typically requires
client applications to connect to a different ASE server
than before the switch, which means that some special
action should be taken.
Here are some possible approaches for tackling this issue:
-
the most elegant
solution is to solve this on network (DNS) level,
independent of ASE and RepServer. Suppose you
have two Unix systems, one named "batman",
hosting the primary dataserver and one named
"robin" for the standby server. Now
make sure the client applications have interfaces
files where the dataserver they connect to is
specified on to run on a host named "gotham".
However, there is no system named "gotham";
instead, the network is configured such that the
name "gotham" resolves to "batman"
(this can normally be done on DNS level on your
LAN). When switching to the standby, reconfigure
the DNS so that "gotham" now resolves
to "robin". This way, the client
applications do not need to do anything at all
when switching to the standby (except reconnect).
Obviously, the port numbers of the dataservers on
both systems should be identical for this to work.
-
the application
user explicitly specifies that a different
dataserver is now to be used; however this is not
desirable, because you would like to hide these
issues from the end-user.
-
the interfaces
file used by the client applications is changed
so that they will connect to a different ASE
server while using the same servername on client
level. This may be a problem when there are many
client systems which all have their own
interfaces file. You'll have to figure out a way
of getting this done.
A solution which I've once seen for this is worth
mentioning, particularly because of the dangers
that come with it: you could add a second "query"
line to the interfaces file, so that the first is
for the active server, and the second for the
standby. The idea is this: as long as the active
server is up, all applications will connect to
it, as it is the first "query" entry.
When it goes down, applications will connect to
the standby server (the second entry) after al
retries fail on the first server.
The danger of this approach is that when the
first server is running fine but for any reason
the connection attempts fail, the application
will move on to the second query entry and
happily connect to the standby database, and
possibly execute DML statements there; obviously,
this may lead to a lot of trouble in your
database, so think *very* carefully before you'd
use this trick...
-
all applications
could have a bit of logic built in which retrieve
the server name of the ASE server they should
connect to. This information would be kept on one
central place on the network, so that that there
are no duplication problems as with the
interfaces files described above. Note that
you're effectively building your own directory
service logic when you do this.
RepServer
versions and platforms
ASE
versions vs. RepServer versions
Unlike the version numbering suggests, there is no strict
relationship between ASE and RepServer versions.
Basically, all ASE versions later than 4.9 will work
with any version of RepServer. However, certain features
may only be available for specific combinations of ASE
and RepServer; for example, you need both ASE 11.5 and
RepServer 11.5 to use ASE RepAgents.
Any version of ASE will do for the RSSD.
Differences
between RepServer versions 11.0, 11.5 and 12.0
Taking RepServer 11.0 as a starting
point (i.e. forgetting about earlier RepServer versions,
which are not supported anymore), this gives you all the
main replication features as well as function replication
and warm standby.
The main new features in RepServer 11.5 are the internal
ASE RepAgent, DDL replication for warm standby,
publications and improved administration functionality.
RepServer 12.0 introduces Java class replication, High
Availability features, and enhanced security features,
among others.
Supported
platforms
RepServer runs on the main Unix platforms and on NT.
Check with Sybase TechSupport for the latest status as
products are ported to other platforms regularly. Support
for Linux is on its way (see next point)
RepServer for Linux
The first version of RepServer supported on Linux is
RepServer 12.0 (released Q1, 2000). While available as a
regular product, there seems to be hardly any information
about it on Sybase's website. In any case, the product
number for RepServer-on-Linux is 18685; if you call
Sybase Sales with this number they should be able to give
you full details. Note that, unlike ASE-for-Linux, there
is no "free" version available for download.
Also, it appears that the standard license for RepServer-on-Linux
contractually only allows you to replicate between Linux-based
ASE servers (see the next paragraph for an interesting
anecdote). For replicating to other platforms than Linux,
further licensing seems to be required (note that this is
stuff that I've heard from others; actually, I'm staying
away from all this contractual and sales-related stuff as
much as I can...).
Someone told me the following story: he'd tried to buy
one RS-for-Linux license for his personal training,
assuming this wouldn't be too expensive. According to
Sybase Sales, the base license cost $700. However, they
wouldn't sell this to him, unless he also purchased two
full ASE-for-Linux deployment licenses at $900 each (one
license for each ASE server participating the replication
system -- Sybase Sales assumes one of those holds the
RSSD I think). And then finally, the price of the
RepServer license had to be multiplied by N, where N is
the number of ASE servers you're using with RepServer. So
the overall pricetag was 2*$700 + 2*$900 = $3200 -- not
exactly the kind of price he'd had in mind (for the
record: he decided not to buy).
For primary and/or replicate dataservers, ASE 11.9.2 is
available for Linux; note that ASE 12.0 will not be
available for Linux, but ASE 12.5 allegedly will be (release
date somewhere in 2001).
Supported
RDBMS's
RepServer allows you to replicate between different types
of database, for example from Oracle to DB2. Data can be
replicated from ASE, ASA (formerly known as SQL Anywhere),
Oracle, DB2, Informix, MVS mainframe data sources, and --expected
to be available somewhere in Q1, 2000-- MS SQLServer.
In addition, you can in principle build your own
replication agent for other data sources - but don't
expect this to be easy. When you plan to do this, it's
worth asking around first. For example, it seems there's
someone at Sybase South Africa who once built a RepAgent
for Adabas.
Data can be replicated to all these databases as well as
to just about anything else.
Note that, no matter the types of databases involved,
every RepServer always has an RSSD (Replication Server
System Database) which must be located in an ASE server (as of RS 12.6, the RSSD can also be located in an 'embedded' ASA server).
Miscellaneous
History
RepServer was developed by Sybase in the beginning of the
90's on request of, and initially funded by a Sybase
customer (actually a US-based bank), that wanted a
product for data replication. Sybase itself has continued
development of the product since then.
RepServer
basics
RepServer is a "data movement" product;
basically, it just copies data modifications from one
location to another. The most important characteristics
are:
-
The unit of
replication is a transaction, not just an
individual DML statement.
-
RepServer
guarantees that replicated transactions are
applied in the same order as they occurred on the
primary side.
-
In principle, when
individual components (dataservers, RepServer,
network connections) of a RepServer-based system
are temporarily unavailable, this should be
transparent to the overall replication system.
When a component becomes available again,
replication should continue with minimal or no
further manual DBA action. Most of the time, this
works quite well.
How
do you get familiar with RepServer ?
I'd say there is no easy, painless way
to learn how to use RepServer. As a start, make sure you
know everything a normal DBA should know about the RDBMS
platforms involved in your replication system. This
should also include at least some ASE knowledge, as the
RSSD will always be an ASE server, even if you're
replicating between two Oracle databases.
When you're replicating between ASE databases, you must
understand how the ASE transaction log works, how the log
is dumped and truncated, how to enlarge databases and
logs, how your choice of disk usage affects recovery
issues... in short: the works.
Next, I recommend you take a RepServer training from
Sybase ("Fast Track to Replication Server").
After that course, you'll probably be back in your office
wondering how you should ever manage to understand this
thing -- don't worry, it is possible !
In my experience, the only way to learn RepServer is to
create -- and solve -- many problems, because only then
will you get a feel for the product. The best way to gain
this experience is to set up your own private little
replication system from scratch: try to create a setup
where you replicate just a simple 1-column table from one
database to another. By the time you have succeeded in
getting this to work, you will have learned a lot -- even
though you already did this on the Sybase training. Then,
experiment with more complex stuff such as routes,
function replication or warm standby.
Also, make sure you have the RepServer documentation
handy: go
here
for details where to get
this from.
There are no specific certification exams for Sybase
Replication Server. For replication in an ASE
environment, it seems reasonable to expect a RepServer
DBA to have an ASE DBA certification.
Why
is RepServer so difficult ?
I've heard many different opinions on
RepServer, ranging from "wonderful" to, as I
once heard someone say, "the worst thing since
Pompeii". While the latter may be overstating it a
bit, RepServer is certainly not an easy product to
set up, administrate or troubleshoot. One reason for this is
that RepServer doesn't seem to have been developed with
user-friendliness as a prime objective; it's a typical
example of a very powerful, flexible, yet difficult-to-use
piece of software, which is best appreciated by hard-core
technology diehards.
Another factor is that the scope and complexity of your
DBA work expands significantly when replication becomes
involved: those previously independent dataservers are
now suddenly closely related because data is replicated
between them. This effectively means that they have
become one big system from a DBA point of view, thus
making a DBA's job significantly more difficult.
Cost
of deploying RepServer
It tends to be pretty expensive to use RepServer. On the
one hand, the license fees can be stiff, as they tend to
be based on the number of users in your dataservers. More
importantly though, you'll need plenty of experienced and
skilled DBA staff, both for your dataservers and for
RepServer, which tend to be expensive.
What
kind of organisations use RepServer ?
Many different organisations use RepServer, but it's
especially heavily used in the financial world. Many
banks have implemented complex bi-directional replication
systems, for example to link their security trading
operations at stock exchanges around the world. This is
the reason why many RepServer DBAs can be found in places
like New York, London and Tokyo.
What
about replication for ASA ?
Replication for Adaptive Server
Anywhere (formerly known as SQL Anywhere) is done through
SQL Remote. This is a different product which is not
related to, and works completely different than,
RepServer.
RepServer
documentation
RepServer
manuals
Some of the existing RepServer manuals are
the "Reference Manual" (descriptions
of all commands), the "Administration Guide"
(extensive discussions of administrator tasks, organised
by topic) and the "Troubleshooting Guide"
(well, for troubleshooting obviously). The "Design
Guide" deals with different possible designs of
replication systems, such as bi-directional replication
and conflict resolution, and changing ownership.
As all Sybase documentation, the RepServer manuals can be
downloaded from the Sybase web site as printable PDF
files. To obtain these files, go
here
, and click on the "Core Documentation Set" for the RS version you're interested in.
You should then see a list of available manuals, both as PDF documents as well as online searchable documents.
The Sybase web site also provides handy tips and tools at CodeXchange (free registration required).
Certification/compatibility info
Information about compatibility between different versions of RepServer and ASE on different platforms can be found here and here.
RepServer System tables poster
You can download a system tables poster for RepServer 12.0 from the Sybase website
as a PDF document.
Go here for more information.
RepServer books
There are very few
books covering technical aspects of RepServer, but here's a few suggestions:
- Another book is
"Sybase Replication Server Primer" by C.B. Clifford,
McGraw-Hill, ISBN 0-07011515-X. This book is only for
those who have read all the existing RepServer manuals
and still want more. It describes architecture and
internals in detail, and is very technical. As such, it
isn't easy reading.
My 1995 edition covers only RepServer version 10 -- I
don't know if there are new editions covering more recent
versions. The book contains a useful appendix with SQL
queries to manually query the RSSD.
RepServer newsgroups
The Sybase news server, forums.sybase.com
, carries some newsgroups
dedicated to RepServer: sybase.public.rep-agent and
sybase.public.rep-server.
There's also a newsgroup about ASA replication (but note:
this is quite a different topic!):
sybase.public.sqlanywhere.replication.
Professional certifications for RepServer
Since April 2003, it is possible to become Certified
Administrator for Replication Server 12.5 by passing a certification exam.
More information is here.
|