#!/bin/sh 
#
# qfilter.sh
# ----------
# This is a Bourne Shell script with the purpose of identifying 
# potentially heavy queries before execution.
# It is designed for the sitation where end users can type ad-hoc SQL queries
# through a WWW-form, which should then be executed. Because the end users
# cannot be expected to have sufficient SQL knowledge, nor to be very
# careful when specifying their query, a filter is needed to trap inefficient,
# heavy queries which might hang up the system before they are executed.
# Using this tool, there is a reasonable chance that such queries can be
# identified, although some may still slip through.
# Running this query filter typically takes 1-2 seconds, depending on system
# capacity and workload. 
#
# This tool will examine the query and its query plan, as well as 
# some optimizer information, and search for indications which
# indicate that the query might consume a lot of system resources.
# When such indications are found, diagnostic messages are printed (see below).
#
# This query filter should be used as follows:
# 
#   1. Create a file (say, "query.sql") containing the query to be analysed.
#
#   2. Pass this file to the tool (assuming it's called 'qfilter.sh') :
#         % qfilter.sh query.sql
#
#   3. Messages will be printed indicating the result of the analysis.
#      Based on these, it should be decided whether to reject the
#      query, or to accept it and execute it through an ISQL session.
#
#
#
# Parameters
# ----------
#
# This tool should be called as follows:
#
#     % qfilter.sh <query-filename> [max.I/Os]
#
# The first parameter is mandatory, as it is the file containing the 
# query to be analysed.
#
# The second parameter is optional. It's the maximum number of logical I/O's
# allowed for this query. A value which isn't too high, like 500 or 1000, 
# should normally be sufficient. This value can be compared with the number
# of logical I/O's as printed when "set statistics io" is enabled.
# The tool uses the estimated number of I/O's as reported by the optimizer, so
# if the optimizer gets it wrong, you're out of luck. Therefore, index 
# statistics should be up-to-date for best chances of reliable results.
# If the second parameter is not specified, the I/O limit will be 
# taken from the variable "QF_MAXIO", if defined. If it isn't, a hard-coded 
# constant is used, which you could change -- check the code below.
#
# You can specify a default database context in which the query will be 
# analysed, by defining the variable "QF_DEFDB". If not, it uses a 
# hard-coded value. This default database is effectively a "use <default-DB>"
# statement that is prepended to the query under consideration.
#
#
# Analysis results
# ----------------
# The following messages may be printed. 'qfilter.sh' is the actual filename
# of the tool.
#
#  Message: "qfilter.sh: Query contains syntax error(s)"
#  Meaning: The query contains syntax errors. This is not related 
#  to query performance issues, but could be used to tell notify the end user
#  of a mistake. 
#
#  Message: "qfilter.sh: Query contains SELECT...INTO construct"
#  Meaning: The query contains a SELECT...INTO. This could be undesirable
#  because end users could create tables (in tempdb for example), thus
#  consuming space which could lead to problems later. Such queries
#  could be rejected.
#
#  Message: "qfilter.sh: Query causes table reformatting"
#  Meaning: The query requires an index which is not available. 
#  This message could be tracked to identify the need for additional indexes.
#
#  Message: "qfilter.sh: Query uses index forcing"
#  Meaning: The query contains a forced index and possibly forced I/O size
#  or cache replacement strategy. Tuning decisions like should not be allowe
#  to end users, but solved in a more structural way. Queries generating
#  this message should be rejected.
#
#  Message: "qfilter.sh: Query generates Cartesian Product(s)"
#  Meaning: The query does not contain enough join clauses. The end user 
#  could be notified of this omission.
#  Queries generating this message should be rejected.
#
#  Message: "qfilter.sh: Query is likely to exceed max. I/O limit (xx vs. yy)"
#  Meaning: The estimated number of logical I/O's, as reported by the 
#  optimizer exceeds the limit specified (see section on "QF_MAXIO" usage).
#  (xx=estimated I/O's by the optimizer; yy=I/O limit specified).
#  Queries generating this message should be rejected.
#
#  Message: "qfilter.sh: Ready"
#  Meaning: This message will always be generated when the tool completes. 
#  In case no other messages were produced, this means no obvious flaws could
#  be spotted, and the query could be executed.
#
#
# Setup requirements
# ------------------
# There should be a server login having 'sa_role' which will be used for 
# query analysis. The login name and password should be hard-coded below 
# (search for the string 'SAUSER').
# 
#
# Notes
# -----
# Note that the query analysis is done through a login having SA-role. This
# may be a different login than that which executes accepted queries, and
# therefore may have a different default database. By defining "QF_DEFDB",
# the right database context can be ensured. 
#
# Also note that any permission problems will go unnoticed. During query 
# analysis, queries are executed with "set noexec on", causing permission
# issues not to be evaluated.
#
# Note that, because "sa_role" is required for certain statements,
# certain problems with access rights that will come out as error messages,
# will go unnoticed.
#
# This version was successfully tested on Solaris & Digital Unix. It may also
# run on other platforms which have Bourne Shell available. 
#
# This version was successfully tested with SQL Server versions 10.x and 11.0.x.
# It does not currently work with version 11.5, because the DBCC 311 output 
# is different from 11.0. 
#
#-----------------------------------------------------------------------------
#
# Copyright note & Disclaimer :
# =============================
# This software is provided "as is" -- no warranty.
# You can use this software free of charge for your own professional, 
# non-commercial purposes. 
# You are not allowed to sell this software or use it for any commercial 
# purpose. You may (re)distribute only unaltered copies of this software, which
# must include this copyright notice.
#
# Please send any comments, bugs, suggestions etc. to the below email address.
#
# (c) 1998 Copyright Rob Verschoor
#                    Sypron B.V.
#                    P.O.Box 10695
#                    2501 HR Den Haag
#                    The Netherlands
#
#                    Email: rob@sypron.nl
#                    WWW  : http://www.sypron.nl
#
#-----------------------------------------------------------------------------
#
# History
# =======
# Version 1.0   Aug.97   First complete version
#
#----------------------------------------------------------------------
#
MYNAME=`basename $0`
#
#
if [ $# -lt 1 ] || [ $# -gt 2 ]
then
  echo "Usage: $MYNAME <query-filename> [max.I/Os]"
  exit 1
fi
#
#----------------------------------------------------------------------
#
# Clean up temp. files
#
CLEANUP_TMPFILES()
{
rm -f $TMPFILE.*
}
#
#-------------------------------------------------------------------------------
#
# Common exit point
#
READY()
{
CLEANUP_TMPFILES
echo "$MYNAME: Ready"
exit 0
}
#
#-------------------------------------------------------------------------------
#
# Platform-specific stuff
#
THISPLATFORM=`uname -s`
#
case "$THISPLATFORM" in
"SunOS")   # Solaris
   THISHOST=`uname -n`
;;
"OSF1")    # DecUnix
   THISHOST=`uname -n`
;;
*)
   echo "Note: This software was not tested on this platform -- there may be errors."
   THISHOST=`uname -n`
;;
esac
#
#----------------------------------------------------------------------
#
# Server login settings
#
SRV=$DSQUERY
#
#
# An SA-role login is required for using traceflags
#
SAUSER=sa                   # login having SA-role
SAPSWD=escondido            # your-SA-password
SAPSWD=dokkum            # your-SA-password
#
#----------------------------------------------------------------------
#
# Set up max number of estimated logical I/O's accepted
#
if [ $# -ge 2 ]
then
MAXIO=$2               # First option: limit specified on command line
else
if [ "$QF_MAXIO" -ne "" ]
then
MAXIO=$QF_MAXIO        # Second option: limit specified in variable
else
MAXIO=100              # If nothing specified, take a hard-coded default here
fi
fi
#
#----------------------------------------------------------------------
#
# Set up default database for query context: a statement "use <default-DB>"
# is prepended to the query under consideration
#
if [ "$QF_DEFDB" -ne "" ]
then
DEFDB=$QF_DEFDB        # First option: default DB specified in variable
else
DEFDB=master           # If nothing specified, take a hard-coded default here
fi
#
#----------------------------------------------------------------------
#
# Set up temp. file definitions
#
TMPFILE=/tmp/$MYNAME.$$
CLEANUP_TMPFILES
#
QIN=$TMPFILE.qin
QOUT=$TMPFILE.qout
#
#----------------------------------------------------------------------
#
# Check specified input file is OK
#
INFILE=$1   # file that holds the query
#
if [ ! -r $INFILE ] || [ ! -f $INFILE ]
then
echo "Input file [$INFILE] is not readable or does not exist"
CLEANUP_TMPFILES
exit 1
fi
#
#----------------------------------------------------------------------
#
# Some checks:
#   - server connection is OK
#   - login has sa_role
#   - pick up server version
#
isql -U$SAUSER -S$SRV -P$SAPSWD << --EOF-- > $TMPFILE.chk
print "Logged in"
select show_role()
if substring(@@version,1,13) = "SQL Server/10" 
   print "version: 10.x"
if substring(@@version,1,13) = "SQL Server/11" 
   print "version: 11.0"
if substring(@@version,1,31) = "Adaptive Server Enterprise/11.5" 
   print "version: 11.5"
go
--EOF--
#
if [ `grep -c "Logged in" $TMPFILE.chk` = 0 ]
then
echo "Cannot log in to server $SRV; Server down ? Wrong password ?"
CLEANUP_TMPFILES
exit 1
fi
#
#
if [ `grep -c sa_role $TMPFILE.chk` = 0 ]
then
echo "This tool requires 'sa_role'; please see your Database Administrator"
CLEANUP_TMPFILES
exit 1
fi
#
#
if [ `grep -c "version:" $TMPFILE.chk` = 0 ]
then
echo "Cannot determine server version."
CLEANUP_TMPFILES
exit 1
fi
#
#
SRVVERSION=`grep "version:" $TMPFILE.chk | awk '{print $2}'`
if [ "$SRVVERSION" = "11.5" ]
then
echo "This tool does not yet work with ASE 11.5. Sorry..."
CLEANUP_TMPFILES
exit 1
#
# Explanation:
# In 11.5, DBCC 311 prints only useless rubbish. SHOWPLAN prints I/O estimates,
# although these are I/O *cost* estimates, but this only happens when
# 'allow resource limits' enabled. For the time being, 11.5 will do without
# us...
#
fi
echo "Server version $SRVVERSION"
#
#----------------------------------------------------------------------
#
# Prepare statements to be placed before actual query
# Note that for using "DBCC traceon", you must have "sa_role".
# This may obfuscate certain problems with access rights that would
# otherwise have come out as error messages.
#
cat << --EOF-- > $TMPFILE.hdr
use $DEFDB
go
dbcc traceon(3604,
302) --DBCC 302
310) --DBCC 310
311) --DBCC 311
go
set showplan on
go
set noexec on
go
--EOF--
#
#----------------------------------------------------------------------
#
# Prepare trailer file that switches DBCC off again ...
#
cat << --EOF-- > $TMPFILE.trl
go
set noexec off
go
dbcc traceoff (3604, 302, 310, 311)
go
--EOF--
#
#----------------------------------------------------------------------
#
# Create final query files for analysis
# First a General one as a starter...
#
cat $TMPFILE.hdr $INFILE $TMPFILE.trl > $TMPFILE.q
#
#
# Create the 302 version
#
egrep -v '(--DBCC 310$|--DBCC 311$|set showplan on)' $TMPFILE.q > $QIN.302
#
#
# Create the 310 version
#
egrep -v '(--DBCC 302$|--DBCC 311$|set showplan on)' $TMPFILE.q > $QIN.310
#
#
# Create the 311 version
#
egrep -v '(--DBCC 302$|--DBCC 310$|set showplan on)' $TMPFILE.q > $QIN.311
#
#
# Create the showplan version
#
egrep -v '(--DBCC 3|dbcc traceon|dbcc traceoff)' $TMPFILE.q > $QIN.plan
#
#
# Remove the DBCC & showplan part for the syntax-check version
#
grep -v "set showplan on" $QIN.plan > $QIN.err
#
#----------------------------------------------------------------------
#
# Run the queries through the optimizer & collect output
#
isql -U$SAUSER -S$SRV -P$SAPSWD < $QIN.err > $QOUT.err
#
#----------------------------------------------------------------------
#
# First check for syntax errors in the query 
#
if [ `cat $QOUT.err | wc -l` -gt 0 ]
then
echo "$MYNAME: Query contains syntax error(s)"
cat $QOUT.err
READY
fi
#
#----------------------------------------------------------------------
#
isql -U$SAUSER -S$SRV -P$SAPSWD < $QIN.plan > $QOUT.plan
isql -U$SAUSER -S$SRV -P$SAPSWD < $QIN.302 > $QOUT.302 
isql -U$SAUSER -S$SRV -P$SAPSWD < $QIN.310 > $QOUT.310 
#
#
# 132-col is necessary because isql will break up lines if they get > 80
# While we don't use the physical reads estimate at this moment, we 
# might at some point ..
#
isql -w132 -U$SAUSER -S$SRV -P$SAPSWD < $QIN.311 > $QOUT.311 
#
# 
# Make sure DBCC flags are turned off again.
# This one is executed just in case the above failed.
#
#isql -U$SAUSER -S$SRV -P$SAPSWD < $TMPFILE.trl > /dev/null
#
#----------------------------------------------------------------------
#
# Analyse the query plan for specific issues
#
#
# Check for SELECT...INTO : Note this check isn't 100%
#  - SELECT_INTO is generated by S.10, but not always
#  - "The type of query is CREATE TABLE" comes from S.11. Not sure if always..
#
if [ `egrep -c '(The type of query is CREATE TABLE|SELECT_INTO)' $QOUT.plan` -gt 0 ]
then
echo "$MYNAME: Query contains SELECT...INTO construct"
fi
#
#
# Check for reformatting
#
if [ `grep -c "created for REFORMATTING" $QOUT.plan` -gt 0 ]
then
echo "$MYNAME: Query causes table reformatting"
fi
#
#----------------------------------------------------------------------
# 
# Check for forced indexes. Note that this includes possible forcing of 
# I/O size and cache strategy, as these can only be specified when
# an index is forced.
#
if [ `grep -c "User forces index" $QOUT.302` -gt 0 ]
then
echo "$MYNAME: Query uses index forcing"
fi
#
#----------------------------------------------------------------------
#
# Check for Cartesian Product in 310 output
# If there is a CP, there is a message saying "QUERY IS NOT CONNECTED"	
# If there is no CP, there is a message saying "QUERY IS CONNECTED"	
#
# Strictly speaking, a CP wouldn't have to be a problem if the 
# tables involved were small. However, in the context of this tool,
# it seems reasonable to force the user to specify enough join clauses.
#
# NB: There can be various messages about the query being connected or not. 
# As DBCC 310 prints only a new plan when this is cheaper than the previous 
# one, we should be looking for the last "query is (not) connected" message,
# as this will correspond to the final plan. Plans that were considered before
# (and that did not make it) may report that the query is not connected, 
# and we don't want to react to those messages.
#
#
grep "QUERY IS" $QOUT.310 | tail -1 > $QOUT.310.connect
#
if [ `grep -c "QUERY IS NOT CONNECTED" $QOUT.310.connect` -gt 0 ]
then
echo "$MYNAME: Query generates Cartesian Product(s)"
fi
#
#----------------------------------------------------------------------
#
# Get logical I/O estimates from DBCC 311
#
if [ `grep -c "scan count" $QOUT.311` -gt 0 ]
then
#
#cat $QOUT.311
#
grep "scan count" $QOUT.311 | awk '{print $8}' |\
     sed -e 's/,//g' > $QOUT.311.list
#
#
# Add 'm up ...
#
TOTIO=`awk '{totio += $1} END {print totio}' $QOUT.311.list`
#
#
# ... and check whether it's more than we like.
#
#echo "Total I/O = $TOTIO"
#
if [ $TOTIO -gt $MAXIO ]
then
echo "$MYNAME: Query is likely to exceed max. I/O limit ($TOTIO vs. $MAXIO)"
fi
#
fi
#
#----------------------------------------------------------------------
#
READY
#
#=== end ==============================================================
